Support Envelopes

This web site provides a brief tutorial introduction (and some MATLAB routines)  for support envelopes, a new type of association pattern.

A paper describing support envelopes can be found here http://www.cs.umn.edu/~steinbac/se/se.pdf, but briefly, support envelopes for a (binary) transaction data set and a specified pair of positive integers (m, n) is the set of items and transactions that that need to be searched to find any association pattern involving m or more transactions and n or more items.

A MATLAB routine, find_envelope.m, can be used to find a support envelope given a binary matrix and specified values of  m (min support) and n (min items)..

The following commands show how this routine can be used. env is a structure containing information about the support envelope
and is documented in find_envelope.m

A = rand( 20, 30 ) > 0.6;          % Generate a random binary matrix
env = find_envelope( A, 5, 10 );   % Find the envelope
env                                % Display the envelope results

(Since A is randomly generated, there is some chance env will be empty, but typically it is not.)

A support envelope defines a sub-block of the binary transaction matrix. This can be visualized using the routine plot_envelope.m. It sorts the rows and columns of the sub-block (envelope) so that the denser region of the sub-block is at the top, lefthand side. (The column and row labels in the firgures do not map to the indices of the support envelope).

plot_envelope( A, env );   % Plot the envelope

To find all support envelopes of a matrix, you can use the routine find_envelopes.m

envs = find_envelopes( A );   % Find the set of all envelopes

The (m, n) values of these envelopes can be plotted in a scatter plot by using the routine envelope_scatter_plot.m.

envelope_scatter_plot( envs ); % Scatter plot the envelope

Only the envelopes on the support boundary can be found using find_boundary.m.

boundary_envs = find_boundary( A ); % Find the boundary envelopes

The (m, n) values of these boundary envelopes can also be plotted in a scatter plot.

envelope_scatter_plot( boundary_envs ); % Scatter plot the boundary envelope

Finally, a data file with transaction data, e.g., the mushroom or chess data sets (http://fimi.cs.helsinki.fi/testdata.html), can be read in
using read_transaction_data.m. (This routine has minimal error checking, e.g., it won't read in the
kosarak dataset from the same location without some tweaking.)

When running the above scripts with larger files, you may want to uncomment some of the 'disp' commands so that you can see the progress.

This is a very brief introduction. I will continue to update this page. There are C/C++ versions of some of the above routines, which are much
faster. I hope to make them available later on.

Questions: Please email me at steinbac@cs.umn.edu.