This web site provides a brief tutorial introduction (and some MATLAB routines) for support envelopes, a new type of association pattern.
A paper describing support envelopes can be found here http://www.cs.umn.edu/~steinbac/se/se.pdf, but briefly, support envelopes for a (binary) transaction data set and a specified pair of positive integers (m, n) is the set of items and transactions that that need to be searched to find any association pattern involving m or more transactions and n or more items.
A MATLAB routine, find_envelope.m, can be used to find a support envelope given a binary matrix and specified values of m (min support) and n (min items)..
The following commands show how this routine can be used. env
is a structure containing information about the support envelope
and is documented in find_envelope.m
A = rand( 20, 30 ) >
0.6; % Generate a random
env = find_envelope( A, 5, 10 ); % Find the envelope
env % Display the envelope results
(Since A is randomly generated, there is some chance env will be empty, but typically it is not.)
A support envelope defines a sub-block of the binary transaction matrix. This can be visualized using the routine plot_envelope.m. It sorts the rows and columns of the sub-block (envelope) so that the denser region of the sub-block is at the top, lefthand side. (The column and row labels in the firgures do not map to the indices of the support envelope).
plot_envelope( A, env ); % Plot the
To find all support envelopes of a matrix, you can use the routine find_envelopes.m.
envs = find_envelopes( A ); % Find the
set of all envelopes
The (m, n) values of these envelopes can be plotted in a scatter plot by using the routine envelope_scatter_plot.m.
envelope_scatter_plot( envs ); % Scatter plot the envelope
Only the envelopes on the support boundary can be found using find_boundary.m.
boundary_envs = find_boundary( A ); % Find the boundary envelopes
The (m, n) values of these boundary envelopes can also be plotted in a scatter plot.
envelope_scatter_plot( boundary_envs ); % Scatter plot the boundary envelope
Finally, a data file with transaction data, e.g., the mushroom or chess data
can be read in
using read_transaction_data.m. (This routine has minimal error checking, e.g., it won't read in the
kosarak dataset from the same location without some tweaking.)
When running the above scripts with larger files, you may want to uncomment some of the 'disp' commands so that you can see the progress.
This is a very brief introduction. I will continue to update this page. There
are C/C++ versions of some of the above routines, which are much
faster. I hope to make them available later on.
Questions: Please email me at email@example.com.
The code in this directory is copyrighted by the Regents of the University of Minnesota. They can be freely used for educational and research purposes by non-profit institutions and US government agencies only. Other organizations are allowed to use these routines only for evaluation purposes, and any further uses will require prior approval. The software may not be sold or redistributed without prior approval. One may make copies of the software for their use provided that the copies, are not sold or distributed, are used under the same terms and conditions. As unestablished research software, this code is provided on an ``as is'' basis without warranty of any kind, either expressed or implied. The downloading, or executing any part of this software constitutes an implicit agreement to these terms. These terms and conditions are subject to change at any time without prior notice.