SLIM
1.0
Sparse Linear Methods (SLIM) for top-n recommender systems
|
SLIM is a library which implements the Sparse LInear Methods (SLIM) for top-n recommendation. The algorithm is described in the paper
Xia Ning and George Karypis, "SLIM: Sparse Linear Models for Top-N Recommender Systems", Proceedings of the 2011 IEEE 11th International Conference on Data Mining, 497–506.
This manual is divided in the following sections:
SLIM is an open-source software and also provided as a binary distribution with pre-built executables for Linux (64 bit architecture). Additional binaries can be provided upon request. The source code can be downloaded here.
slim-1.0.tar.gz Linux (x86_64)
A pdf version of the manual is available here
Once you download SLIM, you need to uncompress and untar it using the following commands:
> tar -xzf slim-1.0.tar.gz
This will create a directory named slim-1.0
with the following structure:
slim-1.0\ build\ examples\ include\ src\
In order to compile the source code and build the SLIM library, it requires CMake 2.8 (http://www.cmake.org/) and gcc 4.4. It also requires gsl (GNU Scientific Library) installed. Assumming CMake, gcc and gsl are installed, do the following commands to compile and build:
> cd slim-1.0 > cd build > cmake .. > make > make install
And if you want to clean all the objects generated from make
, do the following command:
> make clean
After you do the above commands, a libSLIM.a
library will be generated within build/lib
directory, all the *.h files are in build/include
directory, and two executables slim_learn
and slim_predict
will be generated within build/examples
directory. You can use slim_learn
and slim_predict
as stand-alone programs, or you can use the library by properly linking it and including the header files.
The name of the SLIM executable is slim_learn
and slim_predict
and they are located under build/examples
. The slim_learn
and slim_predict programs are invoked at the command-line within a shell window (e.g., Gnome terminal, etc).
The manpage for SLIM
is the following (can be obtained by typing slim_learn -help
):
Usage slim_learn [options] -train_file=string Specifies the input file which contains the training data. This file should be in .csr format. -test_file=string Specifies the input file which contains the testing data. This file should be in .csr format. -model_file=string Specifies the output file which will contains a model matrix. The output file will be in .csr format. -fs_file=string Specifies the input file which contains a matrix for feature selection purpose. This input file should be in .csr format. This option takes effect only when -fs option is specified. -pred_file=string Specifies the output file which will contain the top-n prediction for each user. The output file wil be in .csr format. If this option is not specified, no prediction scores will be output. -lambda=float Specifies the regularization parameter for the $\ell_1$ norm -beta=flat Specifies the regularizationi parameter for the $\ell_2$ norm -starti=int Specifies the index of the first column (C-style indexing) from which the sparse coefficient matrix will be calculated. The default value is 0. -endi=int Specifies the index of the last column (exclusively) up to which the sparse coefficient matrix will be calculated. The default value is the number of total columns. -transpose Specifies that the input feature selection matrix needs to be transposed. -fs Specifies that feature selection is required so as to accelerate the learning. -k=int Specifies the number of features if feature selection is applied. The default value is 50. -dbglvl=int Specifies the debug level. The default value is 0. -optTol=float Specifies the threshold which control the optimization. Once the error from two optimization iterations is smaller than this value, the optimization process will be terminated. The default value is 1e-5. -max_bcls_niters=int Specifies the maximum number of iterations that is allowed for optimization. Once the number of iterations reaches this value, the optimization process will be terminated. The default value is 1e5. -bsize=int Specifies the block size for output. Once the calculation for these bsize blocks are done, they are dumped into the output file. The default value is 1000. -nratings=int Specifies the number of unique rating values in the testing set. The rating values should be integers starting from 1. The default value is 1. -topn=int Specifies the number of recommendations to be produced for each user. The default value is 10. -help Print this message.
The slim_learn
and slim_predict
accept and produce a sparse matrix format (with extension .csr) which is specified as follows.
A sparse matrix A with n rows and m columns is stored in a plain text file that contains n lines, where the n lines contain information for each row of A. In SLIM’s sparse matrix format only the non-zero entries of the matrix are stored. In particular, the i-st line of the file contains information about the non-zero entries of the i-th row of the matrix. The non-zero entries of each row are specified as a space-separated list of pairs. Each pair contains the column number followed by the value for that particular column. The column numbers are assumed to be integers and their corresponding values are assumed to be binary. Note that the columns are numbered starting from 1 (not from 0 as is often done in C). An example of SLIM’s matrix format is shown as follows. This shows an example 7 × 8 matrix and its corresponding representation in SLIM’s matrix format.
matrix: 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 0 1 1 0 1 1 matrix .csr file 2 1 5 1 8 1 1 1 2 1 4 1 3 1 6 1 8 1 1 1 2 1 4 1 7 1 3 1 5 1 6 1 2 1 4 1 5 1 7 1 8 1
The slim_learn
generates a model file which will be in .csr format as specified above, and the contained matrix is actually the transpose of the aggregation coefficient matrix.
The slim_predict
generates a prediction file, if specified by -pred_file
, in .csr format. In this file, each row corresponds to a testing user, the column values correspond to the items that have bean recommended, and the corresponding values are the recommendation scores. All the column values are order based on the scores in descreasing order.
The following shows how to run slim_learn
slim_learn -train_file=train.mat -model_file=model.mat -starti=0 -endi=1682 -lambda=2 -beta=5 -optTol=0.00001 -max_bcls_niters=10000
The model is printed into model.mat
.
model.mat
is the transpose of the sparse aggregation coefficient matrix.The following shows how to run slim_predict
slim_predict -train_file=train.mat -test_file=test.mat -model_file=model.mat -pred_file=prediction.txt -topn=10
model.mat
contains an aggregation coefficient matrix or an item-item similarity matrix (i.e., not the transposed one) computated from another method, it needs to be transposed, You can run slim_learn
to calculate only a chunck (i.e., a certain set of consecutive columns, specified by -starti
and -endi
) of the aggregation coefficent matrix. In this way, you can run multiple slim_learn programs in parallel (e.g., on a hadoop cluster) to calculate different chunks of the aggreegation coefficient matrix concurrently and then collect all the output and concatenate them in the right order so as to get the entire aggregation coefficient matrix.
SLIM was written by Xia Ning.
Thank Prof. Michael P. Friedlander for providing the BCLS library.
Thank Prof. George Karypis for providing the GKlib library.
If you encounter any problems or have any suggestions, please contact Xia Ning via email at xning@cs.umn.edu.
Copyright and License Notice ---------------------------- The SLIM package is copyrighted by the Regents of the University of Minnesota. It can be freely used for educational and research purposes by non-profit institutions and US government agencies only. Other organizations are allowed to use SLIM only for evaluation purposes, and any further uses will require prior approval. The software may not be sold or redistributed without prior approval. One may make copies of the software for their use provided that the copies, are not sold or distributed, are used under the same terms and conditions. As unestablished research software, this code is provided on an ``as is'' basis without warranty of any kind, either expressed or implied. The downloading, or executing any part of this software constitutes an implicit agreement to these terms. These terms and conditions are subject to change at any time without prior notice.