Information retrieval and clustering using vector-based similarities
1 Computational Methods for Intelligent Information Access. M.W.
Berry, S.T. Dumais, and T.A. Letsche.Proceedings of Supercomputing'95,
San Diego, CA, December 1995.
http://www.supercomp.org/sc95/proceedings/473_MBER/SC95.HTM
day 1 linear algebra background ____________________________________________
day 2 LSI example ____________________________________________
2 Latent semantic indexing via a semi-discrete matrix decomposition
Tamara G. Kolda and Dianne P, O'Leary, in The Mathematics
of Information Coding, Extraction and Distribution, G. Cybenko et al.,
eds., vol. 107 of IMA Volumes in Mathematics and Its Applications.
Springer-Verlag, 1999, pp. 73-80.
http://csmr.ca.sandia.gov/~tgkolda/pubs/ACM-TOIS-1998.pdf
day 3 scalings; SDD variation ____________________________________________
3 Concept Decompositions for Large Sparse Text Data using Clustering.
I.S. Dhillon, D.S. Modha, IBM Research Report RJ 10147, July 8, 1999,
Machine Learning, 42:1, pages 143-175, January 2001.
http://www.cs.utexas.edu/users/inderjit/public_papers/concept_mlj.pdf
day 4 alternative representation of dataset for text documents, matrix repr
____________________________________________
4 A Scalable Hierarchical Algorithm for Unsupervised Clustering D L
Boley in Data Mining for Scientific and Engineering Applications, R.
Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, R. Namburu (eds),
Kluwer. p383-400, 2001.
ftp://ftp.cs.umn.edu/dept/users/boley/reports/ARC00paper.ps.gz
day 5 use of sparse lin alg methods -> fast ____________________________________________
5 Using Low-Memory Representations to Cluster Very Large Data Sets
(ps), D Littau, D Boley, Third SIAM Conference on Data Mining, 2003.
file://localhost/web/classes03/Spring-2003/csci8363/littau.ps
day X ____________________________________________
Link Analysis -- Page Rank
X The World's Largest Matrix Computation
Google's PageRank is an eigenvector of a matrix of order 2.7 billion
by Cleve Moler
http://www.mathworks.com/company/newsletters/news_notes/clevescorner/oct02_cleve.html
Day 7 - addendum ___________________________________________
6 The PageRank Citation Ranking: Bringing Order to the Web.Page,
Lawrence; Brin, Sergey; Motwani, Rajeev; Winograd, Terry. Stanford
Univ. Computer Science Dept technical report. Oct. 2001
http://dbpubs.stanford.edu/pub/1999-66
day 7 PageRank idea ____________________________________________
7 * Sepandar D. Kamvar, Taher H. Haveliwala, and Gene H. Golub,
"Adaptive Methods for the Computation of PageRank", To appear in
Linear Algebra and its Applications, Special Issue on the
Numerical Solution of Markov Chains, November, 2003.
http://www.stanford.edu/~sdkamvar/papers/adaptive.pdf
day 8 implementation issues for PageRank ____________________________________________
8 Link Analysis, Eigenvectors and Stability, Andrew Y. Ng and Alice
X. Zheng and Michael I. Jordan. IJCAI 2001, p 903-910.
http://citeseer.ist.psu.edu/ng01link.html
day 9 numerical properties of PageRank ____________________________________________
Eigenfaces
X Face Recognition using View-Based and Modular Eigenspaces (1994)
Baback Moghaddam, Alex Pentland
SPIE'94 [this has more examples]
http://citeseer.ist.psu.edu/moghaddam94face.html
day X -- skip ____________________________________________
9 Probabilistic Visual Learning for Object Representation, Baback
Moghaddam, Alex Pentland Early Visual Learning, Oxford University
Press, 1996.
http://citeseer.ist.psu.edu/moghaddam96probabilistic.html
day 10 face detection ____________________________________________
A View based and modular eigenspaces for face recognition.
A Pentland, B Moghaddam, T Starner.
IEEE Conf on Computer Vision & Pattern Recognition,
Seattle, June 1994.
day 11 face recognition ??? ____________________________________________
Local Linear Embedding and visualization
B An Introduction to Locally Linear Embedding. Lawrence Saul & Sam
Roweis. [draft version (Jan.01)]
http://www.cs.toronto.edu/%7Eroweis/lle/papers/lleintro.pdf
day 12 Introduction to Locally Linear Embedding ______________________________________
F Local Context Finder (LCF) reveals multidimensional relationships among
mRNA expression profiles of Arabidopsis responding to pathogen infection.
Fumiaki Katagiri* and Jane Glazebrook
PNAS | September 16, 2003 | vol. 100 | no. 19 | pp10842-10847
http://www.pnas.org/cgi/content/full/100/19/10842
day X ____________________________________________
Non-Negative Matrix Factorization
C Algorithms for Non-Negative Matrix Factorization. David Lee & H Sebastian
Seung.
hebb.mit.edu/people/seung/papers/nmfconverge.pdf
day 13 intro to NMF (perhaps other paper needed too) ___________________________________
D When Does Non-Negative Matrix Factorization Give a Correct Decomposition
into Parts. David Donoho & Victoria Stoddem
www-stat.stanford.edu/~donoho/Reports/2003/NMFCDP.pdf
day 14 properties of NMF ____________________________________________
E A Weighted Non-negative Matrix Factorizatin for Local Representations.
(Poster). David Guillaumet, Marco Bresan, Jordi Vitria
www.cvc.uab.es/~davidg/PosterCVPR/PosterCVPR2001.pdf
day X ____________________________________________
Matrix Factorizations in Biology
G Metagenes and molecular pattern discovery using matrix factorization.
Jean-Philippe Brunet, Pablo Tamayo, Todd R. Golub, Jill P. Mesirov.
PNAS | March 23, 2004 | vol. 101 | no. 12 | 4164-4169.
http://intl.pnas.org/cgi/content/abstract/101/12/4164
day 15 biological application of Lin Alg ____________________________________________
H O. Alter and G. H. Golub, "Integrative Analysis of Genome-Scale
Data Using Pseudoinverse Projection Predicts Novel Correlation
Between DNA Replication and RNA Transcription," Proceedings of
the National Academy of Sciences 101 (47), pp. 16577-16582
(November 2004) (supplemental material at
http://www.bme.utexas.edu/research/orly/pseudoinverse/).
I O. Alter, P. O. Brown and D. Botstein, "Generalized Singular Value
Decomposition For Comparative Analysis of Genome-Scale Expression
Datasets of Two Different Organisms," Proceedings of the
National Academy of Sciences 100 (6), pp. 3351-3356 (March 2003)
(supplemental material at http://genome-www.stanford.edu/GSVD/
and at http://www.bme.utexas.edu/research/orly/GSVD/).
day 16 GSVD interpretation ____________________________________________
Support Vector Machines
J Support Vector Machines: Hype or Hallelujah?, K. P. Bennett, C.
Campbell SIGKDD Explorations, Vol. 2, Issue 2, 2000.
http://www.acm.org/sigs/sigkdd/explorations/issue2-2/bennett.pdf
day 17 basic idea -- linear kernel primal problem ______________________________________
day 18 non linear kernel, dual problem ____________________________________________
K Kernel principal component analysis, B. Scholkopf, A. Smola, and
K.-R. Muller. In B. Scholkopf, C. J. C. Burges, and A. J. Smola,
editors: Advances in Kernel Methods - SV Learning, pages 327-352. MIT
Press, Cambridge, MA, 1999b.
http://citeseer.ist.psu.edu/25296.html
L Sparse Kernel Principal Component Analysis, Michael E. Tipping. In
Advances in Neural Information Processing Systems (NIPS) 13,
Vancouver, Dec. 2001.
http://citeseer.ist.psu.edu/385655.html
day 19 example of use of nonlinear kernel ____________________________________________
Eigenvalue analysis
M Sepandar D. Kamvar, Dan Klein, and Christopher D. Manning,
"Spectral Learning", To appear in Proceedings of the Eighteenth
International Joint Conference on Artificial Intelligence, August
2003.
http://www.stanford.edu/~sdkamvar/papers/spectral.pdf
day 20 spectral clustering ____________________________________________
N Example: Unstructured Biomedical Databases
Sepandar D. Kamvar, Diane E. Oliver, Christopher D. Manning, and
Russ B. Altman. "Inducing Novel Gene-Drug Interactions from the
Biomedical Literature," Preprint, December, 2002.
http://www.stanford.edu/~sdkamvar/papers/pharmgkb.ps
http://www.stanford.edu/~sdkamvar/papers/pharmgkb.pdf
day X ____________________________________________
Linear Least Squares Fit.
An example-based mapping method for text classification and
retrieval. (pdf) Yang, Y., Chute, C.G.,ACM Transactions on Information
Systems (TOIS)1994;12(3):252-77. (moved from 2/25/03)
http://portal.acm.org/citation.cfm?id=183424
http://doi.acm.org/10.1145/183422.183424
day X ____________________________________________
Information Theoretic Clustering, Co-clustering and Matrix Approximations
P Information Theoretic Clustering, Co-clustering and Matrix Approximations
http://www.cs.utexas.edu/users/inderjit/Talks/InfoTheoryCoClust.ppt
O Information-Theoretic Co-clustering
I. S. Dhillon, S. Mallela, and D. S. Modha
Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining(KDD), pages 89-98, August 2003.
http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_cocluster.pdf
Also UTCS Technical Report #TR-03-12, April 2003.
http://www.cs.utexas.edu/users/UTCS/techreports/index/html/Abstracts.2003.html#TR-03-12
day 21 use ppt slides to introduce info theory _________________________________________
Q Clustering with Bregman Divergences
A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh
Proceedings of the fourth SIAM International Conference on Data Mining,
pages 234-245, April 2004
http://www.cs.utexas.edu/users/inderjit/public_papers/sdm-breg.pdf
day 22 continue using ppt slides ____________________________________________
Wavelets
R Wavelet notes; Also available is some Matlab Software (see
also the Wavelet toolbox in Matlab).
http://www.cs.umn.edu/%7Eboley/wavelet/wavelet.html
http://www-dsp.rice.edu/software/wavebook.shtml
day 23 wavelet notes. ____________________________________________
[Non-Neg Matrix Factorization]
Algorithms for Non-negative MatrixFactorization
Daniel D. Lee H. Sebastian Seung
hebb.mit.edu/people/seung/papers/nmfconverge.pdf
Weighted Non-negative Matrix Factorization for Local Representations
David
Guillamet, Marco Bressan, Jordi Vitria` Computer Vision Center (CVC) and
...
www.cvc.uab.es/~davidg/PosterCVPR/PosterCVPR2001.pdf
When Does Non-Negative Matrix Factorization Give a Correct Decomposition
into Parts? David Donoho Department of Statistics ...
www-stat.stanford.edu/~donoho/Reports/2003/NMFCDP.pdf
Non-negative matrix factorization for gene expression and scientific
texts analysis. AD Pascual-Montano 1 , P. Carmona-Saez 2 , M ...
www.iscb.org/ismb2003/posters/ pascualATcnb.uam.es_226.html