Information retrieval and clustering using vector-based similarities
  
  1 Computational Methods for Intelligent Information Access. M.W.
    Berry, S.T. Dumais, and T.A. Letsche.Proceedings of Supercomputing'95,
    San Diego, CA, December 1995.
    http://www.supercomp.org/sc95/proceedings/473_MBER/SC95.HTM
  
    day 1 linear algebra background		____________________________________________
  
    day 2 LSI example				____________________________________________
  
  2 Latent semantic indexing via a semi-discrete matrix decomposition
    Tamara G. Kolda and Dianne P, O'Leary, in The Mathematics
    of Information Coding, Extraction and Distribution, G. Cybenko et al.,
    eds., vol. 107 of IMA Volumes in Mathematics and Its Applications.
    Springer-Verlag, 1999, pp. 73-80.
    http://csmr.ca.sandia.gov/~tgkolda/pubs/ACM-TOIS-1998.pdf
  
    day 3 scalings; SDD variation		____________________________________________
  
  3 Concept Decompositions for Large Sparse Text Data using Clustering.
    I.S. Dhillon, D.S. Modha, IBM Research Report RJ 10147, July 8, 1999,
    Machine Learning, 42:1, pages 143-175, January 2001.
    http://www.cs.utexas.edu/users/inderjit/public_papers/concept_mlj.pdf
  
    day 4 alternative representation of dataset for text documents, matrix repr
  
    						____________________________________________
  
  4 A Scalable Hierarchical Algorithm for Unsupervised Clustering D L
    Boley in Data Mining for Scientific and Engineering Applications, R.
    Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, R. Namburu (eds),
    Kluwer. p383-400, 2001.
    ftp://ftp.cs.umn.edu/dept/users/boley/reports/ARC00paper.ps.gz
  
    day 5 use of sparse lin alg methods -> fast	____________________________________________
  
  5 Using Low-Memory Representations to Cluster Very Large Data Sets
    (ps), D Littau, D Boley, Third SIAM Conference on Data Mining, 2003.
    file://localhost/web/classes03/Spring-2003/csci8363/littau.ps
  
    day X					____________________________________________
  
Link Analysis -- Page Rank
  
  X The World's Largest Matrix Computation
    Google's PageRank is an eigenvector of a matrix of order 2.7 billion
    by Cleve Moler
    http://www.mathworks.com/company/newsletters/news_notes/clevescorner/oct02_cleve.html
  
    Day 7 - addendum				___________________________________________
  
  6 The PageRank Citation Ranking: Bringing Order to the Web.Page,
    Lawrence; Brin, Sergey; Motwani, Rajeev; Winograd, Terry. Stanford
    Univ. Computer Science Dept technical report. Oct. 2001
    http://dbpubs.stanford.edu/pub/1999-66
  
    day 7 PageRank idea				____________________________________________
  
  7 * Sepandar D. Kamvar, Taher H. Haveliwala, and Gene H. Golub,
    "Adaptive Methods for the Computation of PageRank", To appear in
    Linear Algebra and its Applications, Special Issue on the
    Numerical Solution of Markov Chains, November, 2003.
    http://www.stanford.edu/~sdkamvar/papers/adaptive.pdf
  
    day 8 implementation issues for PageRank	____________________________________________
  
  8 Link Analysis, Eigenvectors and Stability, Andrew Y. Ng and Alice
    X. Zheng and Michael I. Jordan. IJCAI 2001, p 903-910.
    http://citeseer.ist.psu.edu/ng01link.html
  
    day 9 numerical properties of PageRank	____________________________________________
  
Eigenfaces
  
  X Face Recognition using View-Based and Modular Eigenspaces (1994)
    Baback Moghaddam, Alex Pentland
    SPIE'94 [this has more examples]
    http://citeseer.ist.psu.edu/moghaddam94face.html
  
    day X -- skip				____________________________________________
  
  9 Probabilistic Visual Learning for Object Representation, Baback
    Moghaddam, Alex Pentland Early Visual Learning, Oxford University
    Press, 1996.
    http://citeseer.ist.psu.edu/moghaddam96probabilistic.html
  
    day 10 face detection			____________________________________________
  
  A View based and modular eigenspaces for face recognition.
    A Pentland, B Moghaddam, T Starner.
    IEEE Conf on Computer Vision & Pattern Recognition,
    Seattle, June 1994.
  
    day 11 face recognition ???			____________________________________________
  
Local Linear Embedding and visualization 
  
  B An Introduction to Locally Linear Embedding. Lawrence Saul & Sam
    Roweis. [draft version (Jan.01)]
    http://www.cs.toronto.edu/%7Eroweis/lle/papers/lleintro.pdf
  
    day 12 Introduction to Locally Linear Embedding ______________________________________
  
  F Local Context Finder (LCF) reveals multidimensional relationships among
    mRNA expression profiles of Arabidopsis responding to pathogen infection.
    Fumiaki Katagiri* and Jane Glazebrook 
    PNAS | September 16, 2003 | vol. 100 | no. 19 | pp10842-10847 
    http://www.pnas.org/cgi/content/full/100/19/10842
  
    day X					____________________________________________
  
Non-Negative Matrix Factorization
  
  C Algorithms for Non-Negative Matrix Factorization.  David Lee & H Sebastian
    Seung.
    hebb.mit.edu/people/seung/papers/nmfconverge.pdf
  
    day 13 intro to NMF (perhaps other paper needed too) ___________________________________
  
  D When Does Non-Negative Matrix Factorization Give a Correct Decomposition
    into Parts.  David Donoho & Victoria Stoddem
    www-stat.stanford.edu/~donoho/Reports/2003/NMFCDP.pdf
  
    day 14 properties of NMF			____________________________________________
  
  E A Weighted Non-negative Matrix Factorizatin for Local Representations.
    (Poster). David Guillaumet, Marco Bresan, Jordi Vitria
    www.cvc.uab.es/~davidg/PosterCVPR/PosterCVPR2001.pdf
  
    day X					____________________________________________
  
Matrix Factorizations in Biology
  
  G Metagenes and molecular pattern discovery using matrix factorization.
    Jean-Philippe Brunet, Pablo Tamayo, Todd R. Golub, Jill P. Mesirov.
    PNAS | March 23, 2004 | vol. 101 | no. 12 | 4164-4169.
    http://intl.pnas.org/cgi/content/abstract/101/12/4164
  
    day 15 biological application of Lin Alg	____________________________________________
  
  H O. Alter and G. H. Golub, "Integrative Analysis of Genome-Scale
    Data Using Pseudoinverse Projection Predicts Novel Correlation
    Between DNA Replication and RNA Transcription," Proceedings of
    the National Academy of Sciences 101 (47), pp. 16577-16582
    (November 2004) (supplemental material at
    http://www.bme.utexas.edu/research/orly/pseudoinverse/).
  
  I O. Alter, P. O. Brown and D. Botstein, "Generalized Singular Value
    Decomposition For Comparative Analysis of Genome-Scale Expression
    Datasets of Two Different Organisms," Proceedings of the
    National Academy of Sciences 100 (6), pp. 3351-3356 (March 2003)
    (supplemental material at http://genome-www.stanford.edu/GSVD/
    and at http://www.bme.utexas.edu/research/orly/GSVD/).
  
    day 16 GSVD interpretation			____________________________________________
  
Support Vector Machines
  
  J Support Vector Machines: Hype or Hallelujah?, K. P. Bennett, C.
    Campbell SIGKDD Explorations, Vol. 2, Issue 2, 2000.
    http://www.acm.org/sigs/sigkdd/explorations/issue2-2/bennett.pdf
  
    day 17 basic idea -- linear kernel primal problem ______________________________________
  
    day 18 non linear kernel, dual problem	____________________________________________
  
  K Kernel principal component analysis, B. Scholkopf, A. Smola, and
    K.-R. Muller. In B. Scholkopf, C. J. C. Burges, and A. J. Smola,
    editors: Advances in Kernel Methods - SV Learning, pages 327-352. MIT
    Press, Cambridge, MA, 1999b. 
    http://citeseer.ist.psu.edu/25296.html
  
  L Sparse Kernel Principal Component Analysis, Michael E. Tipping. In
    Advances in Neural Information Processing Systems (NIPS) 13,
    Vancouver, Dec. 2001. 
    http://citeseer.ist.psu.edu/385655.html
  
    day 19 example of use of nonlinear kernel	____________________________________________
  
Eigenvalue analysis
  
  M Sepandar D. Kamvar, Dan Klein, and Christopher D. Manning,
    "Spectral Learning", To appear in Proceedings of the Eighteenth
    International Joint Conference on Artificial Intelligence, August
    2003. 
    http://www.stanford.edu/~sdkamvar/papers/spectral.pdf
  
    day 20 spectral clustering			____________________________________________
  
  N Example: Unstructured Biomedical Databases
    Sepandar D. Kamvar, Diane E. Oliver, Christopher D. Manning, and
    Russ B. Altman. "Inducing Novel Gene-Drug Interactions from the
    Biomedical Literature," Preprint, December, 2002. 
    http://www.stanford.edu/~sdkamvar/papers/pharmgkb.ps
    http://www.stanford.edu/~sdkamvar/papers/pharmgkb.pdf
  
    day X					____________________________________________
  
Linear Least Squares Fit.  
  
    An example-based mapping method for text classification and
    retrieval. (pdf) Yang, Y., Chute, C.G.,ACM Transactions on Information
    Systems (TOIS)1994;12(3):252-77. (moved from 2/25/03)
    http://portal.acm.org/citation.cfm?id=183424
    http://doi.acm.org/10.1145/183422.183424
  
    day X					____________________________________________
  
Information Theoretic Clustering, Co-clustering and Matrix Approximations
  
  P Information Theoretic Clustering, Co-clustering and Matrix Approximations
    http://www.cs.utexas.edu/users/inderjit/Talks/InfoTheoryCoClust.ppt
  
  O Information-Theoretic Co-clustering
    I. S. Dhillon, S. Mallela, and D. S. Modha
    Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge
    Discovery and Data Mining(KDD), pages 89-98, August 2003.
    http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_cocluster.pdf
    Also UTCS Technical Report #TR-03-12, April 2003. 
    http://www.cs.utexas.edu/users/UTCS/techreports/index/html/Abstracts.2003.html#TR-03-12
  
    day 21 use ppt slides to introduce info theory _________________________________________
  
  Q Clustering with Bregman Divergences
    A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh
    Proceedings of the fourth SIAM International Conference on Data Mining,
    pages 234-245, April 2004
    http://www.cs.utexas.edu/users/inderjit/public_papers/sdm-breg.pdf
  
    day 22 continue using ppt slides		____________________________________________
  
Wavelets
  
  R Wavelet notes; Also available is some Matlab Software (see
    also the Wavelet toolbox in Matlab).
    http://www.cs.umn.edu/%7Eboley/wavelet/wavelet.html
    http://www-dsp.rice.edu/software/wavebook.shtml
  
    day 23 wavelet notes.			____________________________________________
  
  [Non-Neg Matrix Factorization]
  
    Algorithms for Non-negative MatrixFactorization
    Daniel D. Lee H. Sebastian Seung
    hebb.mit.edu/people/seung/papers/nmfconverge.pdf 
  
     Weighted Non-negative Matrix Factorization for Local Representations
     David
     Guillamet, Marco Bressan, Jordi Vitria` Computer Vision Center (CVC) and
     ...
     www.cvc.uab.es/~davidg/PosterCVPR/PosterCVPR2001.pdf 
  
     When Does Non-Negative Matrix Factorization Give a Correct Decomposition
     into Parts? David Donoho Department of Statistics ...
     www-stat.stanford.edu/~donoho/Reports/2003/NMFCDP.pdf 
  
     Non-negative matrix factorization for gene expression and scientific
     texts analysis. AD Pascual-Montano 1 , P. Carmona-Saez 2 , M ...
     www.iscb.org/ismb2003/posters/ pascualATcnb.uam.es_226.html