CSci 8980: Data Mining (Fall 2000)

Instructor: Vipin Kumar

Time and Place: Monday 3:45PM - 6:15PM, AmundH 104

   Office Hours: Friday, 11:30PM - 12:15PM (at AHPCRC)
Instructor's Office: 5-215 (EE/CSci Bldg)

This page address: http://www.cs.umn.edu/~han/dmclass/index.html



Solution to homeworks 6 and 7 can be picked up at Prof Kumar's office in EE/CS 5-215.

   Please note that the class exam will be next time, Monday, Nov 13. It will be at the usual time and place: Monday 3:45PM, AmundH 104. The test will be 2 and 1/2 hours long and will be open books, open notes. However, if you need to look at your notes much, you will not have enough time to finish the test.

Also note that the choice of topic for your term paper/presentations is due next time, Nov 13.

The Clementine project will be due in two weeks, on Nov 20.

See the homeworks and projects sections for more info on the Clementine project and the term paper/presentation. You should contact the TA's for references once you have selected a project.



Course Overview:

The last decade has seen an explosive growth in database technology and the amount of data collected. This has created an unprecedented opportunity for "data mining", which is a process of efficient supervised or unsupervised discovery of interesting information hidden in the data. Some of the common tasks in data mining are classification, discovery of association rules, clustering, and pattern discovery in sequential data. This course will provide a rapid and vigorous introduction to the field of data mining, and is meant for those students who are planning to do research in this area. The course will consist of about 7 weeks of lectures followed by 7 weeks of presentations by students on selected research topics in the area of spatio-temporal data mining.


Course outline:

$\bullet$
Introduction
$\triangleright$
What is data mining?
$\triangleright$
Intro to Data Mining Tasks (Classification, Clustering, Association Rules, Sequential Patterns, Regression, Deviation Detection)
$\bullet$
Classification Algorithms
$\triangleright$
Decision-Tree Based Approach (e.g. C4.5)
$\triangleright$
Rule-set Based Approach
$\triangleright$
Bayesian Approach : Naive and Bayesian Networks.
$\triangleright$
Instance Based classifiers (e.g. k-Nearest Neighbor)
$\triangleright$
Neural Network Based Approach
$\triangleright$
Feature Selection Techniques
$\bullet$
Clustering
$\triangleright$
k-means, k-medoids methods
$\triangleright$
Hierarchical methods
$\triangleright$
Graph based methods
$\triangleright$
Spatial Clustering
$\triangleright$
Dimensionality Reduction Techniques
$\triangleright$
Commercial and Scientific Applications
$\bullet$
Association Rule Discovery
$\triangleright$
Apriori Principle and its extensions
$\triangleright$
Sequential Associations
$\triangleright$
Handling attributes with continuous values and type hierarchies
$\triangleright$
Interestingness measures


Background Required:

General background in Computer Science (algorithms, etc), and Motivation to Learn.


Workload and Grading Scheme:

Four to six Homeworks (30%), exams (30%), project/paper/presentation (30%), and class participation (10%)


Lecture Notes and Reading Material:

Note: For notes in pdf format, you will need an Acrobat Reader to view them.


Links to Tools and Datasets:

Follow this link.


Data Mining Bibliography:

We are constantly updating our list of references. Here is the latest version.


Other Interesting Data Mining Links:



Mahesh Joshi
2000-01-17