Links to Tools and Datasets


You may find some of the following tools and datasets useful for this course. More links to tools and datasets can be found at
KDNuggets site.
$\bullet$
Tools:
$\triangleright$
C4.5: This is a link to Ross Quinlan's home page and the C4.5 decsision tree program that he created. [NOTE: Source code is available. Can be compiled on Windows and Unix platforms.]
$\triangleright$
MLC++: Provides a suite of machine learning algorithms including decision tree based classification, nearest neighbor (instance-based) classifiers, and naive bayesian classifier. [NOTE: Source code is available. Can be compiled on many Unix platforms.]
$\triangleright$
SIPINA_W: Provides a suite of classification algorithms including CART, ID3, C4.5, and ChAID implementations. [NOTE: Only binary executable is available for Windows]
$\triangleright$
OC1: Provides algorithms for building decision trees that contain linear combination of one or more attributes at each internal node. [NOTEs: Source code available. Unix platform.]
$\triangleright$
Weka: Provides machine learning techniques for instance based classification (PEBLS, K*), rule based classification (FOIL), etc. [NOTEs: Unix platform]
$\triangleright$
DBMiner: A suite of data mining tools for various tasks including classification, market basket analysis (association rules), prediction. [NOTEs: Only binary executable is available for Windows NT]
$\bullet$
Datasets:
$\triangleright$
UCI Machine Learning Repository: Around 70 datasets are available for the purpose of evaluating learning algorithms. Read the README and SUMMARY-TABLE files as a good starting point.



Prepared By Mahesh Joshi. Please mail any additions/corrections.