Data Mining for Scientific and Engineering Applications

edited by
Robert L. Grossman
University of Illinois at Chicago, USA
Chandrika Kamath
Lawrence Livermore National Laboratory, CA, USA
Philip Kegelmeyer
Sandia National Laboratories, Livermore, CA, USA
Vipin Kumar
Army High Performance Computing Research Center (AHPCRC), Minneapolis, MN,USA
Raju R. Namburu
Army Research Laboratory, Aberdeen Proving Ground, MD, USA

Copyright ® 2001
Kluwer Academic Publishers
(Click link for ordering info)
ISBN 1-4020-0033-2


Advances in technology are making massive data sets common in many scientific disciplines, such as astronomy, medical imaging, bio-informatics, combinatorial chemistry, remote sensing, and physics. To find useful information in these data sets, scientists and engineers are turning to data mining techniques. This book is a collection of papers based on the first two in a series of workshops on mining scientific datasets. It illustrates the diversity of problems and application areas that can benefit from data mining, as well as the issues and challenges that differentiate scientific data mining from its commercial counterpart. While the focus of the book is on mining scientific data, the work is of broader interest as many of the techniques can be applied equally well to data arising in business and web applications. 

Audience: This work would be an excellent text for students and researchers who are familiar with the basic principles of data mining and want to learn more about the application of data mining to their problem in science or engineering.


Dr. N. Radhakrishnan

List of Contributors xi
List of Reviewers xvii
Preface xix

On Mining Scientific Datasets

Chandrika Kamath


Understanding High Dimensional and Large Data Sets: Some Mathematical Challenges and Opportunities 

Jagadish Chandra


Data Mining at the Interface of Computer Science and Statistics 

Padhraic Smyth


Mining Large Image Collections

Michael C. Burl


Mining Astronomical Databases 

Roberta M. Humphreys, Juan E. Cabanela, and Jeffrey Kriessler


Searching for Bent-Double Galaxies in the First Survey 

Chandrika Kamath, Erick Cantú-Paz, Imola K. Fodor and Nu Ai Tang


A Dataspace Infrastructure for Astronomical Data 

Robert Grossman, Emory Creel, Marco Mazzucco, and Roy Williams


Data Mining Applications in Bioinformatics 

Naren Ramakrishnan, Ananth Y. Grama


Mining Residue Contacts in Proteins 

Mohammed J. Zaki and Chris Bystroff


KDD Services at the Goddard Earth Sciences Distributed Archive Center 

Christopher Lynnes and Robert Mack


Data Mining in Integrated Data Access and Data Analysis Systems 

Ruixin Yang, Menas Kafatos, Kwang-Su Yang, and X. Sean Wang


Spatial Data Mining for Classification, Visualisation and Interpretation with Artmap Neural Network 

Weiguo Liu, Sucharita Gopal, and Curtis Woodcock


Real Time Feature Extraction for the Analysis of Turbulent Flows 

I. Marusic, G.V. Candler, V. Interrante, P.K. Subbareddy, and A. Moss


Data Mining for Turbulent Flows 

Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar


Evita-Efficient Visualization and Interrogation of Tera-Scale Data 

Raghu Machiraju, James E. Fowler, David Thompson, Bharat Soni, and Will Schroeder


Towards Ubiquitous Mining of Distributed Data 

Hillol Kargupta, Krishnamoorthy Sivakumar, Weiyun Huang, Rajeev Ayyagari, Rong Chen, Byung-Hoon Park, and Erik Johnson


Decomposable Algorithms for Data Mining 

Raj Bhatnagar


HDDI™: Hierarchical Distributed Dynamic Indexing 

William M. Pottenger, Yong-Bin Kim, and Daryl D. Meling 


Parallel Algorithms for Clustering High-Dimensional Large-Scale Datasets 

Harsha Nagesh, Sanjay Goil, and Alok Choudhary


Efficient Clustering of Very Large Document Collections 

Inderjit S. Dhillon, James Fan, and Yuqiang Guan


A Scalable Hierarchical Algorithm for Unsupervised Clustering

Daniel Boley


High-Performance Singular Value Decomposition 

David B. Skillicorn, and Xiaolan Yang


Mining High-Dimensional Scientific Data Sets Using Singular Value Decomposition 

Ekaterina Maltseva, Clara Pizzuti, and Domenico Talia


Spatial Dependence in Data Mining 

James P. LeSage, and R. Kelley Pace.


Sparc: Spatial Association Rule-Based Classification 

Jaiwei Han, Anthony K.H. Tung, and Jing He


What's Spatial About Spatial Data Mining: Three Case Studies

Shashi Shekhar, Yan Huang, Weili Wu, C.T. Lu, and S. Chawla


Predicting Failures in Event Sequences 

Mohammed J. Zaki, Neal Lesh, and Mitsunori Ogihara


Efficient Algorithms for Mining Long Patterns in Scientific Data Sets 

Ramesh C. Agarwal, and Charu C. Aggarwal.


Probabilistic Estimation in Data Mining 

Edwin P.D. Pednault, Chidanand Apte.


Classification Using Association Rules: Weaknesses and Enhancements 

Bing Liu, Yiming Ma, and Ching-Kian Wong