Mining colocation patterns from spatial datasets


Shashi Shekhar : Biography , Homepage


Computer Science Department, University of Minnesota.


The importance of spatial data mining is growing with the increasing incidence and importance of large geo-spatial datasets such as maps, repositories of remote-sensing images, and the decennial census. Applications include M(obile)-commerce industry (location-based services), NASA (studying the climatological effects of El Nino, land-use classification and global change using satellite imagery), National Institute of Health (predicting the spread of disease), National Imagery and Mapping Agency (creating high resolution three-dimensional maps from satellite imagery), National Institute of Justice (finding crime hot spots), and transportation agencies (detecting local instability in traffic). However, classical data mining techniques are often inadequate for spatial data mining and different techniques need to be developed. The talk illustrates this point in context of co-location patterns mining over spatial datasets.

Given a collection of boolean spatial features, the co-location rule discovery process finds the subsets of features whose instances are frequently located together in geographic space. For example, symbiotic plant species and predator-prey animal species are likely co-locations in Ecology datasets. The co-location rule discovery problem is different from the association rule discovery problem. Even though the boolean spatial features may be considered as item types, there is no natural notion of transactions. Transactioning spatial datasets can lead to incorrect estimation of the interest measures for many spatial co-location patterns with instances near transaction boundaries. This makes it difficult to use traditional interest measures, e.g. support, and traditional association rule mining algorithms, which are based on ideas like support based pruning and compression of transaction data.

Proposed approach formalizes the notion of co-locations using user-specified spatial neighborhoods in place of transactions. It defines new interest measures based on the neighborhoods along with a model for interpreting the co-location rules. It provides a simple, correct, and complete algorithm for mining co-location rules. In addition, it proposes to advance the development of co-location mining by addressing three basic issues, namely, scalability, ascertaining quality of inferred patterns, and discovery of high confidence low support rules.

KEYWORDS: Data Mining, Spatial Datasets, Colocation patterns, Association rules, Apriori algorithm.

NOTE: Some of the results discussed in this talk appeared in the following publications:

  1. Spatial Databases: A Tour, (Chapter 7), (S. Shekhar and S. Chawla), Prentice Hall 2003, ISBN 0-13-017480-7.
  2. Colocation related Entries (amenable to a broad audience) in the Encyclopedia of GIS , Springer, 2008.
  3. Discovering Spatial Co-location Patterns: A General Approach (with Y. Huang, H. Xiong), IEEE Transactions on Knowledge and Data Eng., 16(12), December 2004. Pdf versions are at 1 , 2 , 3 4 and 5 ). ( First results appeared in Proc. of 7th Intl. Symposium on Spatial and Temporal Databases(SSTD), 2001, Springer Verlag LNCS 2121). Another subset of results were reported in Multi-resolution Co-location Miner: A New Algorithm to Find Co-location Patterns from Spatial Datasets , (S. Shekhar and Y. Huang), SIAM Intl. Conf. on Data Mining 2002 Workshop on Mining Scientific Datasets.
  4. Architecture Technology Corporation, Spatial Data Mining Toolkit for Generating MSDS (aka TopoAssistant) (Topic No. A03-129), SBIR Phase I, US Army Topographic Eng. Center, June 2004, Final Report , Slides .
  5. A Join-less Approach for Co-location Pattern Mining: A Summary of Results, J. Yoo , S. Shekhar, M. Celik, IEEE Trans. on Knowledge and Data Eng., 18(10), Oct. 2006. A subset of results appeared in Proc. IEEE ICDM 2005. Earlier results were reported in A Partial Join Approach for Mining Co-location Patterns , (J. Yoo, S. Shekhar), Proc. ACMGIS, 2004.
  6. Zonal Co-location Pattern Discovery with Dynamic Parameters , (M. Celik, J. Kang, S. Shekhar), IEEE Intl. Conf. on Data Mining, 2006.