Mining colocation patterns from spatial datasets
Shashi Shekhar :
Computer Science Department, University of Minnesota.
The importance of spatial data mining is growing with the increasing incidence and
importance of large geo-spatial datasets such as maps, repositories of remote-sensing
images, and the decennial census. Applications include M(obile)-commerce industry
(location-based services), NASA (studying the climatological effects of El Nino, land-use
classification and global change using satellite imagery), National Institute of Health
(predicting the spread of disease), National Imagery and Mapping Agency (creating high
resolution three-dimensional maps from satellite imagery), National Institute of Justice
(finding crime hot spots), and transportation agencies (detecting local instability in
traffic). However, classical data mining techniques are often inadequate for spatial data
mining and different techniques need to be developed. The talk illustrates this point
in context of co-location patterns mining over spatial datasets.
Given a collection of boolean spatial features, the co-location
rule discovery process finds the subsets of features whose instances are frequently
located together in geographic space. For example, symbiotic plant species and
predator-prey animal species are likely co-locations in Ecology datasets.
The co-location rule discovery problem is different from the
association rule discovery problem.
Even though the boolean spatial features may be considered as item types,
there is no natural notion of transactions.
Transactioning spatial datasets can lead to incorrect estimation of the interest
measures for many spatial co-location patterns with instances near transaction
This makes it difficult to use traditional interest measures, e.g. support, and traditional
association rule mining algorithms, which are based on ideas like support
based pruning and compression of transaction data.
Proposed approach formalizes the notion of co-locations using user-specified spatial
neighborhoods in place of transactions. It defines new interest measures based on
the neighborhoods along with a model for interpreting the co-location rules.
It provides a simple, correct, and complete algorithm for mining co-location rules.
In addition, it proposes to advance the development of co-location mining
by addressing three basic issues, namely, scalability, ascertaining quality of
inferred patterns, and discovery of high confidence low support rules.
Data Mining, Spatial Datasets, Colocation patterns,
Association rules, Apriori algorithm.
Some of the results discussed in this talk appeared in the following publications:
Spatial Databases: A Tour, (Chapter 7), (S. Shekhar and S. Chawla),
Prentice Hall 2003, ISBN 0-13-017480-7.
- Colocation related Entries (amenable to a broad audience) in the
Encyclopedia of GIS , Springer, 2008.
Colocation Pattern , N. Mamoulis,
Colocation Patterns, Interestingness Measures , M. Salmenkivi,
Colocation Patterns Discovery , W. Hu,
Colocation Patterns, Algorithms , N. Mamoulis,
Discovering Spatial Co-location Patterns: A General Approach
(with Y. Huang, H. Xiong), IEEE Transactions on Knowledge and Data Eng., 16(12), December 2004.
Pdf versions are at
appeared in Proc. of 7th
Intl. Symposium on Spatial and Temporal Databases(SSTD), 2001,
Springer Verlag LNCS 2121).
Another subset of results were reported in
Multi-resolution Co-location Miner: A New Algorithm to Find Co-location Patterns from
Spatial Datasets ,
(S. Shekhar and Y. Huang),
SIAM Intl. Conf. on Data Mining 2002 Workshop on Mining Scientific Datasets.
- Architecture Technology Corporation,
Spatial Data Mining Toolkit for Generating MSDS (aka TopoAssistant)
(Topic No. A03-129), SBIR Phase I, US Army Topographic Eng. Center, June 2004,
Final Report ,
A Join-less Approach for Co-location Pattern Mining: A Summary of Results,
J. Yoo , S. Shekhar, M. Celik, IEEE Trans. on Knowledge and Data Eng., 18(10),
subset of results appeared in Proc. IEEE ICDM 2005.
Earlier results were reported in
A Partial Join Approach for Mining Co-location Patterns ,
(J. Yoo, S. Shekhar), Proc. ACMGIS, 2004.
Zonal Co-location Pattern Discovery with Dynamic Parameters ,
(M. Celik, J. Kang, S. Shekhar), IEEE Intl. Conf. on Data Mining, 2006.