Knowledge Discovery in Spatial-Temporal
Databases
Investigators
Fiez, Tim
Lazarevic,
Aleksandar
Obradovic,
Zoran
Pokrajac,
Dragoljub
Vucetic,
Slobodan
Problem
The
objective of this project is to develop a system for knowledge
discovery and local models integration without exchange of
confidential and proprietary information in large spatial and
spatial-temporal databases.
Results
To
achieve this objective, we have
developed and tested novel exploratory data
analysis methods for spatial data;
developed machine learning algorithms for
building and selectively applying multiple expert modules;
tested model development and prediction
capabilities using multiple non-centralized data sets; and
prototyped a software package for knowledge
discovery from spatial data.
Sampling optimization
A
procedure for evaluating spatial sampling techniques in terms of
sampling cost and interpolated feature accuracy was developed in our
lab and applied to modify grid sampling as to achieve similar
expected accuracy with twice less data.
Exploratory data
analysis
A
spatial data partitioning procedures was developed for training and
testing spatial regression methods.
For heterogeneous spatial databases with unstable driving attributes
(typical in earth sciences) an adaptive and spatial attribute
boosting algorithm is proposed as an effective technique for
increasing modeling accuracy through manipulating training data
distributions.
Data partitioning
For
identifying more homogeneous sub-fields and designing corresponding
expert models we have developed data partitioning methods based on
spatial clustering,
sequential development of local regressors and the corresponding data
distribution models,
and an iterative data partitioning using spatial error analysis. All
of the multiple expert approaches have resulted in better prediction
than a single global model when tested on real-life agricultural
data. Also, data partitioning and local regression algorithms
were successfully adopted to a distributed environment where
data mining is restricted to exchange of local models and essential
statistics without raw data communication.
Models characterization
To
fully characterize our knowledge discovery algorithms for a large
distributed system, we have developed a spatial data simulator which
generates feature layers statistically similar to real spatial
data and computes a target layer according to previously
observed rules and expert knowledge.
This is employed for analyzing the influence of sensor error,
unexplained variance, sampling density and data distribution on
spatial data prediction quality in precision agriculture.
Technology transfer
Currently,
we are developing a data mining software package that integrates our
algorithms for spatial and distributed data inspection,
preprocessing, and partitioning into an easy-to-use toolbox.