research-article

Finding regional co-location patterns for sets of continuous variables in spatial datasets

Authors:
Christoph F. Eick

University of Houston, Houston, TX

University of Houston, Houston, TX
View Profile

,
Rachana Parmar

University of Houston, Houston, TX

University of Houston, Houston, TX
View Profile

,
Wei Ding

University of Massachusetts, Boston, MA

University of Massachusetts, Boston, MA
View Profile

,
Tomasz F. Stepinski

Lunar and Planetary Institute, Houston, TX

Lunar and Planetary Institute, Houston, TX
View Profile

,
Jean-Philippe Nicot

University of Texas at Austin, Austin, TX

University of Texas at Austin, Austin, TX
View Profile

GIS '08: Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systemsNovember 2008Article No.: 30Pages 1–10https://doi.org/10.1145/1463434.1463472

Published:05 November 2008Publication History

GIS '08: Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems

Pages 1–10

ABSTRACT

This paper proposes a novel framework for mining regional co-location patterns with respect to sets of continuous variables in spatial datasets. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are co-located. A co-location mining framework is introduced that operates in the continuous domain and which views regional co-location mining as a clustering problem in which an externally given fitness function has to be maximized. Interestingness of co-location patterns is assessed using products of z-scores of the relevant continuous variables. The proposed framework is evaluated by a domain expert in a case study that analyzes Arsenic contamination in Texas water wells centering on regional co-location patterns. Our approach is able to identify known and unknown regional co-location patterns, and different sets of algorithm parameters lead to the characterization of Arsenic distribution at different scales. Moreover, inconsistent colocation sets are found for regions in South Texas and West Texas that can be clearly attributed to geological differences in the two regions, emphasizing the need for regional co-location mining techniques. Moreover, a novel, prototype-based region discovery algorithm named CLEVER is introduced that uses randomized hill climbing, and searches a variable number of clusters and larger neighborhood sizes.

References

Achtert, E., Böhm, C., Kriegel, H., Kröger, P., and Zimek, A. 2006. Deriving Quantitative Models for Correlation Clusters. In Proc. of the 12th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (Philadelphia, PA, USA, August 2006). KDD '06. ACM, New York, NY, 4--13. Google ScholarDigital Library
Aggarwal, C. C., Procopiuc, C. M., and Yu, P. S. 2002. Finding Localized Associations in Market Basket Data. IEEE Transactions on Knowledge and Data Engineering, 14, 51--62. Google ScholarDigital Library
Aumann, Y., and Lindell, Y. 1999. A Statistical Theory For Quantitative Association Rules. In Proc. of the 5th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining. KDD '99. ACM, New York, NY, 261--270. Google ScholarDigital Library
Brimicombe, A. J. 2005. Cluster Detection in Point Event Data Having Tendency Towards Spatially Repetitive Events. In the 8th Intl. Conf. on GeoComputation.Google Scholar
Calders, T., Goethals, B., and Jaroszewicz, S. 2006. Mining Rank-Correlated Sets of Numerical Attributes. In Proc. of the 12th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining. KDD '06. ACM, New York, NY, 96--105. Google ScholarDigital Library
Choo, J., Jiamthapthaksin, R., Chen, C., Celepcikay, O., Giusti, C., and Eick, C. F. 2007. MOSAIC: A Proximity Graph Approach to Agglomerative Clustering. In Proc. of the 9th Intl. Conf. on Data Warehousing and Knowledge Discovery. DaWaK' 07. Google ScholarDigital Library
Cougar^2: Data Mining and Machine Learning Framework, https://cougarsquared.dev.java.net/.Google Scholar
Data Mining and Machine Learning Group, University of Houston, http://www.tlc2.uh.edu/dmmlg.Google Scholar
Ding, W., Eick, C. F., Wang, J., and Yuan, X. 2006. A Framework for Regional Association Rule Mining in Spatial Datasets. 2006. In Proc. of the IEEE Intl. Conf. on Data Mining. ICDM'06. Google ScholarDigital Library
Ding, W., Jiamthapthaksin, R., Parmar, R., Jiang, D., Stepinski, T., and Eick, C. F. 2008. Towards Region Discovery in Spatial Datasets. In Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (Osaka, Japan, May 2008). PAKDD '08. Google ScholarDigital Library
Eick, C. F., Vaezian, B., Jiang, D., and Wang, J. 2006. Discovering of Interesting Regions in Spatial Data Sets Using Supervised Clustering. In Proc. of the 10th European Conference on Principles of Data Mining and Knowledge Discovery. PKDD '06. Google ScholarDigital Library
Eick, C. F., Zeidat, N., and Zhao, Z. Supervised Clustering --- Algorithms and Benefits. In Proc. of the Intl. Conf. on Tools with AI (Boca Raton, Florida, November 2004). ICTAI '04, 774--776. Google ScholarDigital Library
Getis, A., and Ord, J. K. 1996. Local Spatial Statistics: an Overview. In Spatial analysis: modeling in a GIS environment, Cambridge, GeoInformation International. (Cambridge, 1996), 261--277.Google Scholar
Huang, Y., Pei, J., and Xiong, H. 2006. Mining Co-Location Patterns with Rare Events from Spatial Data Sets. Geoinformatica 10 (3), 239--260. Google ScholarDigital Library
Huang, Y., and Zhang, P. 2006. On the Relationships between Clustering and Spatial Co-location Pattern Mining. In Proc. of the 18th IEEE Intl. Conf. on Tools with Artificial intelligence. ICTAI. IEEE Computer Society, Washington, DC, 513--522. Google ScholarDigital Library
Jaroszewicz, S. 2008. Minimum Variance Associations---Discovering Relationships in Numerical Data. In Proc. of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, (Osaka, Japan, May 2008). PAKDD '08. Google ScholarDigital Library
Kaufman, L., and Rousseeuw, P. J. 2005. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, New Jersey.Google Scholar
Kulldorff, M. 2001. Prospective Time Periodic Geographical Disease Surveillance Using a Scan Statistic. Journal of the Royal Statistical Society Series A, 164, 6--72.Google ScholarCross Ref
Lloyd, S. P. 1982. Least Squares Quantization in PCM. IEEE Trans. on Information Theory, 28, 128--137.Google ScholarDigital Library
Ng, R. T., and Han, J. 1994. Efficient and Effective Clustering Methods for Spatial Data Mining. In Proc. of the 20th Intl. Conf. on Very Large Data Bases. Morgan Kaufmann Publishers, San Francisco, CA, 144--155. Google ScholarDigital Library
Ord, J. K., and Getis, 1995. A. Local Spatial Autocorrelation Statistics: Distributional Issues and an Application. Geographical Analysis, 27(4), 286--306.Google ScholarCross Ref
Scanlon, B. R., Nicot, J. P. et al. 2005. Evaluation of Arsenic Contamination in Texas. Technical report prepared for TCEQ, under contract no. UT-08-5-70828.Google Scholar
Shekhar, S., and Huang, Y. 2001. Discovering Spatial Co-location Patterns: A Summary of Results. In Proc. of the 7th Intl. Symp. on Advances in Spatial and Temporal Databases, Springer-Verlag, London, 236--256. Google ScholarDigital Library
Smedley, P. L., and Kinniburgh, D. G. 2002. A Review of the Source, Behavior and Distribution of Arsenic in Natural Waters. Applied Geochemistry 17, 517--568.Google ScholarCross Ref
Smith, A. H. et al. 1992. Cancer Risks From Arsenic in Drinking Water. Environmental Health Perspectives, 97, 259--267.Google ScholarCross Ref
Srikant, R., and Agrawal, R. 1996. Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Rec. 25(2), 1--12. Google ScholarDigital Library
Texas Water Development Board, http://www.twdb.state.tx.us/home/index.aspGoogle Scholar
Xiong, H., Shekhar, S., Huang, Y., Kumar, V., Ma, X., and Yoo, J. S. 2004. A Framework for Discovering Co-location Patterns in Data Sets with Extended Spatial Objects. In Proc. of SIAM Intl. Conf. on Data Mining (SDM).Google Scholar
Yoo, J. S., and Shekhar, S. 2006. A Join-less Approach for Mining Spatial Co-location Patterns. IEEE Transactions on Knowledge and Data Engineering (TKDE), 18. Google ScholarDigital Library

Index Terms

Finding regional co-location patterns for sets of continuous variables in spatial datasets

Recommendations

A framework for regional association rule mining and scoping in spatial datasets

The motivation for regional association rule mining and scoping is driven by the facts that global statistics seldom provide useful insight and that most relationships in spatial datasets are geographically regional, rather than global. Furthermore, ...
Read More
Regional Pattern Discovery in Geo-referenced Datasets Using PCA
MLDM '09: Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition

Existing data mining techniques mostly focus on finding global patterns and lack the ability to systematically discover regional patterns. Most relationships in spatial datasets are regional; therefore there is a great need to extract regional knowledge ...
Read More
Mining Co-Location Patterns with Rare Events from Spatial Data Sets

A co-location pattern is a group of spatial features/events that are frequently co-located in the same region. For example, human cases of West Nile Virus often occur in regions with poor mosquito control and the presence of birds. For co-location ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GIS '08: Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
November 2008
559 pages
ISBN:9781605583235
DOI:10.1145/1463434
Program Chairs:
Walid G. Aref
Purdue University
,
Mohamed F. Mokbel
University of Minnesota
,
Markus Schneider
University of Florida
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 November 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
finding associations between continuous variables
regional co-location mining
regional knowledge discovery
spatial data mining
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate220of1,116submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 48
  Total Citations
  View Citations
- 519
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Finding regional co-location patterns for sets of continuous variables in spatial datasets

GIS '08: Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

A framework for regional association rule mining and scoping in spatial datasets

Regional Pattern Discovery in Geo-referenced Datasets Using PCA

Mining Co-Location Patterns with Rare Events from Spatial Data Sets