Article

Mining coverage statistics for websource selection in a mediator

Authors:
Zaiqing Nie

Arizona State University, Tempe, AZ

Arizona State University, Tempe, AZ
View Profile

,
Ullas Nambiar

Arizona State University, Tempe, AZ

Arizona State University, Tempe, AZ
View Profile

,
Sreelakshmi Vaddi

Arizona State University, Tempe, AZ

Arizona State University, Tempe, AZ
View Profile

,
Subbarao Kambhampati

Arizona State University, Tempe, AZ

Arizona State University, Tempe, AZ
View Profile

CIKM '02: Proceedings of the eleventh international conference on Information and knowledge managementNovember 2002Pages 678–680https://doi.org/10.1145/584792.584917

Published:04 November 2002Publication History

CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management

Pages 678–680

ABSTRACT

Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. The key challenge in learning such statistics is keeping the number of needed statistics low enough to have the storage and learning costs manageable. Naive approaches can become infeasible very quickly. In this paper we present a set of connected techniques that estimate the coverage and overlap statistics while keeping the needed statistics tightly under control. Our approach uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We describe the details of our method, and present experimental results demonstrating the efficiency of the learning algorithms and the effectiveness of the learned statistics.

References

Rakesh Agrawal, Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. In VLDB, Santiago, Chile, 1994. Google ScholarDigital Library
D. Florescu, D. Koller, and A. Levy. Using probabilistic information in data integration. In Proceeding of the International Conference on Very Large Data Bases (VLDB), 1997. Google ScholarDigital Library
Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmman Publishers, 2000. Google ScholarDigital Library
P. Ipeirotis, L. Gravano, M. Sahami. Probe, Count, and Classify: Categorizing Hidden Web Dababases. In Proceedings of SIGMOD-01, 2001. Google ScholarDigital Library
Z. Nie and S. Kambhampati. Joint optimization of cost and coverage of query plans in data integration. In ACM CIKM, Atlanta, Georgia, November 2001. Google ScholarDigital Library
Z. Nie, S. Kambhampati, U. Nambiar and S. Vaddi. Mining Source Coverage Statistics for Data Integration. Proc. WIDM(CIKM workshop) 2001. Google ScholarDigital Library
Z. Nie, U. Nambiar, S. Vaddi and S. Kambhampati. Mining Coverage Statistics for Websource Selection in a Mediator. ASU CSE TR 02-009. Computer Science & Engg. Arizona State University. http://rakaposhi.eas.asu.edu/statminer-tr.pdf.Google Scholar
Transaction Processing Council. http://www.tpc.org.Google Scholar

Index Terms

Mining coverage statistics for websource selection in a mediator

Recommendations

Mining source coverage statistics for data integration
WIDM '01: Proceedings of the 3rd international workshop on Web information and data management

Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. ...
Read More
Effectively Mining and Using Coverage and Overlap Statistics for Data Integration

Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition, there are no effective approaches for learning the needed statistics. ...
Read More
Multivariate U-statistics: a tutorial with applications

U-statistics represent an important class of statistics arising from modeling quantities of interest defined by multi-subject responses such as the classic Mann-Whitney-Wilcoxon rank tests. However, classic applications of U-statistics are largely ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management
November 2002
704 pages
ISBN:1581134924
DOI:10.1145/584792
General Chair:
Charles Nicholas
University of Maryland Baltimore County
,
Program Chairs:
David Grossman
Illinois Institute of Technology
,
Konstantinos Kalpakis
University of Maryland Baltimore County
,
Sajda Qureshi
Erasmus University, Rotterdam
,
Han van Dissel
Erasmus University, Rotterdam
,
Len Seligman
The MITRE Corporation
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 November 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
coverage statistics
web-based data integration
webmining to support query optimization
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 320
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining coverage statistics for websource selection in a mediator

CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining source coverage statistics for data integration

Effectively Mining and Using Coverage and Overlap Statistics for Data Integration

Multivariate U-statistics: a tutorial with applications