skip to main content
10.1145/584792.584917acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Mining coverage statistics for websource selection in a mediator

Published:04 November 2002Publication History

ABSTRACT

Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. The key challenge in learning such statistics is keeping the number of needed statistics low enough to have the storage and learning costs manageable. Naive approaches can become infeasible very quickly. In this paper we present a set of connected techniques that estimate the coverage and overlap statistics while keeping the needed statistics tightly under control. Our approach uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We describe the details of our method, and present experimental results demonstrating the efficiency of the learning algorithms and the effectiveness of the learned statistics.

References

  1. Rakesh Agrawal, Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. In VLDB, Santiago, Chile, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Florescu, D. Koller, and A. Levy. Using probabilistic information in data integration. In Proceeding of the International Conference on Very Large Data Bases (VLDB), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmman Publishers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Ipeirotis, L. Gravano, M. Sahami. Probe, Count, and Classify: Categorizing Hidden Web Dababases. In Proceedings of SIGMOD-01, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Nie and S. Kambhampati. Joint optimization of cost and coverage of query plans in data integration. In ACM CIKM, Atlanta, Georgia, November 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Z. Nie, S. Kambhampati, U. Nambiar and S. Vaddi. Mining Source Coverage Statistics for Data Integration. Proc. WIDM(CIKM workshop) 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Nie, U. Nambiar, S. Vaddi and S. Kambhampati. Mining Coverage Statistics for Websource Selection in a Mediator. ASU CSE TR 02-009. Computer Science & Engg. Arizona State University. http://rakaposhi.eas.asu.edu/statminer-tr.pdf.Google ScholarGoogle Scholar
  8. Transaction Processing Council. http://www.tpc.org.Google ScholarGoogle Scholar

Index Terms

  1. Mining coverage statistics for websource selection in a mediator

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management
              November 2002
              704 pages
              ISBN:1581134924
              DOI:10.1145/584792

              Copyright © 2002 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 4 November 2002

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate1,861of8,427submissions,22%

              Upcoming Conference

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader