skip to main content
10.1145/1031171.1031180acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Unified utility maximization framework for resource selection

Published: 13 November 2004 Publication History

Abstract

This paper presents a unified utility framework for resource selection of distributed text information retrieval. This new framework shows an efficient and effective way to infer the probabilities of relevance of all the documents across the text databases. With the estimated relevance information, resource selection can be made by explicitly optimizing the goals of different applications. Specifically, when used for database recommendation, the selection is optimized for the goal of high-recall (include as many relevant documents as possible in the selected databases); when used for distributed document retrieval, the selection targets the high-precision goal (high precision in the final merged list of documents). This new model provides a more solid framework for distributed information retrieval. Empirical studies show that it is at least as effective as other state-of-the-art algorithms.

References

[1]
J. Callan. (2000). Distributed information retrieval. In W.B. Croft, editor, Advances in Information Retrieval. Kluwer Academic Publishers. (pp. 127--150).
[2]
J. Callan, W.B. Croft, and J. Broglio. (1995). TREC and TIPSTER experiments with INQUERY. Information Processing and Management, 31(3). (pp. 327--343).
[3]
J. G. Conrad, X. S. Guo, P. Jackson and M. Meziou. (2002). Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB).
[4]
N. Craswell. (2000). Methods for distributed information retrieval. Ph. D. thesis, The Australian Nation University.
[5]
N. Craswell, D. Hawking, and P. Thistlewaite. (1999). Merging results from isolated search engines. In Proceedings of 10th Australasian Database Conference.
[6]
D. D'Souza, J. Thom, and J. Zobel. (2000). A comparison of techniques for selecting text collections. In Proceedings of the 11th Australasian Database Conference.
[7]
N. Fuhr. (1999). A Decision-Theoretic approach to database selection in networked IR. ACM Transactions on Information Systems, 17(3). (pp. 229--249).
[8]
L. Gravano, C. Chang, H. Garcia-Molina, and A. Paepcke. (1997). STARTS: Stanford proposal for internet meta-searching. In Proceedings of the 20th ACM-SIGMOD International Conference on Management of Data.
[9]
L. Gravano, P. Ipeirotis and M. Sahami. (2003). QProber: A System for Automatic Classification of Hidden-Web Databases. ACM Transactions on Information Systems, 21(1).
[10]
P. Ipeirotis and L. Gravano. (2002). Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB).
[11]
InvisibleWeb.com. http://www.invisibleweb.com
[12]
The lemur toolkit. http://www.cs.cmu.edu/ lemur
[13]
J. Lu and J. Callan. (2003). Content-based information retrieval in peer-to-peer networks. In Proceedings of the 12th International Conference on Information and Knowledge Management.
[14]
W. Meng, C.T. Yu and K.L. Liu. (2002) Building efficient and effective metasearch engines. ACM Comput. Surv. 34(1).
[15]
H. Nottelmann and N. Fuhr. (2003). Evaluating different method of estimating retrieval quality for resource selection. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[16]
H., Nottelmann and N., Fuhr. (2003). The MIND architecture for heterogeneous multimedia federated digital libraries. ACM SIGIR 2003 Workshop on Distributed Information Retrieval.
[17]
A.L. Powell, J.C. French, J. Callan, M. Connell, and C.L. Viles. (2000). The impact of database selection on distributed searching. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[18]
A.L. Powell and J.C. French. (2003). Comparing the performance of database selection algorithms. ACM Transactions on Information Systems, 21(4). (pp. 412--456).
[19]
C. Sherman (2001). Search for the invisible web. Guardian Unlimited.
[20]
L. Si and J. Callan. (2002). Using sampled data and regression to merge search engine results. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[21]
L. Si and J. Callan. (2003). Relevant document distribution estimation method for resource selection. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[22]
L. Si and J. Callan. (2003). A Semi-Supervised learning method to merge search engine results. ACM Transactions on Information Systems, 21(4). (pp. 457--491).

Cited By

View all
  • (2020)Deep Web Selection Based on Entity AssociationProceedings of the 9th International Conference on Computer Engineering and Networks10.1007/978-981-15-3753-0_79(815-825)Online publication date: 1-Jul-2020
  • (2018)Source selection of long tail sources for federated search in an uncooperative settingProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167212(720-727)Online publication date: 9-Apr-2018
  • (2017)Integration of deep web sourcesProceedings of the 7th International Conference on Web Intelligence, Mining and Semantics10.1145/3102254.3102291(1-4)Online publication date: 19-Jun-2017
  • Show More Cited By

Index Terms

  1. Unified utility maximization framework for resource selection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
    November 2004
    678 pages
    ISBN:1581138741
    DOI:10.1145/1031171
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 November 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed information retrieval
    2. resource selection

    Qualifiers

    • Article

    Conference

    CIKM04
    Sponsor:
    CIKM04: Conference on Information and Knowledge Management
    November 8 - 13, 2004
    D.C., Washington, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Deep Web Selection Based on Entity AssociationProceedings of the 9th International Conference on Computer Engineering and Networks10.1007/978-981-15-3753-0_79(815-825)Online publication date: 1-Jul-2020
    • (2018)Source selection of long tail sources for federated search in an uncooperative settingProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167212(720-727)Online publication date: 9-Apr-2018
    • (2017)Integration of deep web sourcesProceedings of the 7th International Conference on Web Intelligence, Mining and Semantics10.1145/3102254.3102291(1-4)Online publication date: 19-Jun-2017
    • (2017)Enhancing information source selection using a genetic algorithm and social taggingInternational Journal of Information Management: The Journal for Information Professionals10.1016/j.ijinfomgt.2017.07.01137:6(741-749)Online publication date: 1-Dec-2017
    • (2016)Evaluating Document Retrieval Methods for Resource Selection in Clustered P2P IRProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983912(2073-2076)Online publication date: 24-Oct-2016
    • (2016)Efficient distributed selective searchInformation Retrieval Journal10.1007/s10791-016-9290-620:3(221-252)Online publication date: 25-Nov-2016
    • (2015)Ranking Deep Web Text Collections for Scalable Information ExtractionProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806581(153-162)Online publication date: 17-Oct-2015
    • (2015)PERSONALIZED Source Selection Process: A Social Profile Adaptation TechniqueIntelligent Data Analysis and Applications10.1007/978-3-319-21206-7_18(203-213)Online publication date: 26-Jun-2015
    • (2014)Theoretical, Qualitative, and Quantitative Analyses of Small-Document Approaches to Resource SelectionACM Transactions on Information Systems10.1145/259097532:2(1-37)Online publication date: 1-Apr-2014
    • (2011)Usercentric Operational Decision Making in Distributed Information RetrievalInformation Systems Research10.1287/isre.1100.028722:4(739-755)Online publication date: 1-Dec-2011
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media