skip to main content
10.1145/1557019.1557158acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Towards a universal marketplace over the web: statistical multi-label classification of service provider forms with simulated annealing

Published:28 June 2009Publication History

ABSTRACT

There is a growing number of service providers that a consumer can interact with over the web to learn their service terms. The service terms, such as price and time to completion of the service, depend on the consumer's particular specifications. For instance, a printing services provider would need from its customers specifications such as the size of paper, type of ink, proofing and perforation. In a few sectors, there exist marketplace sites that provide consumers with specifications forms, which the consumer can fill out to learn the service terms of multiple service providers. Unfortunately, there are only a few such marketplace sites, and they cover a few sectors.

At HP Labs, we are working towards building a universal marketplace site, i.e., a marketplace site that covers thousands of sectors and hundreds of providers per sector. One issue in this domain is the automated discovery/retrieval of the specifications for each sector. We address it through extracting and analyzing content from the websites of the service providers listed in business directories. The challenge is that each service provider is often listed under multiple service categories in a business directory, making it infeasible to utilize standard supervised learning techniques. We address this challenge through employing a multilabel statistical clustering approach within an expectation-maximization framework. We implement our solution to retrieve specifications for 3000 sectors, representing more than 300,000 service providers. We discuss our results within the context of the services needed to design a marketing campaign for a small business.

Skip Supplemental Material Section

Supplemental Material

p1295-ozonat.mp4

mp4

123.3 MB

References

  1. L. Barbosa and J. Freire. Searching for hidden web databases. In WebDB, 2005.Google ScholarGoogle Scholar
  2. L. Barbosa and J. Freire. Combining classifiers to identify online databases. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Boutell, J. Luo, X. Shen, and C. Brown. Learning multilabel scene classification. Pattern Recognition, 2004.Google ScholarGoogle Scholar
  4. S. Chakrabarti, K. Punera, and M. Subramanyam. Accelerated focused crawling through online relevance feedback. In WWW, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Cope, N. Craswell, and D. Hawking. Automated discovery of search interfaces on the web. In ADC, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 1977.Google ScholarGoogle Scholar
  8. M. Diligenti, F. Coetzee, S. Lawrence, C. Giles, and M. Gori. Focused crawling using context graphs. In VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Gao, W. Wu, C. Lee, and T. Chua. A mfom learning approach to robust multiclass multilabel text categorization. In ICML, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. He and K. Chang. Organizing structured web sources by query schemas: a clustering approach. In CIKM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Hess and N. Kushmerick. Automatically attaching semantic metadata to web services. In IIWeb, 2003.Google ScholarGoogle Scholar
  13. R. Jin and Z. Ghahramani. Learning with multiple labels. In NIPS, 2002.Google ScholarGoogle Scholar
  14. A. McCallum. Multilabel text classification with a mixture model trained by expectation-maximization. In AAAI, 1999.Google ScholarGoogle Scholar
  15. K. Probst, R. Ghani, M. Krema, A. Fano, and Y. Liu. Semi-supervised learning of attribute-value pairs from product descriptions. In IJCAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Schapire and Y. Singer. Boostexter: a boosting-based system for text categorization. Machine Learning, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Ueda and K. Saito. Parametric mixture models for multilabel text. In NIPS, 2003.Google ScholarGoogle Scholar

Index Terms

  1. Towards a universal marketplace over the web: statistical multi-label classification of service provider forms with simulated annealing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
        June 2009
        1426 pages
        ISBN:9781605584959
        DOI:10.1145/1557019

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 June 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader