ABSTRACT
There is a growing number of service providers that a consumer can interact with over the web to learn their service terms. The service terms, such as price and time to completion of the service, depend on the consumer's particular specifications. For instance, a printing services provider would need from its customers specifications such as the size of paper, type of ink, proofing and perforation. In a few sectors, there exist marketplace sites that provide consumers with specifications forms, which the consumer can fill out to learn the service terms of multiple service providers. Unfortunately, there are only a few such marketplace sites, and they cover a few sectors.
At HP Labs, we are working towards building a universal marketplace site, i.e., a marketplace site that covers thousands of sectors and hundreds of providers per sector. One issue in this domain is the automated discovery/retrieval of the specifications for each sector. We address it through extracting and analyzing content from the websites of the service providers listed in business directories. The challenge is that each service provider is often listed under multiple service categories in a business directory, making it infeasible to utilize standard supervised learning techniques. We address this challenge through employing a multilabel statistical clustering approach within an expectation-maximization framework. We implement our solution to retrieve specifications for 3000 sectors, representing more than 300,000 service providers. We discuss our results within the context of the services needed to design a marketing campaign for a small business.
Supplemental Material
- L. Barbosa and J. Freire. Searching for hidden web databases. In WebDB, 2005.Google Scholar
- L. Barbosa and J. Freire. Combining classifiers to identify online databases. In WWW, 2007. Google ScholarDigital Library
- M. Boutell, J. Luo, X. Shen, and C. Brown. Learning multilabel scene classification. Pattern Recognition, 2004.Google Scholar
- S. Chakrabarti, K. Punera, and M. Subramanyam. Accelerated focused crawling through online relevance feedback. In WWW, 2002. Google ScholarDigital Library
- S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks, 1999. Google ScholarDigital Library
- J. Cope, N. Craswell, and D. Hawking. Automated discovery of search interfaces on the web. In ADC, 2003. Google ScholarDigital Library
- A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 1977.Google Scholar
- M. Diligenti, F. Coetzee, S. Lawrence, C. Giles, and M. Gori. Focused crawling using context graphs. In VLDB, 2000. Google ScholarDigital Library
- R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley, 2001. Google ScholarDigital Library
- S. Gao, W. Wu, C. Lee, and T. Chua. A mfom learning approach to robust multiclass multilabel text categorization. In ICML, 2004. Google ScholarDigital Library
- B. He and K. Chang. Organizing structured web sources by query schemas: a clustering approach. In CIKM, 2004. Google ScholarDigital Library
- A. Hess and N. Kushmerick. Automatically attaching semantic metadata to web services. In IIWeb, 2003.Google Scholar
- R. Jin and Z. Ghahramani. Learning with multiple labels. In NIPS, 2002.Google Scholar
- A. McCallum. Multilabel text classification with a mixture model trained by expectation-maximization. In AAAI, 1999.Google Scholar
- K. Probst, R. Ghani, M. Krema, A. Fano, and Y. Liu. Semi-supervised learning of attribute-value pairs from product descriptions. In IJCAI, 2007. Google ScholarDigital Library
- R. Schapire and Y. Singer. Boostexter: a boosting-based system for text categorization. Machine Learning, 2000. Google ScholarDigital Library
- N. Ueda and K. Saito. Parametric mixture models for multilabel text. In NIPS, 2003.Google Scholar
Index Terms
- Towards a universal marketplace over the web: statistical multi-label classification of service provider forms with simulated annealing
Recommendations
Semantic web service offer discovery for e-commerce
ICEC '08: Proceedings of the 10th international conference on Electronic commerceSemantic Web Services (SWS) are an important part of the Semantic Web, traditionally focused on discovery and composition of e-services. In the area of e-commerce services, it is necessary to go past the granularity of service discovery and also to ...
Web Services: E-Commerce Partner Integration
Online retailers have used a succession of technologies to showcase their products and services on the Web. That experience gives them a greater insight into what it is that the latest technology, Web services, can offer them. This article gives a brief ...
Research Note---Managing e-Marketplace: A Strategic Analysis of Nonprice Advertising
The e-marketplace has emerged as an important electronic shopping environment that, according to a recent Forrester Research report, may evolve into a dominant force in Internet marketing. We investigate an e-marketplace with online stores offering ...
Comments