skip to main content
10.1145/2505515.2505570acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Intelligently querying incomplete instances for improving classification performance

Published:27 October 2013Publication History

ABSTRACT

The problem of intelligently acquiring missing input information given a limited number of queries to enhance classification performance has gained substantial interest in the last decade or so. This is primarily due to the emergence of the targeted advertising industry which is trying to best match products to its potential consumer base in the absence of complete consumer profile information. In this paper, we propose a novel active feature acquisition technique to tackle this problem of instance completion prevalent in these domains. We show theoretically that our technique is optimal given the current classifier and derive a probabilistic lower bound on the error reduction achieved with our technique. We also show that a simplification of our technique is equivalent to the Expected Utility approach which is one of the most sophisticated solutions for this problem in existing literature. We then demonstrate the efficacy of our approach through experiments on real data. Finally, we show that our technique can be easily extended to the scenario where we have a cost matrix associated with acquiring missing information for each instance or instance-feature combinations.

References

  1. M. Bilgic and L. Getoor. Voila: Efficient feature-value acquisition for classification. In AAAI '07: Proceedings of the 22nd National Conference on Artificial Intelligence, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Mach. Learn., 15(2):201--221, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Dasgupta and J. Langford. Active learning tutorial. In Proceedings of the 26th International Conference on Machine Learning, ICML '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Frazier. Knowledge-gradient Methods for Statistical Learning. Princeton Univ. Press, 2009.Google ScholarGoogle Scholar
  5. G. Grimmett and D. Stirzaker. Probability and Random Processes. Oxford, 3 edition, 2001.Google ScholarGoogle Scholar
  6. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Kanani and P. Melville. Prediction-time active feature-value acquisition for customer targeting. In Proceedings of the Workshop on Cost Sensitive Learning, NIPS 2008, 2008.Google ScholarGoogle Scholar
  8. A. Kapoor and R. Greiner. Learning and classifying under hard budgets. In Proceedings of the European Conference on Machine Learning (ECML-05, pages 170--181. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In In Proceedings of the Eleventh International Conference on Machine Learning, pages 148--156. Morgan Kaufmann, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. J. Lizotte and O. Madani. Budgeted learning of naive-bayes classifiers. In Proceedings of 19th Conference on Uncertainty in Artificial Intelligence (UAI-2003, pages 378--385. Morgan Kaufmann, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. McCullagh and J. Nelder. Generalized Linear Models, Second Edition. Chapman and Hall, 1990.Google ScholarGoogle Scholar
  12. P. Melville, M. Saar-Tsechansky, F. Provost, and R. Mooney. Active feature-value acquisition for classifier induction. In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM '04, pages 483--486. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Melville, M. Saar-Tsechansky, F. Provost, and R. Mooney. An expected utility approach to active feature-value acquisition. In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM '05. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Saar-Tsechansky, P. Melville, and F. Provost. Active feature-value acquisition. Manage. Sci., 55(4):664--684, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Saar-tsechansky and F. Provost. Active sampling for class probability estimation and ranking. In Machine Learning, pages 153--178, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Settles. Active learning literature survey. Technical report, 2010.Google ScholarGoogle Scholar
  17. V. S. Sheng and C. X. Ling. Feature value acquisition in testing: A sequential batch test algorithm. In In Proceedings of 2006 International Conference on Machine Learning (ICML 2006, pages 809--816, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Vapnik. Statistical Learning Theory. Wiley & Sons, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Z. Zheng and B. Padmanabhan. On active learning for data acquisition. In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM '02. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Intelligently querying incomplete instances for improving classification performance

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
      October 2013
      2612 pages
      ISBN:9781450322638
      DOI:10.1145/2505515

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 October 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader