ABSTRACT
The problem of intelligently acquiring missing input information given a limited number of queries to enhance classification performance has gained substantial interest in the last decade or so. This is primarily due to the emergence of the targeted advertising industry which is trying to best match products to its potential consumer base in the absence of complete consumer profile information. In this paper, we propose a novel active feature acquisition technique to tackle this problem of instance completion prevalent in these domains. We show theoretically that our technique is optimal given the current classifier and derive a probabilistic lower bound on the error reduction achieved with our technique. We also show that a simplification of our technique is equivalent to the Expected Utility approach which is one of the most sophisticated solutions for this problem in existing literature. We then demonstrate the efficacy of our approach through experiments on real data. Finally, we show that our technique can be easily extended to the scenario where we have a cost matrix associated with acquiring missing information for each instance or instance-feature combinations.
- M. Bilgic and L. Getoor. Voila: Efficient feature-value acquisition for classification. In AAAI '07: Proceedings of the 22nd National Conference on Artificial Intelligence, 2007. Google ScholarDigital Library
- D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Mach. Learn., 15(2):201--221, 1994. Google ScholarDigital Library
- S. Dasgupta and J. Langford. Active learning tutorial. In Proceedings of the 26th International Conference on Machine Learning, ICML '09, 2009. Google ScholarDigital Library
- P. Frazier. Knowledge-gradient Methods for Statistical Learning. Princeton Univ. Press, 2009.Google Scholar
- G. Grimmett and D. Stirzaker. Probability and Random Processes. Oxford, 3 edition, 2001.Google Scholar
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.Google ScholarCross Ref
- P. Kanani and P. Melville. Prediction-time active feature-value acquisition for customer targeting. In Proceedings of the Workshop on Cost Sensitive Learning, NIPS 2008, 2008.Google Scholar
- A. Kapoor and R. Greiner. Learning and classifying under hard budgets. In Proceedings of the European Conference on Machine Learning (ECML-05, pages 170--181. Springer, 2005. Google ScholarDigital Library
- D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In In Proceedings of the Eleventh International Conference on Machine Learning, pages 148--156. Morgan Kaufmann, 1994.Google ScholarDigital Library
- D. J. Lizotte and O. Madani. Budgeted learning of naive-bayes classifiers. In Proceedings of 19th Conference on Uncertainty in Artificial Intelligence (UAI-2003, pages 378--385. Morgan Kaufmann, 2003. Google ScholarDigital Library
- P. McCullagh and J. Nelder. Generalized Linear Models, Second Edition. Chapman and Hall, 1990.Google Scholar
- P. Melville, M. Saar-Tsechansky, F. Provost, and R. Mooney. Active feature-value acquisition for classifier induction. In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM '04, pages 483--486. IEEE Computer Society, 2004. Google ScholarDigital Library
- P. Melville, M. Saar-Tsechansky, F. Provost, and R. Mooney. An expected utility approach to active feature-value acquisition. In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM '05. IEEE Computer Society, 2005. Google ScholarDigital Library
- M. Saar-Tsechansky, P. Melville, and F. Provost. Active feature-value acquisition. Manage. Sci., 55(4):664--684, 2009. Google ScholarDigital Library
- M. Saar-tsechansky and F. Provost. Active sampling for class probability estimation and ranking. In Machine Learning, pages 153--178, 2004. Google ScholarDigital Library
- B. Settles. Active learning literature survey. Technical report, 2010.Google Scholar
- V. S. Sheng and C. X. Ling. Feature value acquisition in testing: A sequential batch test algorithm. In In Proceedings of 2006 International Conference on Machine Learning (ICML 2006, pages 809--816, 2006. Google ScholarDigital Library
- V. Vapnik. Statistical Learning Theory. Wiley & Sons, 1998.Google ScholarDigital Library
- Z. Zheng and B. Padmanabhan. On active learning for data acquisition. In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM '02. IEEE Computer Society, 2002. Google ScholarDigital Library
Index Terms
Intelligently querying incomplete instances for improving classification performance
Recommendations
Adaptive imputation of missing values for incomplete pattern classification
In classification of incomplete pattern, the missing values can either play a crucial role in the class determination, or have only little influence (or eventually none) on the classification results according to the context. We propose a credal ...
Impact of imputation of missing values on classification error for discrete data
Numerous industrial and research databases include missing values. It is not uncommon to encounter databases that have up to a half of the entries missing, making it very difficult to mine them using data analysis methods that can work only with ...
Missing values: how many can they be to preserve classification reliability?
Using five medical datasets we detected the influence of missing values on true positive rates and classification accuracy. We randomly marked more and more values as missing and tested their effects on classification accuracy. The classifications were ...
Comments