research-article

Learning from labeled features using generalized expectation criteria

Authors:

Andrew McCallumAuthors Info & Claims

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 595 - 602

https://doi.org/10.1145/1390334.1390436

Published: 20 July 2008 Publication History

Abstract

It is difficult to apply machine learning to new domains because often we lack labeled problem instances. In this paper, we provide a solution to this problem that leverages domain knowledge in the form of affinities between input features and classes. For example, in a baseball vs. hockey text classification problem, even without any labeled data, we know that the presence of the word puck is a strong indicator of hockey. We refer to this type of domain knowledge as a labeled feature. In this paper, we propose a method for training discriminative probabilistic models with labeled features and unlabeled instances. Unlike previous approaches that use labeled features to create labeled pseudo-instances, we use labeled features directly to constrain the model's predictions on unlabeled instances. We express these soft constraints using generalized expectation (GE) criteria --- terms in a parameter estimation objective function that express preferences on values of a model expectation. In this paper we train multinomial logistic regression models using GE criteria, but the method we develop is applicable to other discriminative probabilistic models. The complete objective function also includes a Gaussian prior on parameters, which encourages generalization by spreading parameter weight to unlabeled features. Experimental results on text classification data sets show that this method outperforms heuristic approaches to training classifiers with labeled features. Experiments with human annotators show that it is more beneficial to spend limited annotation time labeling features rather than labeling instances. For example, after only one minute of labeling features, we can achieve 80% accuracy on the ibm vs. mac text classification problem using GE-FL, whereas ten minutes labeling documents results in an accuracy of only 77%

References

[1]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.

Digital Library

[2]

M. Chang, L. Ratinov, and D. Roth. Guiding semi-supervision with constraint-driven learning. In ACL, 2007.

[3]

D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994.

[4]

A. Dayanik, D. D. Lewis, D. Madigan, V. Menkov, and A. Genkin. Constructing informative prior distributions from domain knowledge in text classification. In SIGIR, pages 493--500, 2006.

Digital Library

[5]

Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2-3):133--168, 1997.

Digital Library

[6]

S. Godbole, A. Harpale, S. Sarawagi, and S. Chakrabarti. Document classification through interactive supervision of document and term labels. In PKDD, pages 185--196, 2004.

Digital Library

[7]

J. Graca, K. Ganchev, and B. Taskar. Expectation maximization and posterior constraints. In NIPS, 2007.

[8]

Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. In NIPS, 2004.

Digital Library

[9]

A. Haghighi and D. Klein. Prototype-driver learning for sequence models. In NAACL, 2006.

Digital Library

[10]

Y. Huang and T. M. Mitchell. Text clustering with extended user feedback. In SIGIR, pages 413--420, 2006.

Digital Library

[11]

R. Jin and Y. Liu. A framework for incorporating class priors into discriminative classification. In PAKDD, 2005.

Digital Library

[12]

T. Joachims. Transductive inference for text classification using support vector machines. In ICML, 1999.

Digital Library

[13]

D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In ICML, 1994.

[14]

B. Liu, X. Li, W. Lee, and P. Yu. Text classification by labeling words. In AAAI, 2004.

Digital Library

[15]

G. Mann and A. McCallum. Simple, robust, scalable semi-supervised learning via expectation regularization. In ICML, 2007.

Digital Library

[16]

A. McCallum, G. Mann, and G. Druck. Generalized expectation criteria. Technical Report 2007-62, University of Massachusetts, Amherst, 2007.

[17]

H. Raghavan and J. Allan. An interactive algorithm for asking and incorporating feature feedback into support vector machines. In SIGIR, pages 79--86, 2007.

Digital Library

[18]

H. Raghavan, O. Madani, and R. Jones. Active learning with feedback on features and instances. Journal of Machine Learning Research, 7:1655--1686, 2006.

Digital Library

[19]

R. Schapire, M. Rochery, M. Rahim, and N. Gupta. Incorporating prior knowledge into boosting. In ICML, 2002.

Digital Library

[20]

X. Wu and R. K. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In SIGKDD, 2004.

Digital Library

[21]

X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005.

Cited By

Lu YSong WArachie CHuang B(2025)Weakly supervised label learning flowsNeural Networks10.1016/j.neunet.2024.106892182(106892)Online publication date: Feb-2025
https://doi.org/10.1016/j.neunet.2024.106892
Riccio CMenanno MZennaro ISavino M(2024)A New Methodological Framework for Optimizing Predictive Maintenance Using Machine Learning Combined with Product Quality ParametersMachines10.3390/machines1207044312:7(443)Online publication date: 27-Jun-2024
https://doi.org/10.3390/machines12070443
Zhang XWang ZJiang LGao WWang PLiu KLarson K(2024)TFWTProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/284(2570-2578)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/284
Show More Cited By

Index Terms

Learning from labeled features using generalized expectation criteria
1. Computing methodologies
  1. Machine learning

Recommendations

Learning Instance Weighted Naive Bayes from labeled and unlabeled data

In real-world data mining applications, it is often the case that unlabeled instances are abundant, while available labeled instances are very limited. Thus, semi-supervised learning, which attempts to benefit from large amount of unlabeled data ...
Non-linear dictionary learning with partially labeled data

While recent techniques for discriminative dictionary learning have demonstrated tremendous success in image analysis applications, their performance is often limited by the amount of labeled data available for training. Even though labeling images is ...
Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

July 2008

934 pages

ISBN:9781605581644

DOI:10.1145/1390334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '08

Sponsor:

SIGIR '08: The 31st Annual International ACM SIGIR Conference

July 20 - 24, 2008

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

144
Total Citations
View Citations
1,060
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lu YSong WArachie CHuang B(2025)Weakly supervised label learning flowsNeural Networks10.1016/j.neunet.2024.106892182(106892)Online publication date: Feb-2025
https://doi.org/10.1016/j.neunet.2024.106892
Riccio CMenanno MZennaro ISavino M(2024)A New Methodological Framework for Optimizing Predictive Maintenance Using Machine Learning Combined with Product Quality ParametersMachines10.3390/machines1207044312:7(443)Online publication date: 27-Jun-2024
https://doi.org/10.3390/machines12070443
Zhang XWang ZJiang LGao WWang PLiu KLarson K(2024)TFWTProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/284(2570-2578)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/284
Karamanolakis GHsu DGravaano L(2024)Interactive Machine Teaching by Labeling Rules and InstancesTransactions of the Association for Computational Linguistics10.1162/tacl_a_0070712(1441-1459)Online publication date: 18-Nov-2024
https://doi.org/10.1162/tacl_a_00707
Lee YLam MVasconcelos HBernstein MFinn C(2024)Clarify: Improving Model Robustness With Natural Language CorrectionsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676362(1-19)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676362
Ejlali MArian ETaghiyeh SChambers KSadeghi ATaghiye ECakdi DHandfield R(2024)Developing hybrid machine learning models to assign health score to railcar fleets for optimal decision makingExpert Systems with Applications10.1016/j.eswa.2024.123931250(123931)Online publication date: Sep-2024
https://doi.org/10.1016/j.eswa.2024.123931
Kavargyris DGeorgiou KAngelis L(2024)EchoSense: a framework for analyzing the echo chambers phenomenon: a case study on Qatar eventsSocial Network Analysis and Mining10.1007/s13278-024-01275-014:1Online publication date: 10-Jun-2024
https://doi.org/10.1007/s13278-024-01275-0
Wood M(2023)Measuring Cultural Diversity in Text with Word CountsSocial Psychology Quarterly10.1177/0190272523119435687:3(205-226)Online publication date: 16-Sep-2023
https://doi.org/10.1177/01902725231194356
Effland TCollins M(2023)Improving Low-Resource Cross-lingual Parsing with Expected Statistic RegularizationTransactions of the Association for Computational Linguistics10.1162/tacl_a_0053711(122-138)Online publication date: 12-Jan-2023
https://doi.org/10.1162/tacl_a_00537
Lee BDowney DLo KWeld D(2023)LIMEADE: From AI Explanations to Advice TakingACM Transactions on Interactive Intelligent Systems10.1145/358934513:4(1-29)Online publication date: 28-Mar-2023
https://dl.acm.org/doi/10.1145/3589345
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten