skip to main content
10.1145/1390334.1390436acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Learning from labeled features using generalized expectation criteria

Published: 20 July 2008 Publication History

Abstract

It is difficult to apply machine learning to new domains because often we lack labeled problem instances. In this paper, we provide a solution to this problem that leverages domain knowledge in the form of affinities between input features and classes. For example, in a baseball vs. hockey text classification problem, even without any labeled data, we know that the presence of the word puck is a strong indicator of hockey. We refer to this type of domain knowledge as a labeled feature. In this paper, we propose a method for training discriminative probabilistic models with labeled features and unlabeled instances. Unlike previous approaches that use labeled features to create labeled pseudo-instances, we use labeled features directly to constrain the model's predictions on unlabeled instances. We express these soft constraints using generalized expectation (GE) criteria --- terms in a parameter estimation objective function that express preferences on values of a model expectation. In this paper we train multinomial logistic regression models using GE criteria, but the method we develop is applicable to other discriminative probabilistic models. The complete objective function also includes a Gaussian prior on parameters, which encourages generalization by spreading parameter weight to unlabeled features. Experimental results on text classification data sets show that this method outperforms heuristic approaches to training classifiers with labeled features. Experiments with human annotators show that it is more beneficial to spend limited annotation time labeling features rather than labeling instances. For example, after only one minute of labeling features, we can achieve 80% accuracy on the ibm vs. mac text classification problem using GE-FL, whereas ten minutes labeling documents results in an accuracy of only 77%

References

[1]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[2]
M. Chang, L. Ratinov, and D. Roth. Guiding semi-supervision with constraint-driven learning. In ACL, 2007.
[3]
D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994.
[4]
A. Dayanik, D. D. Lewis, D. Madigan, V. Menkov, and A. Genkin. Constructing informative prior distributions from domain knowledge in text classification. In SIGIR, pages 493--500, 2006.
[5]
Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2-3):133--168, 1997.
[6]
S. Godbole, A. Harpale, S. Sarawagi, and S. Chakrabarti. Document classification through interactive supervision of document and term labels. In PKDD, pages 185--196, 2004.
[7]
J. Graca, K. Ganchev, and B. Taskar. Expectation maximization and posterior constraints. In NIPS, 2007.
[8]
Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. In NIPS, 2004.
[9]
A. Haghighi and D. Klein. Prototype-driver learning for sequence models. In NAACL, 2006.
[10]
Y. Huang and T. M. Mitchell. Text clustering with extended user feedback. In SIGIR, pages 413--420, 2006.
[11]
R. Jin and Y. Liu. A framework for incorporating class priors into discriminative classification. In PAKDD, 2005.
[12]
T. Joachims. Transductive inference for text classification using support vector machines. In ICML, 1999.
[13]
D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In ICML, 1994.
[14]
B. Liu, X. Li, W. Lee, and P. Yu. Text classification by labeling words. In AAAI, 2004.
[15]
G. Mann and A. McCallum. Simple, robust, scalable semi-supervised learning via expectation regularization. In ICML, 2007.
[16]
A. McCallum, G. Mann, and G. Druck. Generalized expectation criteria. Technical Report 2007-62, University of Massachusetts, Amherst, 2007.
[17]
H. Raghavan and J. Allan. An interactive algorithm for asking and incorporating feature feedback into support vector machines. In SIGIR, pages 79--86, 2007.
[18]
H. Raghavan, O. Madani, and R. Jones. Active learning with feedback on features and instances. Journal of Machine Learning Research, 7:1655--1686, 2006.
[19]
R. Schapire, M. Rochery, M. Rahim, and N. Gupta. Incorporating prior knowledge into boosting. In ICML, 2002.
[20]
X. Wu and R. K. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In SIGKDD, 2004.
[21]
X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005.

Cited By

View all
  • (2025)Weakly supervised label learning flowsNeural Networks10.1016/j.neunet.2024.106892182(106892)Online publication date: Feb-2025
  • (2024)A New Methodological Framework for Optimizing Predictive Maintenance Using Machine Learning Combined with Product Quality ParametersMachines10.3390/machines1207044312:7(443)Online publication date: 27-Jun-2024
  • (2024)TFWTProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/284(2570-2578)Online publication date: 3-Aug-2024
  • Show More Cited By

Index Terms

  1. Learning from labeled features using generalized expectation criteria

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
    July 2008
    934 pages
    ISBN:9781605581644
    DOI:10.1145/1390334
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. labeled features
    2. learning with domain knowledge
    3. semi-supervised learning
    4. text classification

    Qualifiers

    • Research-article

    Conference

    SIGIR '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Weakly supervised label learning flowsNeural Networks10.1016/j.neunet.2024.106892182(106892)Online publication date: Feb-2025
    • (2024)A New Methodological Framework for Optimizing Predictive Maintenance Using Machine Learning Combined with Product Quality ParametersMachines10.3390/machines1207044312:7(443)Online publication date: 27-Jun-2024
    • (2024)TFWTProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/284(2570-2578)Online publication date: 3-Aug-2024
    • (2024)Interactive Machine Teaching by Labeling Rules and InstancesTransactions of the Association for Computational Linguistics10.1162/tacl_a_0070712(1441-1459)Online publication date: 18-Nov-2024
    • (2024)Clarify: Improving Model Robustness With Natural Language CorrectionsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676362(1-19)Online publication date: 13-Oct-2024
    • (2024)Developing hybrid machine learning models to assign health score to railcar fleets for optimal decision makingExpert Systems with Applications10.1016/j.eswa.2024.123931250(123931)Online publication date: Sep-2024
    • (2024)EchoSense: a framework for analyzing the echo chambers phenomenon: a case study on Qatar eventsSocial Network Analysis and Mining10.1007/s13278-024-01275-014:1Online publication date: 10-Jun-2024
    • (2023)Measuring Cultural Diversity in Text with Word CountsSocial Psychology Quarterly10.1177/0190272523119435687:3(205-226)Online publication date: 16-Sep-2023
    • (2023)Improving Low-Resource Cross-lingual Parsing with Expected Statistic RegularizationTransactions of the Association for Computational Linguistics10.1162/tacl_a_0053711(122-138)Online publication date: 12-Jan-2023
    • (2023)LIMEADE: From AI Explanations to Advice TakingACM Transactions on Interactive Intelligent Systems10.1145/358934513:4(1-29)Online publication date: 28-Mar-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media