research-article

A unified architecture for natural language processing: deep neural networks with multitask learning

Authors:
Ronan Collobert

NEC Labs America, Princeton, NJ

NEC Labs America, Princeton, NJ
View Profile

,
Jason Weston

NEC Labs America, Princeton, NJ

NEC Labs America, Princeton, NJ
View Profile

ICML '08: Proceedings of the 25th international conference on Machine learningJuly 2008Pages 160–167https://doi.org/10.1145/1390156.1390177

Published:05 July 2008Publication History

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 160–167

ABSTRACT

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in state-of-the-art-performance.

References

Ando, R., & Zhang, T. (2005). A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. JMLR, 6, 1817--1853. Google ScholarDigital Library
Bengio, Y., & Ducharme, R. (2001). A neural probabilistic language model. NIPS 13.Google Scholar
Bridle, J. (1990). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. F. Soulié and J. Hérault (Eds.), Neurocomputing: Algorithms, architectures and applications, 227--236. NATO ASI Series.Google Scholar
Caruana, R. (1997). Multitask Learning. Machine Learning, 28, 41--75. Google ScholarDigital Library
Chapelle, O., Schlkopf, B., & Zien, A. (2006). Semi-supervised learning. Adaptive computation and machine learning. Cambridge, Mass., USA: MIT Press. Google ScholarDigital Library
Collobert, R., & Weston, J. (2007). Fast semantic extraction using a novel neural network architecture. Proceedings of the 45th Annual Meeting of the ACL (pp. 560--567).Google Scholar
Gildea, D., & Palmer, M. (2001). The necessity of parsing for predicate argument recognition. Proceedings of the 40th Annual Meeting of the ACL, 239--246. Google ScholarDigital Library
Joachims, T. (1999). Transductive inference for text classification using support vector machines. ICML. Google ScholarDigital Library
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86.Google ScholarCross Ref
McClosky, D., Charniak, E., & Johnson, M. (2006). Effective self-training for parsing. Proceedings of HLTNAACL 2006. Google ScholarDigital Library
Miller, S., Fox, H., Ramshaw, L., & Weischedel, R. (2000). A novel use of statistical parsing to extract information from text. 6th Applied Natural Language Processing Conference. Google ScholarDigital Library
Musillo, G., & Merlo, P. (2006). Robust Parsing of the Proposition Bank. ROMAND 2006: Robust Methods in Analysis of Natural language Data.Google Scholar
Okanohara, D., & Tsujii, J. (2007). A discriminative language model with pseudo-negative samples. Proceedings of the 45th Annual Meeting of the ACL, 73--80.Google Scholar
Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Comput. Linguist., 31, 71--106. Google ScholarDigital Library
Pradhan, S., Ward, W., Hacioglu, K., Martin, J., & Jurafsky, D. (2004). Shallow semantic parsing using support vector machines. Proceedings of HLT/NAACL-2004.Google Scholar
Rosenfeld, B., & Feldman, R. (2007). Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web. Proceedings of the 45th Annual Meeting of the ACL, 600--607.Google Scholar
Schwenk, H., & Gauvain, J. (2002). Connectionist language modeling for large vocabulary continuous speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 765--768).Google Scholar
Sutton, C., & McCallum, A. (2005a). Composition of conditional random fields for transfer learning. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 748--754. Google ScholarDigital Library
Sutton, C., & McCallum, A. (2005b). Joint parsing and semantic role labeling. Proceedings of CoNLL-2005 (pp. 225--228). Google ScholarDigital Library
Sutton, C., McCallum, A., & Rohanimanesh, K. (2007). Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. JMLR, 8, 693--723. Google ScholarDigital Library
Ueffing, N., Haffari, G., & Sarkar, A. (2007). Transductive learning for statistical machine translation. Proceedings of the 45th Annual Meeting of the ACL, 25--32.Google Scholar
Waibel, A., abd G. Hinton, T. H., Shikano, K., & Lang, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328--339.Google ScholarCross Ref

Index Terms

A unified architecture for natural language processing: deep neural networks with multitask learning
1. Computing methodologies

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '08: Proceedings of the 25th international conference on Machine learning
July 2008
1310 pages
ISBN:9781605582054
DOI:10.1145/1390156
General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2,700
  Total Citations
  View Citations
- 16,555
  Total Downloads
- Downloads (Last 12 months)829
- Downloads (Last 6 weeks)93
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A unified architecture for natural language processing: deep neural networks with multitask learning

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Introduction to Chinese Natural Language Processing

Crosslingual Sharing for Low-Resource Natural Language Processing

Deciphering natural language

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A unified architecture for natural language processing: deep neural networks with multitask learning

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Introduction to Chinese Natural Language Processing

Crosslingual Sharing for Low-Resource Natural Language Processing

Deciphering natural language

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media