skip to main content
10.1145/1390156.1390177acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

A unified architecture for natural language processing: deep neural networks with multitask learning

Published:05 July 2008Publication History

ABSTRACT

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in state-of-the-art-performance.

References

  1. Ando, R., & Zhang, T. (2005). A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. JMLR, 6, 1817--1853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bengio, Y., & Ducharme, R. (2001). A neural probabilistic language model. NIPS 13.Google ScholarGoogle Scholar
  3. Bridle, J. (1990). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. F. Soulié and J. Hérault (Eds.), Neurocomputing: Algorithms, architectures and applications, 227--236. NATO ASI Series.Google ScholarGoogle Scholar
  4. Caruana, R. (1997). Multitask Learning. Machine Learning, 28, 41--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chapelle, O., Schlkopf, B., & Zien, A. (2006). Semi-supervised learning. Adaptive computation and machine learning. Cambridge, Mass., USA: MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Collobert, R., & Weston, J. (2007). Fast semantic extraction using a novel neural network architecture. Proceedings of the 45th Annual Meeting of the ACL (pp. 560--567).Google ScholarGoogle Scholar
  7. Gildea, D., & Palmer, M. (2001). The necessity of parsing for predicate argument recognition. Proceedings of the 40th Annual Meeting of the ACL, 239--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Joachims, T. (1999). Transductive inference for text classification using support vector machines. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86.Google ScholarGoogle ScholarCross RefCross Ref
  10. McClosky, D., Charniak, E., & Johnson, M. (2006). Effective self-training for parsing. Proceedings of HLTNAACL 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Miller, S., Fox, H., Ramshaw, L., & Weischedel, R. (2000). A novel use of statistical parsing to extract information from text. 6th Applied Natural Language Processing Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Musillo, G., & Merlo, P. (2006). Robust Parsing of the Proposition Bank. ROMAND 2006: Robust Methods in Analysis of Natural language Data.Google ScholarGoogle Scholar
  13. Okanohara, D., & Tsujii, J. (2007). A discriminative language model with pseudo-negative samples. Proceedings of the 45th Annual Meeting of the ACL, 73--80.Google ScholarGoogle Scholar
  14. Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Comput. Linguist., 31, 71--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Pradhan, S., Ward, W., Hacioglu, K., Martin, J., & Jurafsky, D. (2004). Shallow semantic parsing using support vector machines. Proceedings of HLT/NAACL-2004.Google ScholarGoogle Scholar
  16. Rosenfeld, B., & Feldman, R. (2007). Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web. Proceedings of the 45th Annual Meeting of the ACL, 600--607.Google ScholarGoogle Scholar
  17. Schwenk, H., & Gauvain, J. (2002). Connectionist language modeling for large vocabulary continuous speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 765--768).Google ScholarGoogle Scholar
  18. Sutton, C., & McCallum, A. (2005a). Composition of conditional random fields for transfer learning. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 748--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sutton, C., & McCallum, A. (2005b). Joint parsing and semantic role labeling. Proceedings of CoNLL-2005 (pp. 225--228). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sutton, C., McCallum, A., & Rohanimanesh, K. (2007). Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. JMLR, 8, 693--723. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ueffing, N., Haffari, G., & Sarkar, A. (2007). Transductive learning for statistical machine translation. Proceedings of the 45th Annual Meeting of the ACL, 25--32.Google ScholarGoogle Scholar
  22. Waibel, A., abd G. Hinton, T. H., Shikano, K., & Lang, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328--339.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A unified architecture for natural language processing: deep neural networks with multitask learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ICML '08: Proceedings of the 25th international conference on Machine learning
            July 2008
            1310 pages
            ISBN:9781605582054
            DOI:10.1145/1390156

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 July 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate140of548submissions,26%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader