research-article

Deep Active Learning for Text Classification

Authors:
Bang An

BeiHang University, Beijing, China

BeiHang University, Beijing, China
View Profile

,
Wenjun Wu

BeiHang University, Beijing, China

BeiHang University, Beijing, China
View Profile

,
Huimin Han

BeiHang University, Beijing, China

BeiHang University, Beijing, China
View Profile

ICVISP 2018: Proceedings of the 2nd International Conference on Vision, Image and Signal ProcessingAugust 2018Article No.: 22Pages 1–6https://doi.org/10.1145/3271553.3271578

Published:27 August 2018Publication History

ICVISP 2018: Proceedings of the 2nd International Conference on Vision, Image and Signal Processing

Pages 1–6

ABSTRACT

In recent years, Active Learning (AL) has been applied in the domain of text classification successfully. However, traditional methods need researchers to pay attention to feature extraction of datasets and different features will influence the final accuracy seriously. In this paper, we propose a new method that uses Recurrent Neutral Network (RNN) as the acquisition function in Active Learning called Deep Active Learning (DAL). For DAL, there is no need to consider how to extract features because RNN can use its internal state to process sequences of inputs. We have proved that DAL can achieve the accuracy that cannot be reached by traditional Active Learning methods when dealing with text classification. What's more, DAL can decrease the need of the great number of labeled instances for Deep Learning (DL).

At the same time, we design a strategy to distribute label work to different workers. We have proved by using a proper batch size of instance, we can save much time but not decrease the model's accuracy. Based on this, we provide batch of instances for different workers and the size of batch is determined by worker's ability and scale of dataset, meanwhile, it can be updated with the performance of the workers.

References

Aggarwal, C. C., and Zhai, C. 2012. A survey of text classification algorithms. In Mining text data. Springer. 163--222.Google Scholar
Chen, Y. and Krause, A. 2013. Near-optimal batch mode active learning and adaptive submodular optimization. In ICML'13 Proceedings of the 30th International Conference on International Conference on Machine Learning. Google ScholarDigital Library
Gal, Y., Islam, R. and Ghahramani, Z. 2016. Deep Bayesian Active Learning with Image Data. In: Advances in Neural Information Processing Systems 29.Google Scholar
Gal, Y. 2016. Uncertainty in Deep Learning. Doctor. University of Cambridge.Google Scholar
Hassan, S., Rafi, M. and Shaikh, M. 2011. Comparing SVM and naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment. In IEEE 14th International Multitopic Conference.Google Scholar
Hoi, S., Jin R., Zhu J. and Lyu M. 2006. Batch Mode Active Learning and Its Application to Medical Image Classification. In ICML '06 Proceedings of the 23rd international conference on Machine learning. Google ScholarDigital Library
Huang, K. and Lin, H. 2016. A Novel Uncertainty Sampling Algorithm for Cost-sensitive Multiclass Active Learning. In 2016 IEEE 16th International Conference on Data Mining (ICDM).Google Scholar
Kim, Y. 2014. Convolutional Neural Networks for Sentence Classification. In Empirical Methods in Natural Language Processing.Google Scholar
Lai, S., Xu, L., Liu, K. and Zhao, J. 2015. Recurrent Convolutional Neural Networks for Text Classification. In Twenty-Ninth AAAI Conference on Artificial Intelligence Google ScholarDigital Library
Lilleberg, J., Zhu, Y. and Zhang, Y. 2015. Support Vector Machines and Word2vec for Text Classification with Semantic Features. In IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing.Google Scholar
Rajaraman, A., Ullman, J.D. 2011. "Data Mining". Mining of Massive Datasets. pp. 1--17. ISBN 978-1-139-05845-2. Google ScholarDigital Library
Ramos, J.E. 2003. Using TF-IDF to Determine Word Relevance in Document Queries.Google Scholar
Settles, B. 2010. Active Learning Literature Survey. Google ScholarDigital Library
Settles, B. and Craven, M. 2008. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. In Empirical Methods in Natural Language Processing. Google ScholarDigital Library
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15 (1929--1958). Google ScholarDigital Library
Sum, M., Li, J., Guo, Z., Zhao, Y., Zheng, Y., Si, X. and Liu, Z. 2016. THUCTC: An Efficient Chinese Text Classifier.Google Scholar
Thompson, C., Califf, M. and Mooney, R. 1999. Active Learning for Natural Language Parsing and Information Extraction. In Proceedings of the Sixteenth International Machine Learning Conference. Google ScholarDigital Library
Tong, S. and Koller, D. 2001. Support Vector Machine Active Learning with Applications to Text Classification. Machine Learning Research. Google ScholarDigital Library
Wang, X., Jiang, W. and Luo, Z. 2016. Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts. Osaka, Japan, pp.2428--2437.Google Scholar
Yin, W., Kann, K., Yu, M. and Schütze, H. 2017. Comparative Study of CNN and RNN for Natural Language Processing.Google Scholar

Index Terms

Deep Active Learning for Text Classification
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Active learning settings

Recommendations

Effective multi-label active learning for text classification
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Labeling text data is quite time-consuming but essential for automatic text classification. Especially, manually creating multiple labels for each document may become impractical when a very large amount of data is needed for training multi-label text ...
Read More
Active learning for text classification with reusability

We investigate the reusability problem in active learning for text classification.The reusability problem affects active learning systems for text classification.If the consumer classifier type is known, it should be used for the selector.Local and ...
Read More
Deep Reinforcement Active Learning for Medical Image Classification
Medical Image Computing and Computer Assisted Intervention – MICCAI 2020
Abstract
In this paper, we propose a deep reinforcement learning algorithm for active learning on medical image data. Although deep learning has achieved great success on medical image processing, it relies on a large number of labeled data for training, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICVISP 2018: Proceedings of the 2nd International Conference on Vision, Image and Signal Processing
August 2018
402 pages
ISBN:9781450365291
DOI:10.1145/3271553

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 August 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Active Learning
Artificial Intelligence
Deep Learning
Machine Learning
Text Classification
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate186of424submissions,44%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 522
  Total Downloads
- Downloads (Last 12 months)55
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deep Active Learning for Text Classification

ICVISP 2018: Proceedings of the 2nd International Conference on Vision, Image and Signal Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Effective multi-label active learning for text classification

Active learning for text classification with reusability

Deep Reinforcement Active Learning for Medical Image Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Deep Active Learning for Text Classification

ICVISP 2018: Proceedings of the 2nd International Conference on Vision, Image and Signal Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Effective multi-label active learning for text classification

Active learning for text classification with reusability

Deep Reinforcement Active Learning for Medical Image Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media