ABSTRACT
With the development of the Internet, the amount of information grows exponentially, and the automatic text summarization technology becomes more and more important. At present, the majority of researches on automatic summarization techniques are applied to common languages such as Chinese and English, but it is few in low resource language. In this paper, we constructed an automatic summary dataset of Indonesian language and conducts related research on Indonesian automatic abstracts. And in this paper, we propose a new and efficient extraction-based automatic text summarization method based on sentence similarity clustering. Based on the idea of clustering, this paper considers the semantics of sentences and we clusters sentences according to the similarity between sentences and sentences. According to the rules we extracts the abstracts and finally obtains the summarization results. This method not only ensures the integrity, criticality and importance of the summary, but also reduces the information redundancy of the summary. In the evaluation, our method achieved good results and exceeded all the baselines in the indexes of score of ROUGE-1, ROUGE-2, ROUGE-3.
- Luhn, H. P. 1958. The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development. 2, 2, 159--165. Google ScholarDigital Library
- Edmundson, H. P. 1969. New Methods in Automatic Extracting. Journal of the ACM. 16, 2, 264--285. Google ScholarDigital Library
- Wang, Y. C. and Xu, H. M. 1998. The OA-1. 4 Automatic Abstraction System on Chinese Documents. Chinese High Technology Letters. 19--23.Google Scholar
- Zheng, Y. 2003. Research and Implementation of Text Automatic Review System. Journal of Computer Research and Development. 40, 11, 1606--1611.Google Scholar
- Yu, S. S., Su, J. X. and Li, P. F. 2016. Improved TextRank-based Method for Automatic Summarization. Computer Science. 43, 6, 240--247.Google Scholar
- Li, R., Zhang, H. P. and Zhao, Y. P., et al. 2014. Automatic Text Summarization Research Based on Topic Model and Information Entropy. Computer Science. 41, s2, 298--300.Google Scholar
- Baxendale, P. B. 1958. Machine-Made Index for Technical Literature--An Experiment. IBM Journal of Research and Development. 2, 4, 354--361. Google ScholarDigital Library
- Kupiec, J., Pedersen, J. and Chen, F. 1995. A trainable document summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (Seattle, Washington, USA, 1995). SIGIR '95. ACM, New York, NY, 68--73. Google ScholarDigital Library
- Lin, C. Y. 1999. Training a selection function for extraction. In Eighth International Conference on Information & Knowledge Management (Kansas City, Missouri, USA, 1999). CIKM '99. ACM, New York, NY, 55--62. Google ScholarDigital Library
- Osborne, M. 2002. Using maximum entropy for sentence extraction. In ACL-02 Workshop on Automatic Summarization (Phildadelphia, Pennsylvania, 2002). ACL '02. ACM, New York, NY, 1--8. Google ScholarDigital Library
- Conroy, J. M. and O'Leary, D. P. 2001. Text summarization via hidden Markov models. In 24th annual international ACM SIGIR conference on Research and development in information retrieval. Association for Computing Machinery. 406--407. Google ScholarDigital Library
- Mihalcea, R. and Tarau, P. 2004. TextRank: Bringing Order into Texts. In Conference on Empirical Methods in Natural Language Processing (Barcelona, Spain, 2004). EMNLP'04. 404--411.Google Scholar
- Cho, K., Van, M. B., and Gulcehre, C., et al. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Computer Science.Google Scholar
- Chorowski, J., Bahdanau, D., and Serdyuk, D., et al. 2015. Attention-Based Models for Speech Recognition. Computer Science. 10, 4, 429--439.Google Scholar
- Paulin, M., Mairal, J. and Douze, M., et al. 2016. Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach. International Journal of Computer Vision. 121, 1, 1--20. Google ScholarDigital Library
- Mikolov, T., Chen, K. and Corrado, G., et al. 2013. Efficient Estimation of Word Representations in Vector Space. Computer Science.Google Scholar
- Fachrurrozi, M., Yusliani, N. and Yoanita, R. U. 2013. Frequent Term based Text Summarization for Bahasa Indonesia. In Proceedings of the International Conference on Innovations in Engineering and Technology (Bangkok, Thailand, 2013). CIET '13. 30--32.Google Scholar
- Silvia, P., Rukmana, V. and Aprilia, D., et al. 2014. Summarizing Text for Indonesian Language by Using Latent Dirichlet Allocation and Genetic Algorithm. In Proceeding of International Conference on Electrical Engineering, Computer Science and Informatics (Yogyakarta, Indonesia, 2014). EECSI '14. 148--153.Google Scholar
- Najibullah, A. 2015. Indonesian Text Summarization based on Naive Bayes Method. In Proceeding of the International Seminar and Conference 2015: The Golden Triangle (Indonesia-India-Tiongkok) Interrelations in Religion, Science, Culture, and Economic. (Semarang, Indonesia, 2015). ISC'15. 67--78.Google Scholar
- Gunawan, D., Pasaribu, A., Rahmat, R. F. and Budiarto, R. 2017. Automatic Text Summarization for Indonesian Language Using TextTeaser. IOP Conference Series: Materials Science and Engineering. 190, 1, 1--6.Google ScholarCross Ref
- Slamet, C., Atmadja, A. R. and Maylawati, D. S., et al. 2018. Automated Text Summarization for Indonesian Article Using Vector Space Model. IOP Conference Series: Materials Science and Engineering. 288, 1--6.Google Scholar
- Massandy, D. T. and Khodra, M. L. 2014. Guided summarization for Indonesian news articles. In 2014 International Conference of Advanced Informatics: Concept, Theory and Application. ICAICTA'14. 140--145.Google Scholar
- Koto, F. 2016. A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (Portoroz, Slovenia, 2016). LREC'16. European Language Resources Association, 801--805.Google Scholar
- Kurniawan, K. and Louvan, S. 2018. IndoSum: A New Benchmark Dataset for Indonesian Text Summarization. 215--220.Google Scholar
- Nallapati, R., Zhai, F. and Zhou, B. 2017. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA, 2017). AAAI'17. 3075--3081. Google ScholarDigital Library
- Salton, G., Wong, A. and Yang, C. S. 1975. A vector space model for automatic indexing. Communications of the ACM. 18. Google ScholarDigital Library
Index Terms
- Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level
Recommendations
Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information RetrievalSupervised summarization has made significant improvements in recent years by leveraging cutting-edge deep learning technologies. However, the true success of supervised methods relies on the availability of large quantity of human-generated summaries of ...
Extractive text summarization using clustering-based topic modeling
AbstractText summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
Automatic Extractive Text Summarization using Multiple Linguistic Features
Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for ...
Comments