research-article

Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level

Authors:
Zefeng Cai

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
View Profile

,
Nankai Lin

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
View Profile

,
Chuyu Ma

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
View Profile

,
Shengyi Jiang

Eastern Language Processing Center, Guangzhou, China

Eastern Language Processing Center, Guangzhou, China
View Profile

BDE '19: Proceedings of the 2019 International Conference on Big Data EngineeringJune 2019Pages 30–35https://doi.org/10.1145/3341620.3341626

Published:11 June 2019Publication History

BDE '19: Proceedings of the 2019 International Conference on Big Data Engineering

Pages 30–35

ABSTRACT

With the development of the Internet, the amount of information grows exponentially, and the automatic text summarization technology becomes more and more important. At present, the majority of researches on automatic summarization techniques are applied to common languages such as Chinese and English, but it is few in low resource language. In this paper, we constructed an automatic summary dataset of Indonesian language and conducts related research on Indonesian automatic abstracts. And in this paper, we propose a new and efficient extraction-based automatic text summarization method based on sentence similarity clustering. Based on the idea of clustering, this paper considers the semantics of sentences and we clusters sentences according to the similarity between sentences and sentences. According to the rules we extracts the abstracts and finally obtains the summarization results. This method not only ensures the integrity, criticality and importance of the summary, but also reduces the information redundancy of the summary. In the evaluation, our method achieved good results and exceeded all the baselines in the indexes of score of ROUGE-1, ROUGE-2, ROUGE-3.

References

Luhn, H. P. 1958. The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development. 2, 2, 159--165. Google ScholarDigital Library
Edmundson, H. P. 1969. New Methods in Automatic Extracting. Journal of the ACM. 16, 2, 264--285. Google ScholarDigital Library
Wang, Y. C. and Xu, H. M. 1998. The OA-1. 4 Automatic Abstraction System on Chinese Documents. Chinese High Technology Letters. 19--23.Google Scholar
Zheng, Y. 2003. Research and Implementation of Text Automatic Review System. Journal of Computer Research and Development. 40, 11, 1606--1611.Google Scholar
Yu, S. S., Su, J. X. and Li, P. F. 2016. Improved TextRank-based Method for Automatic Summarization. Computer Science. 43, 6, 240--247.Google Scholar
Li, R., Zhang, H. P. and Zhao, Y. P., et al. 2014. Automatic Text Summarization Research Based on Topic Model and Information Entropy. Computer Science. 41, s2, 298--300.Google Scholar
Baxendale, P. B. 1958. Machine-Made Index for Technical Literature--An Experiment. IBM Journal of Research and Development. 2, 4, 354--361. Google ScholarDigital Library
Kupiec, J., Pedersen, J. and Chen, F. 1995. A trainable document summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (Seattle, Washington, USA, 1995). SIGIR '95. ACM, New York, NY, 68--73. Google ScholarDigital Library
Lin, C. Y. 1999. Training a selection function for extraction. In Eighth International Conference on Information & Knowledge Management (Kansas City, Missouri, USA, 1999). CIKM '99. ACM, New York, NY, 55--62. Google ScholarDigital Library
Osborne, M. 2002. Using maximum entropy for sentence extraction. In ACL-02 Workshop on Automatic Summarization (Phildadelphia, Pennsylvania, 2002). ACL '02. ACM, New York, NY, 1--8. Google ScholarDigital Library
Conroy, J. M. and O'Leary, D. P. 2001. Text summarization via hidden Markov models. In 24th annual international ACM SIGIR conference on Research and development in information retrieval. Association for Computing Machinery. 406--407. Google ScholarDigital Library
Mihalcea, R. and Tarau, P. 2004. TextRank: Bringing Order into Texts. In Conference on Empirical Methods in Natural Language Processing (Barcelona, Spain, 2004). EMNLP'04. 404--411.Google Scholar
Cho, K., Van, M. B., and Gulcehre, C., et al. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Computer Science.Google Scholar
Chorowski, J., Bahdanau, D., and Serdyuk, D., et al. 2015. Attention-Based Models for Speech Recognition. Computer Science. 10, 4, 429--439.Google Scholar
Paulin, M., Mairal, J. and Douze, M., et al. 2016. Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach. International Journal of Computer Vision. 121, 1, 1--20. Google ScholarDigital Library
Mikolov, T., Chen, K. and Corrado, G., et al. 2013. Efficient Estimation of Word Representations in Vector Space. Computer Science.Google Scholar
Fachrurrozi, M., Yusliani, N. and Yoanita, R. U. 2013. Frequent Term based Text Summarization for Bahasa Indonesia. In Proceedings of the International Conference on Innovations in Engineering and Technology (Bangkok, Thailand, 2013). CIET '13. 30--32.Google Scholar
Silvia, P., Rukmana, V. and Aprilia, D., et al. 2014. Summarizing Text for Indonesian Language by Using Latent Dirichlet Allocation and Genetic Algorithm. In Proceeding of International Conference on Electrical Engineering, Computer Science and Informatics (Yogyakarta, Indonesia, 2014). EECSI '14. 148--153.Google Scholar
Najibullah, A. 2015. Indonesian Text Summarization based on Naive Bayes Method. In Proceeding of the International Seminar and Conference 2015: The Golden Triangle (Indonesia-India-Tiongkok) Interrelations in Religion, Science, Culture, and Economic. (Semarang, Indonesia, 2015). ISC'15. 67--78.Google Scholar
Gunawan, D., Pasaribu, A., Rahmat, R. F. and Budiarto, R. 2017. Automatic Text Summarization for Indonesian Language Using TextTeaser. IOP Conference Series: Materials Science and Engineering. 190, 1, 1--6.Google ScholarCross Ref
Slamet, C., Atmadja, A. R. and Maylawati, D. S., et al. 2018. Automated Text Summarization for Indonesian Article Using Vector Space Model. IOP Conference Series: Materials Science and Engineering. 288, 1--6.Google Scholar
Massandy, D. T. and Khodra, M. L. 2014. Guided summarization for Indonesian news articles. In 2014 International Conference of Advanced Informatics: Concept, Theory and Application. ICAICTA'14. 140--145.Google Scholar
Koto, F. 2016. A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (Portoroz, Slovenia, 2016). LREC'16. European Language Resources Association, 801--805.Google Scholar
Kurniawan, K. and Louvan, S. 2018. IndoSum: A New Benchmark Dataset for Indonesian Text Summarization. 215--220.Google Scholar
Nallapati, R., Zhai, F. and Zhou, B. 2017. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA, 2017). AAAI'17. 3075--3081. Google ScholarDigital Library
Salton, G., Wong, A. and Yang, C. S. 1975. A vector space model for automatic indexing. Communications of the ACM. 18. Google ScholarDigital Library

Index Terms

Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Supervised summarization has made significant improvements in recent years by leveraging cutting-edge deep learning technologies. However, the true success of supervised methods relies on the availability of large quantity of human-generated summaries of ...
Read More
Extractive text summarization using clustering-based topic modeling
Abstract
Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
Read More
Automatic Extractive Text Summarization using Multiple Linguistic Features
Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for ...
Read More

Reviews

Reviewer: Mariana Damova

A sentence-centered account of text summarization, this work can be applied to any language. Characteristics of the Indonesian language are irrelevant to the proposed approach and are not discussed. Neither is the suitability of the proposed approach to the Indonesian language. The paper's contribution is that it exploits an abstract extraction method that reaches an accuracy close to generative summary. It first converts the sentences into sentence vectors, and then calculates the similarity between the sentences, clusters them, and extracts and sorts the selected sentences from the clusters. The first step adopts the Google word2Vec model. It is not clear, however, what method is used to calculate the sentence similarity, except for predefining the similarity threshold. Further, clustering determines "the sentence with the largest amount of information" to be the core sentence of the cluster. This sentence is extracted to be included in the summary. The experiments were carried out on an especially constituted corpus of Indonesian texts. Tests were performed against ROUGE-1, ROUGE-2, and ROUGE-3 as measurement indices, and the results compared with the results of six popular state-of-the-art text summarization algorithms. The comparison clearly shows the proposed method's superior performance. This well-described paper, though lacking some technical details, includes a quite substantial list of related work. It is a good read for scholars and practitioners of text summarization in general, regardless of the language.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

BDE '19: Proceedings of the 2019 International Conference on Big Data Engineering
June 2019
137 pages
ISBN:9781450360913
DOI:10.1145/3341620

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Extractive Summarization
Indonesian
Sentences Clustering
Sentences Similarity
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 224
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level

BDE '19: Proceedings of the 2019 International Conference on Big Data Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs

Extractive text summarization using clustering-based topic modeling

Automatic Extractive Text Summarization using Multiple Linguistic Features

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level

BDE '19: Proceedings of the 2019 International Conference on Big Data Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs

Extractive text summarization using clustering-based topic modeling

Automatic Extractive Text Summarization using Multiple Linguistic Features

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media