skip to main content
10.1145/3341620.3341626acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbdeConference Proceedingsconference-collections
research-article

Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level

Authors Info & Claims
Published:11 June 2019Publication History

ABSTRACT

With the development of the Internet, the amount of information grows exponentially, and the automatic text summarization technology becomes more and more important. At present, the majority of researches on automatic summarization techniques are applied to common languages such as Chinese and English, but it is few in low resource language. In this paper, we constructed an automatic summary dataset of Indonesian language and conducts related research on Indonesian automatic abstracts. And in this paper, we propose a new and efficient extraction-based automatic text summarization method based on sentence similarity clustering. Based on the idea of clustering, this paper considers the semantics of sentences and we clusters sentences according to the similarity between sentences and sentences. According to the rules we extracts the abstracts and finally obtains the summarization results. This method not only ensures the integrity, criticality and importance of the summary, but also reduces the information redundancy of the summary. In the evaluation, our method achieved good results and exceeded all the baselines in the indexes of score of ROUGE-1, ROUGE-2, ROUGE-3.

References

  1. Luhn, H. P. 1958. The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development. 2, 2, 159--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Edmundson, H. P. 1969. New Methods in Automatic Extracting. Journal of the ACM. 16, 2, 264--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Wang, Y. C. and Xu, H. M. 1998. The OA-1. 4 Automatic Abstraction System on Chinese Documents. Chinese High Technology Letters. 19--23.Google ScholarGoogle Scholar
  4. Zheng, Y. 2003. Research and Implementation of Text Automatic Review System. Journal of Computer Research and Development. 40, 11, 1606--1611.Google ScholarGoogle Scholar
  5. Yu, S. S., Su, J. X. and Li, P. F. 2016. Improved TextRank-based Method for Automatic Summarization. Computer Science. 43, 6, 240--247.Google ScholarGoogle Scholar
  6. Li, R., Zhang, H. P. and Zhao, Y. P., et al. 2014. Automatic Text Summarization Research Based on Topic Model and Information Entropy. Computer Science. 41, s2, 298--300.Google ScholarGoogle Scholar
  7. Baxendale, P. B. 1958. Machine-Made Index for Technical Literature--An Experiment. IBM Journal of Research and Development. 2, 4, 354--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kupiec, J., Pedersen, J. and Chen, F. 1995. A trainable document summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (Seattle, Washington, USA, 1995). SIGIR '95. ACM, New York, NY, 68--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lin, C. Y. 1999. Training a selection function for extraction. In Eighth International Conference on Information & Knowledge Management (Kansas City, Missouri, USA, 1999). CIKM '99. ACM, New York, NY, 55--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Osborne, M. 2002. Using maximum entropy for sentence extraction. In ACL-02 Workshop on Automatic Summarization (Phildadelphia, Pennsylvania, 2002). ACL '02. ACM, New York, NY, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Conroy, J. M. and O'Leary, D. P. 2001. Text summarization via hidden Markov models. In 24th annual international ACM SIGIR conference on Research and development in information retrieval. Association for Computing Machinery. 406--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mihalcea, R. and Tarau, P. 2004. TextRank: Bringing Order into Texts. In Conference on Empirical Methods in Natural Language Processing (Barcelona, Spain, 2004). EMNLP'04. 404--411.Google ScholarGoogle Scholar
  13. Cho, K., Van, M. B., and Gulcehre, C., et al. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Computer Science.Google ScholarGoogle Scholar
  14. Chorowski, J., Bahdanau, D., and Serdyuk, D., et al. 2015. Attention-Based Models for Speech Recognition. Computer Science. 10, 4, 429--439.Google ScholarGoogle Scholar
  15. Paulin, M., Mairal, J. and Douze, M., et al. 2016. Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach. International Journal of Computer Vision. 121, 1, 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mikolov, T., Chen, K. and Corrado, G., et al. 2013. Efficient Estimation of Word Representations in Vector Space. Computer Science.Google ScholarGoogle Scholar
  17. Fachrurrozi, M., Yusliani, N. and Yoanita, R. U. 2013. Frequent Term based Text Summarization for Bahasa Indonesia. In Proceedings of the International Conference on Innovations in Engineering and Technology (Bangkok, Thailand, 2013). CIET '13. 30--32.Google ScholarGoogle Scholar
  18. Silvia, P., Rukmana, V. and Aprilia, D., et al. 2014. Summarizing Text for Indonesian Language by Using Latent Dirichlet Allocation and Genetic Algorithm. In Proceeding of International Conference on Electrical Engineering, Computer Science and Informatics (Yogyakarta, Indonesia, 2014). EECSI '14. 148--153.Google ScholarGoogle Scholar
  19. Najibullah, A. 2015. Indonesian Text Summarization based on Naive Bayes Method. In Proceeding of the International Seminar and Conference 2015: The Golden Triangle (Indonesia-India-Tiongkok) Interrelations in Religion, Science, Culture, and Economic. (Semarang, Indonesia, 2015). ISC'15. 67--78.Google ScholarGoogle Scholar
  20. Gunawan, D., Pasaribu, A., Rahmat, R. F. and Budiarto, R. 2017. Automatic Text Summarization for Indonesian Language Using TextTeaser. IOP Conference Series: Materials Science and Engineering. 190, 1, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  21. Slamet, C., Atmadja, A. R. and Maylawati, D. S., et al. 2018. Automated Text Summarization for Indonesian Article Using Vector Space Model. IOP Conference Series: Materials Science and Engineering. 288, 1--6.Google ScholarGoogle Scholar
  22. Massandy, D. T. and Khodra, M. L. 2014. Guided summarization for Indonesian news articles. In 2014 International Conference of Advanced Informatics: Concept, Theory and Application. ICAICTA'14. 140--145.Google ScholarGoogle Scholar
  23. Koto, F. 2016. A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (Portoroz, Slovenia, 2016). LREC'16. European Language Resources Association, 801--805.Google ScholarGoogle Scholar
  24. Kurniawan, K. and Louvan, S. 2018. IndoSum: A New Benchmark Dataset for Indonesian Text Summarization. 215--220.Google ScholarGoogle Scholar
  25. Nallapati, R., Zhai, F. and Zhou, B. 2017. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA, 2017). AAAI'17. 3075--3081. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Salton, G., Wong, A. and Yang, C. S. 1975. A vector space model for automatic indexing. Communications of the ACM. 18. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level

    Recommendations

    Reviews

    Mariana Damova

    A sentence-centered account of text summarization, this work can be applied to any language. Characteristics of the Indonesian language are irrelevant to the proposed approach and are not discussed. Neither is the suitability of the proposed approach to the Indonesian language. The paper's contribution is that it exploits an abstract extraction method that reaches an accuracy close to generative summary. It first converts the sentences into sentence vectors, and then calculates the similarity between the sentences, clusters them, and extracts and sorts the selected sentences from the clusters. The first step adopts the Google word2Vec model. It is not clear, however, what method is used to calculate the sentence similarity, except for predefining the similarity threshold. Further, clustering determines "the sentence with the largest amount of information" to be the core sentence of the cluster. This sentence is extracted to be included in the summary. The experiments were carried out on an especially constituted corpus of Indonesian texts. Tests were performed against ROUGE-1, ROUGE-2, and ROUGE-3 as measurement indices, and the results compared with the results of six popular state-of-the-art text summarization algorithms. The comparison clearly shows the proposed method's superior performance. This well-described paper, though lacking some technical details, includes a quite substantial list of related work. It is a good read for scholars and practitioners of text summarization in general, regardless of the language.

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      BDE '19: Proceedings of the 2019 International Conference on Big Data Engineering
      June 2019
      137 pages
      ISBN:9781450360913
      DOI:10.1145/3341620

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader