ABSTRACT
Text compression is the task of identifying the main information in the source text to form a short single sentence. A broad approach is to find a path containing common vertices in the word graph model. The first issue of this approach is that the path finding algorithm can separate words from the phrase expressing a content. This leads to create new sentences having different meaning from the original ones. The second issue is that when an information is expressed by different words or phrases, called co-reference situations. Due to lacking of mechanism for handling this situation, the compression will be missing information. We propose in this paper a method to overcome the above issues. The core of new method is the improved graph model in which each vertex illustrates a phrase with its corresponding Part-of-Speech label. The intersection vertices of branches are results of mechanism for handling co-references. The compressing algorithm reduces the graph and forms the final sentence. We use ROUGE measure to compare with two word graph-based baselines. The experiment result shows that our method creates short sentences containing rich information.
- A. Khan and N. Salim. 2014. A Review on Abstractive Summarization Methods. Journal of Theoretical and Applied Information Technology 59, 1 (2014), 64--72.Google Scholar
- B. Santorini. 1990. Part-of-speech Tagging Guidelines for the Penn Treebank Project. Technical Report MS-CIS- 90-47. Department of Computer and Information Science, University of Pennsylvania.Google Scholar
- C. F. Greenbacker. 2011. Towards a framework for abstractive summarization of multimodal documents. In ACL HLT. 75. Google ScholarDigital Library
- C. S. Lee, Z. W. Jian and L. K. Huang. 2005. A Fuzzy Ontology and Its Application to News Summarization. IEEE Transaction on Systems, Man and Cybernetics, Part B: Cybernetics 35, 5 (2005), 859--880. Google ScholarDigital Library
- C. S. Saranyamol and L. Sindhu. 2014. A Survey on Automatic Text Summarization. International Journal of Computer Science and Information Technologies 5, 6 (2014), 7889--7893.Google Scholar
- C. Y. Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Proceeding of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. Barcelona, Spain.Google Scholar
- D. Das and A. F. T. Martins. 2007. A survey on automatic text summarization. Language Technologies Institute, Carnegie Mellon University.Google Scholar
- E. Lloret. 2008. Text summarization: an overview. Paper supported by the Spanish Government under the project TEXT-MESS (TIN2006-15265- C06-01).Google Scholar
- E. Lloret and M. Palomar. 2011. Analyzing the Use of Word Graphs for Abstractive Text Summarization. In Proceeding of The First International Conference on Advances in Information Mining and Management.Google Scholar
- E. Krahmer, E. Marsi and Paul van Pelt. 2008. Query-based sentence fusion is better defined and leads to more preferred results than generic sentence fusion. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies, Short Papers (Companion Volume). Columbus, Ohio, USA, June 2008, 193--196. Google ScholarDigital Library
- F. Boudin and E. Morin. 2013. Keyphrase extraction for n-best reranking in multi-sentence compression. In Proceeding of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013). Atlanta, Georgia, 298--305.Google Scholar
- F. Cornish. 2009. Inter-sentential anaphora and coherence relations in discourse: a perfect match. Language Sciences, 31, 5 (2009), 572--592.Google ScholarCross Ref
- H. P. Luhn.1958. The automatic creation of literature abstracts. IBM Journal of Research Development 2, 2 (1958), 159--165. Google ScholarDigital Library
- H. P. Edmundson. 1969. New methods in automatic extracting. Journal of the ACM 1, 2 (1969), 264--285. Google ScholarDigital Library
- H. T. Le and T. M. Le. 2013. An approach to Abstractive Text Summarization. In Proceeding of 5th International Conference of Soft Computing and Pattern Recognition (SoCPaR 2013). Hanoi, Vietnam. 372--377.Google Scholar
- H. X. Cao. 2006. Tiêng Viêt: So' thao ngũ pháp chũc năng {Vietnamese: Brief of Functional Grammar}. Nhà xuât bân giáo dũc {Education Publisher}.Google Scholar
- I. F. Moawad and M. Aref. 2012. Semantic graph reduction approach for abstractive Text Summarization. In Proceeding of 7th International Conference on Computer Engineering & Systems (ICCES). 132--138.Google Scholar
- I. Mani. 2001. Automatic Summarization. John Benjamins Publishing Company.Google Scholar
- J. Clarke and M. Lapata. 2006a. Constraint-Based Sentence Compression: An Integer Programming Approach. In Proceedings of the COLING/ACL 2006 Main Conference Poster Session. Sydney, Australia, 144--151. Google ScholarDigital Library
- J. Clarke and M. Lapata. 2006b. Models for sentence compression: A comparison across domains, training requirements and evaluation measures. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Sydney, Australia, 17-8 July, 377--384. Google ScholarDigital Library
- J. Clarke and M. Lapata. 2008. Global inference for sentence compression: An integer linear programming approach. Journal of Artificial Intelligence Research, 31 (2008), 399--429. Google ScholarDigital Library
- K. A. Ganesan, C. X. Zhai and J. Han. 2010. Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010). Beijing, China. 340--348. Google ScholarDigital Library
- K. Filippova. 2010. Multi-Sentence Compression: Finding Shortest Paths in Word Graphs. In Proceeding of the 23rd International Conference on Computational Linguistics (COLING 2010). Beijing, China. 322--330. Google ScholarDigital Library
- K. Filippova and M. Strube. 2008a. Dependency Tree Based Sentence Compression. In Proceeding of the 5th International Natural Language Generation Conference. Salt Fork, Ohio. Google ScholarDigital Library
- K. Filippova and M. Strube. 2008b. Sentence Fusion via Dependency Graph Compression. In Proceeding of the Conference on Empirical Methods in Natural Language Processing. Honolulu, Hawaii. Google ScholarDigital Library
- K. Jezek and J. Steinberger. 2008. Automatic Text summarization. Vaclav Snasel (Ed.): Znalosti 2008, ISBN 978-80-227-2827-0, HIT STU Bratislava. Ustav Informatiky a softveroveho inzinierstva, 1--12.Google Scholar
- K. S. Jones. 2007. Automatic summarising: a review and discussion of the state of the art. Technical Report 679. Computer Laboratory, University of Cambridge.Google Scholar
- N. R. Kasture, N. Yargal, N. N. Singh, N. Kulkarni and V. Mathur. 2014. A Survey on Methods of Abstractive Text Summarization. International Journal for Research in Merging Science and Technology 1, 6 (2014), 53--57.Google Scholar
- P. Baxendale. 1958. Machine-made index for technical literature -- an experiment. IBM Journal of Research Development 2, 4 (1958), 354--361. Google ScholarDigital Library
- P. E. Genest and G. Lapalme. 2010. Text Generation for Abstractive Summarization. In Proceedings of the 3rd Text Analysis Conference.Google Scholar
- P. E. Genest and G. Lapalme. 2011. Framework for Abstractive Summarization using Text-to-Text Generation. In Workshop on Monolingual Text-To-Text Generation, pages 64--73. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, 24 June 2011, 64--73. Google ScholarDigital Library
- P. E. Genest and G. Lapalme. 2012. Fully Abstractive Approach to Guided Summarization. In Proceeding of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers -- Volum 2. Jeju Island, Korea, 354--358. Google ScholarDigital Library
- R. Barzilay, K. R. McKeown and M. Elhadad. 1999. Information fusion in the context of multi-document summarization. In Proceeding of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. 550--557. Google ScholarDigital Library
- R. Barzilay and K. R. McKeown. 2005. Sentence Fusion for Multi-document News Summarization. Computational Linguistics 31, 3 (2005), 297--328.Google ScholarDigital Library
- S. M. Harabagiu and F. Lacatusu. 2002. Generating single and multi-document summaries with gistexter. In Proceeding of Document Understanding Conferences.Google Scholar
- T. Tran and D. T. Nguyen. 2013a. A Solution for Resolving Inter-sentential Anaphoric Pronouns for Vietnamese Paragraphs Composing Two Single Sentences. In Proceeding of the 5th International Conference of Soft Computing and Pattern Recognition (SoCPaR 2013). Hanoi, Vietnam, 172--177.Google Scholar
- T. Tran and D. T. Nguyen. 2013b. Improve effectiveness resolving some inter-sentential anaphoric pronouns indicating human objects in Vietnamese paragraphs using finding heuristics with priority. In Proceedings of the 10th RIVF International Conference on Computing and Communication Technologies--Research, Innova- tion, and Vision for the Future (RIVF'13). Hanoi, Vietnam. 109--114.Google Scholar
- T. Tran and D. T. Nguyen. 2006. Môt Phũong Pháp Dũa Trên Luât đe Chuyên Đoi Văn Bân Tiêng Viêt vê DRS (Discourse Representation Structure) {A Rule-based Method for Transforming Vietnamese Paragraphs into DRS (Discourse Representation Structure)}. Chuyên san Công nghê Thông tin và Truyên thông, Tâp chí Khoa hôc và Ky thuât, Hôc viên Ky thuât quân sũ {Journal of Science and Technology: The Section on Information and Communication Technology (LQDTU-JICT)}, 9 (2016), 61--83.Google Scholar
- V. Gupta and G. S. Lehal. 2010. A survey of text summarization extractive techniques. Journal of Emerging Technology in Web Intelligence 2, 3 (2010). 258--268.Google Scholar
Index Terms
- Phrasal Graph-based Method for Abstractive Vietnamese Paragraph Compression
Recommendations
A syllable-based method for Vietnamese text compression
IMCOM '16: Proceedings of the 10th International Conference on Ubiquitous Information Management and CommunicationText compression is a technique to reduce the size of text file and increase the transfer rate as well as save storage space. Many approaches have been proposed to tackle this problem in several languages such as: English, Chinese, Turkey, Japanese, ...
Conjugation-based compression for Hebrew texts
Traditional compression techniques do not look deeply into the morphology of languages. This can be less critical in languages like English where most of the sequences are illegal according to the grammatical rules of the language, for example, zx, bv ...
A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics
Emotion classification is used in many commercial applications and research applications. The semantic classification models (or sentiment classification methods) are based on the vocabulary of the emotion dictionary being studied and being used very ...
Comments