research-article

Phrasal Graph-based Method for Abstractive Vietnamese Paragraph Compression

Authors:
Dang Tuan Nguyen

University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam

University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam
View Profile

,
Trung Tran

University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam

University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam
View Profile

SoICT '17: Proceedings of the 8th International Symposium on Information and Communication TechnologyDecember 2017Pages 143–150https://doi.org/10.1145/3155133.3155177

Published:07 December 2017Publication History

SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology

Pages 143–150

ABSTRACT

Text compression is the task of identifying the main information in the source text to form a short single sentence. A broad approach is to find a path containing common vertices in the word graph model. The first issue of this approach is that the path finding algorithm can separate words from the phrase expressing a content. This leads to create new sentences having different meaning from the original ones. The second issue is that when an information is expressed by different words or phrases, called co-reference situations. Due to lacking of mechanism for handling this situation, the compression will be missing information. We propose in this paper a method to overcome the above issues. The core of new method is the improved graph model in which each vertex illustrates a phrase with its corresponding Part-of-Speech label. The intersection vertices of branches are results of mechanism for handling co-references. The compressing algorithm reduces the graph and forms the final sentence. We use ROUGE measure to compare with two word graph-based baselines. The experiment result shows that our method creates short sentences containing rich information.

References

A. Khan and N. Salim. 2014. A Review on Abstractive Summarization Methods. Journal of Theoretical and Applied Information Technology 59, 1 (2014), 64--72.Google Scholar
B. Santorini. 1990. Part-of-speech Tagging Guidelines for the Penn Treebank Project. Technical Report MS-CIS- 90-47. Department of Computer and Information Science, University of Pennsylvania.Google Scholar
C. F. Greenbacker. 2011. Towards a framework for abstractive summarization of multimodal documents. In ACL HLT. 75. Google ScholarDigital Library
C. S. Lee, Z. W. Jian and L. K. Huang. 2005. A Fuzzy Ontology and Its Application to News Summarization. IEEE Transaction on Systems, Man and Cybernetics, Part B: Cybernetics 35, 5 (2005), 859--880. Google ScholarDigital Library
C. S. Saranyamol and L. Sindhu. 2014. A Survey on Automatic Text Summarization. International Journal of Computer Science and Information Technologies 5, 6 (2014), 7889--7893.Google Scholar
C. Y. Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Proceeding of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. Barcelona, Spain.Google Scholar
D. Das and A. F. T. Martins. 2007. A survey on automatic text summarization. Language Technologies Institute, Carnegie Mellon University.Google Scholar
E. Lloret. 2008. Text summarization: an overview. Paper supported by the Spanish Government under the project TEXT-MESS (TIN2006-15265- C06-01).Google Scholar
E. Lloret and M. Palomar. 2011. Analyzing the Use of Word Graphs for Abstractive Text Summarization. In Proceeding of The First International Conference on Advances in Information Mining and Management.Google Scholar
E. Krahmer, E. Marsi and Paul van Pelt. 2008. Query-based sentence fusion is better defined and leads to more preferred results than generic sentence fusion. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies, Short Papers (Companion Volume). Columbus, Ohio, USA, June 2008, 193--196. Google ScholarDigital Library
F. Boudin and E. Morin. 2013. Keyphrase extraction for n-best reranking in multi-sentence compression. In Proceeding of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013). Atlanta, Georgia, 298--305.Google Scholar
F. Cornish. 2009. Inter-sentential anaphora and coherence relations in discourse: a perfect match. Language Sciences, 31, 5 (2009), 572--592.Google ScholarCross Ref
H. P. Luhn.1958. The automatic creation of literature abstracts. IBM Journal of Research Development 2, 2 (1958), 159--165. Google ScholarDigital Library
H. P. Edmundson. 1969. New methods in automatic extracting. Journal of the ACM 1, 2 (1969), 264--285. Google ScholarDigital Library
H. T. Le and T. M. Le. 2013. An approach to Abstractive Text Summarization. In Proceeding of 5th International Conference of Soft Computing and Pattern Recognition (SoCPaR 2013). Hanoi, Vietnam. 372--377.Google Scholar
H. X. Cao. 2006. Tiêng Viêt: So' thao ngũ pháp chũc năng {Vietnamese: Brief of Functional Grammar}. Nhà xuât bân giáo dũc {Education Publisher}.Google Scholar
I. F. Moawad and M. Aref. 2012. Semantic graph reduction approach for abstractive Text Summarization. In Proceeding of 7th International Conference on Computer Engineering & Systems (ICCES). 132--138.Google Scholar
I. Mani. 2001. Automatic Summarization. John Benjamins Publishing Company.Google Scholar
J. Clarke and M. Lapata. 2006a. Constraint-Based Sentence Compression: An Integer Programming Approach. In Proceedings of the COLING/ACL 2006 Main Conference Poster Session. Sydney, Australia, 144--151. Google ScholarDigital Library
J. Clarke and M. Lapata. 2006b. Models for sentence compression: A comparison across domains, training requirements and evaluation measures. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Sydney, Australia, 17-8 July, 377--384. Google ScholarDigital Library
J. Clarke and M. Lapata. 2008. Global inference for sentence compression: An integer linear programming approach. Journal of Artificial Intelligence Research, 31 (2008), 399--429. Google ScholarDigital Library
K. A. Ganesan, C. X. Zhai and J. Han. 2010. Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010). Beijing, China. 340--348. Google ScholarDigital Library
K. Filippova. 2010. Multi-Sentence Compression: Finding Shortest Paths in Word Graphs. In Proceeding of the 23rd International Conference on Computational Linguistics (COLING 2010). Beijing, China. 322--330. Google ScholarDigital Library
K. Filippova and M. Strube. 2008a. Dependency Tree Based Sentence Compression. In Proceeding of the 5th International Natural Language Generation Conference. Salt Fork, Ohio. Google ScholarDigital Library
K. Filippova and M. Strube. 2008b. Sentence Fusion via Dependency Graph Compression. In Proceeding of the Conference on Empirical Methods in Natural Language Processing. Honolulu, Hawaii. Google ScholarDigital Library
K. Jezek and J. Steinberger. 2008. Automatic Text summarization. Vaclav Snasel (Ed.): Znalosti 2008, ISBN 978-80-227-2827-0, HIT STU Bratislava. Ustav Informatiky a softveroveho inzinierstva, 1--12.Google Scholar
K. S. Jones. 2007. Automatic summarising: a review and discussion of the state of the art. Technical Report 679. Computer Laboratory, University of Cambridge.Google Scholar
N. R. Kasture, N. Yargal, N. N. Singh, N. Kulkarni and V. Mathur. 2014. A Survey on Methods of Abstractive Text Summarization. International Journal for Research in Merging Science and Technology 1, 6 (2014), 53--57.Google Scholar
P. Baxendale. 1958. Machine-made index for technical literature -- an experiment. IBM Journal of Research Development 2, 4 (1958), 354--361. Google ScholarDigital Library
P. E. Genest and G. Lapalme. 2010. Text Generation for Abstractive Summarization. In Proceedings of the 3rd Text Analysis Conference.Google Scholar
P. E. Genest and G. Lapalme. 2011. Framework for Abstractive Summarization using Text-to-Text Generation. In Workshop on Monolingual Text-To-Text Generation, pages 64--73. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, 24 June 2011, 64--73. Google ScholarDigital Library
P. E. Genest and G. Lapalme. 2012. Fully Abstractive Approach to Guided Summarization. In Proceeding of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers -- Volum 2. Jeju Island, Korea, 354--358. Google ScholarDigital Library
R. Barzilay, K. R. McKeown and M. Elhadad. 1999. Information fusion in the context of multi-document summarization. In Proceeding of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. 550--557. Google ScholarDigital Library
R. Barzilay and K. R. McKeown. 2005. Sentence Fusion for Multi-document News Summarization. Computational Linguistics 31, 3 (2005), 297--328.Google ScholarDigital Library
S. M. Harabagiu and F. Lacatusu. 2002. Generating single and multi-document summaries with gistexter. In Proceeding of Document Understanding Conferences.Google Scholar
T. Tran and D. T. Nguyen. 2013a. A Solution for Resolving Inter-sentential Anaphoric Pronouns for Vietnamese Paragraphs Composing Two Single Sentences. In Proceeding of the 5th International Conference of Soft Computing and Pattern Recognition (SoCPaR 2013). Hanoi, Vietnam, 172--177.Google Scholar
T. Tran and D. T. Nguyen. 2013b. Improve effectiveness resolving some inter-sentential anaphoric pronouns indicating human objects in Vietnamese paragraphs using finding heuristics with priority. In Proceedings of the 10th RIVF International Conference on Computing and Communication Technologies--Research, Innova- tion, and Vision for the Future (RIVF'13). Hanoi, Vietnam. 109--114.Google Scholar
T. Tran and D. T. Nguyen. 2006. Môt Phũong Pháp Dũa Trên Luât &dstrok;e Chuyên &Dstrok;oi Văn Bân Tiêng Viêt vê DRS (Discourse Representation Structure) {A Rule-based Method for Transforming Vietnamese Paragraphs into DRS (Discourse Representation Structure)}. Chuyên san Công nghê Thông tin và Truyên thông, Tâp chí Khoa hôc và Ky thuât, Hôc viên Ky thuât quân sũ {Journal of Science and Technology: The Section on Information and Communication Technology (LQDTU-JICT)}, 9 (2016), 61--83.Google Scholar
V. Gupta and G. S. Lehal. 2010. A survey of text summarization extractive techniques. Journal of Emerging Technology in Web Intelligence 2, 3 (2010). 258--268.Google Scholar

Index Terms

Phrasal Graph-based Method for Abstractive Vietnamese Paragraph Compression
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Discourse, dialogue and pragmatics
      2. Information extraction

Recommendations

A syllable-based method for Vietnamese text compression
IMCOM '16: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication

Text compression is a technique to reduce the size of text file and increase the transfer rate as well as save storage space. Many approaches have been proposed to tackle this problem in several languages such as: English, Chinese, Turkey, Japanese, ...
Read More
Conjugation-based compression for Hebrew texts

Traditional compression techniques do not look deeply into the morphology of languages. This can be less critical in languages like English where most of the sequences are illegal according to the grammatical rules of the language, for example, zx, bv ...
Read More
A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics

Emotion classification is used in many commercial applications and research applications. The semantic classification models (or sentiment classification methods) are based on the vocabulary of the emotion dictionary being studied and being used very ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology
December 2017
486 pages
ISBN:9781450353281
DOI:10.1145/3155133
General Chairs:
Huynh Quyet Thang
HUST, Vietnam
,
Zhenjiang Hu
NII, Japan
,
Program Chairs:
Marc Bui
EPHE, France
,
Biplab Sikdar
NUS, Singapore
,
Ichiro IDE
Nagoya, Japan
,
Huynh Thi Thanh Binh
HUST, Vietnam
,
Publications Chairs:
Worrawat Engchuan
Canada
,
Dinh Viet Sang
HUST, Vietnam
,
Nguyen Thi Oanh
HUST, Vietnam
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 December 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Co-Reference Resolution
Graph Construction
Text Compression
Text Tagging
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate147of318submissions,46%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 23
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Phrasal Graph-based Method for Abstractive Vietnamese Paragraph Compression

SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

A syllable-based method for Vietnamese text compression

Conjugation-based compression for Hebrew texts

A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics