ABSTRACT
Because of the agility and diversity of natural languages, extracting the subject of text is one of the most difficult but important tasks in natural language processing (NLP). Due to the unique linguistics and grammar structures of Chinese, we now can only adopt non-semantic based approaches to extract subject from Chinese text. Three different approaches of extracting subject from Chinese text are presented in this paper. The first one is based a component-word dictionary, the second one is based on a subject-word dictionary and the third one is based on a statistic method. We introduce the process of the approaches. To test our approaches, we develop three independent systems and design a comparison experiment. The experimental results are illuminating and inspiring: every system can extract the text's subject to some extent, however, we may need combine these approaches to get a better one.
- 1.Text Mining Technology: Turning Information into Knowledge, A while paper from IBM. IBM 1998.Google Scholar
- 2.McKeown, Radev. Generating summaries of multiple news articles. SIGIR 95 proceeding. Google ScholarDigital Library
- 3.G. Salton: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison Wesley, 1989. Google ScholarDigital Library
- 4.Regina Barzilay, Michael Elhadad. Using Lexical Chains for Text Summarization. http://www.cs.bgu.ac.il/elhadad.Google Scholar
- Research on extracting subject from Chinese text (poster session)
Recommendations
A fast algorithm for detection of Chinese lihe words (poster session)
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languagesIn the Chinese linguistic study, there is a kind of words that are called lihe words. These words can be separated by certain modifier words such as adjective, quantifier, and so on, but the meaning of the word is not changed. If we separated the part ...
Automated Extraction of Lexicon Applied both to Chinese and Japanese Corpora
ACSAT '12: Proceedings of the 2012 International Conference on Advanced Computer Science Applications and TechnologiesA novel statistical approach is described, enabling the automated extraction of large word lists from unsegmented corpora without reliance on existing dictionaries. The main contribution of this approach includes the following two points: First, it's ...
Pseudo-siamese networks with lexicon for Chinese short text matching
Short text matching is one of the fundamental technologies in natural language processing. In previous studies, most of the text matching networks are initially designed for English text. The common approach to applying them to Chinese is segmenting each ...
Comments