|
ABSTRACT
This article introduces a named entity matching model that makes use of both semantic and phonetic evidence. The matching of semantic and phonetic information is captured by a unified framework via a bipartite graph model. By considering various technical challenges of the problem, including order insensitivity and partial matching, this approach is less rigid than existing approaches and highly robust. One major component is a phonetic matching model which exploits similarity at the phoneme level. Two learning algorithms for learning the similarity information of basic phonemic matching units based on training examples are investigated. By applying the proposed named entity matching model, a mining system is developed for discovering new named entity translations from daily Web news. The system is able to discover new name translations that cannot be found in the existing bilingual dictionary.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ravindra K. Ahuja , Thomas L. Magnanti , James B. Orlin, Network flows: theory, algorithms, and applications, Prentice-Hall, Inc., Upper Saddle River, NJ, 1993
|
| |
2
|
|
| |
3
|
Allan, J., Lavrenko, V., and Nallapati, R. 2002. UMass at TDT 2002. In Topic Detection and Tracking Workshop.
|
| |
4
|
|
| |
5
|
|
| |
6
|
Black, A. W., Lenzo, K., and Pagel, V. 1998. Issues in building general letter to sound rules. In Proceedings of the 3rd European Speech Communication Association (ESCA). International Workshop on Speech Synthesis. 77--80.
|
| |
7
|
|
| |
8
|
|
| |
9
|
Chen, H., Ding, Y., Tsai, S., and Bian, G. 1998. Description of the NTU system used for MET-2. In Proceedings of the 7th Message Understanding Conference.
|
| |
10
|
|
 |
11
|
Pu-Jen Cheng , Jei-Wen Teng , Ruei-Cheng Chen , Jenq-Haur Wang , Wen-Hsiang Lu , Lee-Feng Chien, Translating unknown queries with web corpora for cross-language information retrieval, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
[doi> 10.1145/1008992.1009020]
|
| |
12
|
|
| |
13
|
|
 |
14
|
Jianfeng Gao , Ming Zhou , Jian-Yun Nie , Hongzhao He , Weijun Chen, Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
[doi> 10.1145/564376.564409]
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
Huang, F., Vogel, S., and Waibel, A. 2004. Improving named entity translation combining phonetic and semantic similarities. In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 281--288.
|
| |
19
|
|
| |
20
|
|
| |
21
|
Kuhn, H. 1955. The Hungarian method for the assignment problem. Naval Rese. Logist. Quart. 2, 83--97.
|
 |
22
|
|
 |
23
|
|
| |
24
|
Li, X., Morie, P., and Roth, D. 2004. Robust reading: Identification and tracing of ambiguous names. In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 17--24.
|
| |
25
|
|
 |
26
|
|
| |
27
|
Meng, H., Lo, W.-K., Chen, B., and Tang, K. 2001. Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU01). 311--314.
|
 |
28
|
Jian-Yun Nie , Michel Simard , Pierre Isabelle , Richard Durand, Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.74-81, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312656]
|
| |
29
|
|
| |
30
|
Pagel, V., Lenzo, K., and Black, A. W. 1998. Letter to sound rules for accented lexicon compression. In Proceedings of International Conference on Spoken Language Processing. 252--255.
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
Thompson, P. and Dozier, C. 1997. Name searching and information retrieval. In Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing. 134--140.
|
| |
35
|
Voorhees, E. and Tice, D. 2000. The TREC-8 question answering track evaluation. In Proceedings of the 8th Text Retrieval Conference (TREC-8).
|
| |
36
|
Widrow, B. and Hoff, M. 1960. Adaptive switching circuits. IRE WESCON Convention Record, 96--104.
|
| |
37
|
|
| |
38
|
|
|