|
ABSTRACT
The traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named entities; we are interested in entity ranking in the field of information retrieval. In this paper, we describe our approach to identifying and ranking entities from the INEX Wikipedia document collection. Wikipedia offers a number of interesting features for entity identification and ranking that we first introduce. We then describe the principles and the architecture of our entity ranking system, and introduce our methodology for evaluation. Our preliminary results show that the use of categories and the link structure of Wikipedia, together with entity examples, can significantly improve retrieval effectiveness.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
D. Awang Iskandar, J. Pehcevski, J. A. Thom, and S. M. M. Tahaghoghi. Social media retrieval using image features and structured text. In Comparative Evaluation of XML Information Retrieval Systems: 5th Workshop of the INitiative for the Evaluation of XML Retrieval, INEX 2006, volume 4518 of LNCS, pages 358--372, 2007.
|
| |
3
|
E. Blanchard, M. Harzallah, and P. K. Henri Briand. A typology of ontology-based semantic measures. In EMOI-INTEROP'05, Proc. Open Interop Workshop on Enterprise Modelling and Ontologies for Interoperability, Porto, Portugal, 2005.
|
| |
4
|
E. Blanchard, P. Kuntz, M. Harzallah, and H. Briand. A tree-based similarity for evaluating concept proximities in an ontology. In Proc. 10th conference of the International Fedederation of Classification Societies, pages 3--11, Ljubljana, Slovenia, 2006.
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. 2007 Joint Conference on EMNLP and CNLL, pages 708--716, Prague, The Czech Republic, 2007.
|
| |
9
|
S. Cucerzan and D. Yarowsky. Language independent named entity recognition combining morphological and contextual evidence. In Proc. 1999 Joint SIGDAT Conference on EMNLP and VLC, pages 90--99, Maryland, MD, 1999.
|
| |
10
|
|
| |
11
|
A. P. de Vries and N. Craswell. Entity ranking -- guidelines. In INEX 2006 Workshop Pre-Proceedings, pages 413--414, 2006.
|
| |
12
|
A. P. de Vries, J. A. Thom, A.-M. Vercoustre, N. Craswell, and M. Lalmas. INEX 2007 Entity ranking track guidelines. In INEX 2007 Workshop Pre-Proceedings, 2007 (to appear).
|
 |
13
|
|
| |
14
|
J. Hassell, B. Aleman-Meza, and I. B. Arpinar. Ontology-driven automatic entity disambiguation in unstructured text. In Proc. 5th International Semantic Web Conference (ISWC), volume 4273 of LNCS, pages 44--57, Athens, GA, 2006.
|
 |
15
|
|
| |
16
|
|
| |
17
|
K. Lerman, S. N. Minton, and C. A. Knoblock. Wrapper maintenance: A machine learning approach. Journal of Artificial Intelligence Research, 18:149--181, 2003.
|
 |
18
|
|
| |
19
|
S. Malik, A. Trotman, and M. Lalmas. Overview of INEX 2006. In Comparative Evaluation of XML Information Retrieval Systems: 5th Workshop of the INitiative for the Evaluation of XML Retrieval, INEX 2006, volume 4518 of LNCS, pages 1--11, 2007.
|
| |
20
|
|
| |
21
|
NIST Speech Group. The ACE 2006 evaluation plan: Evaluation of the detection and recognition of ACE entities, values, temporal expressions, relations, and events, 2006. http://www.nist.gov/speech/tests/ace/ace06/doc/ace06-evalplan.pdf.
|
| |
22
|
|
| |
23
|
B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, and M. Goranov. Towards semantic web information extraction. In 2nd International Semantic Web Conference: Workshop on Human Language Technology for the Semantic Web and Web Services, 2003. http://gate.ac.uk/conferences/iswc2003/proceedings/popov.pdf.
|
| |
24
|
|
| |
25
|
S. Sekine. Named entity: History and future. Technical report, Proteus Project Report, 2004. http://cs.nyu.edu/sekine/papers/NEsurvey200402.pdf.
|
| |
26
|
B. Sundheim, editor. Proc. 3rd Message Understanding Conference (MUC), Los Altos, CA, 1991. Morgan Kaufmann.
|
| |
27
|
S. Tenier, A. Napoli, X. Polanco, and Y. Toussaint. Annotation semantique de pages web. In 6mes journes francophones "Extraction et Gestion de Connaissances" - EGC 2006, 2006.
|
| |
28
|
A.-M. Vercoustre and F. Paradis. A descriptive language for information object reuse through virtual documents. In 4th International Conference on Object-Oriented Information Systems (OOIS'97), pages 299--311, Brisbane, Australia, 1997.
|
| |
29
|
|
 |
30
|
|
|