ACM Home Page
Please provide us with feedback. Feedback
Entity ranking in Wikipedia
Full text PdfPdf (168 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2008 ACM symposium on Applied computing table of contents
Fortaleza, Ceara, Brazil
SESSION: Information access and retrieval table of contents
Pages 1101-1106  
Year of Publication: 2008
ISBN:978-1-59593-753-7
Authors
Anne-Marie Vercoustre  INRIA, Rocquencourt, France
James A. Thom  RMIT University, Melbourne, Australia
Jovan Pehcevski  INRIA, Rocquencourt, France
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 61,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1363686.1363943
What is a DOI?

ABSTRACT

The traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named entities; we are interested in entity ranking in the field of information retrieval. In this paper, we describe our approach to identifying and ranking entities from the INEX Wikipedia document collection. Wikipedia offers a number of interesting features for entity identification and ranking that we first introduce. We then describe the principles and the architecture of our entity ranking system, and introduce our methodology for evaluation. Our preliminary results show that the use of categories and the link structure of Wikipedia, together with entity examples, can significantly improve retrieval effectiveness.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
D. Awang Iskandar, J. Pehcevski, J. A. Thom, and S. M. M. Tahaghoghi. Social media retrieval using image features and structured text. In Comparative Evaluation of XML Information Retrieval Systems: 5th Workshop of the INitiative for the Evaluation of XML Retrieval, INEX 2006, volume 4518 of LNCS, pages 358--372, 2007.
 
3
E. Blanchard, M. Harzallah, and P. K. Henri Briand. A typology of ontology-based semantic measures. In EMOI-INTEROP'05, Proc. Open Interop Workshop on Enterprise Modelling and Ontologies for Interoperability, Porto, Portugal, 2005.
 
4
E. Blanchard, P. Kuntz, M. Harzallah, and H. Briand. A tree-based similarity for evaluating concept proximities in an ontology. In Proc. 10th conference of the International Fedederation of Classification Societies, pages 3--11, Ljubljana, Slovenia, 2006.
 
5
6
7
 
8
S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. 2007 Joint Conference on EMNLP and CNLL, pages 708--716, Prague, The Czech Republic, 2007.
 
9
S. Cucerzan and D. Yarowsky. Language independent named entity recognition combining morphological and contextual evidence. In Proc. 1999 Joint SIGDAT Conference on EMNLP and VLC, pages 90--99, Maryland, MD, 1999.
 
10
 
11
A. P. de Vries and N. Craswell. Entity ranking -- guidelines. In INEX 2006 Workshop Pre-Proceedings, pages 413--414, 2006.
 
12
A. P. de Vries, J. A. Thom, A.-M. Vercoustre, N. Craswell, and M. Lalmas. INEX 2007 Entity ranking track guidelines. In INEX 2007 Workshop Pre-Proceedings, 2007 (to appear).
13
 
14
J. Hassell, B. Aleman-Meza, and I. B. Arpinar. Ontology-driven automatic entity disambiguation in unstructured text. In Proc. 5th International Semantic Web Conference (ISWC), volume 4273 of LNCS, pages 44--57, Athens, GA, 2006.
15
 
16
 
17
K. Lerman, S. N. Minton, and C. A. Knoblock. Wrapper maintenance: A machine learning approach. Journal of Artificial Intelligence Research, 18:149--181, 2003.
18
 
19
S. Malik, A. Trotman, and M. Lalmas. Overview of INEX 2006. In Comparative Evaluation of XML Information Retrieval Systems: 5th Workshop of the INitiative for the Evaluation of XML Retrieval, INEX 2006, volume 4518 of LNCS, pages 1--11, 2007.
 
20
 
21
NIST Speech Group. The ACE 2006 evaluation plan: Evaluation of the detection and recognition of ACE entities, values, temporal expressions, relations, and events, 2006. http://www.nist.gov/speech/tests/ace/ace06/doc/ace06-evalplan.pdf.
 
22
 
23
B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, and M. Goranov. Towards semantic web information extraction. In 2nd International Semantic Web Conference: Workshop on Human Language Technology for the Semantic Web and Web Services, 2003. http://gate.ac.uk/conferences/iswc2003/proceedings/popov.pdf.
 
24
 
25
S. Sekine. Named entity: History and future. Technical report, Proteus Project Report, 2004. http://cs.nyu.edu/sekine/papers/NEsurvey200402.pdf.
 
26
B. Sundheim, editor. Proc. 3rd Message Understanding Conference (MUC), Los Altos, CA, 1991. Morgan Kaufmann.
 
27
S. Tenier, A. Napoli, X. Polanco, and Y. Toussaint. Annotation semantique de pages web. In 6mes journes francophones "Extraction et Gestion de Connaissances" - EGC 2006, 2006.
 
28
A.-M. Vercoustre and F. Paradis. A descriptive language for information object reuse through virtual documents. In 4th International Conference on Object-Oriented Information Systems (OOIS'97), pages 299--311, Brisbane, Australia, 1997.
 
29
30

Collaborative Colleagues:
Anne-Marie Vercoustre: colleagues
James A. Thom: colleagues
Jovan Pehcevski: colleagues