poster

XML-aided phrase indexing for hypertext documents

Authors:

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 843 - 844

https://doi.org/10.1145/1390334.1390534

Published: 20 July 2008 Publication History

Get Access

Abstract

We combine techniques of XML Mining and Text Mining for the benefit of Information Retrieval. By manipulating the word sequence according to the XML structure of the marked-up text, we strengthen phrase boundaries so that they are more obvious to the algorithms that extract multiword sequences from text. Consequently, the quality of the indexed phrases improves, which has a positive effect on the average precision measured by the INEX 2007 standards.

References

[1]

H. Ahonen-Myka. Finding all frequent maximal sequences in text. In Proceedings of ICML-99 Workshop on Machine Learning in Text Data Analysis, pages 11--17, 1999.

Google Scholar

[2]

S. Banerjee and T. Pedersen. The design, implementation, and use of the Ngram Statistic Package. In Proceedings of the CICLing, pages 372--383, 2003.

Digital Library

Google Scholar

[3]

K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22--29, 1990.

Digital Library

Google Scholar

[4]

P. Cohen, B. Heeringa, and N. Adams. Unsupervised segmentation of categorical time series into episodes. In Proceedings of ICDM'02, pages 99--106, Washington, DC, USA, 2002. IEEE Computer Society.

Digital Library

Google Scholar

[5]

A. Doucet and H. Ahonen-Myka. Non-contiguous word sequences for information retrieval. In Proceedings of ACL-2004 Workshop on Multiword Expressions: Integrating Processing, pages 88--95, July 2004.

Digital Library

Google Scholar

[6]

A. Doucet and H. Ahonen-Myka. Fast extraction of discontiguous sequences in text: a new approach based on maximal frequent sequences. In Proceedings of IS-LTC 2006, pages 186--191, 2006.

Google Scholar

[7]

M. Lehtonen and A. Doucet. Phrase detection in the Wikipedia. In N. Fuhr, M. Lalmas, A. Trotman, and J. Kamps, editors, Focused access to XML documents, 6th INEX Workshop, LNCS. Springer, 2008.

Digital Library

Google Scholar

[8]

O. Vechtomova. The role of multi-word units in interactive information retrieval. In Proceedings of ECIR 2005, pages 403--420, 2005.

Digital Library

Google Scholar

Cited By

View all

Lehtonen MDoucet A(2009)Enhancing Keyword Search with a Keyphrase IndexAdvances in Focused Retrieval10.1007/978-3-642-03761-0_7(65-70)Online publication date: 3-Sep-2009
https://dl.acm.org/doi/10.1007/978-3-642-03761-0_7

Index Terms

XML-aided phrase indexing for hypertext documents
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Efficient Revalidation of XML Documents

We study the problem of schema revalidation where XML data known to conform to one schema must be validated with respect to another schema. Such revalidation algorithms have applications in schema evolution, query processing, XML-based programming ...
Mapping of bibliographical standards into XML

The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
Graph transformation to infer schemata from XML documents
SAC '05: Proceedings of the 2005 ACM symposium on Applied computing

Semi-structured data are characterized by the lack of a predefined schema. This heterogeneity simplifies the management of such data, but analysis and queries become more difficult and demand for schemata that describe these data. Super-imposed ...

Comments

Information & Contributors

Information

Published In

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

July 2008

934 pages

ISBN:9781605581644

DOI:10.1145/1390334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

SIGIR '08

Sponsor:

SIGIR '08: The 31st Annual International ACM SIGIR Conference

July 20 - 24, 2008

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
275
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lehtonen MDoucet A(2009)Enhancing Keyword Search with a Keyphrase IndexAdvances in Focused Retrieval10.1007/978-3-642-03761-0_7(65-70)Online publication date: 3-Sep-2009
https://dl.acm.org/doi/10.1007/978-3-642-03761-0_7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Efficient Revalidation of XML Documents

Mapping of bibliographical standards into XML

Graph transformation to infer schemata from XML documents

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations