ACM Home Page
Please provide us with feedback. Feedback
Use of ranked cross document evidence trails for hypothesis generation
Full text PdfPdf (1.11 MB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Jose, California, USA
SESSION: Research track papers table of contents
Pages: 677 - 686  
Year of Publication: 2007
ISBN:978-1-59593-609-7
Authors
Rohini K. Srihari  State University of New York at Buffalo
Li Xu  State University of New York at Buffalo
Tushar Saxena  State University of New York at Buffalo
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 29,   Downloads (12 Months): 159,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1281192.1281265
What is a DOI?

ABSTRACT

This paper focuses on detecting how concepts are linked across multiple textdocuments by generating an evidence trail explaining the connection. A traditional search involving, for example, two or more person names willattempt to find documents mentioning both of these individuals. This researchfocuses on a different interpretation of such a query: what is the best evidencetrail across documents that explains a connection between these individuals? For example, allmay be good golfers. A generalization ofthis task involves query terms representing general concepts (e.g. indictment,foreign policy). Such queries reflect a special case oftext mining. Previous attempts to solve this problem have focused on graphapproaches involving hyperlinked documents, and link analysis tools exploiting named entities. A new robust framework is presented, based on (i) generating concept chain graphs, a hybrid content representation, (ii) performing graph matching to select candidate subgraphs, and (iii) subsequently using graphical models to validate hypotheses using ranked evidence trails. We adapt the DUC data set for cross-document summarization to evaluate evidence trails generated by this approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
R. Barzilay and L. Lee. Catching the drift: Probabilistic content models, with applications to generation and summarization. In DM. Susan Dumais and SRoukos, editors, HLT-NAACL 2004: Main Proceedings, pages 113--120, Boston, Massachusetts, USA, 2004. Association for Computational Linguistics.
4
5
 
6
A. C. Graesser, D. S. McNamara, M. M. Louwerse, and Z. Cai. Coh-metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments and Computers, 36: 193--202, 2004.
7
 
8
T.K. Landauer and S. T. Dumais. A solution to plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104:211--240, 1997.
 
9
 
10
 
11
 
12
 
13
 
14
R. K. Srihari, L. Xu, and A. Bhasin. A text mining model for concept chain graphs. In Proceedings of the IJCAI 2007 workshop on Text Mining and LInk Analysis, 2007.
 
15
 
16
D. R. Swanson. Migraine and magnesium: Eleven neglected connections. Perspectives in Biology and Medicine, 31(4):552--557, 1988.
17
 
18

Collaborative Colleagues:
Rohini K. Srihari: colleagues
Li Xu: colleagues
Tushar Saxena: colleagues