ACM Home Page
Please provide us with feedback. Feedback
Text categorization for multiple users based on semantic features from a machine-readable dictionary
Full text PdfPdf (1.17 MB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 12 ,  Issue 3  (July 1994) table of contents
Pages: 278 - 295  
Year of Publication: 1994
ISSN:1046-8188
Authors
Elizabeth D. Liddy  Syracuse Univ., Syracuse, NY
Woojin Paik  Syracuse Univ., Syracuse, NY
Edmund S. Yu  Syracuse Univ., Syracuse, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 66,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/183422.183425
What is a DOI?

ABSTRACT

The text categorization module described here provides a front-end filtering function for the larger DR-LINK text retrieval system [Liddy and Myaeing 1993]. The model evaluates a large incoming stream of documents to determine which documents are sufficiently similar to a profile at the broad subject level to warrant more refined representation and matching. To accomplish this task, each substantive word in a text is first categorized using a feature set based on the semantic Subject Field Codes (SFCs) assigned to individual word senses in a machine-readable dictionary. When tested on 50 user profiles and 550 megabytes of documents, results indicate that the feature set that is the basis of the text categorization module and the algorithm that establishes the boundary of categories of potentially relevant documents accomplish their tasks with a high level of performance. This means that the category of potentially relevant documents for most profiles would contain at least 80% of all documents later determined to be relevant to the profile. The number of documents in this set would be uniquely determined by the system's category-boundary predictor, and this set is likely to contain less than 5% of the incoming stream of documents.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
CHOUE~(A, T. AND LUSIGNAN, S. 1985. Disambiguation by short contexts. Comput. Hum. 19, 3, 147 157.
 
4
HALE, R. L. 1990. MYSTAT Stattstzcal Apphcat*ons. Course Technology, Inc, Cambridge, Mass.
 
5
 
6
 
7
KELLY, E. F. AND STONE, P.J. 1975. Computer Recognition of English Word Senses. North Holland, Amsterdam.
 
8
KROVETZ, R. 1991. Lexical acquisition and information retrieval. In Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, U. Zernik, Ed., Lawrence Earlbaum, Hillsdale, N.J..
 
9
 
10
 
11
LIDDY, E.n. 1994. Development and implementation of a discourse model for newspaper texts. In Proceedings of the Dagstuhl on Summarizing Text for Intelligent Communication (Saarbrilken, Germany). International Conference and Research Center for Computer Science in Schloss Dagstuhl. To be published.
 
12
 
13
LintY, E. D. AND PAIK, W. (1992). Statistically-guided word sense disambiguation. In Proceed- ~ngs of AAAI Fall '92 Symposium on Probabilistic Approaches to Natural Language (Boston, Mass.). AAAI, Menlo Park, Calif.
14
 
15
McGmL, M., KOLL, M., AND NOREAVLT, T. 1979. An evaluation of factors affecting document ranking by information retrieval systems. Final Report to National Science Foundation. Syracuse Univ., Syracuse, N.Y.
 
16
METEER, M., SCHWARTZ, R., AND WEISCHEDEL, R. 1991. POST: Using probabilities in language processing. In Proceedings of the 12th Internahonal Jotnt Conference on Artificial Intelligence (Sydney, Australia). Morgan Kaufmann, San Mateo, Calif.
 
17
PAIK, W., LmDY, E. D., Yu, E. S., AND MCKENNA, M. 1993. Extracting and classifying proper nouns in documents. In Proceedings of the Human Language Technology Workshop (Princeton, N.J.). ARPA, Washington, D.C.
 
18
SAGER, W. K. H. ~D LOCk--N, P.C. 1976. Classification of ranking algorithms. Int. Forum Inf. Doc. 1, 4, 2-25.
 
19
SLATOR, B. 1991. Using context for sense preference. In Lexical Acquisition: Exploiting On- Line Resources to Build a Lexicon, Zernik, U. Ed, Lawrence Earlbaum, Hillsdale, N.J.
 
20
 
21
TANIMOTO, T. 1958. An elementary mathematical theory of classification and prediction. Int. Rep., IBM Corp., Watson Research Center, Kingston, N.Y.
 
22
WALKER, D. E. AND AMSLER, R.A. 1986. The use of machine-readable dictionaries in sublang-uage analysis. In Analyzing Language in Restricted Domains: Sublanguage Descriptwn and Processing, R. Grishman and R. Kittredge, Eds., Lawrence Earlbaum, Hillsdale, N.J.



REVIEW

"Richard S. Marcus : Reviewer"

The authors describe a module of their DR-LINK text retrieval system. This module filters texts (in this case from a database of Wall Street Journal news stories) as likely to be relevant to a Text Retrieval   more...

Collaborative Colleagues:
Elizabeth D. Liddy: colleagues
Woojin Paik: colleagues
Edmund S. Yu: colleagues

Peer to Peer - Readers of this Article have also read: