ACM Home Page
Please provide us with feedback. Feedback
Intelligent GP fusion from multiple sources for text classification
Full text PdfPdf (202 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the 14th ACM international conference on Information and knowledge management table of contents
Bremen, Germany
SESSION: Paper session IR-5 (information retrieval): machine learning and collaborative filtering table of contents
Pages: 477 - 484  
Year of Publication: 2005
ISBN:1-59593-140-6
Authors
Baoping Zhang  Virginia Tech, Blacksburg, VA
Yuxin Chen  Virginia Tech, Blacksburg, VA
Weiguo Fan  Virginia Tech, Blacksburg, VA
Edward A. Fox  Virginia Tech, Blacksburg, VA
Marcos Gonçalves  Federal University of Minas Gerais, Belo Horizonte, Brazil
Marco Cristo  Federal University of Minas Gerais, Belo Horizonte, Brazil
Pável Calado  IST/INESC-ID, Lisbon, Portugal
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 54,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1099554.1099688
What is a DOI?

ABSTRACT

This paper shows how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity -- five derived from the citation information of the collection, and three derived from the structural content -- and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our experiments with the ACM Computing Classification Scheme, using documents from the ACM Digital Library, indicate that GP can discover similarity functions superior to those based solely on a single type of evidence. Effectiveness of the similarity functions discovered through simple majority voting is better than that of content-based as well as combination-based Support Vector Machine classifiers. Experiments also were conducted to compare the performance between GP techniques and other fusion techniques such as Genetic Algorithms (GA) and linear fusion. Empirical results show that GP was able to discover better similarity functions than GA or other fusion techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Amsler. Application of citation-based automatic classification. Technical report, The University of Texas at Austin, Linguistics Research Center, Austin, TX, 1972.
2
3
 
4
S. M. Cheang, K. H. Lee, and K. S. Leung. Data classification using genetic parallel programming. In GECCO-03, volume 2724 of LNCS, pages 1918--1919, Chicago, 2003.
 
5
CITIDEL. Computing and Information Technology Interactive Digital Educational Library, www.citidel.org, 2004.
6
 
7
D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In NIPS 13, pages 430--436. MIT Press, 2001.
 
8
9
 
10
J. Eggermont, J. N. Kok, and W. A. Kosters. Genetic programming for data classification: Refining the search space. In Proc. of BNAIC-03, pages 123--130, Nijmegen, 2003.
 
11
 
12
 
13
 
14
M. Fisher and R. Everson. When are links useful? Experiments in text classification. In Proc. of ECIR-03, pages 41--56, 2003.
 
15
16
17
 
18
 
19
 
20
 
21
N. Kampanya, R. Shen, S. Kim, C. North, and E. A. Fox. CitiViz: A visual user interface to the CITIDEL system. In Proc. of ECDL-04, pages 122--133, Bath, UK, 2004.
 
22
M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14(1):10--25, 1963.
 
23
J. K. Kishore, L. M. Patnaik, V. Mani, and V. K. Agrawal. Application of genetic programming for multicategory pattern classification. IEEE TEC-00, 4(3):242--258, 2000.
 
24
 
25
 
26
A. Krowne and E. A. Fox. An architecture for multischeming in digital libraries. In Proc. of ICADL-03, pages 563--577, Kuala Lumpur, Malaysia, 2003.
 
27
 
28
 
29
 
30
T. M. Mitchell. Machine learning. McGraw Hill, New York, US, 1996.
31
 
32
H. G. Small. Co-citation in the scientific literature: A new measure of relationship between two documents. JASIS, 24(4):265--269, 1973.
33
 
34
 
35
 
36
B. Zhang, M. A. Gonçalves, W. Fan, Y. Chen, E. A. Fox, P. Calado, and M. Cristo. Intelligent fusion of structural and citation-based evidence for text classification. Technical Report TR-04-16, Computer Science, Virginia Tech, 2004.
 
37
B. Zhang, M. A. Gonçalves, and E. A. Fox. An OAI-based filtering service for CITIDEL from NDLTD. In Proc. of ICADL-03, pages 590--601, Kuala Lumpur, Malaysia, 2003.


Collaborative Colleagues:
Baoping Zhang: colleagues
Yuxin Chen: colleagues
Weiguo Fan: colleagues
Edward A. Fox: colleagues
Marcos Gonçalves: colleagues
Marco Cristo: colleagues
Pável Calado: colleagues