skip to main content
10.1145/2970398.2970408acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

On Horizontal and Vertical Separation in Hierarchical Text Classification

Published: 12 September 2016 Publication History

Abstract

Hierarchy is an effective and common way of organizing data and representing their relationships at different levels of abstraction. However, hierarchical data dependencies cause difficulties in the estimation of "separable" models that can distinguish between the entities in the hierarchy. Extracting separable models of hierarchical entities requires us to take their relative position into account and to consider the different types of dependencies in the hierarchy. In this paper, we present an investigation of the effect of separability in text-based entity classification and argue that in hierarchical classification, a separation property should be established between entities not only in the same layer, but also in different layers.
Our main findings are the followings. First, we analyse the importance of separability on the data representation in the task of classification and based on that, we introduce "Strong Separation Principle" for optimizing expected effectiveness of classifiers decision based on separation property. Second, we present Significant Words Language Models (SWLM) which capture all, and only, the essential features of hierarchical entities according to their relative position in the hierarchy resulting in horizontally and vertically separable models. Third, we validate our claims on real world data and demonstrate that how SWLM improves the accuracy of classification and how it provides transferable models over time. Although discussions in this paper focus on the classification problem, the models are applicable to any information access tasks on data that has, or can be mapped to, a hierarchical structure.

References

[1]
Feature generation and selection for information retrieval. Workshop of SIGIR, 2010.
[2]
A. Arampatzis and A. van Hameran. The score-distributional threshold optimization for adaptive binary classification tasks. In SIGIR '01, pages 285--293, 2001.
[3]
A. Arampatzis, J. Kamps, and S. Robertson. Where to stop reading a ranked list?: Threshold optimization using truncated score distributions. In SIGIR '09, pages 524--531, 2009.
[4]
D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML, pages 113--120, 2006.
[5]
J. Brank, M. Grobelnik, N. Milic-Frayling, and D. Mladenic. Feature selection using linear support vector machines. Technical Report MSR-TR-2002--63, Microsoft Research, 2002.
[6]
C. J. Burges. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2 (2): 121--167, 1998.
[7]
M. Chen, K. Q. Weinberger, and J. Blitzer. Co-training for domain adaptation. In NIPS '24, pages 2456--2464. 2011.
[8]
F. Crestani, M. Lalmas, C. J. Van Rijsbergen, and I. Campbell. "is this document relevant?... probably...": A survey of probabilistic models in information retrieval. ACM Comput. Surv., 30 (4): 528--552, Dec. 1998.
[9]
A. de Swaan. Coalition Theories and Cabinet Formations: A Study of Formal Theories of Coalition Formation Applied to Nine European Parliaments after 1918, volume 4 of Progress in Mathematical Social Sciences. Elsevier, New York, 1973.
[10]
M. Dehghani. Significant words representations of entities. In SIGIR '16, pages 1183--1183, 2016.
[11]
Dehghani, Azarbonyad, Kamps, Hiemstra, and Marx}Dehghani:2016:CIKM1M. Dehghani, H. Azarbonyad, J. Kamps, D. Hiemstra, and M. Marx. Luhn revisited: Significant words language models. In CIKM '16, 2016\natexlaba.
[12]
Dehghani, Azarbonyad, Kamps, and Marx}Dehghani:2016:CHIIRM. Dehghani, H. Azarbonyad, J. Kamps, and M. Marx. Generalized group profiling for content customization. In CHIIR '16, pages 245--248, 2016\natexlabb.
[13]
Dehghani, Azarbonyad, Kamps, and Marx}Dehghani:2016:CLEFM. Dehghani, H. Azarbonyad, J. Kamps, and M. Marx. Two-way parsimonious classification models for evolving hierarchies. In CLEF '16, 2016\natexlabc.
[14]
G. Forman. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res., 3: 1289--1305, 2003.
[15]
S. Gopal and Y. Yang. Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In SIGKDD, pages 257--265, 2013.
[16]
V. Ha-Thuc and J.-M. Renders. Large-scale hierarchical text classification without labelled data. In WSDM, pages 685--694, 2011.
[17]
D. Hiemstra, S. Robertson, and H. Zaragoza. Parsimonious language models for information retrieval. In SIGIR, pages 178--185, 2004.
[18]
G. Hirst, Y. Riabinin, J. Graham, and M. Boizot-Roche. Text to ideology or text to party status? From Text to Political Positions: Text analysis across disciplines, 55: 93--15, 2014.
[19]
E. Kanoulas, V. Pavlu, K. Dai, and J. Aslam. Modeling the score distributions of relevant and non-relevant documents. In ICTIR'09, volume 5766, pages 152--163. Springer Berlin Heidelberg, 2009.
[20]
D.-k. Kim, G. Voelker, and L. K. Saul. A variational approximation for topic modeling of hierarchical corpora. In ICML, pages 55--63, 2013.
[21]
V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR '01, pages 120--127, 2001.
[22]
D. D. Lewis. Representation and Learning in Information Retrieval. PhD thesis, Amherst, MA, USA, 1992.
[23]
D. D. Lewis. Evaluating and optimizing autonomous text classification systems. In SIGIR '95, pages 246--254, 1995.
[24]
A. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In ICML, pages 359--367, 1998.
[25]
P. Ogilvie and J. Callan. Hierarchical language models for xml component retrieval. In INEX, pages 224--237, 2004.
[26]
H.-S. Oh, Y. Choi, and S.-H. Myaeng. Text classification for a large-scale taxonomy using dynamically mixed local and global models for a node. In ECIR, pages 7--18, 2011.
[27]
S. Robertson. The probability ranking principle in ir. Journal of Documentation, 33 (4): 294--304, 1977.
[28]
T. Saracevic. Relevance: A review of the literature and a framework for thinking on the notion in information science. JASIST, 26: 321--343, 1975.
[29]
F. Sebastiani. Machine learning in automated text categorization. ACM Comput. Surv., 34 (1): 1--47, Mar. 2002.
[30]
sson et al.(2004)Sigurbjörnsson, Kamps, and de Rijke}sigurbjornsson:2004B. Sigurbjörnsson, J. Kamps, and M. de Rijke. An element-based approach to xml retrieval. In INEX, pages 19--26, 2004.
[31]
Y. Song and D. Roth. On dataless hierarchical text classification. In AAAI, pages 1579--1585, 2014.
[32]
K. Sparck Jones, H. D. Robertson, Stephen, and Z. Hugo. Language modeling and relevance. In Language Modeling for Information Retrieval, pages 57--71. 2003.
[33]
A. Sun and E.-P. Lim. Hierarchical text classification and evaluation. In ICDM, pages 521--528, 2001.
[34]
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101 (476): 1566--1581, 2006.
[35]
Xue, Dai, Yang, and Yu}Xue:2008:plsaG.-R. Xue, W. Dai, Q. Yang, and Y. Yu. Topic-bridged plsa for cross-domain text classification. In SIGIR '08, pages 627--634, 2008\natexlaba.
[36]
Xue, Xing, Yang, and Yu}Xue:2008G.-R. Xue, D. Xing, Q. Yang, and Y. Yu. Deep classification in large-scale text hierarchies. In SIGIR, pages 619--626, 2008\natexlabb.
[37]
L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In SIGKDD, pages 937--946, 2009.
[38]
B. Yu, S. Kaufmann, and D. Diermeier. Classifying party affiliation from political speech. Journal of Information Technology & Politics, 5 (1): 33--48, 2008.
[39]
E. Zavitsanos, G. Paliouras, and G. A. Vouros. Non-parametric estimation of topic hierarchies from texts with hierarchical dirichlet processes. J. Mach. Learn. Res., 12: 2749--2775, 2011.
[40]
C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01, pages 403--410, 2001.
[41]
D. Zhou, L. Xiao, and M. Wu. Hierarchical classification via orthogonal transfer. In ICML, pages 801--808, 2011.

Cited By

View all
  • (2024)Revisiting Bag of Words Document Representations for Efficient Ranking with TransformersACM Transactions on Information Systems10.1145/364046042:5(1-27)Online publication date: 29-Apr-2024
  • (2020)Learning to rank for multi-label text classification: Combining different sources of informationNatural Language Engineering10.1017/S1351324920000029(1-23)Online publication date: 18-Feb-2020
  • (2020)Improving Topic Coherence Using Parsimonious Language Model and Latent Semantic IndexingICDSMLA 201910.1007/978-981-15-1420-3_89(823-830)Online publication date: 19-May-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
September 2016
318 pages
ISBN:9781450344975
DOI:10.1145/2970398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2016

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Paper

Author Tags

  1. hierarchical significant words language models
  2. hierarchical text classification
  3. hswlm
  4. language models
  5. significant words language models
  6. swlm

Qualifiers

  • Research-article

Funding Sources

  • Netherlands Organization for Scientific Research

Conference

ICTIR '16
Sponsor:

Acceptance Rates

ICTIR '16 Paper Acceptance Rate 41 of 79 submissions, 52%;
Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Revisiting Bag of Words Document Representations for Efficient Ranking with TransformersACM Transactions on Information Systems10.1145/364046042:5(1-27)Online publication date: 29-Apr-2024
  • (2020)Learning to rank for multi-label text classification: Combining different sources of informationNatural Language Engineering10.1017/S1351324920000029(1-23)Online publication date: 18-Feb-2020
  • (2020)Improving Topic Coherence Using Parsimonious Language Model and Latent Semantic IndexingICDSMLA 201910.1007/978-981-15-1420-3_89(823-830)Online publication date: 19-May-2020
  • (2019)HiTR: Hierarchical Topic Model Re-Estimation for Measuring Topical Diversity of DocumentsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.287424631:11(2124-2137)Online publication date: 1-Nov-2019
  • (2019)How Many Labels? Determining the Number of Labels in Multi-Label Text ClassificationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-28577-7_11(156-163)Online publication date: 3-Aug-2019
  • (2017)Words are MalleableProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132878(1509-1518)Online publication date: 6-Nov-2017
  • (2017)Hierarchical Re-estimation of Topic Models for Measuring Topical DiversityAdvances in Information Retrieval10.1007/978-3-319-56608-5_6(68-81)Online publication date: 8-Apr-2017
  • (2016)Luhn RevisitedProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983814(1301-1310)Online publication date: 24-Oct-2016
  • (2016)Two-Way Parsimonious Classification Models for Evolving HierarchiesExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-319-44564-9_6(69-82)Online publication date: 23-Aug-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media