research-article

On Horizontal and Vertical Separation in Hierarchical Text Classification

Authors:

Mostafa Dehghani,

Hosein Azarbonyad,

Maarten MarxAuthors Info & Claims

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

Pages 185 - 194

https://doi.org/10.1145/2970398.2970408

Published: 12 September 2016 Publication History

Abstract

Hierarchy is an effective and common way of organizing data and representing their relationships at different levels of abstraction. However, hierarchical data dependencies cause difficulties in the estimation of "separable" models that can distinguish between the entities in the hierarchy. Extracting separable models of hierarchical entities requires us to take their relative position into account and to consider the different types of dependencies in the hierarchy. In this paper, we present an investigation of the effect of separability in text-based entity classification and argue that in hierarchical classification, a separation property should be established between entities not only in the same layer, but also in different layers.

Our main findings are the followings. First, we analyse the importance of separability on the data representation in the task of classification and based on that, we introduce "Strong Separation Principle" for optimizing expected effectiveness of classifiers decision based on separation property. Second, we present Significant Words Language Models (SWLM) which capture all, and only, the essential features of hierarchical entities according to their relative position in the hierarchy resulting in horizontally and vertically separable models. Third, we validate our claims on real world data and demonstrate that how SWLM improves the accuracy of classification and how it provides transferable models over time. Although discussions in this paper focus on the classification problem, the models are applicable to any information access tasks on data that has, or can be mapped to, a hierarchical structure.

References

[1]

Feature generation and selection for information retrieval. Workshop of SIGIR, 2010.

[2]

A. Arampatzis and A. van Hameran. The score-distributional threshold optimization for adaptive binary classification tasks. In SIGIR '01, pages 285--293, 2001.

Digital Library

[3]

A. Arampatzis, J. Kamps, and S. Robertson. Where to stop reading a ranked list?: Threshold optimization using truncated score distributions. In SIGIR '09, pages 524--531, 2009.

Digital Library

[4]

D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML, pages 113--120, 2006.

Digital Library

[5]

J. Brank, M. Grobelnik, N. Milic-Frayling, and D. Mladenic. Feature selection using linear support vector machines. Technical Report MSR-TR-2002--63, Microsoft Research, 2002.

[6]

C. J. Burges. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2 (2): 121--167, 1998.

Digital Library

[7]

M. Chen, K. Q. Weinberger, and J. Blitzer. Co-training for domain adaptation. In NIPS '24, pages 2456--2464. 2011.

Digital Library

[8]

F. Crestani, M. Lalmas, C. J. Van Rijsbergen, and I. Campbell. "is this document relevant?... probably...": A survey of probabilistic models in information retrieval. ACM Comput. Surv., 30 (4): 528--552, Dec. 1998.

Digital Library

[9]

A. de Swaan. Coalition Theories and Cabinet Formations: A Study of Formal Theories of Coalition Formation Applied to Nine European Parliaments after 1918, volume 4 of Progress in Mathematical Social Sciences. Elsevier, New York, 1973.

[10]

M. Dehghani. Significant words representations of entities. In SIGIR '16, pages 1183--1183, 2016.

Digital Library

[11]

Dehghani, Azarbonyad, Kamps, Hiemstra, and Marx}Dehghani:2016:CIKM1M. Dehghani, H. Azarbonyad, J. Kamps, D. Hiemstra, and M. Marx. Luhn revisited: Significant words language models. In CIKM '16, 2016\natexlaba.

Digital Library

[12]

Dehghani, Azarbonyad, Kamps, and Marx}Dehghani:2016:CHIIRM. Dehghani, H. Azarbonyad, J. Kamps, and M. Marx. Generalized group profiling for content customization. In CHIIR '16, pages 245--248, 2016\natexlabb.

Digital Library

[13]

Dehghani, Azarbonyad, Kamps, and Marx}Dehghani:2016:CLEFM. Dehghani, H. Azarbonyad, J. Kamps, and M. Marx. Two-way parsimonious classification models for evolving hierarchies. In CLEF '16, 2016\natexlabc.

[14]

G. Forman. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res., 3: 1289--1305, 2003.

[15]

S. Gopal and Y. Yang. Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In SIGKDD, pages 257--265, 2013.

Digital Library

[16]

V. Ha-Thuc and J.-M. Renders. Large-scale hierarchical text classification without labelled data. In WSDM, pages 685--694, 2011.

Digital Library

[17]

D. Hiemstra, S. Robertson, and H. Zaragoza. Parsimonious language models for information retrieval. In SIGIR, pages 178--185, 2004.

Digital Library

[18]

G. Hirst, Y. Riabinin, J. Graham, and M. Boizot-Roche. Text to ideology or text to party status? From Text to Political Positions: Text analysis across disciplines, 55: 93--15, 2014.

[19]

E. Kanoulas, V. Pavlu, K. Dai, and J. Aslam. Modeling the score distributions of relevant and non-relevant documents. In ICTIR'09, volume 5766, pages 152--163. Springer Berlin Heidelberg, 2009.

Digital Library

[20]

D.-k. Kim, G. Voelker, and L. K. Saul. A variational approximation for topic modeling of hierarchical corpora. In ICML, pages 55--63, 2013.

[21]

V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR '01, pages 120--127, 2001.

Digital Library

[22]

D. D. Lewis. Representation and Learning in Information Retrieval. PhD thesis, Amherst, MA, USA, 1992.

Digital Library

[23]

D. D. Lewis. Evaluating and optimizing autonomous text classification systems. In SIGIR '95, pages 246--254, 1995.

Digital Library

[24]

A. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In ICML, pages 359--367, 1998.

Digital Library

[25]

P. Ogilvie and J. Callan. Hierarchical language models for xml component retrieval. In INEX, pages 224--237, 2004.

Digital Library

[26]

H.-S. Oh, Y. Choi, and S.-H. Myaeng. Text classification for a large-scale taxonomy using dynamically mixed local and global models for a node. In ECIR, pages 7--18, 2011.

Digital Library

[27]

S. Robertson. The probability ranking principle in ir. Journal of Documentation, 33 (4): 294--304, 1977.

[28]

T. Saracevic. Relevance: A review of the literature and a framework for thinking on the notion in information science. JASIST, 26: 321--343, 1975.

[29]

F. Sebastiani. Machine learning in automated text categorization. ACM Comput. Surv., 34 (1): 1--47, Mar. 2002.

Digital Library

[30]

sson et al.(2004)Sigurbjörnsson, Kamps, and de Rijke}sigurbjornsson:2004B. Sigurbjörnsson, J. Kamps, and M. de Rijke. An element-based approach to xml retrieval. In INEX, pages 19--26, 2004.

[31]

Y. Song and D. Roth. On dataless hierarchical text classification. In AAAI, pages 1579--1585, 2014.

Digital Library

[32]

K. Sparck Jones, H. D. Robertson, Stephen, and Z. Hugo. Language modeling and relevance. In Language Modeling for Information Retrieval, pages 57--71. 2003.

[33]

A. Sun and E.-P. Lim. Hierarchical text classification and evaluation. In ICDM, pages 521--528, 2001.

Digital Library

[34]

Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101 (476): 1566--1581, 2006.

[35]

Xue, Dai, Yang, and Yu}Xue:2008:plsaG.-R. Xue, W. Dai, Q. Yang, and Y. Yu. Topic-bridged plsa for cross-domain text classification. In SIGIR '08, pages 627--634, 2008\natexlaba.

Digital Library

[36]

Xue, Xing, Yang, and Yu}Xue:2008G.-R. Xue, D. Xing, Q. Yang, and Y. Yu. Deep classification in large-scale text hierarchies. In SIGIR, pages 619--626, 2008\natexlabb.

Digital Library

[37]

L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In SIGKDD, pages 937--946, 2009.

Digital Library

[38]

B. Yu, S. Kaufmann, and D. Diermeier. Classifying party affiliation from political speech. Journal of Information Technology & Politics, 5 (1): 33--48, 2008.

[39]

E. Zavitsanos, G. Paliouras, and G. A. Vouros. Non-parametric estimation of topic hierarchies from texts with hierarchical dirichlet processes. J. Mach. Learn. Res., 12: 2749--2775, 2011.

Digital Library

[40]

C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01, pages 403--410, 2001.

Digital Library

[41]

D. Zhou, L. Xiao, and M. Wu. Hierarchical classification via orthogonal transfer. In ICML, pages 801--808, 2011.

Cited By

Rau DDehghani MKamps J(2024)Revisiting Bag of Words Document Representations for Efficient Ranking with TransformersACM Transactions on Information Systems10.1145/364046042:5(1-27)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3640460
Azarbonyad HDehghani MMarx MKamps J(2020)Learning to rank for multi-label text classification: Combining different sources of informationNatural Language Engineering10.1017/S1351324920000029(1-23)Online publication date: 18-Feb-2020
https://doi.org/10.1017/S1351324920000029
Dewangan JSharaff APandey S(2020)Improving Topic Coherence Using Parsimonious Language Model and Latent Semantic IndexingICDSMLA 201910.1007/978-981-15-1420-3_89(823-830)Online publication date: 19-May-2020
https://doi.org/10.1007/978-981-15-1420-3_89
Show More Cited By

Index Terms

On Horizontal and Vertical Separation in Hierarchical Text Classification
1. Information systems
  1. Information retrieval

Recommendations

Hierarchical Text Classification Incremental Learning
ICONIP '09: Proceedings of the 16th International Conference on Neural Information Processing: Part I

To classify large-scale text corpora, an incremental learning method for hierarchical text classification is proposed. Based on the deep analysis of virtual classification tree based hierarchical text classification, combining the two application models ...
Hierarchy-Aware and Label Balanced Model for Hierarchical Text Classification
Highlights
- Propose a hierarchy-aware and label balanced model based on the model HGCLR for hierarchical text classification.
- Propose a new method called multi-label negative supervision to obtain more hierarchy-aware text representations.
- ...
Abstract
Hierarchical text classification, where labels can be modeled as a hierarchical structure, is a special multi-label text classification sub-task. Current methods mainly improve model performance by modeling label dependencies. The model HGCLR ...
Boosting multi-label hierarchical text categorization
Abstract
Hierarchical Text Categorization (HTC) is the task of generating (usually by means of supervised learning algorithms) text classifiers that operate on hierarchically structured classification schemes. Notwithstanding the fact that most large-sized ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

September 2016

318 pages

ISBN:9781450344975

DOI:10.1145/2970398

General Chairs:
Ben Carterette
University of Delaware, USA
,
Hui Fang
University of Delaware, USA
,
Program Chairs:
Mounia Lalmas
Yahoo! Labs, UK
,
Jian-Yun Nie
University of Montreal, Canada

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Paper

Author Tags

Qualifiers

Research-article

Funding Sources

Netherlands Organization for Scientific Research

Conference

ICTIR '16

Sponsor:

SIGIR

ICTIR '16: ACM SIGIR International Conference on the Theory of Information Retrieval

September 12 - 16, 2016

Delaware, Newark, USA

Acceptance Rates

ICTIR '16 Paper Acceptance Rate 41 of 79 submissions, 52%;

Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
176
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)2

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rau DDehghani MKamps J(2024)Revisiting Bag of Words Document Representations for Efficient Ranking with TransformersACM Transactions on Information Systems10.1145/364046042:5(1-27)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3640460
Azarbonyad HDehghani MMarx MKamps J(2020)Learning to rank for multi-label text classification: Combining different sources of informationNatural Language Engineering10.1017/S1351324920000029(1-23)Online publication date: 18-Feb-2020
https://doi.org/10.1017/S1351324920000029
Dewangan JSharaff APandey S(2020)Improving Topic Coherence Using Parsimonious Language Model and Latent Semantic IndexingICDSMLA 201910.1007/978-981-15-1420-3_89(823-830)Online publication date: 19-May-2020
https://doi.org/10.1007/978-981-15-1420-3_89
Azarbonyad HDehghani MKenter TMarx MKamps Jde Rijke M(2019)HiTR: Hierarchical Topic Model Re-Estimation for Measuring Topical Diversity of DocumentsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.287424631:11(2124-2137)Online publication date: 1-Nov-2019
https://doi.org/10.1109/TKDE.2018.2874246
Azarbonyad HMarx M(2019)How Many Labels? Determining the Number of Labels in Multi-Label Text ClassificationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-28577-7_11(156-163)Online publication date: 3-Aug-2019
https://doi.org/10.1007/978-3-030-28577-7_11
Azarbonyad HDehghani MBeelen KArkut AMarx MKamps JLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)Words are MalleableProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132878(1509-1518)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132878
Azarbonyad HDehghani MKenter TMarx MKamps Jde Rijke M(2017)Hierarchical Re-estimation of Topic Models for Measuring Topical DiversityAdvances in Information Retrieval10.1007/978-3-319-56608-5_6(68-81)Online publication date: 8-Apr-2017
https://doi.org/10.1007/978-3-319-56608-5_6
Dehghani MAzarbonyad HKamps JHiemstra DMarx MMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Luhn RevisitedProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983814(1301-1310)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983814
Dehghani MAzarbonyad HKamps JMarx M(2016)Two-Way Parsimonious Classification Models for Evolving HierarchiesExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-319-44564-9_6(69-82)Online publication date: 23-Aug-2016
https://doi.org/10.1007/978-3-319-44564-9_6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents