skip to main content
research-article

Topic taxonomy adaptation for group profiling

Published: 01 February 2008 Publication History

Abstract

A topic taxonomy is an effective representation that describes salient features of virtual groups or online communities. A topic taxonomy consists of topic nodes. Each internal node is defined by its vertical path (i.e., ancestor and child nodes) and its horizonal list of attributes (or terms). In a text-dominant environment, a topic taxonomy can be used to flexibly describe a group's interests with varying granularity. However, the stagnant nature of a taxonomy may fail to timely capture the dynamic change of a group's interest. This article addresses the problem of how to adapt a topic taxonomy to the accumulated data that reflects the change of a group's interest to achieve dynamic group profiling. We first discuss the issues related to topic taxonomy. We next formulate taxonomy adaptation as an optimization problem to find the taxonomy that best fits the data. We then present a viable algorithm that can efficiently accomplish taxonomy adaptation. We conduct extensive experiments to evaluate our approach's efficacy for group profiling, compare the approach with some alternatives, and study its performance for dynamic group profiling. While pointing out various applications of taxonomy adaption, we suggest some future work that can take advantage of burgeoning Web 2.0 services for online targeted marketing, counterterrorism in connecting dots, and community tracking.

References

[1]
Adomavicius, G. and Tuzhilin, A. 2001. Using data mining methods to build customer profiles. Comput. 34, 2, 74--82.
[2]
Aggarwal, C. C., Gates, S. C., and Yu, P. S. 1999. On the merits of building categorization systems by supervised clustering. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press, New York, 352--356.
[3]
Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. SIGMOD Rec. 22, 2, 207--216.
[4]
Airoldi, E. M., Fienberg, S. E., Joutard, C., and Love, T. M. 2006. Discovering latent patterns with hierarchical Bayesian mixed-membership models. Tech. Rep. CMU-ML-06-101, School of Computer Science, Carnegie Mellon University, Philadelphia, PA.
[5]
Allan, J. 2002. Introduction to Topic Detection and Tracking. Kluwer Academic, Norwell, MA, 1--16.
[6]
Blei, D., Griffiths, T. L., Jordan, M. I., and Tenenbaum, J. B. 2003. Hierarchical topic models and the nested Chinese restaurant process. In Advances in Neural Information Processing Systems 16, S. Thrun et al., eds. MIT Press, Cambridge, MA.
[7]
Blei, D. M. and Lafferty, J. D. 2006. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning (ICML). ACM Press, New York, 113--120.
[8]
Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022.
[9]
Bounsaythip, C. and Rinta-Runsala, E. 2001. Overview of data mining for customer behavior modeling. http://virtual.vtt.fi/inf/julkaisut/muut/2001/customerprofiling.pdf.
[10]
Cai, L. and Hofmann, T. 2004. Hierarchical document categorization with support vector machines. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM). ACM Press, New York, 78--87.
[11]
Cesa-Bianchi, N., Gentile, C., and Zaniboni, L. 2006b. Hierarchical classification: Combining Bayes with SVM. In Proceedings of the 23rd International Conference on Machine Learning (ICML). ACM Press, New York, 177--184.
[12]
Cesa-Bianchi, N., Gentile, C., and Zaniboni, L. 2006a. Incremental algorithms for hierarchical classification. J. Mach. Learn. Res. 7, 31--54.
[13]
Chakrabarti, D., Kumar, R., and Tomkins, A. 2006. Evolutionary clustering. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press, New York, 554--560.
[14]
Chakrabarti, S., Dom, B., Agrawal, R., and Raghavan, P. 1998. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB J. 7, 3, 163--178.
[15]
Chen, M.-C., Chiu, A.-L., and Chang, H.-H. 2005. Mining changes in customer behavior in retail marketing. Expert Syst. Appl. 28, 773--781.
[16]
Chuang, S.-L. and Chien, L.-F. 2004. A practical web-based approach to generating topic hierarchy for text segments. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM). ACM Press, New York, 127--136.
[17]
Dekel, O., Keshet, J., and Singer, Y. 2004. Large margin hierarchical classification. In Proceedings of the 21st International Conference on Machine Learning (ICML). ACM Press, New York, 27.
[18]
Dhillon, I. S., Fan, J., and Guan, Y. 2001. Efficient clustering of very large document collections. In Data Mining for Scientific and Engineering Applications. Kluwer Academic.
[19]
Dumais, S. and Chen, H. 2000. Hierarchical classification of web content. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM Press, New York, 256--263.
[20]
Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289--1305.
[21]
Gates, S. C., Teiken, W., and Cheng, K.-S. F. 2005. Taxonomies by the numbers: Building high-performance taxonomies. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM). ACM Press, New York, 568--577.
[22]
Gruhl, D., Guha, R., Liben-Nowell, D., and Tomkins, A. 2004. Information diffusion through blogspace. In Proceedings of the 13th International Conference on World Wide Web (WWW). ACM Press, New York, 491--501.
[23]
Hofmann, T. 1999. The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI). Morgan Kaufmann, San Francisco, CA, 682--687.
[24]
Hwang, F. and Richards, D. 1992. The Steiner tree problem. Ann. Discrete Math. 53.
[25]
Jain, A. K. and Dubes, R. C. 1988. Algorithms for Clustering Data. Prentice-Hall.
[26]
Koller, D. and Sahami, M. 1997. Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning (ICML). Morgan Kaufmann, San Francisco, CA, 170--178.
[27]
Li, T. and Zhu, S. 2005. Hierarchical document classification using automatically generated hierarchy. In SIAM International Data Mining Conference, Newport Beach, CA.
[28]
Liu, H. and Motoda, H., eds. 2007. Computational Methods of Feature Selection. Chapman and Hall/CRC Press.
[29]
Liu, H. and Yu, L. 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17, 3, 1--12.
[30]
Liu, T.-Y., Yang, Y., Wan, H., Zeng, H.-J., Chen, Z., and Ma, W.-Y. 2005. Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor. Newsl. 7, 1, 36--43.
[31]
McCallum, A. and Nigam, K. 1998. A comparison of event models for naive Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization.
[32]
McCallum, A., Rosenfeld, R., Mitchell, T. M., and Ng, A. Y. 1998. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the 15th International Conference on Machine Learning (ICML). Morgan Kaufmann, San Francisco, CA, 359--367.
[33]
Punera, K., Rajan, S., and Ghosh, J. 2005. Automatically learning document taxonomies for hierarchical classification. In Proceedings of the Special Interest Tracks and Posters of the 14th International Conference on World Wide Web (WWW). 1010--1011.
[34]
Rousu, J., Saunders, C., Szedmak, S., and Shawe-Taylor, J. 2005. Learning hierarchical multi-category text classification models. In Proceedings of the 22nd International Conference on Machine Learning (ICML). ACM Press, New York, 744--751.
[35]
Ruiz, M. E. and Srinivasan, P. 1999. Hierarchical neural networks for text categorization (poster abstract). In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM Press, New York, 281--282.
[36]
Segal, E., Koller, D., and Ormoneit, D. 2001. Probabilistic abstraction hierarchies. In Advances in Neural Information Processing Systems 14. MIT Press, Vancouver, British Columbia, Canada, 913--920.
[37]
Shaw, M. J., Subramaniam, C., Tan, G. W., and Welge, M. E. 2001. Knowledge management and data mining for marketing. Decis. Support Syst. 31, 1, 127--137.
[38]
Sun, A. and Lim, E.-P. 2001. Hierarchical text classification and evaluation. In Proceedings of the IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, Washington, DC, 521--528.
[39]
Tang, L. and Liu, H. 2005. Bias analysis in text classification for highly skewed data. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, Washington, DC, 781--784.
[40]
Tang, L., Zhang, J., and Liu, H. 2006. Acclimatizing taxonomic semantics for hierarchical content classification from semantics to data-driven taxonomy. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press, New York, 384--393.
[41]
Toutanova, K., Chen, F., Popat, K., and Hofmann, T. 2001. Text classification in a hierarchical mixture model for small training sets. In Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM). ACM Press, New York, 105--113.
[42]
Tsochantaridis, I., Hofmann, T., Joachims, T., and Altun, Y. 2004. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the 21st International Conference on Machine Learning (ICML). ACM Press, New York, 104.
[43]
Veeramachaneni, S., Sona, D., and Avesani, P. 2005. Hierarchical Dirichlet model for document classification. In Proceedings of the 22nd International Conference on Machine Learning (ICML). ACM Press, New York, 928--935.
[44]
Wang, K., Zhou, S., and Liew, S. C. 1999. Building hierarchical classifiers using class proximity. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, San Francisco, CA, 363--374.
[45]
Weigend, A. S., Wiener, E. D., and Pedersen, J. O. 1999. Exploiting hierarchy in text categorization. Inf. Retr. 1, 3, 193--216.
[46]
Wibowo, W. and Williams, H. E. 2002. Strategies for minimising errors in hierarchical web categorisation. In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM). ACM Press, New York, 525--531.
[47]
Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML). Morgan Kaufmann, San Francisco, CA, 412--420.
[48]
Yang, Y., Zhang, J., and Kisiel, B. 2003. A scalability analysis of classifiers in text categorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM Press, New York, 96--103.
[49]
Zhang, L., Liu, S., Pan, Y., and Yang, L. 2004. Infoanalyzer: A computer-aided tool for building enterprise taxonomies. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM). ACM Press, New York, 477--483.

Cited By

View all
  • (2024)Navigating the Digital Public Sphere: An AI-Driven Analysis of Interaction Dynamics across Societal DomainsSocieties10.3390/soc1410019514:10(195)Online publication date: 26-Sep-2024
  • (2020)Scalable Taxonomy Generation and Evolution on Apache Spark2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00110(634-639)Online publication date: Aug-2020
  • (2020)Identification of Salient Attributes in Social Network: A Data Mining ApproachData Science and Analytics10.1007/978-981-15-5830-6_16(173-185)Online publication date: 28-May-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 1, Issue 4
January 2008
143 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1324172
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2008
Accepted: 01 August 2007
Revised: 01 May 2007
Received: 01 February 2007
Published in TKDD Volume 1, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Topic taxonomy
  2. dynamic profiling
  3. group interest
  4. taxonomy adjustment
  5. text hierarchical classification

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Navigating the Digital Public Sphere: An AI-Driven Analysis of Interaction Dynamics across Societal DomainsSocieties10.3390/soc1410019514:10(195)Online publication date: 26-Sep-2024
  • (2020)Scalable Taxonomy Generation and Evolution on Apache Spark2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00110(634-639)Online publication date: Aug-2020
  • (2020)Identification of Salient Attributes in Social Network: A Data Mining ApproachData Science and Analytics10.1007/978-981-15-5830-6_16(173-185)Online publication date: 28-May-2020
  • (2019)A Multivariate Method for Group Profiling Using Subgroup Discovery2019 8th Brazilian Conference on Intelligent Systems (BRACIS)10.1109/BRACIS.2019.00072(371-376)Online publication date: Oct-2019
  • (2019)Process Mining in Social Media: Applying Object-Centric Behavioral Constraint ModelsIEEE Access10.1109/ACCESS.2019.29251057(84360-84373)Online publication date: 2019
  • (2018)Social Network Analysis Based on Topic Model with Temporal FactorInternational Journal of Knowledge and Systems Science10.4018/IJKSS.20180101059:1(82-97)Online publication date: 1-Jan-2018
  • (2018)CRSStudent Engagement and Participation10.4018/978-1-5225-2584-4.ch028(553-570)Online publication date: 2018
  • (2018)Centrality-Based Group Profiling: A Comparative Study in Co-authorship NetworksNew Generation Computing10.1007/s00354-017-0028-936:1(59-89)Online publication date: 1-Jan-2018
  • (2016)A Comparative Study of Group Profiling Techniques in Co-authorship Networks2016 5th Brazilian Conference on Intelligent Systems (BRACIS)10.1109/BRACIS.2016.074(373-378)Online publication date: Oct-2016
  • (2015)Hierarchical label partitioning for large scale classification2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2015.7344792(1-10)Online publication date: Oct-2015
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media