ABSTRACT
Networked data, extracted from social media, web pages, and bibliographic databases, can contain entities of multiple classes, interconnected through different types of links. In this paper, we focus on the problem of performing multi-label classification on networked data, where the instances in the network can be assigned multiple labels. In contrast to traditional content-only classification methods, relational learning succeeds in improving classification performance by leveraging the correlation of the labels between linked instances. However, instances in a network can be linked for various causal reasons, hence treating all links in a homogeneous way can limit the performance of relational classifiers.
In this paper, we propose a multi-label iterative relational neighbor classifier that employs social context features (SCRN). Our classifier incorporates a class propagation probability distribution obtained from instances' social features, which are in turn extracted from the network topology. This class-propagation probability captures the node's intrinsic likelihood of belonging to each class, and serves as a prior weight for each class when aggregating the neighbors' class labels in the collective inference procedure. Experiments on several real-world datasets demonstrate that our proposed classifier boosts classification performance over common benchmarks on networked multi-label data.
- Bhagat, S., Cormode, G., and Muthukrishnan, S. Node classification in social networks. Computing Research Repository (CoRR) abs/1101.3291 (2011).Google Scholar
- Boughorbely, S., Tarel, J.-P., and Boujemaa, N. Generalized histogram intersection kernel for image recognition. In IEEE International Conference on Image Processing (2005).Google ScholarCross Ref
- Chakrabarti, S., Dom, B., , and Indyk, P. Enhanced hypertext categorization using hyperlinks. In Proceedings of the ACM International Conference on Management of Data (SIGMOD) (1998), pp. 307--318. Google ScholarDigital Library
- Fan, R., and Lin, C. A study on threshold selection for multi-label classification. Tech. rep., National Taiwan University, 2007.Google Scholar
- Fan, Y., and Shelton, C. R. Learning continuous-time social network dynamics. In Proceedings of Conference on Uncertainty in Artificial Intelligence (UAI) (2009), pp. 161--168. Google ScholarDigital Library
- Getoor, L., and Taskar, B. Introduction to Statistical Relational Learning. The MIT Press, 2007. Google ScholarDigital Library
- Goldberg, A., Zhu, X., and Wright, S. Dissimilarity in graph-based semi-supervised classification. In Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS) (2007).Google Scholar
- Guo, Y., and Gu, S. Multi-label classification using conditional dependency networks. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI) (2011), pp. 1300--1305. Google ScholarDigital Library
- Heatherly, R., Kantarcioglu, M., and Li, X. Social network classification incorporating link type. In Proceedings of IEEE Intelligence and Security Informatics (ISI) (2009), pp. 19--24. Google ScholarDigital Library
- Ji, M., Han, J., and Danilevsky, M. Ranking-based classification of heterogeneous information networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2011), pp. 1298--1306. Google ScholarDigital Library
- Lewis, D. D., Yang, Y., Rose, T. G., and Li, F. RCV1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research 5 (Dec 2004), 361--397. Google ScholarDigital Library
- Lu, Q., and Getoor, L. Link-based classification. In Proceedings of 20th International Conference on Machine Learning (ICML) (2003), pp. 496--503.Google Scholar
- Macskassy, S. A., and Provost, F. A simple relational classifier. In Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM) at KDD 2003 (2003), pp. 64--76.Google ScholarCross Ref
- Macskassy, S. A., and Provost, F. Classification in networked data: a toolkit and a univariate case study. Journal of Machine Learning 8 (2007), 935--983. Google ScholarDigital Library
- McPherson, M., Smith-Lovin, L., and Cook, J. M. Birds of a feather: Homophily in social networks. Annual Review of Sociology 27, 1 (2001), 415--444.Google ScholarCross Ref
- Neville, J., Gallagher, B., Eliassi-Rad, T., and Wang, T. Correcting evaluation bias of relational classifiers with network cross validation. Knowledge and Information Systems (Jan 2011), 1--25. Google ScholarDigital Library
- Neville, J., and Jensen, D. Iterative classification in relational data. In Proceedings of the AAAI Workshop on Learning Statistical Models from Relational Data (2000), pp. 42--49.Google Scholar
- Neville, J., Jensen, D., Friedland, L., and Hay, M. Learning relational probability trees. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) (2003), pp. 625--630. Google ScholarDigital Library
- Newman, M. Networks: An Introduction. Oxford University Press, 2010. Google ScholarCross Ref
- Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., and Eliassi-Rad, T. Collective classification in network data. AI Magazine (2008), 93--106.Google Scholar
- Singh, A., and Gordon, G. A Bayesian matrix factorization model for relational data. In Proceedings of Conference on Uncertainty in Artificial Intelligence (UAI) (2010), pp. 556--563.Google Scholar
- Tang, L., and Liu, H. Relational learning via latent social dimensions. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009), KDD '09, pp. 817--826. Google ScholarDigital Library
- Tang, L., and Liu, H. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of International Conference on Information and Knowledge Management (CIKM) (2009). Google ScholarDigital Library
- Tang, L., and Liu, H. Leveraging social media networks for classification. Data Mining and Knowledge Discovery (DMKD 2011) 23, 3 (Nov. 2011), 447--478. Google ScholarDigital Library
- Taskar, B., Abbeel, P., and Koller, D. Discriminative probabilistic models for relational data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2002), pp. 895--902. Google ScholarDigital Library
- Wang, X., and Sukthankar, G. Extracting social dimensions using Fiedler embedding. In Proceedings of IEEE International Confernece on Social Computing (2011), pp. 824--829.Google ScholarCross Ref
- Yedidia, J. S., Freeman, W. T., and Weiss, Y. Constructing free energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory 51 (2005), 2282--2312. Google ScholarDigital Library
- Zhang, X., Yuan, Q., Zhao, S., Fan, W., Zheng, W., and Wang, Z. Multi-label classification without the multi-label cost. In Proceedings of SIAM International Conference on Data Mining (Apr. 2010).Google ScholarCross Ref
Index Terms
- Multi-label relational neighbor classification using social context features
Recommendations
Relational learning via latent social dimensions
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningSocial media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, relational learning has been proposed, and collective inference based on ...
Leveraging label-independent features for classification in sparsely labeled networks: an empirical study
SNAKDD'08: Proceedings of the Second international conference on Advances in social network mining and analysisWe address the problem of within-network classification in sparsely labeled networks. Recent work has demonstrated success with statistical relational learning (SRL) and semi-supervised learning (SSL) on such problems. However, both approaches rely on ...
Semi-supervised multi-label classification using incomplete label information
Highlights- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
AbstractClassifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Comments