skip to main content
10.1145/1610555.1610559acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Identifying graphs from noisy and incomplete data

Published:28 June 2009Publication History

ABSTRACT

There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing these networks, in order to uncover general laws that govern their structure and evolution, and patterns and predictive models to develop better policies and practices. However, a fundamental challenge in dealing with this newly available observational data describing networks is that the data is often of dubious quality -- it is noisy and incomplete -- and before any analysis method can be applied, the data must be cleaned, and missing information inferred. In this paper, we introduce the notion of graph identification, which explicitly models the inference of a "cleaned" output network from a noisy input graph. It is this output network that is appropriate for further analysis. We present an illustrative example and use the example to explore the types of inferences involved in graph identification, as well as the challenges and issues involved in combining those inferences. We then present a simple, general approach to combining the inferences in graph identification and experimentally show the utility of our combined approach and how the performance of graph identification is sensitive to the inter-dependencies among these inferences.

References

  1. I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1:1--36, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. I. Bhattacharya, S. Godbole, and S. Joshi. Structured entity identification and document categorization: two tasks with one joint model. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 25--33, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Bilgic and L. Getoor. Effective label acquisition for collective classification. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43--51, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Bilgic, G. M. Namata, and L. Getoor. Combining collective classification and link prediction. In ICDMW '07: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pages 381--386, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Choi, E. Breck, and C. Cardie. Joint extraction of entities and relations for opinion recognition. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 431--439. Association for Computational Linguistics, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In Proceedings of IJCAI-03 Workshop on Information Integration, pages 73--78, August 2003.Google ScholarGoogle Scholar
  7. L. Getoor and C. P. Diehl. Link mining: a survey. SIGKDD Explorations Newsletter, 7:3--12, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of link structure. Machine Learning, 3:679--707, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery Data, 1(1):2, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In Intl. Conf. on Information and Knowledge Management, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Q. Lu and L. Getoor. Link-based classification. In Proceedings of the International Conference on Machine Learning, 2003.Google ScholarGoogle Scholar
  12. L. McDowell, K. M. Gupta, and D. W. Aha. Cautious inference in collective classification. In AAAI, pages 596--601, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. M. Namata, P. Sen, M. Bilgic, and L. Getoor. Collective classification for text classification. In M. Sahami and A. Srivastava, editors, Text Mining: Classification, Clustering, and Applications. Taylor and Francis Group, 2009.Google ScholarGoogle Scholar
  14. M. J. Rattigan, M. Maier, and D. Jensen. Exploiting network structure for active inference in collective classification. Technical report, University of Massachusetts Amherst, 2007.Google ScholarGoogle Scholar
  15. M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62:107--136, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Roth, K. Small, and I. Titov. Sequential learning of classifiers for structured prediction problems. In Artificial Intelligence STAT, April 2009.Google ScholarGoogle Scholar
  17. D. Roth and W. Yih. A linear programming formulation for global inference in natural language tasks. In Proc. of CoNLL-2004, pages 1--8. Boston, MA, USA, 2004.Google ScholarGoogle Scholar
  18. B. Taskar, A. Pieter, and D. Koller. Discriminative probabilistic models for relational data. In Conf. on Uncertainty in Artificial Intelligence, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Taskar, M.-F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. In Advances in Neural Information Processing Systems, 2003.Google ScholarGoogle Scholar
  20. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Identifying graphs from noisy and incomplete data

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
                June 2009
                66 pages
                ISBN:9781605586755
                DOI:10.1145/1610555

                Copyright © 2009 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 28 June 2009

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Upcoming Conference

                KDD '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader