research-article

Identifying graphs from noisy and incomplete data

Authors:
Galileo Mark S. Namata

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

,
Lise Getoor

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain DataJune 2009Pages 23–29https://doi.org/10.1145/1610555.1610559

Published:28 June 2009Publication History

U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data

Pages 23–29

ABSTRACT

There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing these networks, in order to uncover general laws that govern their structure and evolution, and patterns and predictive models to develop better policies and practices. However, a fundamental challenge in dealing with this newly available observational data describing networks is that the data is often of dubious quality -- it is noisy and incomplete -- and before any analysis method can be applied, the data must be cleaned, and missing information inferred. In this paper, we introduce the notion of graph identification, which explicitly models the inference of a "cleaned" output network from a noisy input graph. It is this output network that is appropriate for further analysis. We present an illustrative example and use the example to explore the types of inferences involved in graph identification, as well as the challenges and issues involved in combining those inferences. We then present a simple, general approach to combining the inferences in graph identification and experimentally show the utility of our combined approach and how the performance of graph identification is sensitive to the inter-dependencies among these inferences.

References

I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1:1--36, 2007. Google ScholarDigital Library
I. Bhattacharya, S. Godbole, and S. Joshi. Structured entity identification and document categorization: two tasks with one joint model. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 25--33, 2008. Google ScholarDigital Library
M. Bilgic and L. Getoor. Effective label acquisition for collective classification. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43--51, 2008. Google ScholarDigital Library
M. Bilgic, G. M. Namata, and L. Getoor. Combining collective classification and link prediction. In ICDMW '07: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pages 381--386, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
Y. Choi, E. Breck, and C. Cardie. Joint extraction of entities and relations for opinion recognition. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 431--439. Association for Computational Linguistics, 2006. Google ScholarDigital Library
W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In Proceedings of IJCAI-03 Workshop on Information Integration, pages 73--78, August 2003.Google Scholar
L. Getoor and C. P. Diehl. Link mining: a survey. SIGKDD Explorations Newsletter, 7:3--12, 2005. Google ScholarDigital Library
L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of link structure. Machine Learning, 3:679--707, 2003. Google ScholarDigital Library
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery Data, 1(1):2, 2007. Google ScholarDigital Library
D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In Intl. Conf. on Information and Knowledge Management, 2003. Google ScholarDigital Library
Q. Lu and L. Getoor. Link-based classification. In Proceedings of the International Conference on Machine Learning, 2003.Google Scholar
L. McDowell, K. M. Gupta, and D. W. Aha. Cautious inference in collective classification. In AAAI, pages 596--601, 2007. Google ScholarDigital Library
G. M. Namata, P. Sen, M. Bilgic, and L. Getoor. Collective classification for text classification. In M. Sahami and A. Srivastava, editors, Text Mining: Classification, Clustering, and Applications. Taylor and Francis Group, 2009.Google Scholar
M. J. Rattigan, M. Maier, and D. Jensen. Exploiting network structure for active inference in collective classification. Technical report, University of Massachusetts Amherst, 2007.Google Scholar
M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62:107--136, 2006. Google ScholarDigital Library
D. Roth, K. Small, and I. Titov. Sequential learning of classifiers for structured prediction problems. In Artificial Intelligence STAT, April 2009.Google Scholar
D. Roth and W. Yih. A linear programming formulation for global inference in natural language tasks. In Proc. of CoNLL-2004, pages 1--8. Boston, MA, USA, 2004.Google Scholar
B. Taskar, A. Pieter, and D. Koller. Discriminative probabilistic models for relational data. In Conf. on Uncertainty in Artificial Intelligence, 2002. Google ScholarDigital Library
B. Taskar, M.-F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. In Advances in Neural Information Processing Systems, 2003.Google Scholar
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October 1999. Google ScholarDigital Library

Index Terms

Recommendations

Identifying graphs from noisy and incomplete data

There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing ...
Read More
Collective graph identification
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Data describing networks (communication networks, transaction networks, disease transmission networks, collaboration networks, etc.) is becoming increasingly ubiquitous. While this observational data is useful, it often only hints at the actual ...
Read More
Collective Graph Identification

Data describing networks—such as communication networks, transaction networks, disease transmission networks, collaboration networks, etc.—are becoming increasingly available. While observational data can be useful, it often only hints at the actual ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
June 2009
66 pages
ISBN:9781605586755
DOI:10.1145/1610555
Editors:
Jian Pei
Simon Fraser University
,
Lise Getoor
University of Maryland College Park
,
Ander de Keijzer
University of Twente
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
data mining
entity resolution
link prediction
social networks
statistical relational learning
Qualifiers
- research-article
Conference
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 210
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Identifying graphs from noisy and incomplete data

U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identifying graphs from noisy and incomplete data

Collective graph identification

Collective Graph Identification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Identifying graphs from noisy and incomplete data

U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identifying graphs from noisy and incomplete data

Collective graph identification

Collective Graph Identification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media