skip to main content
article

Link mining: a new data mining challenge

Published:01 July 2003Publication History
Skip Abstract Section

Abstract

A key challenge for data mining is tackling the problem of mining richly structured datasets, where the objects are linked in some way. Links among the objects may demonstrate certain patterns, which can be helpful for many data mining tasks and are usually hard to capture with traditional statistical models. Recently there has been a surge of interest in this area, fueled largely by interest in web and hypertext mining, but also by interest in mining social networks, security and law enforcement data, bibliographic citations and epidemiological records.

References

  1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf Very Large Data Bases, VLDB, pages 487--499. Morgan Kaufmann, 12--15 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Bilenko and R. J. Mooney. On evaluation and training-set construction for duplicate detection. under review.]]Google ScholarGoogle Scholar
  3. S. Chakrabarti. Mining the Web. Morgan Kaufman, 2002.]]Google ScholarGoogle Scholar
  4. S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In Proc of SIGMOD-98, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Chakrabarti, M. Joshi, and V. Tawde. Enhanced topic distillation using text, markup tags, and hyperlinks. In Research and Development in Information Retrieval, pages 208--216, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Chellappa and A. Jain. Markov random fields: theory and applications. Academic Press, Boston, 1993.]]Google ScholarGoogle Scholar
  7. D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In Neural Information Processing Systems 13, 2001.]]Google ScholarGoogle Scholar
  8. D. Cook and L. Holder. Graph-based data mining. IEEE Intelligent Systems, 15(2):32--41, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dean and M. R. Henzinger. Finding related pages in the World Wide Web. Computer Networks, 31(11--16):1467--1479, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Dehaspe, H. Toivonen, and R. D. King. Finding frequent substructures in chemical compounds. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, 4th International Conference on Knowledge Discovery and Data Mining, pages 30--36. AAAI Press., 1998.]]Google ScholarGoogle Scholar
  11. P. Domingos. Prospects and challenges for multirelational data mining. SIGKDD Explorations, 2003. In this volume.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Domingos and M. Richardson. Mining the network value of customers. In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Dzeroski and N. Lavrac, editors. Relational Data Mining. Kluwer, Berlin, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Feldman. Link analysis: Current state of the art. In KDD-02 Tutorial, 2002.]]Google ScholarGoogle Scholar
  15. P. A. Flach and N. Lavrac. The role of feature construction in inductive rule learning. In Proc. of the ICML2000 workshop on Attribute-Value and Relational Learning: crossing the boundaries, 2000.]]Google ScholarGoogle Scholar
  16. L. Getoor, N. Friedman, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In S. Dzeroski and N. Lavrac, editors, Relational Data Mining, pages 307--335. Kluwer, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  17. L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models with link uncertainty. Journal of Machine Learning Research, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Getoor and D. Jensen. Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data. AAAI Press, 2000.]]Google ScholarGoogle Scholar
  19. L. Getoor and D. Jensen. Proc. IJCAI 2003 Workshop on Learning Statistical Models from Relational Data. AAAI Press, 2003.]]Google ScholarGoogle Scholar
  20. D. Gibson, J. M. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In UK Conference on Hypertext, pages 225--234, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Hofmann. Probabilistic Latent Semantic Indexing. In Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, pages 50--57, Berkeley, California, August 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Hummel and S. Zucker. On the foundations of relaxation labeling processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(5):267--287, 1983.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Principles of Data Mining and Knowledge Discovery, pages 13--23, 2000.]] Google ScholarGoogle ScholarCross RefCross Ref
  24. D. Jensen. Statistical challenges to inductive inference in linked data. In Seventh International Workshop on Artificial Intelligence and Statistics, 1999.]]Google ScholarGoogle Scholar
  25. D. Jensen and H. Goldberg. AAAI Fall Symposium on AI and Link Analysis. AAAI Press, 1998.]]Google ScholarGoogle Scholar
  26. H. Kautz, B. Selman, and M. Shah. Referral Web: Combining social networks and collaborative filtering. Communications of the ACM, 40(3):63--65, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Kramer, N. Lavrac, and P. Flach. Propositionalization approaches to relational data mining. In S. Dzeroski and N. Lavrac, editors, Relational Data Mining, pages 262--291. Kluwer, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Kubica, A. Moore, J. Schneider, and Y. Yang. Stochastic link and group detection. In Proc. of AAAI-02, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Q. Lu and L. Getoor. Link-based classification. In Proc. of ICML-03, 2003.]]Google ScholarGoogle Scholar
  32. K. Murphy and Y. Weiss. Loopy belief propagation for approximate inference: an empirical study. In Proc. of UAI-99. Morgan Kaufman, 1999.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Neville and D. Jensen. Iterative classification in relational data. In Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data. AAAI Press, 2000.]]Google ScholarGoogle Scholar
  34. H.-J. Oh, S. H. Myaeng, and M.-H. Lee. A practical hypertext categorization method using links and incrementally available class information. In Proc. of SIGIR-00, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bring order to the web. Technical report, Stanford University, 1998.]]Google ScholarGoogle Scholar
  36. C. Palmer, P. Gibbons, and C. Faloutsos. Anf: A fast and scalable toole for data mining in massive graphs. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. Identity uncertainty and citation matching. In Advances in Neural Information Processing Systems 15 (NIPS2002). MIT Press, 2003.]]Google ScholarGoogle Scholar
  38. A. Popescul, L. Ungar, S. Lawrence, and D. Pennock. Towards structural logistic regression: Combing relational and statistical learning. In KDD Workshop on Multi-Relational Data Mining, 2002.]]Google ScholarGoogle Scholar
  39. J. R. Quinlan and R. M. Cameron-Jones. FOIL: A midterm report. In Proc. of ECML-93, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Richardson and P. Domingos. The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank. In Advances in Neural Information Processing Systems 14. MIT Press, 2002.]]Google ScholarGoogle Scholar
  41. S. Russell. Identity uncertainty. In Proc. of IFSA-01, Vancouver, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  42. S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Slattery and M. Craven. Combining statistical and relational methods for learning in hypertext domains. In Proc. of ILP-98, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Proc. of UAI-02, pages 485--492, Edmonton, Canada, 2002.]]Google ScholarGoogle Scholar
  45. B. Taskar, E. Segal, and D. Koller. Probabilistic classification and clustering in relational data. In Proc. of IJCAI-01, 2001.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. W. E. Winkler. Advanced methods for record linkage. Technical report, Statistical Research Division, U.S. Census Bureau, 1994.]]Google ScholarGoogle Scholar
  47. W. E. Winkler. Methods for record linkage and bayesian networks. Technical report, Statistical Research Division, U.S. Census Bureau, 1994.]]Google ScholarGoogle Scholar
  48. Y. Yang, S. Slattery, and R. Ghani. A study of approaches to hypertext categorization. Journal of Intelligent Information Systems, 18(2--3):219--241, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Link mining: a new data mining challenge
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGKDD Explorations Newsletter
          ACM SIGKDD Explorations Newsletter  Volume 5, Issue 1
          July 2003
          101 pages
          ISSN:1931-0145
          EISSN:1931-0153
          DOI:10.1145/959242
          Issue’s Table of Contents

          Copyright © 2003 Author

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 July 2003

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader