Abstract
A key challenge for data mining is tackling the problem of mining richly structured datasets, where the objects are linked in some way. Links among the objects may demonstrate certain patterns, which can be helpful for many data mining tasks and are usually hard to capture with traditional statistical models. Recently there has been a surge of interest in this area, fueled largely by interest in web and hypertext mining, but also by interest in mining social networks, security and law enforcement data, bibliographic citations and epidemiological records.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf Very Large Data Bases, VLDB, pages 487--499. Morgan Kaufmann, 12--15 1994.]] Google ScholarDigital Library
- M. Bilenko and R. J. Mooney. On evaluation and training-set construction for duplicate detection. under review.]]Google Scholar
- S. Chakrabarti. Mining the Web. Morgan Kaufman, 2002.]]Google Scholar
- S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In Proc of SIGMOD-98, 1998.]] Google ScholarDigital Library
- S. Chakrabarti, M. Joshi, and V. Tawde. Enhanced topic distillation using text, markup tags, and hyperlinks. In Research and Development in Information Retrieval, pages 208--216, 2001.]] Google ScholarDigital Library
- R. Chellappa and A. Jain. Markov random fields: theory and applications. Academic Press, Boston, 1993.]]Google Scholar
- D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In Neural Information Processing Systems 13, 2001.]]Google Scholar
- D. Cook and L. Holder. Graph-based data mining. IEEE Intelligent Systems, 15(2):32--41, 2000.]] Google ScholarDigital Library
- J. Dean and M. R. Henzinger. Finding related pages in the World Wide Web. Computer Networks, 31(11--16):1467--1479, 1999.]] Google ScholarDigital Library
- L. Dehaspe, H. Toivonen, and R. D. King. Finding frequent substructures in chemical compounds. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, 4th International Conference on Knowledge Discovery and Data Mining, pages 30--36. AAAI Press., 1998.]]Google Scholar
- P. Domingos. Prospects and challenges for multirelational data mining. SIGKDD Explorations, 2003. In this volume.]] Google ScholarDigital Library
- P. Domingos and M. Richardson. Mining the network value of customers. In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, 2001.]] Google ScholarDigital Library
- S. Dzeroski and N. Lavrac, editors. Relational Data Mining. Kluwer, Berlin, 2001.]] Google ScholarDigital Library
- R. Feldman. Link analysis: Current state of the art. In KDD-02 Tutorial, 2002.]]Google Scholar
- P. A. Flach and N. Lavrac. The role of feature construction in inductive rule learning. In Proc. of the ICML2000 workshop on Attribute-Value and Relational Learning: crossing the boundaries, 2000.]]Google Scholar
- L. Getoor, N. Friedman, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In S. Dzeroski and N. Lavrac, editors, Relational Data Mining, pages 307--335. Kluwer, 2001.]]Google ScholarCross Ref
- L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models with link uncertainty. Journal of Machine Learning Research, 2002.]] Google ScholarDigital Library
- L. Getoor and D. Jensen. Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data. AAAI Press, 2000.]]Google Scholar
- L. Getoor and D. Jensen. Proc. IJCAI 2003 Workshop on Learning Statistical Models from Relational Data. AAAI Press, 2003.]]Google Scholar
- D. Gibson, J. M. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In UK Conference on Hypertext, pages 225--234, 1998.]] Google ScholarDigital Library
- T. Hofmann. Probabilistic Latent Semantic Indexing. In Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, pages 50--57, Berkeley, California, August 1999.]] Google ScholarDigital Library
- R. Hummel and S. Zucker. On the foundations of relaxation labeling processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(5):267--287, 1983.]]Google ScholarDigital Library
- A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Principles of Data Mining and Knowledge Discovery, pages 13--23, 2000.]] Google ScholarCross Ref
- D. Jensen. Statistical challenges to inductive inference in linked data. In Seventh International Workshop on Artificial Intelligence and Statistics, 1999.]]Google Scholar
- D. Jensen and H. Goldberg. AAAI Fall Symposium on AI and Link Analysis. AAAI Press, 1998.]]Google Scholar
- H. Kautz, B. Selman, and M. Shah. Referral Web: Combining social networks and collaborative filtering. Communications of the ACM, 40(3):63--65, 1997.]] Google ScholarDigital Library
- J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.]] Google ScholarDigital Library
- S. Kramer, N. Lavrac, and P. Flach. Propositionalization approaches to relational data mining. In S. Dzeroski and N. Lavrac, editors, Relational Data Mining, pages 262--291. Kluwer, 2001.]] Google ScholarDigital Library
- J. Kubica, A. Moore, J. Schneider, and Y. Yang. Stochastic link and group detection. In Proc. of AAAI-02, 2002.]] Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001.]] Google ScholarDigital Library
- Q. Lu and L. Getoor. Link-based classification. In Proc. of ICML-03, 2003.]]Google Scholar
- K. Murphy and Y. Weiss. Loopy belief propagation for approximate inference: an empirical study. In Proc. of UAI-99. Morgan Kaufman, 1999.]]Google ScholarDigital Library
- J. Neville and D. Jensen. Iterative classification in relational data. In Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data. AAAI Press, 2000.]]Google Scholar
- H.-J. Oh, S. H. Myaeng, and M.-H. Lee. A practical hypertext categorization method using links and incrementally available class information. In Proc. of SIGIR-00, 2000.]] Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bring order to the web. Technical report, Stanford University, 1998.]]Google Scholar
- C. Palmer, P. Gibbons, and C. Faloutsos. Anf: A fast and scalable toole for data mining in massive graphs. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), 2002.]] Google ScholarDigital Library
- H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. Identity uncertainty and citation matching. In Advances in Neural Information Processing Systems 15 (NIPS2002). MIT Press, 2003.]]Google Scholar
- A. Popescul, L. Ungar, S. Lawrence, and D. Pennock. Towards structural logistic regression: Combing relational and statistical learning. In KDD Workshop on Multi-Relational Data Mining, 2002.]]Google Scholar
- J. R. Quinlan and R. M. Cameron-Jones. FOIL: A midterm report. In Proc. of ECML-93, 1993.]] Google ScholarDigital Library
- M. Richardson and P. Domingos. The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank. In Advances in Neural Information Processing Systems 14. MIT Press, 2002.]]Google Scholar
- S. Russell. Identity uncertainty. In Proc. of IFSA-01, Vancouver, 2001.]]Google ScholarCross Ref
- S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), 2002.]] Google ScholarDigital Library
- S. Slattery and M. Craven. Combining statistical and relational methods for learning in hypertext domains. In Proc. of ILP-98, 1998.]] Google ScholarDigital Library
- B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Proc. of UAI-02, pages 485--492, Edmonton, Canada, 2002.]]Google Scholar
- B. Taskar, E. Segal, and D. Koller. Probabilistic classification and clustering in relational data. In Proc. of IJCAI-01, 2001.]]Google ScholarDigital Library
- W. E. Winkler. Advanced methods for record linkage. Technical report, Statistical Research Division, U.S. Census Bureau, 1994.]]Google Scholar
- W. E. Winkler. Methods for record linkage and bayesian networks. Technical report, Statistical Research Division, U.S. Census Bureau, 1994.]]Google Scholar
- Y. Yang, S. Slattery, and R. Ghani. A study of approaches to hypertext categorization. Journal of Intelligent Information Systems, 18(2--3):219--241, 2002.]] Google ScholarDigital Library
Index Terms
Link mining: a new data mining challenge
Recommendations
Link mining applications: progress and challenges
This article reviews a decade of progress in the area of link mining, focusing on application requirements and how they have and have not yet been addressed, especially in the area of complex event detection. It discusses some ongoing challenges and ...
Mining uncertain data
As an important data mining and knowledge discovery task, association rule mining searches for implicit, previously unknown, and potentially useful pieces of information—in the form of rules revealing associative relationships—that are embedded in the ...
Comments