skip to main content
10.1145/1526709.1526724acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections

StatSnowball: a statistical approach to extracting entity relationships

Published: 20 April 2009 Publication History


Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristic-based methods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Furthermore, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify various types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a bootstrapping system and can perform both traditional relation extraction and Open IE.
StatSnowball uses the discriminative Markov logic networks (MLNs) and softens hard rules by learning their weights in a maximum likelihood estimate sense. MLN is a general model, and can be configured to perform different levels of relation extraction. In StatSnwoball, pattern selection is performed by solving an l1-norm penalized maximum likelihood estimation, which enjoys well-founded theories and efficient solvers. We extensively evaluate the performance of StatSnowball in different configurations on both a small but fully labeled data set and large-scale Web data. Empirical results show that StatSnowball can achieve a significantly higher recall without sacrificing the high precision during iterations with a small number of seeds, and the joint inference of MLN can improve the performance. Finally, StatSnowball is efficient and we have developed a working entity relation search engine called Renlifang based on it.


E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. In International Conference on Digital Libraries, 2000.
G. Andrew and J. Gao. Scalable training of l<sub>1</sub>-regularized log-linear models. In ICML, 2007.
M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, 2007.
M. Banko and O. Etzioni. The tradeoffs between open and traditional relation extraction. In ACL, 2008.
S. Brin. Extracting patterns and relations from the world wide web. In International Workshop on the Web and Databases, 1998.
C. Cortes and V. Vapnik. Support-vector networks. Machine Learing, 20:273--297, 1995.
O. Etzioni, M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91--134, 2005.
C. Giuliano, A. Lavelli, and L. Romano. Exploiting shallow linguistic information for relation extraction from biomedical literature. In EACL, 2006.
A. Harabagiu, C. A. Bejan, and P. Morcheckarescu. Shallow semantics for relation extraction. In IJCAI, 2005.
T. N. Huynh and R. J. Mooney. Dsicriminative structure and parameter learning for markov logic networks. In ICML, 2008.
A. Kaban. On Bayesian classification with laplace priors. Pattern Recognition Letters, 28(10):1271--1282, 2007.
S. Kok and P. Domingos. Learning the structure of markov logic networks. In ICML, 2005.
S. Kok and P. Domingos. Statistical predicate invention. In ICML, 2007.
S. Kok and P. Domingos. Extracting semantic networks from text via relational clustering. In ECML, 2008.
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001.
A. McCallum. Efficiently inducing features of conditional random fields. In UAI, 2003.
A. McCallum and D. Jensen. A note on the unification of information extraction and data mining using conditional probability, relational models. In IJCAI-2003 Workshop on Learning Statistical Models from Relational Data, 2003.
Z. Nie, J.-R. Wen, and W.-Y. Ma. Object-level vertical search. In CIDR, 2007.
S. D. Pietra, V. D. Pietra, and J. Lafferty. Inducing features of random fields. IEEE Trans. on PAMI, 1997.
H. Poon and P. Domingos. Joint inference in information extraction. In AAAI, 2007.
M. Richardson and P. Domingos. Markov logic networks. Machine Learing, 62(1--2):107--136, 2006.
Y. Shinyama and S. Sekine. Preemptive information extraction using unrestricted relation discovery. In HLT/NAACL, 2006.
P. Singla and P. Domingos. Discriminative training of markov logic networks. In AAAI, 2005.
C. H. Teo, Q. Le, A. Smola, and S. Vishwanathan. A scalable modular convex solver for regularized risk minimization. In SIGKDD, 2007.
R. Tibshirani. Regression shrinkage and selection via the LASSO. J. Royal. Statist. Soc., B(58):267--288, 1996.
D. Zelenko, C. AoneE, and A. Richardella. Kernel methods for relation extraction. Journal of Machine Learning Research, (3):1083--1106, 2003.
G. Zhou, M. Zhang, D. H. Ji, and Q. Zhu. Tree kernel-based relation extraction with context-sensitive structured parse tree information. In EMNLP-CoNLL, 2005.
J. Zhu, Z. Nie, J.-R. Wen, B. Zhang, and W.-Y. Ma. Simultaneous record detection and attribute labeling in web data extraction. In SIGKDD, 2006.

Cited By

View all
  • (2024)RSRNeT: a novel multi-modal network framework for named entity recognition and relation extractionPeerJ Computer Science10.7717/peerj-cs.185610(e1856)Online publication date: 9-Feb-2024
  • (2024)Verifiable Strong Privacy-Preserving Any-Hop Reachability Query on Blockchain-Assisted CloudIEEE Internet of Things Journal10.1109/JIOT.2024.344543111:24(39637-39650)Online publication date: 15-Dec-2024
  • (2024)Extraction of object-action and object-state associations from Knowledge GraphsJournal of Web Semantics10.1016/j.websem.2024.10081681(100816)Online publication date: Jul-2024
  • Show More Cited By
  1. StatSnowball: a statistical approach to extracting entity relationships



    Information & Contributors


    Published In

    cover image ACM Conferences
    WWW '09: Proceedings of the 18th international conference on World wide web
    April 2009
    1280 pages



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 April 2009


    Request permissions for this article.

    Check for updates

    Author Tags

    1. Markov logic networks
    2. relationship extraction
    3. statistical models


    • Research-article


    WWW '09

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 15 Feb 2025

    Other Metrics


    Cited By

    View all
    • (2024)RSRNeT: a novel multi-modal network framework for named entity recognition and relation extractionPeerJ Computer Science10.7717/peerj-cs.185610(e1856)Online publication date: 9-Feb-2024
    • (2024)Verifiable Strong Privacy-Preserving Any-Hop Reachability Query on Blockchain-Assisted CloudIEEE Internet of Things Journal10.1109/JIOT.2024.344543111:24(39637-39650)Online publication date: 15-Dec-2024
    • (2024)Extraction of object-action and object-state associations from Knowledge GraphsJournal of Web Semantics10.1016/j.websem.2024.10081681(100816)Online publication date: Jul-2024
    • (2024)Cnosso, a novel method for business document automation based on open information extractionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.123038245:COnline publication date: 2-Jul-2024
    • (2024)OIE4PA: open information extraction for the public administrationJournal of Intelligent Information Systems10.1007/s10844-023-00814-z62:1(273-294)Online publication date: 1-Feb-2024
    • (2023)Entity Relationship Extraction Based on a Multi-Neural Network Cooperation ModelApplied Sciences10.3390/app1311681213:11(6812)Online publication date: 3-Jun-2023
    • (2023)A Comprehensive Survey on Automatic Knowledge Graph ConstructionACM Computing Surveys10.1145/361829556:4(1-62)Online publication date: 5-Sep-2023
    • (2023)Application of DA-Bi-SRU and Improved RoBERTa Model in Entity Relationship Extraction for High-Speed Train Bogie2023 6th International Conference on Data Science and Information Technology (DSIT)10.1109/DSIT60026.2023.00023(89-96)Online publication date: 28-Jul-2023
    • (2023)Answering reachability queries with ordered label constraints over labeled graphsFrontiers of Computer Science10.1007/s11704-022-2368-y18:1Online publication date: 12-Aug-2023
    • (2023)A system review on bootstrapping information extractionMultimedia Tools and Applications10.1007/s11042-023-17005-183:13(38329-38353)Online publication date: 5-Oct-2023
    • Show More Cited By

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media