skip to main content
article

Improving discriminative sequential learning by discovering important association of statistics

Published: 01 December 2006 Publication History

Abstract

Discriminative sequential learning models like Conditional Random Fields (CRFs) have achieved significant success in several areas such as natural language processing or information extraction. Their key advantage is the ability to capture various nonindependent and overlapping features of inputs. However, several unexpected pitfalls have a negative influence on the model's performance; these mainly come from a high imbalance among classes, irregular phenomena, and potential ambiguity in the training data. This article presents a data-driven approach that can deal with such difficult data instances by discovering and emphasizing important conjunctions or associations of statistics hidden in the training data. Discovered associations are then incorporated into these models to deal with difficult data instances. Experimental results of phrase-chunking and named entity recognition using CRFs show a significant improvement in accuracy. In addition to the technical perspective, our approach also highlights a potential connection between association mining and statistical learning by offering an alternative strategy to enhance learning performance with interesting and useful patterns discovered from large datasets.

References

[1]
Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference Very Large Data Bases (VLDB). 487--499.
[2]
Altun, Y., Hofmann, T., and Johnson, M. 2002. Discriminative learning for label sequences via boosting. In Proceedings of Neural Information Processing Systems (NIPS).
[3]
Ando, R. and Zhang, T. 2005. A high-performance semi-supervised learning methods for text chunking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL). 1--9.
[4]
Berger, A., Pietra, A. D., and Pietra, J. D. 1996. A maximum entropy approach to natural language processing. Computat. Linguis. 22, 1, 39--71.
[5]
Carreras, X. and Marquez, L. 2003. Phrase recognition by filtering and ranking with perceptrons. In Proceedings of the Recent Advances in Natural Language Processing (RANLP). 205--216.
[6]
Chen, S. F. and Rosenfeld, R. 1999. A gaussian prior for smoothing maximum entropy models. Tech. Rep. CMU-CS-99-108. Carnegie Mellon University.
[7]
Chieu, H. L. and Ng, H. T. 2003. Named entity recognition with a maximum entropy approach. In Proceedings of the Conference on Natural Language Learning (CoNLL). 160--163.
[8]
Collins, M. 2002. Discriminative training methods for hidden markov models: theory and experiment with perceptron algorithms. In Proceedings of the Conf. on Empirical Methods in Natural Language Processing (EMNLP). 1--8.
[9]
Daumé, III H. and Marcu, D. 2005. Learning as search optimization: approximate large margin methods for structured prediction. In Proceedings of the International Conference on Machine Learning (ICML).
[10]
Dietterich, T. G. 2004. Training conditional random fields via gradient tree boosting. In Proceedings of the 21th International Conference on Machine Learning (ICML). 169--176.
[11]
Florian, R., Ittycheriah, A., Jing, H., and Zhang, T. 2003. Named entity recognition through classifier combination. In Proceedings of Conference on Natural Language Learning (CoNLL). 168--171.
[12]
Freund, Y. and Schapire, R. 1997. A decision-theoretic generalization of on-line learning and application to boosting. J. Comput. Syst. Sci. 55, 119--139.
[13]
Klein, D., Smarr, J., Nguyen, H., and Manning, C. D. 2003. Named entity recognition with character-level models. In Proceedings of the Conference on Natural Language Learning (CoNLL). 180--183.
[14]
Kristjansson, T., Culotta, A., Viola, P., and McCallum, A. 2004. Interactive information extraction with constrained conditional random fields. In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI). 412--418.
[15]
Kudo, T. and Matsumoto, Y. 2001. Chunking with support vector machines. In Proceedings of the second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL). 1--8.
[16]
Kumar, S. and Hebert, M. 2003. Discriminative random fields: a discriminative framework for contextual interaction in classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1150--1157.
[17]
Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proceedings of the ACM International Conference on Management of Data (ACM SIGMOD). 1--12.
[18]
Han, J., Pei, J., Yin, Y., and Mao, R. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Disc. 8, 53--87.
[19]
He, X., Zemel, R. S., and Carreira-Perpinan, M. A. 2004. Multiscale conditional random fields for image labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 695--702.
[20]
Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML). 282--289.
[21]
Li, W., Han, J., and Pei, J. 2001. Accurate and efficient classifications based on multiple class-association rules. In Proceedings of the IEEE International Conference on Data Mining (ICDM). 369--376.
[22]
Liu, D. and Nocedal, J. 1989. On the limited memory BFGS method for large-scale optimization. Math. Program. 45, 503--528.
[23]
Liu, B., Hsu, W., and Ma, Y. 1998. Integrating classification and association rule mining. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD). 80--86.
[24]
Liu, B. 1999. Finding interesting patterns using user expectations. IEEE Trans. Knowl. Data Eng. 11, 817--832.
[25]
Malouf, R. 2002. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of Conference on Computational Natural Language Learning (CoNLL). 1--7.
[26]
McCallum, A., Freitag, D., and Pereira, F. 2000. Maximum entropy markov models for information extraction and segmentation. In Proceedings of 17th International Conference on Machine Learning (ICML). 591--598.
[27]
McCallum, A. 2003. Efficiently inducing features of conditional random fields. In Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI). 403--410.
[28]
Padmanabhan, B. and Tuzhilin, A. 1998. A belief-driven method for discovering unexpected patterns. In Proceedings of International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD). 94--100.
[29]
Peng, F., Peng, F., and McCallum, A. 2004. Chinese segmentation and new word detection using conditional random fields. In Proceedings of the 20th International Conference on Computational Linguistics (COLING).
[30]
Pietra, S. D., Pietra, V. D., and Lafferty, J. 1997. Inducing features of random fields. IEEE Trans. Pattern Analys. Mach. Intell. 19, 4, 380--393.
[31]
Pinto, D., McCallum, A., Wei, X., and Croft, W. B. 2003. Table extraction using conditional random fields. In Proceedings of the 26th ACM International Conference on Information Retrieval (ACM SIGIR). 235--242.
[32]
Phan, X. H., Nguyen, L. M., Ho, T. B., and Horiguchi, S. 2005. Improving discriminative sequential learning with rare-but-important associations. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD). 304--313.
[33]
Rabiner, L. R. 1989. A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of IEEE 77, 2, 257--286.
[34]
Ratnaparkhi, A. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
[35]
Sha, F. and Pereira, F. 2003. Shallow parsing with conditional random fields. In Proceedings of Human Language Technology/The North American Chapter of the Association for Computational Linguistics (HLT/NAACL). 134--141.
[36]
Silberschats, A. and Tuzhilin, A. 1996. What makes patterns interesting in knowledge discovery systems. IEEE Trans. Knowl. Data Eng. 8, 970--974.
[37]
Suzuki, E. 1997. Autonomous discovery of reliable exception rules. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD). 259--262.
[38]
Suzuki, E. and Shimura, M. 1996. Exceptional knowledge discovery in databases based on information theory. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD). 295--298.
[39]
Torralba, A., Murphy, K. P., and Freeman, W. T. 2004. Contextual models for object detection using boosted random fields. In Proceedings of the Conference on Neural Information Processing Systems (NIPS).
[40]
Yeo, G. and Burge, C. B. 2003. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. In Proceedings of Conference. on Computational Molecular Biology. 322--331.
[41]
Zhang, T., Damerau, F., and Johnson, D. 2002. Text chunking based on a generalization of winnow. J. Mach. Learn. Res. 2, 615--637.

Cited By

View all
  • (2013)From Creative Ideas Generation to Real World SolutionsMultidisciplinary Studies in Knowledge and Systems Science10.4018/978-1-4666-3998-0.ch008(95-112)Online publication date: 2013
  • (2011)From Creative Ideas Generation to Real World SolutionsInternational Journal of Knowledge and Systems Science10.4018/jkss.20110401032:2(31-48)Online publication date: 1-Apr-2011

Index Terms

  1. Improving discriminative sequential learning by discovering important association of statistics

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian Language Information Processing
      ACM Transactions on Asian Language Information Processing  Volume 5, Issue 4
      December 2006
      148 pages
      ISSN:1530-0226
      EISSN:1558-3430
      DOI:10.1145/1236181
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 December 2006
      Published in TALIP Volume 5, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Discriminative sequential learning
      2. association rule mining
      3. feature selection
      4. information extraction
      5. text segmentation

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2013)From Creative Ideas Generation to Real World SolutionsMultidisciplinary Studies in Knowledge and Systems Science10.4018/978-1-4666-3998-0.ch008(95-112)Online publication date: 2013
      • (2011)From Creative Ideas Generation to Real World SolutionsInternational Journal of Knowledge and Systems Science10.4018/jkss.20110401032:2(31-48)Online publication date: 1-Apr-2011

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media