article

Improving discriminative sequential learning by discovering important association of statistics

Authors:

Xuan-Hieu Phan,

Le-Minh Nguyen,

Yasushi Inoguchi,

Susumu HoriguchiAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 5, Issue 4

Pages 413 - 438

https://doi.org/10.1145/1236181.1236187

Published: 01 December 2006 Publication History

Abstract

Discriminative sequential learning models like Conditional Random Fields (CRFs) have achieved significant success in several areas such as natural language processing or information extraction. Their key advantage is the ability to capture various nonindependent and overlapping features of inputs. However, several unexpected pitfalls have a negative influence on the model's performance; these mainly come from a high imbalance among classes, irregular phenomena, and potential ambiguity in the training data. This article presents a data-driven approach that can deal with such difficult data instances by discovering and emphasizing important conjunctions or associations of statistics hidden in the training data. Discovered associations are then incorporated into these models to deal with difficult data instances. Experimental results of phrase-chunking and named entity recognition using CRFs show a significant improvement in accuracy. In addition to the technical perspective, our approach also highlights a potential connection between association mining and statistical learning by offering an alternative strategy to enhance learning performance with interesting and useful patterns discovered from large datasets.

References

[1]

Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference Very Large Data Bases (VLDB). 487--499.

Digital Library

[2]

Altun, Y., Hofmann, T., and Johnson, M. 2002. Discriminative learning for label sequences via boosting. In Proceedings of Neural Information Processing Systems (NIPS).

[3]

Ando, R. and Zhang, T. 2005. A high-performance semi-supervised learning methods for text chunking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL). 1--9.

Digital Library

[4]

Berger, A., Pietra, A. D., and Pietra, J. D. 1996. A maximum entropy approach to natural language processing. Computat. Linguis. 22, 1, 39--71.

Digital Library

[5]

Carreras, X. and Marquez, L. 2003. Phrase recognition by filtering and ranking with perceptrons. In Proceedings of the Recent Advances in Natural Language Processing (RANLP). 205--216.

[6]

Chen, S. F. and Rosenfeld, R. 1999. A gaussian prior for smoothing maximum entropy models. Tech. Rep. CMU-CS-99-108. Carnegie Mellon University.

[7]

Chieu, H. L. and Ng, H. T. 2003. Named entity recognition with a maximum entropy approach. In Proceedings of the Conference on Natural Language Learning (CoNLL). 160--163.

Digital Library

[8]

Collins, M. 2002. Discriminative training methods for hidden markov models: theory and experiment with perceptron algorithms. In Proceedings of the Conf. on Empirical Methods in Natural Language Processing (EMNLP). 1--8.

Digital Library

[9]

Daumé, III H. and Marcu, D. 2005. Learning as search optimization: approximate large margin methods for structured prediction. In Proceedings of the International Conference on Machine Learning (ICML).

Digital Library

[10]

Dietterich, T. G. 2004. Training conditional random fields via gradient tree boosting. In Proceedings of the 21th International Conference on Machine Learning (ICML). 169--176.

Digital Library

[11]

Florian, R., Ittycheriah, A., Jing, H., and Zhang, T. 2003. Named entity recognition through classifier combination. In Proceedings of Conference on Natural Language Learning (CoNLL). 168--171.

Digital Library

[12]

Freund, Y. and Schapire, R. 1997. A decision-theoretic generalization of on-line learning and application to boosting. J. Comput. Syst. Sci. 55, 119--139.

Digital Library

[13]

Klein, D., Smarr, J., Nguyen, H., and Manning, C. D. 2003. Named entity recognition with character-level models. In Proceedings of the Conference on Natural Language Learning (CoNLL). 180--183.

Digital Library

[14]

Kristjansson, T., Culotta, A., Viola, P., and McCallum, A. 2004. Interactive information extraction with constrained conditional random fields. In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI). 412--418.

Digital Library

[15]

Kudo, T. and Matsumoto, Y. 2001. Chunking with support vector machines. In Proceedings of the second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL). 1--8.

Digital Library

[16]

Kumar, S. and Hebert, M. 2003. Discriminative random fields: a discriminative framework for contextual interaction in classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1150--1157.

Digital Library

[17]

Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proceedings of the ACM International Conference on Management of Data (ACM SIGMOD). 1--12.

Digital Library

[18]

Han, J., Pei, J., Yin, Y., and Mao, R. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Disc. 8, 53--87.

Digital Library

[19]

He, X., Zemel, R. S., and Carreira-Perpinan, M. A. 2004. Multiscale conditional random fields for image labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 695--702.

Digital Library

[20]

Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML). 282--289.

Digital Library

[21]

Li, W., Han, J., and Pei, J. 2001. Accurate and efficient classifications based on multiple class-association rules. In Proceedings of the IEEE International Conference on Data Mining (ICDM). 369--376.

Digital Library

[22]

Liu, D. and Nocedal, J. 1989. On the limited memory BFGS method for large-scale optimization. Math. Program. 45, 503--528.

Digital Library

[23]

Liu, B., Hsu, W., and Ma, Y. 1998. Integrating classification and association rule mining. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD). 80--86.

[24]

Liu, B. 1999. Finding interesting patterns using user expectations. IEEE Trans. Knowl. Data Eng. 11, 817--832.

Digital Library

[25]

Malouf, R. 2002. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of Conference on Computational Natural Language Learning (CoNLL). 1--7.

Digital Library

[26]

McCallum, A., Freitag, D., and Pereira, F. 2000. Maximum entropy markov models for information extraction and segmentation. In Proceedings of 17th International Conference on Machine Learning (ICML). 591--598.

Digital Library

[27]

McCallum, A. 2003. Efficiently inducing features of conditional random fields. In Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI). 403--410.

Digital Library

[28]

Padmanabhan, B. and Tuzhilin, A. 1998. A belief-driven method for discovering unexpected patterns. In Proceedings of International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD). 94--100.

[29]

Peng, F., Peng, F., and McCallum, A. 2004. Chinese segmentation and new word detection using conditional random fields. In Proceedings of the 20th International Conference on Computational Linguistics (COLING).

Digital Library

[30]

Pietra, S. D., Pietra, V. D., and Lafferty, J. 1997. Inducing features of random fields. IEEE Trans. Pattern Analys. Mach. Intell. 19, 4, 380--393.

Digital Library

[31]

Pinto, D., McCallum, A., Wei, X., and Croft, W. B. 2003. Table extraction using conditional random fields. In Proceedings of the 26th ACM International Conference on Information Retrieval (ACM SIGIR). 235--242.

Digital Library

[32]

Phan, X. H., Nguyen, L. M., Ho, T. B., and Horiguchi, S. 2005. Improving discriminative sequential learning with rare-but-important associations. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD). 304--313.

Digital Library

[33]

Rabiner, L. R. 1989. A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of IEEE 77, 2, 257--286.

[34]

Ratnaparkhi, A. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).

[35]

Sha, F. and Pereira, F. 2003. Shallow parsing with conditional random fields. In Proceedings of Human Language Technology/The North American Chapter of the Association for Computational Linguistics (HLT/NAACL). 134--141.

Digital Library

[36]

Silberschats, A. and Tuzhilin, A. 1996. What makes patterns interesting in knowledge discovery systems. IEEE Trans. Knowl. Data Eng. 8, 970--974.

Digital Library

[37]

Suzuki, E. 1997. Autonomous discovery of reliable exception rules. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD). 259--262.

[38]

Suzuki, E. and Shimura, M. 1996. Exceptional knowledge discovery in databases based on information theory. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD). 295--298.

[39]

Torralba, A., Murphy, K. P., and Freeman, W. T. 2004. Contextual models for object detection using boosted random fields. In Proceedings of the Conference on Neural Information Processing Systems (NIPS).

[40]

Yeo, G. and Burge, C. B. 2003. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. In Proceedings of Conference. on Computational Molecular Biology. 322--331.

Digital Library

[41]

Zhang, T., Damerau, F., and Johnson, D. 2002. Text chunking based on a generalization of winnow. J. Mach. Learn. Res. 2, 615--637.

Digital Library

Cited By

Rousselot FZanni-Merk CBertrand de Beuvron F(2013)From Creative Ideas Generation to Real World SolutionsMultidisciplinary Studies in Knowledge and Systems Science10.4018/978-1-4666-3998-0.ch008(95-112)Online publication date: 2013
https://doi.org/10.4018/978-1-4666-3998-0.ch008
Rousselot FZanni-Merk Cde Bertrand de Beuvron F(2011)From Creative Ideas Generation to Real World SolutionsInternational Journal of Knowledge and Systems Science10.4018/jkss.20110401032:2(31-48)Online publication date: 1-Apr-2011
https://doi.org/10.4018/jkss.2011040103

Index Terms

Improving discriminative sequential learning by discovering important association of statistics
1. Computing methodologies
  1. Machine learning
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Improving discriminative sequential learning with rare--but--important associations
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Discriminative sequential learning models like Conditional Random Fields (CRFs) have achieved significant success in several areas such as natural language processing or information extraction. Their key advantage is the ability to capture various non--...
A Survey on Association Rule Mining
ACCT '15: Proceedings of the 2015 Fifth International Conference on Advanced Computing & Communication Technologies

Task of extracting useful and interesting knowledge from large data is called data mining. It has many aspects like clustering, classification, association mining, outlier detection, regression etc. Among them association rule mining is one of the ...
TCOM, an innovative data structure for mining association rules among infrequent items

Association rule mining is one of the most important areas in data mining, which has received a great deal of attention. The purpose of association rule mining is the discovery of association relationships or correlations among a set of items. In this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing

ACM Transactions on Asian Language Information Processing Volume 5, Issue 4

December 2006

148 pages

ISSN:1530-0226

EISSN:1558-3430

DOI:10.1145/1236181

Issue’s Table of Contents

Copyright © 2006 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2006

Published in TALIP Volume 5, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
367
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rousselot FZanni-Merk CBertrand de Beuvron F(2013)From Creative Ideas Generation to Real World SolutionsMultidisciplinary Studies in Knowledge and Systems Science10.4018/978-1-4666-3998-0.ch008(95-112)Online publication date: 2013
https://doi.org/10.4018/978-1-4666-3998-0.ch008
Rousselot FZanni-Merk Cde Bertrand de Beuvron F(2011)From Creative Ideas Generation to Real World SolutionsInternational Journal of Knowledge and Systems Science10.4018/jkss.20110401032:2(31-48)Online publication date: 1-Apr-2011
https://doi.org/10.4018/jkss.2011040103

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents