ACM Home Page
Please provide us with feedback. Feedback
Reducing overfitting in process model induction
Full text PdfPdf (914 KB)
Source ACM International Conference Proceeding Series; Vol. 119 archive
Proceedings of the 22nd international conference on Machine learning table of contents
Bonn, Germany
Pages: 81 - 88  
Year of Publication: 2005
ISBN:1-59593-180-5
Authors
Will Bridewell  Stanford University, Stanford, CA
Narges Bani Asadi  Stanford University, Stanford, CA
Pat Langley  Stanford University, Stanford, CA
Ljupčo Todorovski  Jožef Stefan Institute, Ljubljana, Slovenia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 22,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1102351.1102362
What is a DOI?

ABSTRACT

In this paper, we review the paradigm of inductive process modeling, which uses background knowledge about possible component processes to construct quantitative models of dynamical systems. We note that previous methods for this task tend to overfit the training data, which suggests ensemble learning as a likely response. However, such techniques combine models in ways that reduce comprehensibility, making their output much less accessible to domain scientists. As an alternative, we introduce a new approach that induces a set of process models from different samples of the training data and uses them to guide a final search through the space of model structures. Experiments with synthetic and natural data suggest this method reduces error and decreases the chance of including unnecessary processes in the model. We conclude by discussing related work and suggesting directions for additional research.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Arrigo, K. R., Worthen, D. L., & Robinson, D. H. (2003). A coupled ocean-ecosystem model of the Ross Sea: 2. Iron regulation of phytoplankton taxonomic variability and primary production. Journal of Geophysical Research, 108, 3231.
 
3
Åström, K. J., & Eykhoff, P. (1971). System identification---A survey. Automatica, 7, 123--167.
 
4
 
5
6
 
7
Domingos, P. (1998). Knowledge discovery via multiple models. Intelligent Data Analysis, 2, 187--202.
 
8
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York City: Chapman & Hall.
 
9
 
10
 
11
Langley, P. (1981). Data-driven discovery of physical laws. Cognitive Science, 5, 31--54.
 
12
Langley, P., George, D., Bay, S., & Saito, K. (2003). Robust induction of process models from time-series data. Proceedings of the Twentieth International Conference on Machine Learning (pp. 432--439). Washington, D.C.: AAAI Press.
 
13
 
14
 
15
Todorovski, L. (2003). Using domain knowledge for automated modeling of dynamic systems with equation discovery. Doctoral dissertation, Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.
 
16
Todorovski, L., Bridewell, W., Shiran, O., & Langley, P. (in press). Inducing hierarchical process models in dynamic domains. Proceedings of the Twentieth National Conference on Artificial Intelligence. Pittsburgh, PA: AAAI Press.
 
17
 
18
Williams, R., & Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1, 270--280.
 
19
Zytkow, J. M., Zhu, J., & Hussam, A. (1990). Automated discovery in a chemistry laboratory. Proceedings of the Eighth National Conference on Artificial Intelligence (pp. 89--894). Boston, MA: AAAI Press.
Collaborative Colleagues:
Will Bridewell: colleagues
Narges Bani Asadi: colleagues
Pat Langley: colleagues
Ljupčo Todorovski: colleagues