| Reducing overfitting in process model induction |
| Full text |
Pdf
(914 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 119
archive
Proceedings of the 22nd international conference on Machine learning
table of contents
Bonn, Germany
Pages: 81 - 88
Year of Publication: 2005
ISBN:1-59593-180-5
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 22, Citation Count: 0
|
|
|
ABSTRACT
In this paper, we review the paradigm of inductive process modeling, which uses background knowledge about possible component processes to construct quantitative models of dynamical systems. We note that previous methods for this task tend to overfit the training data, which suggests ensemble learning as a likely response. However, such techniques combine models in ways that reduce comprehensibility, making their output much less accessible to domain scientists. As an alternative, we introduce a new approach that induces a set of process models from different samples of the training data and uses them to guide a final search through the space of model structures. Experiments with synthetic and natural data suggest this method reduces error and decreases the chance of including unnecessary processes in the model. We conclude by discussing related work and suggesting directions for additional research.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Arrigo, K. R., Worthen, D. L., & Robinson, D. H. (2003). A coupled ocean-ecosystem model of the Ross Sea: 2. Iron regulation of phytoplankton taxonomic variability and primary production. Journal of Geophysical Research, 108, 3231.
|
| |
3
|
Åström, K. J., & Eykhoff, P. (1971). System identification---A survey. Automatica, 7, 123--167.
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
Domingos, P. (1998). Knowledge discovery via multiple models. Intelligent Data Analysis, 2, 187--202.
|
| |
8
|
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York City: Chapman & Hall.
|
| |
9
|
|
| |
10
|
|
| |
11
|
Langley, P. (1981). Data-driven discovery of physical laws. Cognitive Science, 5, 31--54.
|
| |
12
|
Langley, P., George, D., Bay, S., & Saito, K. (2003). Robust induction of process models from time-series data. Proceedings of the Twentieth International Conference on Machine Learning (pp. 432--439). Washington, D.C.: AAAI Press.
|
| |
13
|
|
| |
14
|
|
| |
15
|
Todorovski, L. (2003). Using domain knowledge for automated modeling of dynamic systems with equation discovery. Doctoral dissertation, Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.
|
| |
16
|
Todorovski, L., Bridewell, W., Shiran, O., & Langley, P. (in press). Inducing hierarchical process models in dynamic domains. Proceedings of the Twentieth National Conference on Artificial Intelligence. Pittsburgh, PA: AAAI Press.
|
| |
17
|
|
| |
18
|
Williams, R., & Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1, 270--280.
|
| |
19
|
Zytkow, J. M., Zhu, J., & Hussam, A. (1990). Automated discovery in a chemistry laboratory. Proceedings of the Eighth National Conference on Artificial Intelligence (pp. 89--894). Boston, MA: AAAI Press.
|
|