| Modifying boosted trees to improve performance on task 1 of the 2006 KDD challenge cup |
| Full text |
Pdf
(252 KB)
|
| Source
|
ACM SIGKDD Explorations Newsletter
archive
Volume 8 , Issue 2 (December 2006)
table of contents
Pages: 47 - 52
Year of Publication: 2006
ISSN:1931-0145
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 44, Citation Count: 0
|
|
|
ABSTRACT
Task 1 of the 2006 KDD Challenge Cup required classification of pulmonary embolisms (PEs) using variables derived from computed tomography angiography. We present our approach to the challenge and justification for our choices. We used boosted trees to perform the main classification task, but modified the algorithm to address idiosyncrasies of the scoring criteria. The two main modifications were: 1) changing the dependent variable in the training set to account for multiple PEs per patient, and 2) incorporating neighborhood information through augmentation of the set of predictor variables. Both of these resulted in measurable predictive improvement. In addition, we discuss a statistically based method for setting the classification threshold.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Breiman, L. Arcing classifiers (with discussion and rejoinder). Annals Statist. 26 (1998), 801--849
|
| |
2
|
Friedman, J. H. Greedy function approximation: a gradient boosting machine, Annals Statist. 29 (2001), 1189--1232.
|
| |
3
|
Friedman, J., Hastie, T., and Tibshirani, R. Additive logistic regression: a statistical view of boosting (with discussion and rejoinder). Annals Statist. 28 (2000), 337--407.
|
| |
4
|
Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning. Springer, New York, 2001.
|
| |
5
|
KDD Cup 2006. Data description, task specification, rules. http://www.cs.unm.edu/files/kdd-cup-2006-task-spec-v1.pdf
|
| |
6
|
|
| |
7
|
Viola, P., Platt, J, and Zhang, C. Multiple instance boosting for object detection. NIPS 18, 2006.
|
|