ACM Home Page
Please provide us with feedback. Feedback
Modifying boosted trees to improve performance on task 1 of the 2006 KDD challenge cup
Full text PdfPdf (252 KB)
Source ACM SIGKDD Explorations Newsletter archive
Volume 8 ,  Issue 2  (December 2006) table of contents
Pages: 47 - 52  
Year of Publication: 2006
ISSN:1931-0145
Authors
Robert M. Bell  AT&T Labs-Research, Florham Park, NJ
Patrick G. Haffner  AT&T Labs-Research, Middletown, NJ
Chris Volinsky  AT&T Labs-Research, Florham Park, NJ
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 44,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1233321.1233327
What is a DOI?

ABSTRACT

Task 1 of the 2006 KDD Challenge Cup required classification of pulmonary embolisms (PEs) using variables derived from computed tomography angiography. We present our approach to the challenge and justification for our choices. We used boosted trees to perform the main classification task, but modified the algorithm to address idiosyncrasies of the scoring criteria. The two main modifications were: 1) changing the dependent variable in the training set to account for multiple PEs per patient, and 2) incorporating neighborhood information through augmentation of the set of predictor variables. Both of these resulted in measurable predictive improvement. In addition, we discuss a statistically based method for setting the classification threshold.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Breiman, L. Arcing classifiers (with discussion and rejoinder). Annals Statist. 26 (1998), 801--849
 
2
Friedman, J. H. Greedy function approximation: a gradient boosting machine, Annals Statist. 29 (2001), 1189--1232.
 
3
Friedman, J., Hastie, T., and Tibshirani, R. Additive logistic regression: a statistical view of boosting (with discussion and rejoinder). Annals Statist. 28 (2000), 337--407.
 
4
Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning. Springer, New York, 2001.
 
5
KDD Cup 2006. Data description, task specification, rules. http://www.cs.unm.edu/files/kdd-cup-2006-task-spec-v1.pdf
 
6
 
7
Viola, P., Platt, J, and Zhang, C. Multiple instance boosting for object detection. NIPS 18, 2006.

Collaborative Colleagues:
Robert M. Bell: colleagues
Patrick G. Haffner: colleagues
Chris Volinsky: colleagues