Date of Award

1-1-2014

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

College/School/Department

Department of Epidemiology and Biostatistics

Program

Biostatistics

Content Description

1 online resource (xi, 125 pages) : illustrations (some color)

Dissertation/Thesis Chair

Recai M Yucel

Committee Members

A. Gregory DiRienzo, Tao Lu

Keywords

Binary Classification, High-dimensional Microarray Data, Multiple Imputation, Random Forests, Single Imputation, Variable selection, Random graphs, Decision trees, Trees (Graph theory), Big data, Data mining

Subject Categories

Biostatistics | Computer Sciences | Statistics and Probability

Abstract

Binary classification plays an important role in many decision-making processes. Random forests can build a strong ensemble classifier by combining weaker classification trees that are de-correlated. The strength and correlation among individual classification trees are the key factors that contribute to the ensemble performance of random forests. We propose roughened random forests, a new set of tools which show further improvement over random forests in binary classification. Roughened random forests modify the original dataset for each classification tree and further reduce the correlation among individual classification trees. This data modification process is composed of artificially imposing missing data that are missing completely at random and subsequent missing data imputation.

Share

COinS