Date of Award

1-1-2011

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

College/School/Department

Department of Information Science

Content Description

1 online resource (xiv, 191 pages) : illustrations.

Dissertation/Thesis Chair

Ozlem Uzuner

Committee Members

Jagdish Gangolly, George Berg

Keywords

assertion, classification, co-training, imbalanced data, machine learning, synonymy, Natural language processing (Computer science), Medical informatics, Medical records, Computational linguistics, Automatic speech recognition

Subject Categories

Bioinformatics | Library and Information Science

Abstract

In this dissertation we present three topics critical to the document level classification of the narrative in medical reports: the use of preferred terminology in light of the presence of synonymous terms, the less than optimal performance of classification systems when presented with a non-uniform distribution of classes, and the problems associated with scarcity of labeled data when presented with an imbalance of classes in the data sets.

Share

COinS