Date of Award




Document Type


Degree Name

Doctor of Philosophy (PhD)


Department of Information Science

Content Description

1 online resource (ii, xii, 130 pages) : illustrations (some color)

Dissertation/Thesis Chair

Özlem Uzuner

Committee Members

Eliot Rich, Michele Filannino


CRFs, information extraction, machine learning, named-entity recognition, NLP, word embeddings, Medical informatics, Data mining, Information retrieval, Natural language processing (Computer science)

Subject Categories

Bioinformatics | Computer Sciences | Library and Information Science


Information extraction (IE) is a fundamental component of natural language processing (NLP) that provides a deeper understanding of the texts. In the clinical domain, documents prepared by medical experts (e.g., discharge summaries, drug labels, medical history records) contain a significant amount of clinically-relevant information that is crucial to the overall well-being of patients. Unfortunately, in many cases, clinically-relevant information is presented in an unstructured format, predominantly consisting of free-texts, making it inaccessible to computerized methods. Automatic extraction of this information can improve accessibility. However, the presence of synonymous expressions, medical acronyms, misspellings, negated phrases, and ambiguous terminologies make automatic extraction difficult. The lack of annotated data, sometimes in less well-represented information categories, sometimes in all categories altogether, further complicates this task.