Date of Award

1-1-2018

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

College/School/Department

Department of Information Science

Content Description

1 online resource (ii, xii, 130 pages) : illustrations (some color)

Dissertation/Thesis Chair

Özlem Uzuner

Committee Members

Eliot Rich, Michele Filannino

Keywords

CRFs, information extraction, machine learning, named-entity recognition, NLP, word embeddings, Medical informatics, Data mining, Information retrieval, Natural language processing (Computer science)

Subject Categories

Bioinformatics | Computer Sciences | Library and Information Science

Abstract

Information extraction (IE) is a fundamental component of natural language processing (NLP) that provides a deeper understanding of the texts. In the clinical domain, documents prepared by medical experts (e.g., discharge summaries, drug labels, medical history records) contain a significant amount of clinically-relevant information that is crucial to the overall well-being of patients. Unfortunately, in many cases, clinically-relevant information is presented in an unstructured format, predominantly consisting of free-texts, making it inaccessible to computerized methods. Automatic extraction of this information can improve accessibility. However, the presence of synonymous expressions, medical acronyms, misspellings, negated phrases, and ambiguous terminologies make automatic extraction difficult. The lack of annotated data, sometimes in less well-represented information categories, sometimes in all categories altogether, further complicates this task.

Share

COinS