Date of Award
1-1-2018
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
College/School/Department
Department of Information Science
Content Description
1 online resource (ii, xii, 130 pages) : illustrations (some color)
Dissertation/Thesis Chair
Özlem Uzuner
Committee Members
Eliot Rich, Michele Filannino
Keywords
CRFs, information extraction, machine learning, named-entity recognition, NLP, word embeddings, Medical informatics, Data mining, Information retrieval, Natural language processing (Computer science)
Subject Categories
Bioinformatics | Computer Sciences | Library and Information Science
Abstract
Information extraction (IE) is a fundamental component of natural language processing (NLP) that provides a deeper understanding of the texts. In the clinical domain, documents prepared by medical experts (e.g., discharge summaries, drug labels, medical history records) contain a significant amount of clinically-relevant information that is crucial to the overall well-being of patients. Unfortunately, in many cases, clinically-relevant information is presented in an unstructured format, predominantly consisting of free-texts, making it inaccessible to computerized methods. Automatic extraction of this information can improve accessibility. However, the presence of synonymous expressions, medical acronyms, misspellings, negated phrases, and ambiguous terminologies make automatic extraction difficult. The lack of annotated data, sometimes in less well-represented information categories, sometimes in all categories altogether, further complicates this task.
Recommended Citation
Tao, Mingzhe, "Clinical information extraction from unstructured free-texts" (2018). Legacy Theses & Dissertations (2009 - 2024). 2175.
https://scholarsarchive.library.albany.edu/legacy-etd/2175
Included in
Bioinformatics Commons, Computer Sciences Commons, Library and Information Science Commons