ORCID

https://orcid.org/0000-0002-3748-7890

Date of Award

Spring 2026

Language

English

Embargo Period

5-1-2026

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

College/School/Department

Department of Computer Science

Program

Computer Science

First Advisor

Ming-Ching Chang

Committee Members

Xin Li, Emily Leckman-Westin, Paliath Narendran

Keywords

Graph Neural Networks, Graph Transformer, Self-Supervised Learning, Healthcare Claims Data, Representation Learning, Clinical Prediction

Subject Categories

Artificial Intelligence and Robotics | Data Science

Abstract

Healthcare data exhibit complex structures, including heterogeneous clinical entities, sparse observations, and longitudinal patient trajectories. Effectively modeling such data remains a fundamental challenge in computational healthcare research. Traditional machine learning approaches often rely on flat feature representations that fail to capture relationships among clinical events, limiting their ability to model complex healthcare processes. These challenges motivate structured learning frameworks that capture both relational structure and temporal dynamics in healthcare data. This dissertation develops a series of graph-based representation learning approaches, extended through graph-transformer architectures for modeling complex healthcare data. Such data can be represented as graphs, where nodes correspond to clinical entities and edges encode relationships. This representation enables structured modeling of complex interactions, while attention-based graph-transformer models capture higher-order dependencies and temporal dynamics. From this perspective, the dissertation explores multiple research directions. One direction focuses on modeling structured clinical data, where healthcare encounters are represented as graphs of diagnoses, medications, and demographic factors to support representation learning and downstream prediction tasks. A complementary line of work explores pharmacovigilance using patient-generated health data, where a bi-submodular optimization (BSMO) framework is proposed to detect potential drug–drug interactions from online health forums. While these approaches differ in methodology and data sources, they share a common emphasis on leveraging graph structures to uncover relationships in complex healthcare data. Within large-scale administrative claims data settings, including Medicaid claims data, several graph-based learning frameworks are proposed to improve representation learning and predictive performance. First, a graph convolutional matrix completion (GC-MC) approach is introduced to model latent comorbidity structures among diseases through graph-based link prediction, demonstrating how relational patterns among clinical conditions can be learned from sparse claims data. Building on encounterlevel graph representations, Med-GCT introduces a hybrid graph–transformer architecture that integrates graph neural networks with attention mechanisms to learn expressive representations of healthcare encounters. To leverage large-scale unlabeled data, PreClaim-GCT proposes a self-supervised learning framework that pretrains graph-transformer models using masked clinical code reconstruction objectives. Finally, TrajMedGCT extends this framework to longitudinal patient trajectories by incorporating temporal dependencies across encounters, enabling more effective modeling of sequential healthcare processes. Empirical evaluations on large-scale Medicaid claims datasets demonstrate that the proposed models consistently improve predictive performance compared to conventional machine learning and deep learning baselines, achieving gains of approximately 3–8 percentage points in AUC-PR across multiple tasks. Together, this dissertation demonstrates that graph-based and graphtransformer representation learning provides a scalable and flexible approach for modeling complex healthcare data. By integrating structural relationships with longitudinal patient trajectories, the proposed approaches advance healthcare data modeling and support effective, data-driven clinical decision-making across healthcare domains.

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS