ORCID
https://orcid.org/0000-0002-3748-7890
Date of Award
Spring 2026
Language
English
Embargo Period
5-1-2026
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
College/School/Department
Department of Computer Science
Program
Computer Science
First Advisor
Ming-Ching Chang
Committee Members
Xin Li, Emily Leckman-Westin, Paliath Narendran
Keywords
Graph Neural Networks, Graph Transformer, Self-Supervised Learning, Healthcare Claims Data, Representation Learning, Clinical Prediction
Subject Categories
Artificial Intelligence and Robotics | Data Science
Abstract
Healthcare data exhibit complex structures, including heterogeneous clinical entities, sparse observations, and longitudinal patient trajectories. Effectively modeling such data remains a fundamental challenge in computational healthcare research. Traditional machine learning approaches often rely on flat feature representations that fail to capture relationships among clinical events, limiting their ability to model complex healthcare processes. These challenges motivate structured learning frameworks that capture both relational structure and temporal dynamics in healthcare data. This dissertation develops a series of graph-based representation learning approaches, extended through graph-transformer architectures for modeling complex healthcare data. Such data can be represented as graphs, where nodes correspond to clinical entities and edges encode relationships. This representation enables structured modeling of complex interactions, while attention-based graph-transformer models capture higher-order dependencies and temporal dynamics. From this perspective, the dissertation explores multiple research directions. One direction focuses on modeling structured clinical data, where healthcare encounters are represented as graphs of diagnoses, medications, and demographic factors to support representation learning and downstream prediction tasks. A complementary line of work explores pharmacovigilance using patient-generated health data, where a bi-submodular optimization (BSMO) framework is proposed to detect potential drug–drug interactions from online health forums. While these approaches differ in methodology and data sources, they share a common emphasis on leveraging graph structures to uncover relationships in complex healthcare data. Within large-scale administrative claims data settings, including Medicaid claims data, several graph-based learning frameworks are proposed to improve representation learning and predictive performance. First, a graph convolutional matrix completion (GC-MC) approach is introduced to model latent comorbidity structures among diseases through graph-based link prediction, demonstrating how relational patterns among clinical conditions can be learned from sparse claims data. Building on encounterlevel graph representations, Med-GCT introduces a hybrid graph–transformer architecture that integrates graph neural networks with attention mechanisms to learn expressive representations of healthcare encounters. To leverage large-scale unlabeled data, PreClaim-GCT proposes a self-supervised learning framework that pretrains graph-transformer models using masked clinical code reconstruction objectives. Finally, TrajMedGCT extends this framework to longitudinal patient trajectories by incorporating temporal dependencies across encounters, enabling more effective modeling of sequential healthcare processes. Empirical evaluations on large-scale Medicaid claims datasets demonstrate that the proposed models consistently improve predictive performance compared to conventional machine learning and deep learning baselines, achieving gains of approximately 3–8 percentage points in AUC-PR across multiple tasks. Together, this dissertation demonstrates that graph-based and graphtransformer representation learning provides a scalable and flexible approach for modeling complex healthcare data. By integrating structural relationships with longitudinal patient trajectories, the proposed approaches advance healthcare data modeling and support effective, data-driven clinical decision-making across healthcare domains.
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Wang, Rui, "GRAPH-BASED AND GRAPH-TRANSFORMER REPRESENTATION LEARNING FOR HEALTHCARE DATA" (2026). Electronic Theses & Dissertations (2024 - present). 429.
https://scholarsarchive.library.albany.edu/etd/429