"Modeling Feature and Instance Interactions in Longitudinal Data for Im" by Khandker Sadia Rahman

ORCID

https://orcid.org/0000-0002-6690-2299

Date of Award

Spring 2025

Language

English

Embargo Period

11-30-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

College/School/Department

Department of Computer Science

Program

Computer Science

First Advisor

Charalampos Chelmis

Committee Members

Sherry Sahebi, Petko Bogdanov, Daphney-Stavroula Zois

Keywords

Predictive Modeling, Longitudinal Data, High-stakes domains, Impact-aligned predictive modeling

Abstract

Tabular data has been a core representation format in machine learning, widely used in high-stakes domains such as healthcare, finance, and social services. In many of these domains, data is collected longitudinally, capturing sequences of events and evolving feature values for the same entities over time. Despite its prevalence, most machine learning approaches assume mutual independence between features and independent and identically distributed (i.i.d.) instances. However, these assumptions limit effectiveness in settings where features and entities exhibit complex interdependencies. Moreover, existing approaches frequently overlook a critical dimension: the real-world impact of the decisions informed by model predictions. This thesis addresses these limitations with the broader goal of improving model performance—not only in terms of aligning with ground truth labels but also in terms of enhancing outcomes. To achieve this, we propose four complementary models tailored for longitudinal tabular data: TRACE, PREVISE, REPLETE, and SECURE. TRACE models high-stakes systems as event transition networks and defines a novel sequence similarity score based on event sequences to support personalized event prediction. PREVISE constructs Bayesian networks that capture both service transitions and their links to exit, providing structured probabilistic modeling for high-stakes longitudinal systems. Building upon these foundations, REPLETE captures feature interactions by modeling temporal patterns and functional relationships between events, while simultaneously capturing instance interactions through soft clustering of entities based on shared characteristics. These relationships are embedded into a unified optimization framework that learns low-dimensional representations, which are then used as input to predictive models. REPLETE's generalizability and effectiveness are demonstrated across three domains—homeless service assignment, financial service recommendation, and ICU service prediction. Our experiments show that REPLETE performs significantly well in the healthcare and social service domains and achieves performance comparable to the best-performing baseline in the financial domain. Finally, SECURE complements predictive modeling with decision-making aligned to real-world outcomes. It jointly models the likelihood of a favorable event outcome and the risk of undesirable recurrence using a tripartite event-outcome-recurrence graph, while enforcing a similarity-based neighbor constraint to ensure feasible recommendations. SECURE is evaluated in the context of homeless service assignment, demonstrating that it provides outcome-aligned recommendations that outperform baseline approaches focused solely on predictive accuracy. Together, the models introduced in this thesis represent a step forward in building predictive systems that are both accurate and socially impactful.

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Available for download on Sunday, November 30, 2025

Share

COinS