ORCID
https://orcid.org/0000-0002-6690-2299
Date of Award
Spring 2025
Language
English
Embargo Period
11-30-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
College/School/Department
Department of Computer Science
Program
Computer Science
First Advisor
Charalampos Chelmis
Committee Members
Sherry Sahebi, Petko Bogdanov, Daphney-Stavroula Zois
Keywords
Predictive Modeling, Longitudinal Data, High-stakes domains, Impact-aligned predictive modeling
Abstract
Tabular data has been a core representation format in machine learning, widely used in high-stakes domains such as healthcare, finance, and social services. In many of these domains, data is collected longitudinally, capturing sequences of events and evolving feature values for the same entities over time. Despite its prevalence, most machine learning approaches assume mutual independence between features and independent and identically distributed (i.i.d.) instances. However, these assumptions limit effectiveness in settings where features and entities exhibit complex interdependencies. Moreover, existing approaches frequently overlook a critical dimension: the real-world impact of the decisions informed by model predictions. This thesis addresses these limitations with the broader goal of improving model performance—not only in terms of aligning with ground truth labels but also in terms of enhancing outcomes. To achieve this, we propose four complementary models tailored for longitudinal tabular data: TRACE, PREVISE, REPLETE, and SECURE. TRACE models high-stakes systems as event transition networks and defines a novel sequence similarity score based on event sequences to support personalized event prediction. PREVISE constructs Bayesian networks that capture both service transitions and their links to exit, providing structured probabilistic modeling for high-stakes longitudinal systems. Building upon these foundations, REPLETE captures feature interactions by modeling temporal patterns and functional relationships between events, while simultaneously capturing instance interactions through soft clustering of entities based on shared characteristics. These relationships are embedded into a unified optimization framework that learns low-dimensional representations, which are then used as input to predictive models. REPLETE's generalizability and effectiveness are demonstrated across three domains—homeless service assignment, financial service recommendation, and ICU service prediction. Our experiments show that REPLETE performs significantly well in the healthcare and social service domains and achieves performance comparable to the best-performing baseline in the financial domain. Finally, SECURE complements predictive modeling with decision-making aligned to real-world outcomes. It jointly models the likelihood of a favorable event outcome and the risk of undesirable recurrence using a tripartite event-outcome-recurrence graph, while enforcing a similarity-based neighbor constraint to ensure feasible recommendations. SECURE is evaluated in the context of homeless service assignment, demonstrating that it provides outcome-aligned recommendations that outperform baseline approaches focused solely on predictive accuracy. Together, the models introduced in this thesis represent a step forward in building predictive systems that are both accurate and socially impactful.
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Rahman, Khandker Sadia, "Modeling Feature and Instance Interactions in Longitudinal Data for Improved Performance" (2025). Electronic Theses & Dissertations (2024 - present). 208.
https://scholarsarchive.library.albany.edu/etd/208