Date of Award
8-1-2022
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
College/School/Department
Department of Epidemiology and Biostatistics
Program
Biostatistics
Content Description
1 online resource (x, 128 pages) : illustrations (some color)
Dissertation/Thesis Chair
Recai Yucel
Committee Members
Yiming Ying, Elizabeth Vasquez
Keywords
Multiple imputation (Statistics), Missing observations (Statistics)
Subject Categories
Biostatistics | Statistics and Probability
Abstract
This dissertation focuses on the development of multiple imputation models and algorithms for high-dimensional data with variable selection structures. Leveraging on the multivariate linear mixed-effects model with missing responses for clustered data, we incorporate the variable selection routines using spike-and-slab priors within the Bayesian variable selection framework. Specific choice of these priors allow us to "force'' variables of importance (e.g. design variables or variables known to play role in missingness mechanism) into the imputation models. Our ultimate goal is to improve computational speed by removing unnecessary variables. Markov chain Monte Carlo techniques have been designed to sample from the implied posterior distributions for model unknowns as well as missing data. A computationally efficient alternative, namely, variational inference algorithms have also been developed to overcome the computational burden of Markov chain Monte Carlo for sampling the posterior predictive distribution missing data. The continuous imputation model and algorithm can be easily modified through a calibration-based rounding strategy for multiple imputation of categorical variables. We also compare two imputation strategies, i.e., the sequential imputation and the joint modeling imputation, has been made in the presence of variable selection structures. We design a joint modeling based imputation framework using a nonparametric Bayesian Dirichlet process mixture model together with a sparse precision matrix model. Two computational algorithms have been carefully designed: the Gibbs sampler and the variational Bayesian inference algorithm. The proposed methodology can be applied to both continuous and mixed continuous and categorical missing data.
Recommended Citation
Li, Qiushuang, "Multiple imputation in high-dimensional data with variable selection" (2022). Legacy Theses & Dissertations (2009 - 2024). 2957.
https://scholarsarchive.library.albany.edu/legacy-etd/2957