Date of Award




Document Type


Degree Name

Doctor of Philosophy (PhD)


Department of Epidemiology and Biostatistics



Content Description

1 online resource (x, 128 pages) : illustrations (some color)

Dissertation/Thesis Chair

Recai Yucel

Committee Members

Yiming Ying, Elizabeth Vasquez


Multiple imputation (Statistics), Missing observations (Statistics)

Subject Categories

Biostatistics | Statistics and Probability


This dissertation focuses on the development of multiple imputation models and algorithms for high-dimensional data with variable selection structures. Leveraging on the multivariate linear mixed-effects model with missing responses for clustered data, we incorporate the variable selection routines using spike-and-slab priors within the Bayesian variable selection framework. Specific choice of these priors allow us to "force'' variables of importance (e.g. design variables or variables known to play role in missingness mechanism) into the imputation models. Our ultimate goal is to improve computational speed by removing unnecessary variables. Markov chain Monte Carlo techniques have been designed to sample from the implied posterior distributions for model unknowns as well as missing data. A computationally efficient alternative, namely, variational inference algorithms have also been developed to overcome the computational burden of Markov chain Monte Carlo for sampling the posterior predictive distribution missing data. The continuous imputation model and algorithm can be easily modified through a calibration-based rounding strategy for multiple imputation of categorical variables. We also compare two imputation strategies, i.e., the sequential imputation and the joint modeling imputation, has been made in the presence of variable selection structures. We design a joint modeling based imputation framework using a nonparametric Bayesian Dirichlet process mixture model together with a sparse precision matrix model. Two computational algorithms have been carefully designed: the Gibbs sampler and the variational Bayesian inference algorithm. The proposed methodology can be applied to both continuous and mixed continuous and categorical missing data.

Included in

Biostatistics Commons