ORCID
https://orcid.org/0000-0001-7477-2730
Date of Award
Summer 2025
Language
English
Embargo Period
7-30-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
College/School/Department
Department of Educational and Counseling Psychology
Program
Educational Psychology and Methodology
First Advisor
Mariola Moeyaert
Second Advisor
Recai Yucel
Committee Members
Mariola Moeyaert, Recai Yucel, Kimberly Colvin
Keywords
missing data, multiple imputation, complex survey, highschool longitudinal survey, systematic review
Subject Categories
Biostatistics | Educational Methods | Educational Psychology | Longitudinal Data Analysis and Time Series | Statistical Methodology
Abstract
Missing data are a nearly universal problem in human subjects research, including in education. However, reporting and addressing missing data is an issue, despite guidelines from the APA style guide and the What Works Clearinghouse, as well as guidance from prominent statisticians on the best methods to use. Prior research conducted in 2004 and 2014 found that in the field of education, most studies do not report or address missing data. In addition, no study has looked specifically at how missing data are reported and addressed in complex surveys. The current study has two main objectives: first, to determine if research using complex survey designs reports missing data, addresses it, and uses appropriate methods to handle missing data. Second, to assess the missing data methods used in the High School Longitudinal Study (HSLS) of 2009. These objectives were accomplished in two separate papers: First, a systematic review of missing data in complex survey designs aims to address the first objective. Second, using the HSLS to examine the appropriateness of a two-level mixed effects model to handle missing data. The random intercept and random intercept plus slope models were each fit using five different methods of handling missing data. The five missing data methods were complete case analysis, using the HSLS dataset as is, single imputation, single-level multiple imputation, and multi-level multiple imputation. Each of the imputation models were fit using both a semi-parametric and a parametric model. Additionally, sensitivity analyses of the multi-level multiple imputation models were conducted using the delta adjustment method. Overall, the first study found that most educational surveys using complex data (76%) did not report missing data, while among the 24% that reported missing data, 54% addressed the issue in some manner. However, only two studies used a method that accounted for the clustered nature of the data, highlighting a gap in the application of advanced techniques for handling missing data in complex surveys. Regarding the second study, the imputation models were largely similar. However, the intercept for the complete cases and NCES provided dataset had a higher GPA compared to any of the imputation methods. In addition, the intercept for the single-level multiple imputation models had at least a 10% higher fraction of missing information compared to the multi-level multiple imputation models, suggesting that more sophisticated approaches can better account for missing data in a complex survey design.
License
This work is licensed under the University at Albany Standard Author Agreement.
Recommended Citation
Robertson, Thomas Wesley, "Complex Missing Data Problems in Education Surveys" (2025). Electronic Theses & Dissertations (2024 - present). 277.
https://scholarsarchive.library.albany.edu/etd/277
Included in
Biostatistics Commons, Educational Methods Commons, Educational Psychology Commons, Longitudinal Data Analysis and Time Series Commons, Statistical Methodology Commons