ORCID

https://orcid.org/0000-0001-7477-2730

Date of Award

Summer 2025

Language

English

Embargo Period

7-30-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

College/School/Department

Department of Educational and Counseling Psychology

Program

Educational Psychology and Methodology

First Advisor

Mariola Moeyaert

Second Advisor

Recai Yucel

Committee Members

Mariola Moeyaert, Recai Yucel, Kimberly Colvin

Keywords

missing data, multiple imputation, complex survey, highschool longitudinal survey, systematic review

Subject Categories

Biostatistics | Educational Methods | Educational Psychology | Longitudinal Data Analysis and Time Series | Statistical Methodology

Abstract

Missing data are a nearly universal problem in human subjects research, including in education. However, reporting and addressing missing data is an issue, despite guidelines from the APA style guide and the What Works Clearinghouse, as well as guidance from prominent statisticians on the best methods to use. Prior research conducted in 2004 and 2014 found that in the field of education, most studies do not report or address missing data. In addition, no study has looked specifically at how missing data are reported and addressed in complex surveys. The current study has two main objectives: first, to determine if research using complex survey designs reports missing data, addresses it, and uses appropriate methods to handle missing data. Second, to assess the missing data methods used in the High School Longitudinal Study (HSLS) of 2009. These objectives were accomplished in two separate papers: First, a systematic review of missing data in complex survey designs aims to address the first objective. Second, using the HSLS to examine the appropriateness of a two-level mixed effects model to handle missing data. The random intercept and random intercept plus slope models were each fit using five different methods of handling missing data. The five missing data methods were complete case analysis, using the HSLS dataset as is, single imputation, single-level multiple imputation, and multi-level multiple imputation. Each of the imputation models were fit using both a semi-parametric and a parametric model. Additionally, sensitivity analyses of the multi-level multiple imputation models were conducted using the delta adjustment method. Overall, the first study found that most educational surveys using complex data (76%) did not report missing data, while among the 24% that reported missing data, 54% addressed the issue in some manner. However, only two studies used a method that accounted for the clustered nature of the data, highlighting a gap in the application of advanced techniques for handling missing data in complex surveys. Regarding the second study, the imputation models were largely similar. However, the intercept for the complete cases and NCES provided dataset had a higher GPA compared to any of the imputation methods. In addition, the intercept for the single-level multiple imputation models had at least a 10% higher fraction of missing information compared to the multi-level multiple imputation models, suggesting that more sophisticated approaches can better account for missing data in a complex survey design.

License

This work is licensed under the University at Albany Standard Author Agreement.

Share

COinS