Date of Award




Document Type


Degree Name

Doctor of Philosophy (PhD)


Department of Epidemiology and Biostatistics



Content Description

1 online resource (xii, 126 pages) : color illustrations.

Dissertation/Thesis Chair

Victoria Lazariu

Committee Members

Valerie Haley, David O Carpenter


Mixture distributions (Probability theory), Hospital utilization, Women, Hospitals, Generalized estimating equations

Subject Categories



In the United States (U.S.), childbirth is the most common reason for hospitalization, and the maternal mortality rate per 100,000 (2017-2018) is markedly elevated in the U.S. (17.4) compared to neighboring Canada (10), the United Kingdom (7), and Japan (5) (Trends in Maternal Mortality, 2000 to 2017: Estimates by WHO, UNICEF, UNFPA, World Bank Group and the United Nations Population Division). These data, the increased focus on addressing severe maternal morbidity and mortality to improve patient outcomes and reduce healthcare costs is well deserved. These women often have a longer delivery length of stay (LOS) and experience complications of varying severity. Patient and hospital characteristics influence LOS, but the right-skewness and heteroskedasticity of the LOS distribution for delivery hospitalizations makes modeling the LOS distribution difficult as assumptions for conventional parametric models are often violated (e.g., normality). This dissertation presents a practical approach with new capabilities for improved modeling of delivery LOS. The longstanding debate regarding the appropriate LOS for delivery includes as evidence the benefits of discharge to the mother’s physical and emotional health. However, early discharge can increase the risk of adverse events. In the U.S., most delivery stays last two or three days; those who remain inpatient for longer represent an important group which may have experienced severe maternal morbidity. Although a longer LOS may not be avoidable for women who have experienced complications, the concomitant costs and certain risks, increase with each day in the hospital. To improve the quality of maternal care, allocation of healthcare resources, and reduce costs, it is important to determine patient and hospital risk factors for extended delivery LOS. However, the challenge of modeling must be overcome to provide meaningful insights regarding predictors of delivery LOS. The strongly skewed distribution of LOS poses problems for modeling and analysis. Various methods and models, such as data transformations, have been examined for describing the LOS distribution, but are not always satisfactory in fitting the entire LOS distribution. Finite mixture models have been shown to be beneficial, as they accommodate the skewed LOS distribution without the need to transform the data or arbitrarily define the longer LOS outliers. These models allow all observations to be used and can identify the proportion of women staying longer. Finite mixture models decompose the LOS distribution into multiple underlying subpopulations. For example, delivery hospitalizations can be comprised of two subpopulations, one group staying shorter and another group staying longer. This allows significant factors to affect LOS differently in each subpopulation. The application of finite mixture models to delivery LOS is enhanced by evaluating the effects of placing predictor variables in the model specifications. Gamma mixture models are advantageous to analyze the LOS. Unlike other approaches, these models are flexible to the characteristics of LOS distribution and allow the overall LOS distribution to be expressed as multiple subpopulations. Random effects are added to the Gamma mixture models to account for the differences in clinical care for each hospital. There are four specifications of finite mixture regression models that differ in the placement of predictor variables and random effects. This new work evaluates the distinct model specifications and their impact on the predictor effects and LOS subpopulations. The methods were demonstrated with New York State hospital discharge data and showed that the models are robust to the placement of predictor variables and random effects. Extending this work further to refine the variable selection process, a new practical variable selection approach for finite mixture models with clustered data is presented. Marginal models solved using generalized estimating equations (GEE) are quasi-likelihood methods that account for clustered data and estimate population average effects. Marginal models have practical benefits over finite mixture models with random effects, such as less mathematical complexity, faster computational times, and an increased chance of model convergence. Variable selection using marginal models is performed to select predictor variables for inclusion in finite mixture regression models. Forward variable selection with two criteria, quasi-likelihood information criterion (QIC), and score information criterion (SIC) are compared. A simulation study is conducted to evaluate the performance of the two criteria. This new approach is demonstrated using Washington State hospital delivery data. The algorithm for variable selection with SIC is expanded to allow for unequal cluster sizes, which is commonly encountered in healthcare settings. The method selected predictor variables that were confirmed in the literature to be associated with LOS. The robustness of the Gamma mixture regression models to the potential measurement error in LOS is evaluated by introducing noise in the distribution. This is important because administrative hospital discharge records, often used to study patient and hospital characteristics related to LOS, can have data quality issues specific to the LOS. To test model robustness, the observed LOS distribution in the Washington State hospital delivery data is altered by introducing levels of noise in the LOS variable while holding all other variables constant. The four model specifications of Gamma mixture regression models are fit to the datasets at each level of noise. The results are compared by level of noise across the different model specifications. Overall, the models predict very similar LOS ranges for each subpopulation when noise is introduced in the LOS distribution. Overall, Gamma mixture regression models are shown to be advantageous compared to other methods in modeling and predicting LOS. A new variable selection approach for finite mixture models reduces computational intensity, allowing this approach to be implemented in practice. The sensitivity analysis shows the robustness in the overall predictions of LOS at various levels of noise. This demonstrates the usefulness of this approach to modeling LOS, yielding consistent and robust results.

Included in

Biostatistics Commons