Bayesian Imputation of Missing Covariates

Abstract

Missing values are a pervasive problem in almost all kinds of studies. In large cohort studies, the type of study most often conducted in the field of epidemiology, missing observations in covariates pose the major challenge. Since measurements are taken in an uncontrolled environment, typically many covariates need to be considered as potential confounders to filter out unwanted influences that environmental factors may have on the estimates of interest. Due to the large number of variables measured and the fact that measurement often relies on participants recalling and reporting detailed information, large proportions of missing data are common in these types of studies.

In light of the above, the research that forms this thesis focuses on the analysis of incomplete cohort study data where missingness is in the covariates.

We describe a fully Bayesian approach to analyse and impute data in this setting and discuss a number of naive and more sophisticated approaches to impute such data using multiple imputation with chained equations (MICE). The fully Bayesian approach is applied to multiple applications from the field of Epidemiology, and is further extended to settings with time-varying covariates, in which additional challenges, such as the functional form of the association between outcome and covariate and potential endogeneity arise.

Moreover, the implementation of the fully Bayesian approach in the R package JointAI is described and illustrated by means of various examples.

Read the html version (created with bookdown)

Bayesian Imputation of Missing Covariates

Abstract

Related

Publications

JointAI: Joint Analysis and Imputation of Incomplete Data in R

Bayesian imputation of time-varying covariates in linear mixed models

Dealing with missing covariates in epidemiologic studies: A comparison between multiple imputation and a full Bayesian approach