Working with Incomplete Data

class: title-slide-custom
count: false

<div style="font-size: 64pt; font-weight: bold; position: absolute; top: 30%;">
Working with Incomplete Data
<div style="font-size: 30pt; font-weight: 700;">
When One-size-fits-all Does Not Fit
</div>
</div>

<div id = "author">
<div style = "font-size: 20pt; font-weight: bold;">Nicole Erler</div>
<div style = "font-size: 16pt;">Erasmus Medical Center</div>
<div style = "font-size: 16pt;">Rotterdam, NL</div>
</div>

<div id="contact">
<i class="fas fa-envelope"></i> n.erler@erasmusmc.nl &emsp;
<a href="https://twitter.com/N_Erler"><i class="fab fa-twitter"></i> N_Erler</a> &emsp;
<a href="https://github.com/NErler"><i class="fab fa-github"></i> NErler</a> &emsp;
<a href="https://nerler.com"><i class="fas fa-globe-americas"></i> https://nerler.com</a>
</div>

---

class: disclosure
count: false

## Disclosures

<div style = "text-align: center; position: absolute; top: 50%; font-size: 30pt;">
Nothing to disclose.
</div>

<div class="my-footer"><span style = "color: white;">
<a href="https://twitter.com/N_Erler"><i class="fab fa-twitter"></i> N_Erler</a>
&emsp;&emsp;&emsp;&emsp;
<a href="https://github.com/NErler"><i class="fab fa-github"></i> NErler</a> &emsp;&emsp;&emsp;&emsp;
<a href = "https://nerler.com"><i class="fas fa-globe-americas"></i> nerler.com</a>
</span></div>

---
layout: true

<link href="fontawesome-free-5.14.0-web/css/all.css" rel="stylesheet">

<div class="my-footer"><span>
<a href="https://twitter.com/N_Erler"><i class="fab fa-twitter"></i> N_Erler</a>
&emsp;&emsp;&emsp;&emsp;
<a href="https://github.com/NErler"><i class="fab fa-github"></i> NErler</a> &emsp;&emsp;&emsp;&emsp;
<a href = "https://nerler.com"><i class="fas fa-globe-americas"></i> nerler.com</a>
</span></div>

---

## In the Beginning...

<h3 style="text-align: right;">...there weren't any missing values.</h3>

???

In the beginning there weren't really enough missing values for it to be
considered a problem.
- - - - -
--

.pull-left[
**In the 1960s/70s:**<br>
Development of multiple imputation
]

???
But by the 1960s there was so much data missing in the US census that something
had to be done about it. And, as a result, in the 1970s Donald Rubin came up
with the idea of multiple imputation.

- - - - -

--
.pull-right[
**Also in the 1960s/70s:**<br>
<img src = "graphics/computer.png" width = 320 style = "position: fixed; right: 150px; bottom: 60px;">
]

???

And, what is important to keep in mind is that during that period, analysing
data must have been quite a bit different from what it is today.

Not every researcher had a computer, and when they did, computers had very 
basic statistical software, and no functionality to handle missing values,
and analysts were not trained in missing data methodology.
- - - - -
--

<br>

.turqdkbox-50[
&#8680; fix the missing data problem once (centrally)<br>
&#8680; supply complete data to many analysts
]

???

And so a very central point of the solution to the missing data problem back
then was that the data had to be analysed by researchers who only had very basic
statistical tools at their disposal.

---

## Multiple Imputation

* **uncertainty** about the missing value
???

The important issue in imputing missing values is that there is **uncertainty**
about what the value would have been. And so we **can't just pick** one value
and fill it in, because then we would just ignore this uncertainty.
- - - -
--

* some values **more likely** than others
* relationship with **other** available **data**

???
Also: some values are going to be more likely than others, and usually there is
a relationship between the variable that has missing values and the other data
that we have collected.
- - - -

.pull-left[
**&#8680; missing values have a distribution**

]

???
So, in statistical terms, we can say that missing values have a distribution,
and that we need a model to learn how the incomplete variable is related to the
other data.
- - - -

.pull-right[

<br>

.turqdkbox[
<span style="font-weight: bold;">Predictive distribution</span>
of the missing values given the observed values.
`$$p(x_{mis}\mid\text{everything else})$$`
]
]

???
And this means that we can impute the missing values by sampling from this
distribution conditional on the other data.

---

## A Simple Example

.gr-left[

* `$\mathbf y$`: **response**
* `$\color{var(--turq)}{\mathbf x_1}$`: **incomplete** covariate
* `$\mathbf x_2$`, `$\mathbf x_3$`: **complete** covariates

]

.gr-right[
**Predictive distribution:**
`$$p(\color{var(--turq)}{\mathbf x_1} \mid \mathbf y, \mathbf x_2, \mathbf x_3,
\boldsymbol\beta, \sigma)$$`

<br>

{{content}}
]

???
Let's look at a simple example. Imagine, we have the following dataset, where we
have a completely observed response variable `$y$`, a variable `$x_1$` that is
missing for patient `$i$`, and two other covariates that are completely observed.

And so the the predictive distribution that we need to sample the imputed value
from, would be the distribution of `$x_1$`, given the response `$y$`, the other
covariates, and some parameters.
- - -

--
For example:
* Fit a model to the cases with observed
`$\color{var(--turq)}{\mathbf x_1}$`:
`$$\color{var(--turq)}{\mathbf x_1} = \beta_0 + \beta_1 \mathbf y +
\beta_2 \mathbf x_2 + \beta_3 \mathbf x_3 + \boldsymbol\varepsilon$$`
{{content}}

???
For example, we could think of this as fitting a regression model with `$x_1$` as
the dependent variable, and `$y$` & the other covariates as independent variables.

We can then fit this model to all those cases for which we have `$x_1$` observed,...
- - - -

* Estimate parameters `$\boldsymbol{\hat\beta}, \hat\sigma$`<br>
  &#8680; define distribution 
  `$p(\color{var(--turq)}{x_{i1}} \mid y_i, x_{i2}, x_{i3}, \boldsymbol{\hat\beta}, \hat\sigma)$`

???
... in order to estimate the parameters, and to learn how the
distribution of `$x_1$` conditional on the other data looks like.

And then we can use this information to specify the predictive distribution for
the cases with missing `$x_1$` and sample imputed values from this distribution.

---

## Multiple Imputation

???

The idea behind multiple imputation is that, using this principle,
we sample imputed values and fill them into the original, incomplete data to 
create a completed dataset.

And in order to take into account the uncertainty that we have about the missing
values, we do this multiple times, so that we obtain multiple completed datasets.

Because all the missing values have now been filled in, we can analyse each of
these datasets separately with standard statistical techniques.

To obtain overall results, the results from each of these analyses need to be
combined in a way that takes into account both the uncertainty that we have
about the estimates from each analysis, and the variation between these estimates.

---

## In Practice

.three-cols[

<div style = "text-align: center; margin-bottom: 25px;">
<strong>Multivariate<br>Missingness</strong></div>

<table class="simpletable">
<tr>
<th></th>
<th>$\mathbf y$</th>
<th>$\mathbf x_1$</th>
<th>$\mathbf x_2$</th>
<th>$\mathbf x_3$</th>
<th>$\ldots$</th>
</tr>
<tr><td></td><td colspan = "5"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td></td>
</tr>
</table>

]

???
In practice, we usually have missing values in multiple variables.

.three-cols60[

**Most common approach:**<br>
<span style = "color: var(--turqdk); font-weight: bold;">MICE</span> 
  <span style = "color: var(--lgrey);">(multivariate imputation by chained equations)</span><br>
  <span style = "color: var(--turqdk); font-weight: bold;">FCS</span> 
  <span style = "color: var(--lgrey);">(fully conditional specification)</span>

<div>
\begin{alignat}{10}
\color{var(--turq)}{\mathbf x_1} &= \beta_0 &+& \beta_1 \mathbf y &+&
\beta_2 \color{var(--turq)}{\mathbf x_2} &+& \beta_3 \color{var(--turq)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\
\color{var(--turq)}{\mathbf x_2} &= \alpha_0 &+& \alpha_1 \mathbf y &+&
\alpha_2 \color{var(--turq)}{\mathbf x_1} &+& \alpha_3 \color{var(--turq)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\
\color{var(--turq)}{\mathbf x_3} &= \theta_0 &+& \theta_1 \mathbf y &+&
\theta_2 \color{var(--turq)}{\mathbf x_1} &+& \theta_3 \color{var(--turq)}{\mathbf x_2} &+& \ldots &+& \boldsymbol\varepsilon
\end{alignat} 
</div>

{{content}}
]

???

And the most common approach to imputation in this setting is MICE, short for
**multivariate imputation by chained equations**, an approach that is also
called **fully conditional specification**.

The principle is an extension to what we've seen on the previous slides.
We impute missing values using models that have all other data in their linear
predictor.
- - -

--
<br>

* iterative
{{content}}

???
Because in these imputation models we now have incomplete covariates, we use an
iterative algorithm. We start by randomly drawing starting values from the 
observed part of the data, and then we cycle through the
incomplete variables and impute one at a time.
- - - - - -

--
* flexible model types
???
The models for the different variables can be specified according to the type
of variable.

Once we have imputed each missing value, we start again with the first
variable, but now use the imputed values of the other variables instead of the
starting values, and we do this a few times until the algorithm has converged.

---

## One-Size-Fits-All?

In <i class="fab fa-r-project" style = "color: var(--blue);"></i>:

```r
mice::mice(mydata)
```

???
The MICE algorithm is available in most statistical programms.
In R, it is part of the package called **mice**.

And using this package, we could perform multiple imputation using just a single
line of code.
--

<br>

Imputation strategy independent of
* **type** of variables
* **size** of the data
* **analysis model** of interest

???
So it seems that MICE is an imputation strategy that works for 
* any **type of variable**: continuous, categorical, skewed, ..., because we
  can choose a different type of model for each incomplete variable
* and it worksfor large or small datasets, both with respect to the number of variables 
  and the number of observations in the data,
* and is completely independent from the analysis that we want to perform.

- - - - 
--

<div style = "width: 60%;">
.turqdkbox[
&#8680; MICE / FCS "works" in all settings!?
]
</div>

???
So it seems like MICE just works in all settings. One single approach that fits
all of our missing data problems.

**But does it really?**

---

## A Simple Example
.pull-left[
**Implied Assumption:**<br>
<span>Linear association</span>
between `$\color{var(--turq)}{\mathbf x_1}$` and `$\mathbf y$`:

`$$\color{var(--turq)}{\mathbf x_1} = 
\beta_0 + \bbox[#E5E5E5, 2pt]{\beta_1 \mathbf y} +
\beta_2 \mathbf x_2 + \beta_3 \mathbf x_3 + \boldsymbol\varepsilon$$`

]

???

Let's go back to our simple example with missing values in just one covariate
`$x_1$`.

An assumption that we implicitly made during the imputation was that there is 
a linear association between the incompl. covariate and the outcome.

- - -

.pull-right[
<br>
But what if 
`$$\mathbf y = \theta_0 + 
\bbox[#E5E5E5, 2pt]{\theta_1 \color{var(--turq)}{\mathbf x_1} +
\theta_2 \color{var(--turq)}{\mathbf x_1}^2} +
\theta_3 \mathbf x_2 + \theta_4 \mathbf x_3 + \boldsymbol\varepsilon$$`

]

???

But what if we have a setting where the true association is non-linear, for 
example, quadratic?
In that case our analysis model for the response `$y$` would also include
the quadratic term `$x_1^2$`.

---

## Non-linear Associations

.pull-left[
* <span style="font-weight: bold; color:var(--blue);">true association</span>: non-linear
* <span style="font-weight: bold; color:var(--turq);">imputation assumption</span>: linear
]

.pull-right[
<span style="font-size: 56pt; position: relative; right: 110px; bottom: 20px; color: transparent;">
}&#8680;
</span>
<span style = "color: transparent; font-size: 1.2rem; font-weight: bold; position: relative; bottom: 30px; right: 100px;">
bias!
</span>
]

???

What happens when we have data with a non-linear association, but wrongly assume
a linear association during imputation is that the imputed values will distort
the true association between the incomplete variable and the response.

---
count: false
class: animated, fadeIn

## Non-linear Associations

.pull-left[
* <span style="font-weight: bold; color:var(--blue);">true association</span>: non-linear
* <span style="font-weight: bold; color:var(--turq);">imputation assumption</span>: linear
]

.pull-right[
<span style="font-size: 56pt; position: relative; right: 110px; bottom: 20px;">} &#8680;</span>
<span style = "color: var(--pink); font-size: 1.2rem; font-weight: bold; position: relative; bottom: 30px; right: 100px;">
bias!</span>
]

???

And this will introduce bias, even if we analyse the imputed data with the
correct model.

---

## Time-to-Event Outcomes

<br>

**Proportional Hazards Model:**
`$$h_i(t) = h_0(t) \exp(\color{var(--turq)}{x_i} \beta_x + \mathbf z_i^\top \boldsymbol \beta_z)$$`

.pull-left[
* `$\color{var(--turq)}{x_i}$`: incomplete covariate
* `$\mathbf z_i$`: vector of other covariates
]
.pull-right[

<div style = "color: var(--lgrey);">
<ul>
<li>$h(t)$: hazard function</li>
<li>$h_0(t)$: baseline hazard</li>
<li>$\mathbf T$: observed event / censoring time</li>
<li>$\boldsymbol\delta$: event indicator</li>
</ul>
</div>
]

???

Another setting that we encounter in many applications is that we have a 
time-to-event outcome, and we want to model this outcome using
a proportional hazards model such as the Cox model.

To simplify the notation a bit I assume here that we have
* one incomplete covariate `$x$`
* and some completely observed covariates `$z$`.

For the rest we use the standard notation.

The proportional hazards model is written with the hazard as the 
response, but to see the implication for imputation it is more convenient
to look at the log likelihood.

---

## Time-to-Event Outcomes

**Log-likelihood**
`$$p(\mathbf T, \boldsymbol \delta \mid \color{var(--turq)}{\mathbf x}, \mathbf z, \boldsymbol\beta) =
\boldsymbol\delta (\log h_0(T) + \color{var(--turq)}{\mathbf x} \beta_x + \mathbf z \boldsymbol\beta_z) - 
\int_0^T h_0(s)\exp(
\color{var(--turq)}{\mathbf x} \beta_x + \mathbf z \boldsymbol\beta_z)ds$$`

???
And what we can see here is that the response, the observed event or censoring time `$T$`
and the event indicator,
has a non-linear association with the incomplete variable `$x$`.

<br>

* Proportional hazards models imply **non-linear** associations
???
So, proportional hazards models imply a non-linear association, ...
- - - 
--

* Imputation with a model 
  `$$\color{var(--turq)}{\mathbf x} = \theta_0 + \theta_1 \mathbf T + \theta_2 \boldsymbol\delta + \theta_3 \mathbf z_1 + \ldots$$`
  is <span style = "color: var(--pink); font-weight: bold;">wrong!</span>

???
But the imputation model that we might naively use for the incomplete variable `$x$` 
would assume that `$x$` has a linear association with event time and indicator,
and we would get biased results when we impute our data this way.

---

## Multi-level Data

.gr-left2[
<img src="figures/trajectories_allb.png", height = 420, style = "margin: auto; display: block;">
]

.gr-right2[

<table class="simpletable">
<tr>
<th></th>
<th>$\mathbf y$</th>
<th>$\mathbf x_1$</th>
<th>$\mathbf x_2$</th>
<th>$\mathbf x_3$</th>
</tr>
<tr><td></td><td colspan = "4"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class="hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
</tr>
</table>
]

???

Another setting in which specification of the predictive distribution for
a incomplete variable is not straightforward: **multi-level setting.**

For example:

* a response variable `$y$`: measured repeatedly over time in the same patient
* in a multi-center study, where we need to take into account that patients from
  the same hospital are more similar to each other than patients from different
  hospitals

&#8680; data in long format<br>
(multiple rows with information on the same patient "i")

In this format:<br>
it does not matter if we have unbalanced data 
(different number of measurements, taken at different time points)

---

## Multi-level Data

.gr-left[
<table class="simpletable">
<tr>
<th></th>
<th>$\mathbf y$</th>
<th>$\mathbf x_1$</th>
<th>$\mathbf x_2$</th>
<th>$\mathbf x_3$</th>
</tr>
<tr><td></td><td colspan = "4"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class="hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
</tr>
</table>
]

.gr-right[

**(Linear) Mixed Model**
`$$y_{ij} = \underset{\text{fixed effects}}{\underbrace{\mathbf x_{ij}^\top\boldsymbol\beta}} + 
\underset{\text{random effects}}{\underbrace{\mathbf z_{ij}^\top\mathbf b_i}} +
\varepsilon_{ij}$$`

<br>

* **level-1** variables:<br>repeatedly measured / time-varying
* **level-2** variables:<br>baseline / patient specific / time-constant

]

???

For analysis: &#8680; typically use a mixed model

* takes into account that the repeated measurements for a patient are not
  independent by extending the standard linear regression model with random
  effects terms

Our data can be related to different levels of the data hierarchy.
In a longitudinal study, for example, we would have
* level-1 variables, which are the repeatedly measured values or time-varying variables
* and level-2 variables, which are for example patient characteristics that are time-constant

---

## Imputation in Multi-level Data

.gr-left2[
If `$\color{var(--turq)}{\mathbf x_1}$` is a **level-1** variable:

`$$\color{var(--turq}{x_{1ij}} = 
\underset{\color{var(--lgrey)}{\text{fixed effects}}}{\color{var(--lgrey)}{\underbrace{\color{#000000}{\theta_0 + \theta_1 y_{ij} + \theta_2 x_{2ij} + \theta_3 x_{3ij}}}}} + 
\underset{\color{var(--lgrey)}{\substack{\text{random}\\\text{effects}}}}{\color{var(--lgrey)}{\underbrace{\color{#000000}{\mathbf u_i \mathbf z_i(t)}}}} + \varepsilon_{ij}$$`
]

.gr-right2[
<table class="simpletable">
<tr>
<th></th>
<th>$\mathbf y$</th>
<th>$\mathbf x_1$</th>
<th>$\mathbf x_2$</th>
<th>$\mathbf x_3$</th>
</tr>
<tr><td></td><td colspan = "4"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class="hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
</tr>
</table>
]

???

If the incomplete variable `$x_1$` was a level-1 variable (i.e. time-varying)
we could use a mixed model as the imputation model.
- - -

--
.raisebox150[

But if `$\color{var(--turq)}{\mathbf x_1}$` is a **level-2** variable?

]

???
Things get interesting when we have missing values in a baseline covariate:<br>
because the repeated values of a level-2 variable should not just be correlated,
but **identical at different time points**.

* The above would result in different `$\color{var(--turq)}{\mathbf x_1}$` over
time.
* So would a standard GLM (applied to long-format data).

???

If we impute a level-2 variable with the mixed model shown here, the imputed
values at different time points  would not be identical.

And when we use a standard GLM, like we would usually do for a time-constant
variable, would then treat the rows that belong to the same patient as
independent and again get values for `$x_1$` that vary between those rows.
- - - -
--

** <span style="font-size:1.5rem;">&#8680;</span> Imputation in wide format?**

???
It seems that we have a problem to correctly impute level-2 variables when our
data is in long format.

So the question is, can we transform our dataset to wide format, so that we only
have one row per patient, and have data from different time points as separate
variables.

---

## Imputation in Wide Format

???

When we look at this example, it becomes quite clear that for very unbalanced
data it is not possible to convert our data to wide format.

---
count: false
class: animated, fadeIn

## Imputation in Wide Format
<img src = "figures/p0_wide_grid.png", height = 450, style = "margin: auto; display: block;">

???
In the wide format we would have to put our data into a grid. For example, by
creating time intervals. But with unbalanced data, patients have multiple
measurements in some intervals, and no measurement in others.

---

## Imputation in Wide Format

???

Or we could think about making a variable for the first observations, the second,
and so on. But you can see here, where I have highlighted the first and 
second observations for each patient, that the observations are at very different
time points so that they will probably have to be interpreted differently.

- - - -

For this reason, longitudinal variables are sometimes excluded from the imputation,
or very simple summaries are used, like taking the first value or the mean over the
repeated values.

But when we do not fully include the longitudinal response and
other longitudinal variables into the imputation model, we lose important 
information and could introduce considerable bias.

---

## Imputation of Missing Covariates

Specifying the **correct imputation** model 
`$$p(\color{var(--turq)}{\mathbf x_{mis}} \mid \text{everything else})$$`
directly is **not straightforward** for

* GLMs with **non-linear associations**
* **time-to-event** outcomes
* **multi-level** settings

???

So, in summary, we have seen that specifying the correct imputation model for
the incomplete variables is not always straightforward, and in some settings 
even not possible, specifically in settings with non-linear associations, 
when we have a time-to-event outcome or in multi-level settings.
- - - 
--

<br>

.nord0box[
**<span style="font-size: 1.5rem;">&#8680;</span>
We need another approach in these settings.**
]

???

But the "classic" FCS / MICE approach does require us to specify the imputation
models directly, and so in these settings we do need alternative approaches.

---

## Joint Model Multiple Imputation

**Idea:**<br>
Approximate `$p(\color{var(--turq)}{\mathbf x_{mis}} \mid \text{everything else})$`
with a known multivariate distribution.<br>
<span style = "color:var(--lgrey);">(usually multivariate normal)</span>

???

One such alternative approach is joint model multiple imputation. This is not
a new approach, actually, it was the approach suggested when multiple
imputation was first developed.

When there are missing values in multiple variables and the variables are of 
different type, for example continuous and binary, then the joint distribution
does not have a closed form which makes it difficult to work with.

The idea of joint model MI is to approximate this multivariate distribution with
a known distribution that is easy to work with. And in practice this is often
the multivariate normal distribution.

- - - - 
--

<br>

&#8680; each variable is assumed to be (latent) normally distributed

???
This means that we assume for each incomplete variable, that it is normally
distributed, or has a latent normal distribution. 
So, even when a variable has a skewed distribution, we treat it as if it had
a normal distribution.

And for categorical variables we assume that there is an underlying, normally
distributed variable, and when that underlying value is less than a certain
cut-off, we observe a particular category, and when it is above the cut-off we
observe the next category.

---

## Joint Model Multiple Imputation

.gr-left[

<table class="simpletable">
<tr>
<th></th>
<th>$\mathbf y$</th>
<th style = "color: var(--turq);">$\mathbf x_1$</th>
<th style = "color: var(--turq);">$\mathbf x_2$</th>
<th style = "color: var(--turq);">$\mathbf x_3$</th>
<th style = "color: var(--turqdk);" colspan="3">$\mathbf X_{obs}$</th>
</tr>
<tr><td></td><td colspan = "7"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);">$\ldots$</td>
</tr>
<tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);">$\ldots$</td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);">$\ldots$</td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);">$\ldots$</td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);">$\ldots$</td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--turq);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);"><i class = "fas fa-check"</i></td>
<td style = "color: var(--turqdk);">$\ldots$</td>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td style = "color: var(--turqdk);">$\vdots$</td>
<td style = "color: var(--turqdk);">$\vdots$</td>
<td style = "color: var(--turqdk);">$\vdots$</td>
</tr>
</table>

]

???

It is possible to use Joint Model MI also in simpler settings, but I will 
focus here on the multi-level setting.

Say we have the following data situation, where we have missing values in 
3 variables, `$x_1$` is time-varying, `$x_2$` and `$x_3$` are baseline covariates,
and we have a bunch of other variables that are completely observed.

- - - -

.gr-right[

`\begin{align*}
\boldsymbol y &= \color{var(--turqdk}{\mathbf X_{obs}^\top} \boldsymbol\theta_y + 
                  \mathbf b_y \mathbf Z_y + \boldsymbol\varepsilon_y\\
\color{var(--turq)}{x_1} &= \color{var(--turqdk}{\mathbf X_{obs}^\top} \boldsymbol\theta_1 +
\mathbf b_1 \mathbf Z_1 + \boldsymbol\varepsilon_1\\
\color{var(--turq)}{x_2} &= \color{var(--turqdk}{\mathbf X_{obs}^\top} \boldsymbol\theta_2 +
\boldsymbol\varepsilon_2\\
\color{var(--turq)}{x_3} &= \color{var(--turqdk}{\mathbf X_{obs}^\top} \boldsymbol\theta_3 +
\boldsymbol\varepsilon_3
\end{align*}`

]

???

In Joint Model MI we would then specify a linear mixed model for the longitudinal
response `$y$` and the longitudinal incomplete variable `$x_1$`, and standard linear
regression models for the other two incomplete variables.

In these models, we only use the completely observed variables as covariates.

---
count: false

## Joint Model Multiple Imputation
.gr-left[

]

.gr-right[

<br>

$$
`\begin{pmatrix}
\color{var(--blue)}{\mathbf b_y}\\
\color{var(--blue)}{\mathbf b_1}\\
\color{var(--blue)}{\mathbf \varepsilon_2}\\
\color{var(--blue)}{\mathbf \varepsilon_3}
\end{pmatrix}`
\sim N(\mathbf 0, \mathbf V)
\qquad
`\begin{pmatrix}
\color{var(--blue)}{\mathbf \varepsilon_y}\\
\color{var(--blue)}{\mathbf \varepsilon_1}
\end{pmatrix}`
\sim N(\mathbf 0, \mathbf W)
$$

]

???

And to connect these models, to take into account that the imputed values
are not independent from each other and from the response `$y$`, the random
effects `$b$` and error terms `$\varepsilon$` are modelled together using
multivariate normal distributions.

One drawback, however, is that this connection between the models implies
linear associations, so this approach is not appropriate if we had 
non-linear associations between incomplete covariates and the response.

---

## Bayesian Analysis of Incomplete Data

**Imputation:**
`$$p(\color{var(--turq)}{\mathbf x_{mis}} \mid \text{everything else})$$`

???

Another alternative to MICE is to perform a fully Bayesian analysis.

The idea that missing values have a distribution, which I talked about in the 
beginning of this presentation, is essentially a Bayesian idea.
And so it makes sense to think about analysing incomplete data in the
Bayesian framework.

When MI was developed 50 years ago the theoretical knowledge to use Bayesian
methods for missing data was there, but because Bayesian models are often 
computationally intensive, they were just not feasible at the time.
- - - 
--

**Bayesian Analysis** (of complete data):
`$${\scriptsize\phantom{\text{(posterior distribution)}\qquad}}
p(\boldsymbol\beta, \sigma \mid \text{data}) \qquad
\scriptsize\color{grey}{\text{(posterior distribution)}}$$`

???

When we perform a sBayesian analysis we determine the posterior distribution of
the unknown parameters, given the data.

<br>

**&#8680; simultaneous analysis and imputation**
`$$p(\color{var(--turq)}{\mathbf x_{mis}}, \boldsymbol\beta, \sigma \mid \text{observed data})$$`
???

And so, in a setting where we have incomplete data, in Bayesian analysis we
can combine the estimation of the parameters with the imputation of the
missing values.

---

## Bayesian Analysis of Incomplete Data

**Bayes Theorem:** 
`$$\underset{\text{posterior}}{\underbrace{p(\boldsymbol\beta, \sigma \mid \text{data})}}\propto \underset{\substack{\text{likelihood}\\\text{(analysis model)}}}{\underbrace{p(\text{data}\mid \boldsymbol\beta, \sigma)}}\;\;\underset{\text{prior}}{\underbrace{p(\boldsymbol\beta, \sigma)}}$$`

???

To obtain the posterior distribution, Bayes theorem is applied,
which that tells us that the posterior is proportional to the
product of likelihood of the data given the parameters, and our prior assumption
about the parameters.
- - - 
--

In the case of missing covariate values, this formulation now slightly changes.
We are interested in the posterior distribution of the parameters **AND** the
missing values, conditional on the observed data. The observed data consists
of the response variable `$y$` and the completely observed covariates.

---
count: false
class: animated, fadeIn

## Bayesian Analysis of Incomplete Data

**For missing covariates:**
`$$p(\color{var(--turq)}{\mathbf x_{mis}}, \boldsymbol\beta, \sigma \mid \underset{\color{var(--lgrey)}{\mathbf y, \mathbf x_{obs}}}{\underbrace{\text{observed data}}})\propto p(\text{observed data}\mid \color{var(--turq)}{\mathbf x_{mis}}, \boldsymbol\beta, \sigma)\;\;\underset{\underset{\substack{\text{imputation}\\\text{part}}}{{p(\color{var(--turq)}{\mathbf x_{mis}} \mid \boldsymbol\beta, \sigma)}}\;\underset{\text{prior}}{{p(\boldsymbol\beta, \sigma)}}}{\underbrace{p(\color{var(--turq)}{\mathbf x_{mis}}, \boldsymbol\beta, \sigma)}}$$` 
???
The last term here, the joint distribution of the missing values and parameters
can be split up into the distribution of the missing values given parameters
and the prior distribution of the parameters.

---

## Bayesian Analysis of Incomplete Data

<div class = "container">
<div class = "box">
<div class = "box-row">
<div class = "box-cell" style = "background: var(--turqdk);color: white;">posterior<br>distribution</div>&nbsp;
<div class = "box-cell">$\propto$</div> 
<div class = "box-cell" style = "background: var(--turqdk); color: white;">analysis<br>model</div>&nbsp;
<div class = "box-cell" style = "background: var(--turqdk); color: white; border: solid 4px var(--turq);">covariate<br>models</div>&nbsp;
<div class = "box-cell" style = "background: var(--turqdk); color: white;">priors</div>
</div>
</div>
</div>

???

This means that in the setting with incomplete covariates, in order to obtain the 
posterior distribution, we need to specify the analysis model, a model for the
covariates, and prior distributions for all parameters.

Compared to a Bayesian analysis of complete data, we have to additionally 
specify models for the incomplete covariates.
- - - 
--

<br>

&#8680;  Numeric estimation via MCMC sampling

???
In most cases, the posterior distribution will not have a closed form and so 
we won't be able to derive it analytically.

Instead, Markov Chain Monte Carlo methods are used to create a sample from the
posterior distribution. The results from the Bayesian analysis are then
presented as summary measures of this sample, usually the mean and the 2.5% and
97.5% quantiles, which form the 95% credible interval.

Since, in the Bayesian framework, the result is given in terms of the
probability distribution of the unknown parameters conditional on the data that
was observed, these results have a more intuitive interpretation than
frequentist results.

<br>

**For example:**
$$\text{Survival:}\qquad \text{posterior} \propto p(\mathbf T, \boldsymbol\delta \mid \mathbf x, \boldsymbol\theta)\;  p(\mathbf x\mid \boldsymbol\theta)\; p(\boldsymbol\theta) $$ 
???

As an example of how the Bayesian model formulation looks like, I show the
model structure, the elements that have to be specified, for a proportional
hazards model with incomplete covariates.

---

## Bayesian Proportional Hazards Model
**Analysis model**
`$$p(\mathbf T, \boldsymbol\delta \mid \mathbf x, \boldsymbol\theta) = h(\mathbf T \mid \mathbf x, \boldsymbol\theta)^{\boldsymbol\delta} \exp\left\{-\int_0^{\mathbf T}h(s\mid \mathbf x, \boldsymbol\theta) ds\right\}$$`

???

The analysis model has the known formulation of the likelihood of a 
proportional hazards model, where the hazard consists of a population baseline hazard and a linear predictor of covariates.
But contrary to the classic Cox model we cannot leave the baseline hazard
unspecified. Typically we would model it flexibly, for example using splines.

<br>

**Covariate models:**
<span style = "color: var(--lgrey); float:right;">
for `$\mathbf x = (\mathbf x_1, \mathbf x_2, \mathbf x_3, \mathbf x_{obs})$`
</span>

`\begin{align*}
p(\color{var(--turq)}{\mathbf x_1},
    \color{var(--turq)}{\mathbf x_2},
    \color{var(--turq)}{\mathbf x_3},
    \mathbf x_{obs} \mid \boldsymbol\theta) = &  p(\color{var(--turq)}{\mathbf x_1} \mid\color{var(--turq)}{\mathbf x_2}, \color{var(--turq)}{\mathbf x_3}, \mathbf x_{obs}, \boldsymbol\theta) & \color{var(--lgrey)}{\text{e.g., normal}}\\
    & p(\color{var(--turq)}{\mathbf x_2} \mid \color{var(--turq)}{ \mathbf x_3}, \mathbf x_{obs}, \boldsymbol\theta)& \color{var(--lgrey)}{\text{e.g., binomial}}\\
    & p(\color{var(--turq)}{\mathbf x_3} \mid \mathbf x_{obs}, \boldsymbol\theta) &\\
    & \color{var(--lgrey)}{p(\mathbf x_{obs}\mid\boldsymbol\theta)} &
    \scriptsize \color{var(--lgrey)}{\text{(can be omitted)}}
\end{align*}`

???

If we had 3 incomplete covariates and a bunch of complete covariates 
in this model, the covariate model part would look like this.

Probability theory tells us that we can split the joint distribution for the 
incomplete covariates into a a sequence of univariate conditional distributions.
This allows us to choose a different type of model per variable.

And because the response is not part of this specification, it is no problem
to use this in settings where we have complex outcomes.
In MICE, we had to specify full conditional models for the incomplete variables,
and this requires us to explicitly include the response into the linear
predictor of the incomplete covariates.

Here, we do not directly specify the imputation model but the joint distribution
of the covariates, and this does not involve the response.
Because the analysis model is included in our specification of the posterior distribution, and the imputed values are sampled from that posterior, but not
from the models shown here on the slide, the response is taken into account in
the imputation. And this is what makes this approach so well suited for settings
with complex outcomes, that we could not easily include into the linear 
predictor of the models for the incomplete covariates.

---

## <i class="fab fa-r-project" style = "color: var(--blue);"></i> package JointAI

```syntax
library("JointAI")

mod <- coxph_imp(Surv(time, event) ~ x1 + x2 + x3 + x4 + x5,
                 data = mydata,  n.iter = 1000)

```

???

The R package JointAI makes the use of this Bayesian approach feasible also
for researchers with limited experience in Bayesian methods.

The specification of the models is straightforward and very similar to how 
models are specified other R packages.
Here, as an example, syntax for the proportional hazards model with five covariates. JointAI will automatically detect which of these variables are 
incomplete and specify the covariate model part, and the prior distributions.
The user has additional options, which are not shown here, to specify the types
of models used for the incomplete covariates, and to change the hyperparameters
in the prior distributions.

JointAI requires JAGS to be installed. JAGS is short for just another Gibbs 
sampler, and is a freely available software that performs Markov Chain Monte
Carlo sampling with the help of the Gibbs sampler.

<br>

.pull-left[
**Also possible**
* time-varying covariates
* frailties/ recurrent events
* joint model for longitudinal & survival data
* ...
]

.pull-right[
**Other model types:**
* GLM
* generalized linear mixed model
* ordinal/multinomial (mixed) model
* beta (mixed) model
* ...
]

???
But you are not restricted to simple proportional hazards models, but several
extensions are possible.

Full documentation at
[**https://nerler.github.io/JointAI/**](https://nerler.github.io/JointAI/)

---

## In Comparison

<table class = "ppt" style = "margin-top: -20px;">
<tr>
<th style = "text-align: center;"></th>
<th style = "width: 25%;">MICE</th>
<th style = "width: 30%;">Joint Model MI</th>
<th>Bayesian Analysis</th>
</tr>
<tr>
<td></td>
<td colspan = "2" style = "text-align: center;">
 <span style = "color: var(--turqdk); font-weight: bold;">separate</span>
 imputation & analysis</td>
<td><span style = "color: var(--turqdk); font-weight: bold;">simultaneous</span>
analysis & imputation</td>
</tr>
<tr>
<td></td>
<td colspan = "2" style = "text-align: center;"><span style = "color: var(--turqdk); font-weight: bold;">direct</span> specification of imputation model</td>
<td><span style = "color: var(--turqdk); font-weight: bold;">indirect</span>
specification of imputation model</td>
</tr>
<tr style = "padding-top: 0px;">
<td style = "text-align: center;"><i class="fas fa-thumbs-up fa-2x" style = "color: var(--turqdk)"></i></td>
<td><ul>
<li>simple settings</li>
</ul></td>
<td><ul>
<li>simple settings</li>
<li>multi-level data</li>
</ul></td>
<td style = "padding: 0px 5px;"><ul>
<li>non-linear associations</li>
<li>time-to-event outcomes</li>
<li>multi-level data</li>
<li>more complex analyses</li></ul></td>
</tr>
<tr>
<td style = "text-align: center;"><i class="fas fa-thumbs-down fa-2x" style = "color: var(--turqdk)"></i></td>
<td style = "padding: 0px 5px;">
<ul>
<li>non-linear associations</li>
<li>complex outcomes / data structure</li>
</ul></td>
<td style = "padding: 0px 5px;"><ul>
<li>non-linear associations</li>
<li>many incomplete variables / complex random effects</li>
</ul>
</td>
<td><ul>
<li> very large datasets</li>
</ul></td>
</tr>
<tr style = "text-align: center;">
<td><i class="fab fa-r-project fa-2x" style = "color: var(--turqdk);"></i></td>
<td><span style = "color: var(--turqdk); font-weight: bold;">mice</span></td>
<td><span style = "color: var(--turqdk); font-weight: bold;">jomo</span></td>
<td><span style = "color: var(--turqdk); font-weight: bold;">JointAI</span></td>
</tr>
</table>

???

MICE and Joint Model imputation are two different options to perform the
imputation step in a multiple imputation procedure. This means, that in both
cases the imputation is completely separate from the analysis. This separation
can be convenient, when the same incomplete data is used in multiple analyses,
but it is also introduces the risk of having imputation models that are not
compatible with the analysis, as for example when there is a non-linear
association in the analysis model.

In the Bayesian approach, analysis and imputation are combined. This combination
assures that the imputation and analysis models do not contradict each other. It
is, however, possible to extract the imputed values sampled in the Bayesian
approach so that this method could also serve as the imputation step in a
multiple imputation.

In MICE and the Joint model imputation we specify the imputation models
directly. This is what makes these approaches difficult to use when the
incomplete variables do not just have simple linear associations with the other
variables.

In the Bayesian approach we specify the likelihood for the data, but the imputed
values are sampled from the posterior distribution that is derived from the
likelihood and the prior. With all the advanced sampling techniques available
nowadays, this also works when the posterior does not have a closed form, and so
this approach is well suited for settings with complex associations., while MICE
is better suited for simpler settings.

Joint model imputation can handle multi-level settings, but assumes linear
associations between all sub-models.

Because the Bayesian approach is more computationally intensive, it may be less
well suited for very large datasets.

All three approaches are available in R, as the R packages mice, jomo and
JointAI.

---

## Extensions

<i class="fab fa-r-project" style = "color: var(--blue);"></i> package **smcfcs** <span style = "color:var(--lgrey)">(substantive model compatible fully conditional specification)</span>
* hybrid MICE / Bayes
  * time-to-event outcomes
  * non-linear associations

???

There are of course some extensions to the basic methodology that I have
presented so far. One package that is specifically relevant in this context is
the package smcfcs, short for **substantive model compatible fully conditional
specification**.

It uses a hybrid approach between mice and the Bayesian approach, to ensure
valid imputations in settings with time-to-event outcomes and non-linear
associations.

<i class="fab fa-r-project" style = "color: var(--blue);"></i> package **jomo**
* hybrid Joint model MI / Bayes
  * time-to-event outcomes
  * non-linear association
  
???

And also the package jomo combines the classic joint model imputation with the
Bayesian approach, in order to assure imputations that are compatible with an
analysis model, and to impute missing covariates in survival models.

See also: 
<a href = "https://CRAN.R-project.org/view=MissingData">
<strong>https://cran.r-project.org/web/views/MissingData.html</strong></a>

???

There are a lot more packages that either extend the mice package or implement
imputation in some form available. A good place to get an overview of what is
available is the CRAN task view on Missing Data.

---

## Finding the Right Fit!

???

One-size-fits-all can sometimes be appropriate, but mostly for scarfs and things
like that. In statistical analysis, one-size-fits-all just does not work.

Every model or method makes assumptions. And when we just blindly apply a method
because it is the standard that everyone uses, we will likely violate some of 
these assumptions and get biased results.

Often, it is possible to work with off-the-shelf methods, meaning, methods that
are readily available in software. But there is a variety of models and
methods we can choose from, and we need to study them to be able to find the one
that fits our data best.

And even then, we may need to adapt the methods to really fit our data, by 
using the more advanced options provided in the software.

But the more complex our data and the analysis model of interest are, the fewer
options we have, and in certain settings, we may have to use a completely custom
tailored approach.

And we should not forget, multiple imputation was developed 50 years ago. The
data that we collect and the models that we use to analyse that data have
gotten more and more complex since then. And so even though multiple imputation
as a method is not wrong or bad, it just may not be a good fit for our research
projects today.

Luckily, also the computational power has tremendously increased over the last
decades, and so it is now feasible to use more complex techniques for handling
missing data.

---

## Reality Check

Solving a missing data problem adequately in **just one line is an illusion!**

```r
mice::mice(mydata)
```

???

And talking about complex. 
Being able to solve a missing data problem adequately in just one line
unfortunately is an illusion!

???

In reality, setting up the imputation can take quite a bit of time.
Here is an example of syntax that we used for the imputation with mice in an
observational cohort study. And this syntax has more than 600 lines of code.

---
class: the-end
background-image: url(graphics/ColouredBackground.jpg)
background-position: center
background-size: contain
layout: false
count: false

---
class: center, inverse, the-end
layout: false
count: false

<div class="thanks">Thanks!</div>

<script type="text/javascript" async
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML">
</script>

// All page modals
var modals = document.querySelectorAll('.modal');

// Get the <span> element that closes the modal
var spans = document.getElementsByClassName("close");

// When the user clicks the button, open the modal
for (var i = 0; i < btn.length; i++) {
btn[i].onclick = function(e) {
e.preventDefault();
modal = document.querySelector(e.target.getAttribute("href"));
modal.style.display = "block";
}
}

// When the user clicks on <span> (x), close the modal
for (var i = 0; i < spans.length; i++) {
spans[i].onclick = function() {
for (var index in modals) {
if (typeof modals[index].style !== 'undefined') modals[index].style.display = "none";    
}
}
}

// When the user clicks anywhere outside of the modal, close it
window.onclick = function(event) {
if (event.target.classList.contains('modal')) {
for (var index in modals) {
if (typeof modals[index].style !== 'undefined') modals[index].style.display = "none";    
}
}
}
</script>