Preface

R packages

In this practical, a number of R packages are used. If any of them are not installed you may be able to follow the practical but will not be able to run all of the code. The packages used (with versions that were used to generate the solutions) are:

  • R version 3.6.1 (2019-07-05)
  • mice (version: 3.6.0)
  • survival (version: 2.44.1.1; note: in versions 2.44-1 and 2.44-1.1 confint() returns NA)
  • JointAI (version: 0.6.0)
  • ggplot2 (version: 3.2.1)

Dataset

Data & Model of interest

In this practical we will work with a different subset of the PBC data, omitting the longitudinal structure and focusing on the survival component.

To get the pbcdat data, load the file pbcdat.RData. You can download it here. To load this dataset, you can use the command file.choose() which opens the explorer and allows you to navigate to the location of the downloaded file on your computer. If you know the path to the file, you can also use load("<path>/<pbcdat.RData").

The variables contained in this subset (pbcdat) are:
time number of years between inclusion and death, transplantion, or end of follow-up
status status at time (censored, transplant, dead)
age patient’s age at intake
sex patient’s sex
platelet platelet count
chol serum cholesterol
stage histologic stage of disease

The variables in pbcdat dataset have the following distributions:

The missing data pattern is

We are interested to determine predictor variables for patient survival, using the following Cox proportional hazards model:

coxph(Surv(time, status == 'dead') ~ platelet + age + sex + chol + stage)

Imputation with mice

As we have seen in the lecture, the mice package provides the function nelsonaalen() that calculates the Nelson-Aalen estimator with which the cumulative Hazard can be approximated.

Task 1

  • Calculate the Nelson-Aalen estimate for patient survival in the pbc data and
  • perform the usual setup steps for imputation using mice().

Note:
nelsonaalen() does not accept the status == 'dead' specification of the event, hence, we need to create a new event indicator event.


Solution 1

pbcdat$event <- pbcdat$status == 'dead'
pbcdat$na <- nelsonaalen(data = pbcdat, timevar = "time",  statusvar = "event")

micesurv0 <- mice(pbcdat, maxit = 0)
micesurvmeth <- micesurv0$meth
micesurvpred <- micesurv0$pred

micesurvmeth[c("chol")] <- "midastouch"
micesurvmeth[c("platelet")] <- "norm"

micesurvpred[, "time"] <- 0

# check the method and predictorMatrix
micesurvmeth
##         time       status     platelet          age          sex         chol        stage 
##           ""           ""       "norm"           ""           "" "midastouch"       "polr" 
##        event           na 
##           ""           ""
micesurvpred
##          time status platelet age sex chol stage event na
## time        0      1        1   1   1    1     1     1  1
## status      0      0        1   1   1    1     1     1  1
## platelet    0      1        0   1   1    1     1     1  1
## age         0      1        1   0   1    1     1     1  1
## sex         0      1        1   1   0    1     1     1  1
## chol        0      1        1   1   1    0     1     1  1
## stage       0      1        1   1   1    1     0     1  1
## event       0      1        1   1   1    1     1     0  1
## na          0      1        1   1   1    1     1     1  0

Task 2

Run the imputation and analyse the imputed data.

Solution 2

## Warning: Unknown or uninitialised column: 'df.residual'.
## Warning in pool.fitlist(getfit(object), dfcom = dfcom): Large sample assumed.

Imputation with JointAI

Analysis of a Cox proportional hazards model using JointAI works analogous to the function coxph() from packages survival.

Task

  • Fit the Cox model given above using the function coxph_imp() (and 500 iterations).
  • Check the traceplot to confirm convergence.

Solution

library(survival)
JointAIcox <- coxph_imp(Surv(time = time, event) ~ platelet + 
                          age + sex + chol + stage, data = pbcdat,
                        n.iter = 500, seed = 2019)
traceplot(JointAIcox)

The mixing of the parameters for stage is not great. We would have to increase the number of iterations and/or the number of chains to get better results.

In the following section we will see a trick that can sometimes help to improve problems with convergence or mixing.

Additional excercise JointAI

For larger datasets with many different event times the Cox model implemented in JointAI can become quite slow. This is because it has to use the counting process notation which requires a loop over all event times in each iteration of the MCMC sampler.

A parametric survival model, e.g. assuming a Weibull distribution (see Section 4.2 of the slides of EP03: Biostatistical Methods II: Survival Analysis), is often faster. This is implemented in the function survreg_imp().

Task 1

  • Fit a Weibull survival model using the model structure of the Cox model.
  • Investigate convergence using the traceplot() and the Gelman-Rubin criterion (GR_crit()).

Solution 1

JointAIsurv <- survreg_imp(Surv(time = time, event) ~ platelet + 
                             age + sex + chol + stage, data = pbcdat,
                           n.iter = 500, seed = 2019)

traceplot(JointAIsurv)
GR_crit(JointAIsurv)

## Potential scale reduction factors:
## 
##             Point est. Upper C.I.
## (Intercept)       1.34       1.91
## platelet          1.01       1.04
## age               1.02       1.07
## sexf              1.03       1.05
## chol              1.02       1.07
## stage2            2.07       3.91
## stage3            2.00       4.03
## stage4            2.12       4.26
## shape_time        1.01       1.04
## 
## Multivariate psrf
## 
## 1.78

Both the traceplot and the Gelman-Rubin criterion show that the parameters for stage don’t converge well. There are clear patterns visible in the plots and the upper limit of the confidence intverval of the Gelman-Rubin criterion is much larger than one.

Task 2

In some cases, convergence of coefficients for categorical variables can be improved by changing the reference category. Especially when the categories are very unbalanced, convergence it better when the largest (or a large) category is chosen as the reference.

The plot of the distribution of the variables in the pbcdat data at the beginning of this practical shows that there are few patients with stage 1.
  • Re-run the Weibull survival model with the most frequent category of stage as reference (using the argument refcats).

Check out the help file and the vignette on Model Specification for more details on how to use the argument refcats.

  • Then look at the traceplot() and GR_crit() to see if the change in reference category improved convergence.

Solution 2

JointAIsurv2 <- survreg_imp(Surv(time = time, event) ~ platelet + 
                              age + sex + chol + stage, data = pbcdat,
                            n.iter = 500, seed = 2019, refcats = 'largest')

traceplot(JointAIsurv2)
GR_crit(JointAIsurv2)

## Potential scale reduction factors:
## 
##             Point est. Upper C.I.
## (Intercept)       1.02       1.08
## platelet          1.04       1.12
## age               1.02       1.06
## sexm              1.01       1.04
## chol              1.02       1.06
## stage1            1.14       1.41
## stage2            1.15       1.47
## stage3            1.05       1.16
## shape_time        1.00       1.01
## 
## Multivariate psrf
## 
## 1.12

The traceplots look a lot better and the Gelman-Rubin criterion has improved. Nevertheless, more iterations or chaines would be necessary to obtain more reliable results.

 

© Nicole Erler