Chapter 5 Statistical analysis plan

Since a key principle of IDA is not to touch the research questions, before IDA commences the research aim and statistical analysis plan needs to be in place. IDA may lead to an update or refinement of the analysis plan. To demonstrate the workflow and content of IDA, we created a hypothetical research aim and corresponding statistical analysis plan.

Hypothetical research aim for IDA: Develop a multivariable model for early death (death within 28 days from injury) using nine independent variables of mixed type (continuous, categorical, semicontinuous) with the primary aim of prediction and a secondary aim of describing the association of each variable with the outcome.

The assumed analysis aim is in line with the prediction model presented by Perel et al, BMJ 2012, supplement available at.

5.1 Outcome variable

Early death, i.e. in-hospital death within 28 days from injury (binary variable)

5.2 Statistical methods

Logistic regression will be used to model early death by the following independent variables (measured at randomisation) deemed important to predict early death.

Demographic measurements:

  • Age (age, years)
  • Sex (sex, male or female)

Physiological measurements:

  • Systolic blood pressure (sbp, mmHg)
  • Heart rate (hr, 1/min)
  • Respiratory rate (rr, 1/min)
  • Glasgow coma score (gcs, points)
  • Central capillary refill time (cc, seconds)

Characteristics of injury measurements:

  • Time since injury (injurytime, hours)
  • Type of injury (injurytype, ‘blunt’, ‘penetrating’ or ‘blunt and penetrating’)

Restricted cubic splines with 3 degrees of freedom with knots set to default values will be used for continuous variables. As the final prediction model should be parsimonious enough to simplify its application, a backward elimination algorithm with a significance level set at \(\alpha=0.05\) will be applied to remove statistically insignificant effects. Finally, nonlinear representation of each continuous variable will be tested against linear representation at \(\alpha=0.05\). In case of lacking added value of a nonlinear effect, the model will be refitted with a linear effect for that variable.

5.3 Remarks

  • Regarding type of injury, the original paper describes its treatment in the model as follows: ‘Type of injury had three categories—-penetrating, blunt, or blunt and penetrating—but we analysed it as ’penetrating’ or ‘blunt and penetrating.’ ’ It is not clear from that description what happened to the ‘blunt’ group. (I assume they were collapsed with ‘blunt and penetrating’.) ** we are going to consider the three categories, and then check aout recommendations for the final analysis-MH**

  • The original paper describes the modeling approach as follows: ‘We used a backward step-wise approach. Firstly, we included all potential prognostic factors and interaction terms that users considered plausible. These interactions included all potential predictors with type of injury, time since injury, and age. We then removed, one at a time, terms for which we found no strong evidence of an association, judged according to the P values (<0.05) from the Wald test.’ This would mean they tested at least 24 interaction terms, each possibly using several degrees of freedom! In the final model, only an interaction of Glasgow coma score and type of injury was included.

5.4 Preparations

The outcome variable, early death (i.e., death within 28 days from injury) must be computed from the time span between date of death and date of randomization using the following logic:

  • transform ddeath and trandomisation into an interpretable date format and then compute the difference
  • interpret missing (i.e. NAs) as ‘not died within study period, at least not within 28 days’
  • if patients died after 28 days, treat as alive

This can be derived using the following code logic:

## NOTE: This is for demostration purposes, this code is not run here. 
## The derivation was executed earlier. 

a_crash2$time2death <-
  as.numeric(as.Date(a_crash2$ddeath) - as.Date(a_crash2$trandomised))

a_crash2$earlydeath[!is.na(a_crash2$time2death)] <-
  (a_crash2$time2death[!is.na(a_crash2$time2death)] <= 28) + 0

# +0 to transform it from TRUE/FALSE to 1/0
# NA in time2death means alive at day 28
a_crash2$earlydeath[is.na(a_crash2$time2death)] <- 0    

We also display the marginal distribution of the derived outcome variable.

a_crash2 %>%
  dplyr::select(earlydeath) %>%
  gtsummary::tbl_summary()
Characteristic N = 20,2071
Death within 28 days from injury 3,076 (15%)
1 n (%)

The number of deaths computed in the data set coincides with the number reported in Perel et al, BMJ 2012.

5.5 Sources

Data obtained from http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets

To download the data set, click the link to data set

5.5.1 Data dictionary

The data dictionary can be found LINK

5.6 References

CRASH-2 Collaborators. Effects of tranexamic acid on death, vascular occlusive events, and blood transfusion in trauma patients with significant haemorrhage (CRASH-2): a randomised, placebo-controlled trial. Lancet 2010;376:23-32

Perel P, Prieto-Merino D, Shakur H, Clayton T, Lecky F, Bouamra O, Russell R, Faulkner M, Steyerberg EW, Roberts I. Predicting early death in patients with traumatic bleeding: development and validation of prognostic model. BMJ 2012; 345(aug15 1): e5166.