Chapter 14 Introduction to Bacteremia

To demonstrate the workflow and content of IDA, we created a hypothetical research aim and corresponding statistical analysis plan, which is described in more detail in the section Bact_SAP.Rmd.

Hypothetical research aim for IDA is to develop a multivariable diagnostic model for bacteremia using 49 continuous laboratory blood parameters, age and gender with the primary aim of prediction and a secondary aim of describing the association of each variable with the outcome (‘explaining’ the multivariable model).

A diagnostic prediction model was developed based on this data set and validated in “A Risk Prediction Model for Screening Bacteremic Patients: A Cross Sectional Study” Ratzinger et al, PLoS One 2014. The assumed research aim is in line with this diagnostic prediction model.

14.1 Dataset Description

Ratzinger et al (2014) performed a diagnostic study in which age, sex and 49 laboratory variables can be used to diagnose bacteremia status of a blood sample using a multivariable model. Between January 2006 and December 2010, patients with the clinical suspicion to suffer from bacteraemia were included if blood culture analysis was requested by the responsible physician and blood was sampled for assessment of haematology and biochemistry. The data consists of 14,691 observations from different patients.

Our version of this data was slightly modified compared to original version, and this modified version was cleared by the Medical University of Vienna for public use (DC 2019-0054). Variable names have been kept as they were (partly German abbreviations). A data dictionary is available in the misc folder of the project directory (‘bacteremia-DataDictionary.csv’).

In the original paper describing the study (Ratzinger et al, PLoS One 2014), a machine learning approach was taken to diagnose a positive status of blood culture. The true status was determined for all blood samples by blood culture analysis, which is the gold standard. Here we will make use of a multivariable logistic regression model.

14.2 Bacteremia dataset contents

14.2.1 Source dataset

We refer to the source data set as the dataset available in this repository.

Display the source dataset contents. This dataset is in the data folder of the project directory.


Data frame:bact

14691 observations and 53 variables, maximum # NAs:7114  
NameStorageNAs
IDinteger 0
sexinteger 0
Alterinteger 0
MCVdouble 42
HGBdouble 41
HCTdouble 42
PLTinteger 42
MCHdouble 42
MCHCdouble 42
RDWdouble 56
MPVdouble 702
LYMdouble 262
MONOdouble 246
EOSdouble 135
BASOdouble 146
NTinteger2467
APTTdouble2549
FIBinteger2567
NA.integer1282
Kdouble2008
CAdouble1276
PHOSdouble1242
MGdouble1869
KREAdouble 159
BUNdouble 172
HSdouble3061
GBILdouble1441
TPdouble1583
ALBdouble1676
AMYinteger3913
PAMYinteger7114
LIPinteger3699
CHEdouble2447
APinteger1400
ASATinteger1154
ALATinteger 987
GGTinteger1262
LDHinteger1714
CKinteger2080
GLUinteger4192
TRIGinteger5061
CHOLinteger5045
CRPdouble 155
BASORdouble 732
EOSRdouble 732
LYMRdouble 732
MONORdouble 732
NEUdouble 728
NEURdouble 732
PDWdouble1102
RBCdouble 461
WBCdouble 462
BloodCulturecharacter 0

14.2.2 Updated analysis dataset

Additional meta-data is added to the original source data set. We write this new modified (annotated) data set back to the data folder after adding additional meta-data for all variables. The meta-data is taken from the data dictionary.

At the stage we could select the variables of interest to take in to the IDA phase by dropping variables we do not check in IDA.

As a cross check we display the contents again to ensure the additional data is added, and then write the changes to the data folder in the file “data/a_bact.rda”.

Input object size: 5119632 bytes; 53 variables 14691 observations New object size: 5159904 bytes; 53 variables 14691 observations Input object size: 5277552 bytes; 54 variables 14691 observations New object size: 5219544 bytes; 54 variables 14691 observations


Data frame:a_bact

14691 observations and 54 variables, maximum # NAs:7114  
NameLabelsUnitsClassStorageNAs
IDPatient Identification1-14691integerinteger 0
sexPatient Sex1=male, 2=femaleintegerinteger 0
AlterPatient Ageyearsintegerinteger 0
MCVMean corpuscular volumepgnumericdouble 42
HGBHaemoglobinG/Lnumericdouble 41
HCTHaematocrit%numericdouble 42
PLTBlood plateletsG/Lintegerinteger 42
MCHMean corpuscular hemoglobinflnumericdouble 42
MCHCMean corpuscular hemoglobin concentrationg/dlnumericdouble 42
RDWRed blood cell distribution width%numericdouble 56
MPVMean platelet volumeflnumericdouble 702
LYMLymphocytesG/Lnumericdouble 262
MONOMonocytesG/Lnumericdouble 246
EOSEosinophilsG/Lnumericdouble 135
BASOBasophilesG/Lnumericdouble 146
NTNormotest%integerinteger2467
APTTActivated partial thromboplastin timesecnumericdouble2549
FIBFibrinogenmg/dlintegerinteger2567
NA.Sodiummmol/Lintegerinteger1282
KPotassiummmol/Lnumericdouble2008
CACalciummmol/Lnumericdouble1276
PHOSPhosphatemmol/Lnumericdouble1242
MGMagnesiummmol/Lnumericdouble1869
KREACreatininemg/dlnumericdouble 159
BUNBlood urea nitrogenmg/dlnumericdouble 172
HSUric acidmg/dlnumericdouble3061
GBILBilirubinmg/dlnumericdouble1441
TPTotal proteinG/Lnumericdouble1583
ALBAlbuminG/Lnumericdouble1676
AMYAmylaseU/Lintegerinteger3913
PAMYPancreas amylaseU/Lintegerinteger7114
LIPLipasesU/Lintegerinteger3699
CHECholinesterasekU/Lnumericdouble2447
APAlkaline phosphataseU/Lintegerinteger1400
ASATAspartate transaminaseU/Lintegerinteger1154
ALATAlanin transaminaseU/Lintegerinteger 987
GGTGamma-glutamyl transpeptidaseG/Lintegerinteger1262
LDHLactate dehydrogenaseU/Lintegerinteger1714
CKCreatinine kinasesU/Lintegerinteger2080
GLUGlucosesmg/dlintegerinteger4192
TRIGTriclyceridemg/dlintegerinteger5061
CHOLCholesterolmg/dlintegerinteger5045
CRPC-reactive proteinmg/dlnumericdouble 155
BASORBasophile ratio%numericdouble 732
EOSREosinophil ratio%numericdouble 732
LYMRLymphocyte ratio% (mg/dl)numericdouble 732
MONORMonocyte ratio%numericdouble 732
NEUNeutrophilesG/Lnumericdouble 728
NEURNeutrophile ratio%numericdouble 732
PDWPlatelet distribution width%numericdouble1102
RBCRed blood countT/Lnumericdouble 461
WBCWhite blood countG/Lnumericdouble 462
BloodCultureBlood culture result for bacteremiano, yescharactercharacter 0
BCbacteremia0/1integerinteger 0

14.3 Section session info

## R version 4.1.3 (2022-03-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Austria.1252  LC_CTYPE=English_Austria.1252   
## [3] LC_MONETARY=English_Austria.1252 LC_NUMERIC=C                    
## [5] LC_TIME=English_Austria.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] Hmisc_4.6-0     Formula_1.2-4   survival_3.2-13 lattice_0.20-45
##  [5] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.8     purrr_0.3.4    
##  [9] readr_2.1.2     tidyr_1.2.0     tibble_3.1.6    ggplot2_3.3.5  
## [13] tidyverse_1.3.1 here_1.0.1     
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2          sass_0.4.1          jsonlite_1.8.0     
##  [4] splines_4.1.3       modelr_0.1.8        bslib_0.3.1        
##  [7] assertthat_0.2.1    latticeExtra_0.6-29 cellranger_1.1.0   
## [10] yaml_2.3.5          pillar_1.7.0        backports_1.4.1    
## [13] glue_1.6.2          digest_0.6.29       checkmate_2.0.0    
## [16] RColorBrewer_1.1-2  rvest_1.0.2         colorspace_2.0-3   
## [19] htmltools_0.5.2     Matrix_1.4-0        pkgconfig_2.0.3    
## [22] broom_0.7.12        haven_2.4.3         bookdown_0.25      
## [25] scales_1.1.1        jpeg_0.1-9          tzdb_0.2.0         
## [28] htmlTable_2.4.0     generics_0.1.2      ellipsis_0.3.2     
## [31] withr_2.5.0         nnet_7.3-17         cli_3.2.0          
## [34] magrittr_2.0.2      crayon_1.5.1        readxl_1.3.1       
## [37] evaluate_0.15       fs_1.5.2            fansi_1.0.3        
## [40] xml2_1.3.3          foreign_0.8-82      data.table_1.14.2  
## [43] tools_4.1.3         hms_1.1.1           lifecycle_1.0.1    
## [46] munsell_0.5.0       reprex_2.0.1        cluster_2.1.2      
## [49] compiler_4.1.3      jquerylib_0.1.4     rlang_1.0.2        
## [52] grid_4.1.3          rstudioapi_0.13     htmlwidgets_1.5.4  
## [55] base64enc_0.1-3     rmarkdown_2.13      gtable_0.3.0       
## [58] DBI_1.1.2           R6_2.5.1            gridExtra_2.3      
## [61] lubridate_1.8.0     knitr_1.38          fastmap_1.1.0      
## [64] utf8_1.2.2          rprojroot_2.0.2     stringi_1.7.6      
## [67] Rcpp_1.0.8.3        vctrs_0.3.8         rpart_4.1.16       
## [70] png_0.1-7           dbplyr_2.1.1        tidyselect_1.1.2   
## [73] xfun_0.30