Appendix E — Univariate distribution checks

This section reports a series of univariate summary checks of the bacteremia dataset.

E.1 U2: Descriptive summaries

E.1.1 U2: Remaining predictors

We present a visual summary.

And a descriprive summary

remaining_predictors Descriptives
remaining_predictors

38 Variables   14691 Observations

MCV: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1464942506188.356.99278.281.184.788.392.095.999.0
lowest : 51 52.6 54.9 56.3 57.5 , highest: 121 121.8 124.6 127.9 128.7
HGB: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1465041157111.572.558 8.2 8.8 9.911.413.214.615.4
lowest : 3 3.1 3.5 3.9 4.1 , highest: 19.5 20.5 20.7 20.8 21
HCT: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1464942404134.487.31624.626.429.834.339.142.944.8
lowest : 0 0.1 0.2 9.7 9.8 , highest: 61.4 61.9 63.2 65.3 66.6
MCH: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1464942232129.582.69325.326.728.429.731.032.433.4
lowest : 14.9 15.6 15.9 16 16.5 , highest: 42 42.3 42.4 42.5 47.4
MCHC: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14649421240.99933.471.54631.131.732.633.534.435.235.6
lowest : 23.7 24.4 24.8 25.1 26.1 , highest: 38.3 38.4 38.9 39.3 43.5
RDW: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14635561731152.38512.412.713.414.516.018.019.5
lowest : 10.6 11.1 11.2 11.3 11.4 , highest: 28.6 28.9 29.1 29.7 31.8
MPV: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
13989702710.99910.381.132 8.9 9.2 9.710.311.011.712.2
lowest : 7.3 7.7 7.8 7.9 8 , highest: 14.2 14.3 14.6 14.8 15
LYM: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
144292621140.9981.3661.1620.20.40.71.01.62.12.6
lowest : 0 0.1 0.2 0.3 0.4 , highest: 149.9 357.5 366.8 375.1 578.1
MONO: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14445246670.9960.85270.59650.10.30.50.81.11.51.8
lowest : 0 0.1 0.2 0.3 0.4 , highest: 13.9 14.6 16.2 17.3 20.4
EOS: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14556135360.8670.11480.15850.00.00.00.10.10.30.4
lowest : 0 0.1 0.2 0.3 0.4 , highest: 3.8 5.3 9.6 11.5 15.8
BASO: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14545146180.3370.017250.031110.00.00.00.00.00.10.1
 Value      0.000 0.065 0.195 0.260 0.390 0.455 0.585 0.650 0.780 0.845 0.975 1.040
 Frequency  12671  1636   109    59    31    14     6     7     1     2     1     2
 Proportion 0.871 0.112 0.007 0.004 0.002 0.001 0.000 0.000 0.000 0.000 0.000 0.000
                                               
 Value      1.170 1.300 1.365 1.495 2.145 6.500
 Frequency      1     1     1     1     1     1
 Proportion 0.000 0.000 0.000 0.000 0.000 0.000 
For the frequency table, variable is rounded to the nearest 0.065
NT: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
122242467149183.2230.56 35 48 67 83101118128
lowest : 4 5 6 7 8 , highest: 148 149 150 151 152
APTT: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
121422549631140.069.53330.131.434.137.742.749.956.6
lowest : 21.4 21.6 23.4 23.5 23.6 , highest: 160.7 163 168.7 171.6 176.1
SODIUM: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
134091282580.994137.25.034129132135137140142144
lowest : 106 108 109 110 112 , highest: 161 165 166 168 170
CA: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
13415127618512.2140.22131.891.962.092.222.352.452.51
lowest : 1.03 1.15 1.18 1.2 1.23 , highest: 3.84 3.88 3.96 4.18 4.4
PHOS: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
13449124230611.0480.39930.550.640.810.991.201.471.74
lowest : 0.3 0.31 0.32 0.33 0.34 , highest: 4.36 4.43 4.53 5.48 6.22
MG: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1282218691460.9990.81360.16090.590.640.720.810.890.981.06
lowest : 0.2 0.21 0.22 0.26 0.28 , highest: 1.83 1.88 1.96 2.07 2.22
HS: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
11630306116915.4132.625 2.2 2.7 3.7 5.0 6.6 8.510.0
lowest : 1.3 1.4 1.5 1.6 1.7 , highest: 19.8 20.2 22.2 22.3 22.7
GBIL: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
13250144188511.4061.4770.330.390.530.771.232.343.96
lowest : 0.11 0.12 0.13 0.14 0.15 , highest: 42.82 43.83 45.1 51.72 51.77
TP: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
131081583649164.912.9745.2049.4756.9065.7073.3078.8082.00
lowest : 29.9 30 30.3 30.5 30.6 , highest: 107.8 108.1 108.7 112.8 120.9
ALB: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
130151676401133.428.51321.323.627.933.639.143.245.2
lowest : 10 10.2 10.5 10.6 10.7 , highest: 52.9 53.2 53.7 54 55.7
AMY: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
107783913488190.83100.5 18 23 33 49 76125187
lowest : 8 9 10 11 12 , highest: 4984 5248 40372 43970 56146
PAMY: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
757771142800.99941.6647.28 7 91422366497
lowest : 1 2 3 4 5 , highest: 1673 2083 2116 3066 38369
LIP: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
109923699444163.8289.88 6 8 14 23 40 79135
lowest : 0 1 2 3 4 , highest: 11469 15843 18560 22339 45991
CHE: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
12244244799714.792.3781.702.173.154.606.227.658.49
lowest : 0.98 0.99 1 1.01 1.02 , highest: 12.39 12.55 12.97 13.32 13.89
AP: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1329114006721118.891.51 42 49 63 84123206302
lowest : 11 14 15 16 17 , highest: 1980 2132 2549 2596 2995
LDH: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
12977171411371331.2240.9136152187239332508724
lowest : 39 46 54 55 56 , highest: 10473 10784 10822 11246 13906
CK: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
12611208015061385615.4 18 25 42 80 184 5771155
lowest : 8 9 10 11 12 , highest: 60799 63011 82180 83880 98801
GLU: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1049941923891126.448.3 78 85 97113138177216
lowest : 19 22 23 26 28 , highest: 843 848 890 1349 1403
TRIG: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
963050615381141.790.33 54 64 83115165241307
lowest : 14 15 16 20 22 , highest: 1796 2247 2662 2918 5440
CHOL: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
964650453391150.859.23 74 89113145182219243
lowest : 25 26 27 28 29 , highest: 646 662 676 710 1104
BASOR: Parameter analysis value (Numeric)
image
        n  missing distinct     Info     Mean      Gmd      .05      .10      .25 
    13959      732      419    0.322    0.145   0.2679   0.0000   0.0000   0.0000 
      .50      .75      .90      .95 
   0.0000   0.0000   0.5501   1.0526  
lowest : 0 0.13587 0.15456 0.165289 0.181818 , highest: 11.1111 15.2174 16.6667 18.4211 23.6559
EOSR: Parameter analysis value (Numeric)
image
        n  missing distinct     Info     Mean      Gmd      .05      .10      .25 
    13959      732      927    0.891    1.297    1.825   0.0000   0.0000   0.0000 
      .50      .75      .90      .95 
   0.5882   1.7857   3.4900   5.0000  
lowest : 0 0.183486 0.20284 0.217865 0.218818 , highest: 39.1753 46.6019 46.9027 50 73.4884
LYMR: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
139597323121114.6111.87 2.752 4.000 6.75711.34018.18227.86936.620
lowest : 0 0.321543 0.44843 0.460829 0.463679 , highest: 97.2414 97.4194 98 99.1848 100
MONOR: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
13959732233418.7935.4 2.000 3.390 5.634 8.00010.87014.14117.021
lowest : 0 0.274725 0.341297 0.344828 0.456621 , highest: 68.5446 69.2308 70.3704 72.7273 100
NEUR: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
139597323850175.1515.647.4257.8869.2378.3385.3290.1392.63
lowest : 0 1.48483 1.93548 1.96078 2.41379 , highest: 99.1228 99.1667 99.4764 99.4845 100
PDW: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
135891102167112.292.375 9.3 9.810.812.013.415.116.4
lowest : 6.6 6.8 6.9 7 7.1 , highest: 24.1 24.7 24.9 25.2 25.3
RBC: Parameter analysis value (Numeric)
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14230461650.9993.9360.87722.72.93.43.94.54.95.2
lowest : 1 1.1 1.2 1.3 1.4 , highest: 7.2 7.4 7.6 7.7 8.2

E.1.2 Full descriptive summaries

For U2, we present only a limited number of statistics are presented for brevity. However, a full set of descriptive summaries are available according to the specifications in the IDA plan as a data set. The summary statistics can be viewed and analysed in the following directory data/results/U2-descriptive-stats.csv.