Chapter 17 Univariate distribution checks

This section reports a series of univariate summary checks of the bacteremia dataset.

17.1 Data set overview

Using the Hmisc describe function, we provide an overview of the data set. The descriptive report also provides histograms of continuous variables. For ease of scanning the information, we group the report by measurement type.

17.1.1 Demographic variables

Demographic variables

2 Variables   14691 Observations

Alter: Patient Age years
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14691085156.1720.7824294358707984
lowest : 16 17 18 19 20 , highest: 96 97 98 99 101
sex: Patient Sex 1=male, 2=female
nmissingdistinctInfoMeanGmd
14691020.731.4190.4869
 Value          1     2
 Frequency   8536  6155
 Proportion 0.581 0.419
 

17.1.2 Pivotal variables and very important predictors

Pivotal variables and VIPs

6 Variables   14691 Observations

WBC: White blood count G/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
142294622710111.237.602 2.66 4.26 6.63 9.6013.5318.2222.27
lowest : 0.00 0.01 0.02 0.03 0.04 , highest: 365.30 383.74 387.73 433.83 604.47
Alter: Patient Age years
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14691085156.1720.7824294358707984
lowest : 16 17 18 19 20 , highest: 96 97 98 99 101
BUN: Blood urea nitrogen mg/dl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14519172947122.6616.92 7.1 8.611.616.626.944.860.8
lowest : 2.5 2.7 2.8 2.9 3.0 , highest: 160.6 171.3 171.9 176.0 184.8
KREA: Creatinine mg/dl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1453215967411.3290.85180.6200.6900.8101.0001.3502.1603.144
lowest : 0.26 0.27 0.28 0.29 0.30 , highest: 15.24 15.40 15.67 16.64 20.75
NEU: Neutrophiles G/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1396372837418.3675.776 1.60 2.70 4.60 7.3010.8015.0818.40
lowest : 0.0 0.1 0.2 0.3 0.4 , highest: 54.0 56.4 63.7 71.6 83.8
PLT: Blood platelets G/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14649427181220130.1 50 81140204277369445
lowest : 0 1 2 3 4 , highest: 1068 1211 1321 1639 2092

17.1.6 Remaining variables

Remaining variables

29 Variables   14691 Observations

MCV: Mean corpuscular volume pg
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1464942506188.356.99278.281.184.788.392.095.999.0
lowest : 51.0 52.6 54.9 56.3 57.5 , highest: 121.0 121.8 124.6 127.9 128.7
HGB: Haemoglobin G/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1465041157111.572.558 8.2 8.8 9.911.413.214.615.4
lowest : 3.0 3.1 3.5 3.9 4.1 , highest: 19.5 20.5 20.7 20.8 21.0
HCT: Haematocrit %
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1464942404134.487.31624.626.429.834.339.142.944.8
lowest : 0.0 0.1 0.2 9.7 9.8 , highest: 61.4 61.9 63.2 65.3 66.6
MCH: Mean corpuscular hemoglobin fl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1464942232129.582.69325.326.728.429.731.032.433.4
lowest : 14.9 15.6 15.9 16.0 16.5 , highest: 42.0 42.3 42.4 42.5 47.4
MCHC: Mean corpuscular hemoglobin concentration g/dl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14649421240.99933.471.54631.131.732.633.534.435.235.6
lowest : 23.7 24.4 24.8 25.1 26.1 , highest: 38.3 38.4 38.9 39.3 43.5
RDW: Red blood cell distribution width %
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14635561731152.38512.412.713.414.516.018.019.5
lowest : 10.6 11.1 11.2 11.3 11.4 , highest: 28.6 28.9 29.1 29.7 31.8
MPV: Mean platelet volume fl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
13989702710.99910.381.132 8.9 9.2 9.710.311.011.712.2
lowest : 7.3 7.7 7.8 7.9 8.0 , highest: 14.2 14.3 14.6 14.8 15.0
NT: Normotest %
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
122242467149183.2230.56 35 48 67 83101118128
lowest : 4 5 6 7 8 , highest: 148 149 150 151 152
APTT: Activated partial thromboplastin time sec
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
121422549631140.069.53330.131.434.137.742.749.956.6
lowest : 21.4 21.6 23.4 23.5 23.6 , highest: 160.7 163.0 168.7 171.6 176.1
NA.: Sodium mmol/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
134091282580.994137.25.034129132135137140142144
lowest : 106 108 109 110 112 , highest: 161 165 166 168 170
CA: Calcium mmol/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
13415127618512.2140.22131.891.962.092.222.352.452.51
lowest : 1.03 1.15 1.18 1.20 1.23 , highest: 3.84 3.88 3.96 4.18 4.40
PHOS: Phosphate mmol/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
13449124230611.0480.39930.550.640.810.991.201.471.74
lowest : 0.30 0.31 0.32 0.33 0.34 , highest: 4.36 4.43 4.53 5.48 6.22
MG: Magnesium mmol/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1282218691460.9990.81360.16090.590.640.720.810.890.981.06
lowest : 0.20 0.21 0.22 0.26 0.28 , highest: 1.83 1.88 1.96 2.07 2.22
HS: Uric acid mg/dl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
11630306116915.4132.625 2.2 2.7 3.7 5.0 6.6 8.510.0
lowest : 1.3 1.4 1.5 1.6 1.7 , highest: 19.8 20.2 22.2 22.3 22.7
GBIL: Bilirubin mg/dl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
13250144188511.4061.4770.330.390.530.771.232.343.96
lowest : 0.11 0.12 0.13 0.14 0.15 , highest: 42.82 43.83 45.10 51.72 51.77
TP: Total protein G/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
131081583649164.912.9745.2049.4756.9065.7073.3078.8082.00
lowest : 29.9 30.0 30.3 30.5 30.6 , highest: 107.8 108.1 108.7 112.8 120.9
ALB: Albumin G/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
130151676401133.428.51321.323.627.933.639.143.245.2
lowest : 10.0 10.2 10.5 10.6 10.7 , highest: 52.9 53.2 53.7 54.0 55.7
AMY: Amylase U/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
107783913488190.83100.5 18 23 33 49 76125187
lowest : 8 9 10 11 12 , highest: 4984 5248 40372 43970 56146
 Value          0   500  1000  1500  2000  2500  4000  4500  5000 40500 44000 56000
 Frequency  10432   268    39    14    12     4     2     2     2     1     1     1
 Proportion 0.968 0.025 0.004 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000
 
For the frequency table, variable is rounded to the nearest 500
PAMY: Pancreas amylase U/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
757771142800.99941.6647.28 7 91422366497
lowest : 1 2 3 4 5 , highest: 1673 2083 2116 3066 38369
 Value          0   500  1000  1500  2000  3000 38500
 Frequency   7495    65     7     6     2     1     1
 Proportion 0.989 0.009 0.001 0.001 0.000 0.000 0.000
 
For the frequency table, variable is rounded to the nearest 500
LIP: Lipases U/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
109923699444163.8289.88 6 8 14 23 40 79135
lowest : 0 1 2 3 4 , highest: 11469 15843 18560 22339 45991
CHE: Cholinesterase kU/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
12244244799714.792.3781.702.173.154.606.227.658.49
lowest : 0.98 0.99 1.00 1.01 1.02 , highest: 12.39 12.55 12.97 13.32 13.89
AP: Alkaline phosphatase U/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1329114006721118.891.51 42 49 63 84123206302
lowest : 11 14 15 16 17 , highest: 1980 2132 2549 2596 2995
LDH: Lactate dehydrogenase U/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
12977171411371331.2240.9136152187239332508724
lowest : 39 46 54 55 56 , highest: 10473 10784 10822 11246 13906
CK: Creatinine kinases U/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
12611208015061385615.4 18 25 42 80 184 5771155
lowest : 8 9 10 11 12 , highest: 60799 63011 82180 83880 98801
GLU: Glucoses mg/dl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1049941923891126.448.3 78 85 97113138177216
lowest : 19 22 23 26 28 , highest: 843 848 890 1349 1403
TRIG: Triclyceride mg/dl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
963050615381141.790.33 54 64 83115165241307
lowest : 14 15 16 20 22 , highest: 1796 2247 2662 2918 5440
CHOL: Cholesterol mg/dl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
964650453391150.859.23 74 89113145182219243
lowest : 25 26 27 28 29 , highest: 646 662 676 710 1104
PDW: Platelet distribution width %
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
135891102167112.292.375 9.3 9.810.812.013.415.116.4
lowest : 6.6 6.8 6.9 7.0 7.1 , highest: 24.1 24.7 24.9 25.2 25.3
RBC: Red blood count T/L
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
14230461650.9993.9360.87722.72.93.43.94.54.95.2
lowest : 1.0 1.1 1.2 1.3 1.4 , highest: 7.2 7.4 7.6 7.7 8.2

17.2 Categorical variables

We now provide a closer visual examination of the categorical predictors.

17.3 Continuous variables

17.3.1 Suggested transformations

Next we investigate whether a transformation of continuous variables may improve any further analyses to reduce disproportional impact of highly influential points, also in multivariate summaries. We employ a function ida_trans for this purpose, which optimises the parameter sigma of the pseudo-logarithm for that purpose. The optimization targets the best possible linear correlation of the transformed values with normal deviates. If no better transformation can be found, no transformation is suggested.

variables<- c("Alter", pivotal_vars, vip_vars, leuko_related_vars, leuko_ratio_vars, kidney_related_vars, acute_related_vars, remaining_vars)
unique.variables <- unique(variables)
#variables<- c("Alter", pivotal_vars, vip_vars)

res<-sapply(unique.variables, function(X) ida_trans(b_bact[,X])$const) #takes long, calculate once, and save?

res
##      Alter        WBC        BUN       KREA        NEU        PLT        EOS 
##         NA 2.51471408 0.03198339 0.03193846 2.30783002         NA 0.11873740 
##       BASO        LYM       MONO       NEUR       EOSR      BASOR       LYMR 
## 0.12980073 0.17957981 0.26505156         NA 0.42874860 0.17300902 1.77333947 
##      MONOR          K       eGFR   BUN_KREA        FIB        CRP       ASAT 
## 3.00040692 0.02677349         NA 0.03194543         NA         NA 0.03185536 
##       ALAT        GGT        MCV        HGB        HCT        MCH       MCHC 
## 1.02570312 0.03185702         NA         NA         NA         NA         NA 
##        RDW        MPV         NT       APTT        NA.         CA       PHOS 
##         NA         NA         NA 0.03047767         NA         NA 0.14534462 
##         MG         HS       GBIL         TP        ALB        AMY       PAMY 
##         NA         NA 0.03306450         NA         NA 0.02970893 0.03005131 
##        LIP        CHE         AP        LDH         CK        GLU       TRIG 
## 1.02558160         NA 0.02888640 0.02191602 0.02786388 0.01875994 0.02911146 
##       CHOL        PDW        RBC 
##         NA         NA         NA

Register transformed variables in the data set:

for(j in 1:length(unique.variables)){
  if(!is.na(res[j])){
    newname <- paste("t_",unique.variables[j],sep="")
    newlabel <- paste("pseudo-log of",label(b_bact)[unique.variables[j]])
    names(newlabel)<-newname
    x<-pseudo_log(b_bact[[unique.variables[j]]], sigma=res[j], base=10)
    label(x)<-newlabel
    b_bact[[newname]] <- x
    upData(b_bact, labels=newlabel)
  }
}
## Input object size:    5575040 bytes;  57 variables    14691 observations
## New object size: 5574816 bytes;  57 variables    14691 observations
## Input object size:    5693696 bytes;  58 variables    14691 observations
## New object size: 5693472 bytes;  58 variables    14691 observations
## Input object size:    5812336 bytes;  59 variables    14691 observations
## New object size: 5812112 bytes;  59 variables    14691 observations
## Input object size:    5930976 bytes;  60 variables    14691 observations
## New object size: 5930752 bytes;  60 variables    14691 observations
## Input object size:    6049616 bytes;  61 variables    14691 observations
## New object size: 6049392 bytes;  61 variables    14691 observations
## Input object size:    6168256 bytes;  62 variables    14691 observations
## New object size: 6168032 bytes;  62 variables    14691 observations
## Input object size:    6286896 bytes;  63 variables    14691 observations
## New object size: 6286672 bytes;  63 variables    14691 observations
## Input object size:    6405536 bytes;  64 variables    14691 observations
## New object size: 6405312 bytes;  64 variables    14691 observations
## Input object size:    6524176 bytes;  65 variables    14691 observations
## New object size: 6523952 bytes;  65 variables    14691 observations
## Input object size:    6642816 bytes;  66 variables    14691 observations
## New object size: 6642592 bytes;  66 variables    14691 observations
## Input object size:    6761464 bytes;  67 variables    14691 observations
## New object size: 6761240 bytes;  67 variables    14691 observations
## Input object size:    6880104 bytes;  68 variables    14691 observations
## New object size: 6879880 bytes;  68 variables    14691 observations
## Input object size:    6998744 bytes;  69 variables    14691 observations
## New object size: 6998520 bytes;  69 variables    14691 observations
## Input object size:    7117408 bytes;  70 variables    14691 observations
## New object size: 7117176 bytes;  70 variables    14691 observations
## Input object size:    7236064 bytes;  71 variables    14691 observations
## New object size: 7235840 bytes;  71 variables    14691 observations
## Input object size:    7354720 bytes;  72 variables    14691 observations
## New object size: 7354496 bytes;  72 variables    14691 observations
## Input object size:    7473376 bytes;  73 variables    14691 observations
## New object size: 7473152 bytes;  73 variables    14691 observations
## Input object size:    7592048 bytes;  74 variables    14691 observations
## New object size: 7591824 bytes;  74 variables    14691 observations
## Input object size:    7710688 bytes;  75 variables    14691 observations
## New object size: 7710464 bytes;  75 variables    14691 observations
## Input object size:    7829328 bytes;  76 variables    14691 observations
## New object size: 7829104 bytes;  76 variables    14691 observations
## Input object size:    7947968 bytes;  77 variables    14691 observations
## New object size: 7947744 bytes;  77 variables    14691 observations
## Input object size:    8066608 bytes;  78 variables    14691 observations
## New object size: 8066384 bytes;  78 variables    14691 observations
## Input object size:    8185248 bytes;  79 variables    14691 observations
## New object size: 8185024 bytes;  79 variables    14691 observations
## Input object size:    8303904 bytes;  80 variables    14691 observations
## New object size: 8303680 bytes;  80 variables    14691 observations
## Input object size:    8422560 bytes;  81 variables    14691 observations
## New object size: 8422336 bytes;  81 variables    14691 observations
## Input object size:    8541216 bytes;  82 variables    14691 observations
## New object size: 8540992 bytes;  82 variables    14691 observations
## Input object size:    8659856 bytes;  83 variables    14691 observations
## New object size: 8659632 bytes;  83 variables    14691 observations
## Input object size:    8778496 bytes;  84 variables    14691 observations
## New object size: 8778272 bytes;  84 variables    14691 observations
sigma_values <- res


c_bact <- b_bact

# update variable lists - generate a second list with transformed variables replacing the originals

bact_transformed <- bact_variables

for(j in 1:length(bact_variables)){
  for(jj in 1:length(bact_variables[[j]])){
      if(!is.na(res[bact_variables[[j]][jj]])) bact_transformed[[j]][jj] <- paste("t_", bact_variables[[j]][jj], sep="")
  }
}

17.3.2 Univariate distribution with variables using the original variable and the suggested transformations

for(j in 1:length(unique.variables)){
  print(ida_plot_univar(b_bact, unique.variables[j], sigma=res[j], n_bars=100))
#  if(!is.na(res[j])){
#    print(ida_plot_univar(b_bact, paste("t_",variables[j],sep="")))
#  }
}
## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 95 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 5 rows containing missing values (geom_point).

## Warning: Removed 162 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 3483 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 6249 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 56 rows containing missing values (geom_point).

## Warning: Removed 233 rows containing missing values (geom_point).

## Warning: Removed 76 rows containing missing values (geom_point).

## Warning: Removed 3325 rows containing missing values (geom_point).

## Warning: Removed 6233 rows containing missing values (geom_point).

## Warning: Removed 92 rows containing missing values (geom_point).

## Warning: Removed 204 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 57 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 27 rows containing missing values (geom_point).

## Warning: Removed 5 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 9 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 17 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

save(list=c("c_bact", "bact_variables", "sigma_values", "bact_transformed"), 
     file=here::here("data", "bact_env_c.rda"))

17.3.3 Univariate distribution with variables using only the original variable without the suggested transformations

for(j in 1:length(unique.variables)){
  print(ida_plot_univar(b_bact, unique.variables[j], sigma=res[j], n_bars=100, transform = FALSE))
#  if(!is.na(res[j])){
#    print(ida_plot_univar(b_bact, paste("t_",variables[j],sep="")))
#  }
}
## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 59 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 3407 rows containing missing values (geom_point).

## Warning: Removed 6332 rows containing missing values (geom_point).

## Warning: Removed 54 rows containing missing values (geom_point).

## Warning: Removed 188 rows containing missing values (geom_point).

## Warning: Removed 82 rows containing missing values (geom_point).

## Warning: Removed 3333 rows containing missing values (geom_point).

## Warning: Removed 6181 rows containing missing values (geom_point).

## Warning: Removed 68 rows containing missing values (geom_point).

## Warning: Removed 177 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 46 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 28 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 5 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).

17.3.4 Comparison of univariate distributions with and without pseudo-log transformation

The comparison is only shown for variables where a transformation is suggested.

for(j in 1:length(unique.variables)){
#  print(ida_plot_univar_orig_vs_trans(b_bact, unique.variables[j], sigma=res[j], n_bars=100))
 if(!is.na(res[j])){
   print(ida_plot_univar_orig_vs_trans(b_bact, unique.variables[j], sigma=res[j], n_bars=100))
 }
}
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 91 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 8 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 55 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_bar).
## Warning: Removed 188 rows containing missing values (geom_point).

## Warning: Removed 3437 rows containing missing values (geom_point).
## Warning: Removed 3417 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 6396 rows containing missing values (geom_point).
## Warning: Removed 6306 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 57 rows containing missing values (geom_point).
## Warning: Removed 57 rows containing missing values (geom_point).

## Warning: Removed 188 rows containing missing values (geom_point).
## Warning: Removed 222 rows containing missing values (geom_point).

## Warning: Removed 3361 rows containing missing values (geom_point).
## Warning: Removed 3381 rows containing missing values (geom_point).

## Warning: Removed 5993 rows containing missing values (geom_point).
## Warning: Removed 6175 rows containing missing values (geom_point).

## Warning: Removed 76 rows containing missing values (geom_point).
## Warning: Removed 87 rows containing missing values (geom_point).

## Warning: Removed 193 rows containing missing values (geom_point).
## Warning: Removed 204 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 32 rows containing missing values (geom_point).
## Warning: Removed 22 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_bar).
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).
## Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_point).

17.4 Section session info

## R version 4.1.3 (2022-03-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Austria.1252  LC_CTYPE=English_Austria.1252   
## [3] LC_MONETARY=English_Austria.1252 LC_NUMERIC=C                    
## [5] LC_TIME=English_Austria.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] Hmisc_4.6-0     Formula_1.2-4   survival_3.2-13 lattice_0.20-45
##  [5] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.8     purrr_0.3.4    
##  [9] readr_2.1.2     tidyr_1.2.0     tibble_3.1.6    ggplot2_3.3.5  
## [13] tidyverse_1.3.1 here_1.0.1     
## 
## loaded via a namespace (and not attached):
##  [1] fs_1.5.2            lubridate_1.8.0     RColorBrewer_1.1-2 
##  [4] httr_1.4.2          rprojroot_2.0.2     tools_4.1.3        
##  [7] backports_1.4.1     bslib_0.3.1         utf8_1.2.2         
## [10] R6_2.5.1            rpart_4.1.16        DBI_1.1.2          
## [13] colorspace_2.0-3    nnet_7.3-17         withr_2.5.0        
## [16] tidyselect_1.1.2    gridExtra_2.3       compiler_4.1.3     
## [19] cli_3.2.0           rvest_1.0.2         htmlTable_2.4.0    
## [22] xml2_1.3.3          labeling_0.4.2      bookdown_0.25      
## [25] sass_0.4.1          scales_1.1.1        checkmate_2.0.0    
## [28] digest_0.6.29       foreign_0.8-82      rmarkdown_2.13     
## [31] base64enc_0.1-3     jpeg_0.1-9          pkgconfig_2.0.3    
## [34] htmltools_0.5.2     highr_0.9           dbplyr_2.1.1       
## [37] fastmap_1.1.0       htmlwidgets_1.5.4   rlang_1.0.2        
## [40] readxl_1.3.1        rstudioapi_0.13     jquerylib_0.1.4    
## [43] generics_0.1.2      farver_2.1.0        jsonlite_1.8.0     
## [46] magrittr_2.0.2      patchwork_1.1.1     Matrix_1.4-0       
## [49] Rcpp_1.0.8.3        munsell_0.5.0       fansi_1.0.3        
## [52] lifecycle_1.0.1     stringi_1.7.6       yaml_2.3.5         
## [55] grid_4.1.3          crayon_1.5.1        haven_2.4.3        
## [58] splines_4.1.3       hms_1.1.1           knitr_1.38         
## [61] pillar_1.7.0        reprex_2.0.1        glue_1.6.2         
## [64] evaluate_0.15       latticeExtra_0.6-29 data.table_1.14.2  
## [67] modelr_0.1.8        png_0.1-7           vctrs_0.3.8        
## [70] tzdb_0.2.0          cellranger_1.1.0    gtable_0.3.0       
## [73] assertthat_0.2.1    xfun_0.30           broom_0.7.12       
## [76] cluster_2.1.2       ellipsis_0.3.2