IDA report
Overview
1
IDA Framework
2
Scope of the regression analyses for the examples
3
Data screening and possible actions
3.1
Univariate distributions
3.2
Bivariate distributions
3.3
Missing values
I CRASH-2
4
Introduction to CRASH-2
4.1
CRASH-2 Description
4.2
Crash2 dataset contents
4.2.1
Source dataset
4.2.2
Updated analysis dataset
4.3
Section session info
5
Statistical analysis plan
5.1
Outcome variable
5.2
Statistical methods
5.3
Remarks
5.4
Preparations
5.5
Sources
5.5.1
Data dictionary
5.6
References
6
Missing data
6.1
Per variable missingness
6.2
Missingness patterns over variables
6.3
(In)complete cases
6.4
Section session info
7
Univariate distribution checks
7.1
Data set overview
7.1.1
Demographic variables
7.1.2
Physiological measurements
7.1.3
Characteristics of injury
7.2
Categorical variables
7.2.1
Categorical ordinal plots
7.3
Continuous variables
7.3.1
Age
7.3.2
Blood pressure
7.3.3
Respiratory rate
7.3.4
Heart rate
7.3.5
Central capillary refill time
7.3.6
Hours since injury
7.4
Section session info
8
Multivariate distributions
8.1
Overview
8.1.1
Variable correlation
8.1.2
Variable clustering
8.1.3
Variable redundancy
8.2
Summary reports by sex
8.2.1
Overall
8.2.2
Distribution of age by sex
8.2.3
Distribution of systolic blood pressure by sex
8.2.4
Distribution of heart rate by sex
8.2.5
Distribution of respiratory rate by sex
8.2.6
Distribution of central capillary refille time by sex
8.2.7
Distribution of hours since injury by sex
8.2.8
Distribution of Glasgow coma score by sex
8.2.9
Distribution of injury type by sex
8.3
Summary reports by age
8.3.1
Distribution of systolic blood pressure by age categories
8.3.2
Distribution of heart rate by age categories
8.3.3
Distribution of respiratory rate by age categories
8.3.4
Distribution of central capillary refille time by age categories
8.3.5
WIP: multivariate scatter plots
8.3.6
WIP: Scatter plots with a third or fourth variable
8.4
Summary reports by Glasgow coma score
8.4.1
Distribution of age by Glasgow coma score
8.4.2
Distribution of systolic blood pressure by Glasgow coma score
8.4.3
Distribution of heart rate by Glasgow coma score
8.4.4
Distribution of respiratory rate by Glasgow coma score
8.4.5
Distribution of central capillary refille time by Glasgow coma score
8.5
Section session info
National Health and Nutrition Examination Survey (NHANES)
9
Introduction to NHANES
9.1
NHANES Dataset Description
9.1.1
Variables
9.2
NHANES dataset contents
9.2.1
Source dataset
9.2.2
Data dictionary
9.3
References
9.4
Section session info
10
Initial data analysis plan (IDAP)
10.1
Initial data analysis strategy
10.1.1
IDA domain: missing values
10.1.2
IDA domain: univariate distributions
10.1.3
IDA domain: multivariate system of variables
11
Missing data
11.1
Per variable missingness
11.2
Variable summaries for complete vs incomplete cases
11.3
Missingness patterns over variables
11.4
(In)complete cases
11.5
Section session info
12
Univariate distribution checks
12.1
Data set overview
12.1.1
Demographic and lifestyle variables
12.1.2
Physiological measurements
12.1.3
Comorbidities
12.1.4
Physical activity variables
12.2
Categorical variables
12.3
Continuous variables
12.3.1
Age
12.3.2
Blood pressure
12.3.3
Body mass index
12.3.4
Outcome= Time of moderate or vigrous physical activity and related variables
12.4
Section session info
13
Multivariate distributions
13.1
Overview
13.1.1
Variable correlation
13.1.2
Variable clustering
13.1.3
Variable redundancy
13.2
Summary reports by pivotal covariates age and gender
13.2.1
Distribution of age by gender
13.3
Summary report by age group and gender
13.3.1
Summary report by gender
13.3.2
Summary report by age group for men
13.3.3
Summary report by age group for women
13.4
Continuous variables by age and gender
13.4.1
Distribution of systolic blood pressure
13.4.2
Distribution of BMI
13.4.3
Distribution of wear time
13.5
Physical activity data (outcome)
13.5.1
Distribution of MVPA
13.5.2
Distribution of MVPA and Total log activity count by time of day
13.6
Section session info
Bacteremia
14
Introduction to Bacteremia
14.1
Dataset Description
14.2
Bacteremia dataset contents
14.2.1
Source dataset
14.2.2
Updated analysis dataset
14.3
Section session info
15
IDA plan
15.1
Preparations
15.2
IDA domains
15.2.1
IDA domain: missing values
15.2.2
IDA domain: univariate distributions
15.2.3
IDA domain: multivariate system of variables
15.3
Sources
15.3.1
Data dictionary
15.4
References
16
Missing data
16.1
Per variable missingness
16.2
Missingness patterns over variables
16.3
Section session info
17
Univariate distribution checks
17.1
Data set overview
17.1.1
Demographic variables
17.1.2
Pivotal variables and very important predictors
17.1.3
Further variables related to leukocyte types and leukocyte ratios
17.1.4
Kidney function related variables
17.1.5
Acute phase reaction related variables
17.1.6
Remaining variables
17.2
Categorical variables
17.3
Continuous variables
17.3.1
Suggested transformations
17.3.2
Univariate distribution with variables using the original variable and the suggested transformations
17.3.3
Univariate distribution with variables using only the original variable without the suggested transformations
17.3.4
Comparison of univariate distributions with and without pseudo-log transformation
17.4
Section session info
18
Multivariate distributions
18.1
Overview
18.1.1
Variable correlation
18.1.2
Distribution of age by sex
18.1.3
Distribution of leukocytes by age, coloured by sex
18.1.4
Plot all variables vs. WBC in age/sex groups
18.1.5
Plot all variables vs. WBC in age/sex groups: loess curves only
18.2
Variable redundancy
18.2.1
Redundancy among very important predictors
18.2.2
Redundancy among leukocyte-related variables
18.2.3
Redundancy among all potential predictors
18.3
Section session info
Published with bookdown
Regression without regrets
Bacteremia