Chapter 13 Multivariate distributions

This section reports a series of multivariate summaries of the NHANES dataset.

13.1 Overview

13.1.1 Variable correlation

Below we further examine the association between age and systolic blood pressure stratified by pivotal variables.

Next, we depict differences in the correlation coefficients (Spearman vs Pearson) in a heat map:

Correlations of the physical activity variables (outcome)

13.1.2 Variable clustering

Variable clustering is used for assessing collinearity, redundancy, and for separating variables into clusters that can be scored as a single variable, thus resulting in data reduction.

13.1.3 Variable redundancy

Redundancy analysis of predictor variables.

## 
## Redundancy Analysis
## 
## Hmisc::redun(formula = ~age + gender + bmi + sys + lbxtc + lbdhdd + 
##     smokecigs + diabetes + chf + cancer + stroke, data = a_nhanes)
## 
## n: 5467  p: 11   nk: 3 
## 
## Number of NAs:    505 
## Frequencies of Missing Values Due to Each Variable
##       age    gender       bmi       sys     lbxtc    lbdhdd smokecigs  diabetes 
##         0         0        44       274       230       230         2         0 
##       chf    cancer    stroke 
##         0         0         0 
## 
## 
## Transformation of target variables forced to be linear
## 
## R-squared cutoff: 0.9    Type: ordinary 
## 
## R^2 with which each variable can be predicted from all other variables:
## 
##       age    gender       bmi       sys     lbxtc    lbdhdd smokecigs  diabetes 
##     0.330     0.202     0.153     0.209     0.061     0.247     0.099     0.102 
##       chf    cancer    stroke 
##     0.080     0.079     0.057 
## 
## No redundant variables

13.2 Summary reports by pivotal covariates age and gender

13.2.1 Distribution of age by gender

Figure 7.4: Distribution of age by gender

13.3 Summary report by age group and gender

13.3.1 Summary report by gender

Baseline characteristics by gender.
N
Male
N=2935
Female
N=3037
age
years
5972 42.1 54.3 68.4
55.4 ± 15.3
41.2 53.0 66.2
54.3 ± 15.3
body mass index
kg/m2
5928 24.99 27.85 31.24
28.59 ±  5.69
24.44 28.32 33.33
29.52 ±  7.03
education level : Less than high school 5968 0.29 856/2932 0.27 827/3036
  High school 0.24 710/2932 0.24 738/3036
  More than high school 0.47 1366/2932 0.48 1471/3036
Systolic blood pressure
mg/dl
5698 115.0 125.0 137.0
127.7 ±  18.3
111.0 123.0 139.0
127.1 ±  22.4
Total cholesterol
mg/dl
5742 172.0 198.0 224.0
200.4 ±  41.6
178.0 205.0 231.0
207.6 ±  42.7
HDL cholesterol
mg/dl
5742 40.0 46.0 56.0
49.0 ± 13.8
48.0 58.0 70.0
60.1 ± 17.1
smoking status : Never 5970 0.38 1119/2934 0.59 1792/3036
  Former 0.36 1061/2934 0.23 698/3036
  Current 0.26 754/2934 0.18 546/3036
alcohol consumption : 1 5577 0.60 1645/2755 0.51 1445/2822
  2 0.31 860/2755 0.44 1238/2822
  3 0.09 250/2755 0.05 139/2822
diabetes : Yes 5972 0.13 383/2935 0.12 375/3037
congestive heart failure : Yes 5972 0.05 138/2935 0.03 95/3037
cancer : Yes 5972 0.10 280/2935 0.11 333/3037
stroke : Yes 5972 0.04 122/2935 0.04 116/3037
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.

13.3.2 Summary report by age group for men

Baseline characteristics by age group for men.
N
30-44
N=916
45-59
N=794
60-74
N=845
75+
N=380
body mass index
kg/m2
2919 24.90 27.91 31.49
28.96 ±  6.82
25.18 27.96 31.31
28.65 ±  5.28
25.49 28.16 31.79
28.80 ±  5.11
24.20 26.65 29.72
27.14 ±  4.33
education level : Less than high school 2932 0.24 220/916 0.21 168/793 0.37 311/844 0.41 157/379
  High school 0.26 236/916 0.25 201/793 0.23 194/844 0.21 79/379
  More than high school 0.50 460/916 0.53 424/793 0.40 339/844 0.38 143/379
Systolic blood pressure
mg/dl
2824 112.0 119.0 129.0
120.9 ±  12.5
115.0 123.0 134.0
126.0 ±  17.1
120.0 131.0 145.0
133.3 ±  19.8
120.0 133.0 147.0
135.5 ±  22.2
Total cholesterol
mg/dl
2838 177.0 199.0 229.8
204.2 ±  42.6
178.0 204.0 230.0
206.3 ±  41.1
169.0 194.0 222.0
196.9 ±  41.0
159.0 185.0 211.8
187.2 ±  37.6
HDL cholesterol
mg/dl
2838 39.0 45.0 54.0
47.7 ± 13.6
40.0 47.0 57.0
49.5 ± 14.1
40.0 46.0 57.0
49.2 ± 13.5
41.0 47.0 57.0
50.7 ± 14.3
smoking status : Never 2934 0.50 454/916 0.39 308/794 0.28 234/845 0.32 123/379
  Former 0.19 177/916 0.28 224/794 0.52 436/845 0.59 224/379
  Current 0.31 285/916 0.33 262/794 0.21 175/845 0.08 32/379
alcohol consumption : 1 2755 0.68 567/839 0.64 474/746 0.53 429/811 0.49 175/359
  2 0.21 178/839 0.26 196/746 0.39 316/811 0.47 170/359
  3 0.11 94/839 0.10 76/746 0.08 66/811 0.04 14/359
diabetes : Yes 2935 0.05 46/916 0.10 80/794 0.23 194/845 0.17 63/380
congestive heart failure : Yes 2935 0.01 5/916 0.03 23/794 0.08 69/845 0.11 41/380
cancer : Yes 2935 0.02 15/916 0.04 35/794 0.15 127/845 0.27 103/380
stroke : Yes 2935 0.00 4/916 0.02 14/794 0.07 59/845 0.12 45/380
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.

13.3.3 Summary report by age group for women

Baseline characteristics by age group for women.
N
30-44
N=1006
45-59
N=838
60-74
N=844
75+
N=349
body mass index
kg/m2
3009 24.06 28.11 33.33
29.29 ±  7.10
24.66 29.09 34.90
30.40 ±  7.58
24.92 28.76 33.36
29.81 ±  6.77
23.59 26.97 30.39
27.33 ±  5.36
education level : Less than high school 3036 0.21 215/1006 0.20 170/ 838 0.35 297/ 844 0.42 145/ 348
  High school 0.20 200/1006 0.24 204/ 838 0.27 230/ 844 0.30 104/ 348
  More than high school 0.59 591/1006 0.55 464/ 838 0.38 317/ 844 0.28 99/ 348
Systolic blood pressure
mg/dl
2874 104.0 111.0 120.0
113.1 ±  13.2
113.0 124.0 136.0
125.9 ±  19.4
121.0 134.0 150.0
137.0 ±  22.2
131.2 142.5 158.8
146.0 ±  24.7
Total cholesterol
mg/dl
2904 171.0 196.0 223.8
199.8 ±  42.8
181.0 207.0 231.5
209.6 ±  41.9
186.0 212.0 239.0
214.9 ±  42.7
179.0 204.0 232.0
207.6 ±  41.0
HDL cholesterol
mg/dl
2904 47.0 57.0 69.0
59.6 ± 17.2
47.0 58.0 70.0
60.3 ± 17.4
49.0 58.0 69.0
59.8 ± 16.4
49.0 60.0 73.8
62.1 ± 17.4
smoking status : Never 3036 0.65 655/1006 0.55 459/ 837 0.55 466/ 844 0.61 212/ 349
  Former 0.14 137/1006 0.23 191/ 837 0.30 255/ 844 0.33 115/ 349
  Current 0.21 214/1006 0.22 187/ 837 0.15 123/ 844 0.06 22/ 349
alcohol consumption : 1 2822 0.63 568/908 0.56 433/776 0.41 332/811 0.34 112/327
  2 0.33 299/908 0.36 283/776 0.55 450/811 0.63 206/327
  3 0.05 41/908 0.08 60/776 0.04 29/811 0.03 9/327
diabetes : Yes 3037 0.03 30/1006 0.12 104/ 838 0.21 178/ 844 0.18 63/ 349
congestive heart failure : Yes 3037 0.01 8/1006 0.02 16/ 838 0.05 40/ 844 0.09 31/ 349
cancer : Yes 3037 0.04 36/1006 0.10 84/ 838 0.14 121/ 844 0.26 92/ 349
stroke : Yes 3037 0.01 12/1006 0.03 24/ 838 0.06 47/ 844 0.09 33/ 349
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.

13.4 Continuous variables by age and gender

13.4.1 Distribution of systolic blood pressure

13.4.2 Distribution of BMI

As already noted in the univariate distributions there is a participant with an unusual high value (130.2). It is possible that this is an entry error (bmi=30.2).

13.4.3 Distribution of wear time

13.5 Physical activity data (outcome)

13.5.1 Distribution of MVPA

13.5.2 Distribution of MVPA and Total log activity count by time of day

13.6 Section session info

## R version 4.1.3 (2022-03-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Austria.1252  LC_CTYPE=English_Austria.1252   
## [3] LC_MONETARY=English_Austria.1252 LC_NUMERIC=C                    
## [5] LC_TIME=English_Austria.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gridExtra_2.3    naniar_0.6.1     ggcorrplot_0.1.3 corrplot_0.92   
##  [5] gtsummary_1.5.2  Hmisc_4.6-0      Formula_1.2-4    survival_3.2-13 
##  [9] lattice_0.20-45  plotly_4.10.0    forcats_0.5.1    stringr_1.4.0   
## [13] dplyr_1.0.8      purrr_0.3.4      readr_2.1.2      tidyr_1.2.0     
## [17] tibble_3.1.6     ggplot2_3.3.5    tidyverse_1.3.1  here_1.0.1      
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-155        fs_1.5.2            lubridate_1.8.0    
##  [4] RColorBrewer_1.1-2  httr_1.4.2          rprojroot_2.0.2    
##  [7] tools_4.1.3         backports_1.4.1     bslib_0.3.1        
## [10] utf8_1.2.2          R6_2.5.1            rpart_4.1.16       
## [13] mgcv_1.8-39         DBI_1.1.2           lazyeval_0.2.2     
## [16] colorspace_2.0-3    nnet_7.3-17         withr_2.5.0        
## [19] tidyselect_1.1.2    compiler_4.1.3      cli_3.2.0          
## [22] rvest_1.0.2         gt_0.4.0            htmlTable_2.4.0    
## [25] xml2_1.3.3          labeling_0.4.2      bookdown_0.25      
## [28] sass_0.4.1          checkmate_2.0.0     scales_1.1.1       
## [31] digest_0.6.29       foreign_0.8-82      rmarkdown_2.13     
## [34] base64enc_0.1-3     jpeg_0.1-9          pkgconfig_2.0.3    
## [37] htmltools_0.5.2     highr_0.9           dbplyr_2.1.1       
## [40] fastmap_1.1.0       htmlwidgets_1.5.4   rlang_1.0.2        
## [43] readxl_1.3.1        rstudioapi_0.13     farver_2.1.0       
## [46] jquerylib_0.1.4     generics_0.1.2      jsonlite_1.8.0     
## [49] crosstalk_1.2.0     magrittr_2.0.2      Matrix_1.4-0       
## [52] Rcpp_1.0.8.3        munsell_0.5.0       fansi_1.0.3        
## [55] visdat_0.5.3        lifecycle_1.0.1     stringi_1.7.6      
## [58] yaml_2.3.5          plyr_1.8.7          grid_4.1.3         
## [61] crayon_1.5.1        haven_2.4.3         splines_4.1.3      
## [64] hms_1.1.1           knitr_1.38          pillar_1.7.0       
## [67] reshape2_1.4.4      reprex_2.0.1        glue_1.6.2         
## [70] evaluate_0.15       latticeExtra_0.6-29 broom.helpers_1.6.0
## [73] data.table_1.14.2   modelr_0.1.8        vctrs_0.3.8        
## [76] png_0.1-7           tzdb_0.2.0          cellranger_1.1.0   
## [79] gtable_0.3.0        assertthat_0.2.1    xfun_0.30          
## [82] broom_0.7.12        viridisLite_0.4.0   cluster_2.1.2      
## [85] ellipsis_0.3.2