Chapter 12 Univariate distribution checks

This section reports a series of univariate summary checks of the NHANES dataset.

12.1 Data set overview

Using the Hmisc describe function, we provide an overview of the data set. The descriptive report also provides histograms of continuous variables. For ease of scanning the information, we group the report by measurement type.

12.1.1 Demographic and lifestyle variables

Demographic and lifestyle variables

6 Variables   5972 Observations

ageyears
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
59720660154.8717.6432.1734.5841.6553.7567.2576.5080.83
lowest : 30.00000 30.08333 30.16667 30.25000 30.33333 , highest: 84.58333 84.66667 84.75000 84.83333 84.91667
gender
nmissingdistinct
597202
 Value        Male Female
 Frequency    2935   3037
 Proportion  0.491  0.509
 

educationadult: education level
image
nmissingdistinct
596843
 Value      Less than high school           High school More than high school
 Frequency                   1683                  1448                  2837
 Proportion                 0.282                 0.243                 0.475
 

bmi: body mass index kg/m2
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5928442161129.076.80420.7822.1224.7328.0832.2337.0740.80
lowest : 13.36 14.65 14.70 15.91 15.92 , highest: 62.50 62.77 63.42 63.87 130.21
smokecigs: smoking status
image
nmissingdistinct
597023
 Value        Never  Former Current
 Frequency     2911    1759    1300
 Proportion   0.488   0.295   0.218
 

alcohol: alcohol consumption
image
nmissingdistinctInfoMeanGmd
557739530.7761.5160.624
 Value          1     2     3
 Frequency   3090  2098   389
 Proportion 0.554 0.376 0.070
 

12.1.2 Physiological measurements

Lab measurements

3 Variables   5972 Observations

sys: Systolic blood pressure mg/dl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
56982741371127.422.26100.0105.0113.0124.0138.0154.0166.1
lowest : 73 80 81 83 85 , highest: 226 230 238 256 270
lbxtc: Total cholesterol mg/dl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
57422302641204.146.4143155175201228258277
lowest : 82 83 85 92 94 , highest: 431 440 458 539 650
lbdhdd: HDL cholesterol mg/dl
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5742230109154.6417.9133374352647685
lowest : 17 22 23 24 25 , highest: 146 151 152 154 188

12.1.3 Comorbidities

Comorbidities

4 Variables   5972 Observations

diabetes
nmissingdistinct
597202
 Value         No   Yes
 Frequency   5214   758
 Proportion 0.873 0.127
 

chf: congestive heart failure
nmissingdistinct
597202
 Value         No   Yes
 Frequency   5739   233
 Proportion 0.961 0.039
 

cancer
nmissingdistinct
597202
 Value         No   Yes
 Frequency   5359   613
 Proportion 0.897 0.103
 

stroke
nmissingdistinct
597202
 Value        No  Yes
 Frequency  5734  238
 Proportion 0.96 0.04
 

12.1.4 Physical activity variables

Physical activity

16 Variables   5972 Observations

tac: total activity counts per day
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972059651244811143738 69233 94872150571223572314224417410486450
lowest : 8263.000 8931.833 12123.000 14642.000 15656.000
highest: 981517.167 986261.000 986593.8001097823.5001122542.600

tlac: total log activity count (log(1+activity))
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
59720596912900873.51613190023852911343138774164
lowest : 313.0835 364.4561 400.8157 429.9288 466.0362
highest:5436.15485492.53955588.34015655.46806122.6779

mvpa: Moderate or vigorous physical activity minutes
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
597201163119.1920.9 0.800 1.429 4.00012.00026.76246.00059.921
lowest : 0.0000000 0.1428571 0.1666667 0.2000000 0.2500000
highest:180.8333333186.2000000194.8000000208.5000000249.0000000

wt: total accelerometer wear time minutes
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972036131866.1139.8 684.3 721.0 782.9 852.1 922.01000.61111.5
lowest : 600.000 601.500 602.000 603.000 604.000 , highest: 1425.286 1426.250 1426.286 1426.857 1440.000
tlac.1: total log actvity count 12:00AM-2:00AM
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972026560.82930.9251.83 0.00 0.00 0.00 0.00 24.38 94.43169.25
lowest : 0.0000000 0.1569446 0.1831020 0.2299197 0.2559656
highest:597.3808309620.0469233674.1677375709.3300116719.0239316

tlac.2: total log actvity count 2:00AM-4:00AM
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972017700.65319.0934.47 0.00 0.00 0.00 0.00 2.91 51.83110.64
lowest : 0.00000000 0.09902103 0.11552453 0.15694461 0.23104906
highest:586.34967162611.00545824617.44773130737.25383394775.42871350

tlac.3: total log actvity count 4:00AM-6:00AM
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972028340.85543.2970.78 0.00 0.00 0.00 0.00 38.74147.59248.43
lowest : 0.0000000 0.1155245 0.1386294 0.2299197 0.2682397
highest:679.1484297697.1093552704.5766819719.3198459769.6014301

tlac.4: total log actvity count 6:00AM-8:00AM
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972052850.998177178.6 0.00 0.00 36.94137.34282.09416.35496.25
lowest : 0.0000000 0.2299197 0.3465736 0.6148132 0.6839274
highest:774.8811640792.6938042822.1482092832.9933042857.9018816

tlac.5: total log actvity count 8:00AM-10:00AM
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972058341339.3191.7 39.52102.56221.28346.74460.18552.17610.19
lowest : 0.0000000 0.2310491 0.7250248 0.8652549 1.0357837
highest:812.0225306812.8675420813.2942210824.5800445888.1759271

tlac.6: total log actvity count 10:00AM-12:00PM
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972059311407.7163.6150.4218.6316.2415.0506.7589.9634.9
lowest : 0.0000000 0.6986213 2.6001909 4.5903937 5.7234361
highest:807.7712473808.7247458811.5701740884.1169241892.0314653

tlac.7: total log actvity count 12:00PM-2:00PM
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972059471418146.9192.1250.4337.6423.5507.2581.3623.7
lowest : 0.000000 1.734669 2.704424 5.605670 6.387910
highest:788.370472796.082067813.380498821.733575885.445891

tlac.8: total log actvity count 2:00PM-4:00PM
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972059541411.7147.8192.1243.1323.6414.3501.7577.5619.9
lowest : 0.000000 1.974752 3.096473 4.094345 5.772020
highest:792.683985837.042353846.553847877.212734904.872351

tlac.9: total log actvity count 4:00PM-6:00PM
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972059551397140.3185.4234.8316.4401.8483.5553.6591.4
lowest : 0.000000 2.957040 3.401197 4.148165 5.084134
highest:771.497952783.128869801.039991809.429425822.294800

tlac.10: total log actvity count 6:00PM-8:00PM
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972059321337.6151.3114.1165.5246.6339.5433.1504.4548.9
lowest : 0.000000 1.311822 1.353699 1.753975 3.459493
highest:778.168243778.774433802.020060851.421446860.123328

tlac.11: total log actvity count 8:00PM-10:00PM
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
5972057861223.2158.2 10.22 42.32116.77212.72315.90411.75471.84
lowest : 0.0000000 0.6229449 0.6708919 1.0233141 1.0525597
highest:724.9040071753.8848070821.4989318826.3463412839.8942777

tlac.12: total log actvity count 10:00PM-12:00AM
image
        n  missing distinct     Info     Mean      Gmd      .05      .10      .25 
     5972        0     4943    0.995    95.37    114.3    0.000    0.000    6.693 
      .50      .75      .90      .95 
   55.438  141.863  251.308  328.945 
 
lowest : 0.00000000 0.09902103 0.17328680 0.27798716 0.41291025
highest:683.58618305698.46723961702.66304648707.15487443733.61717206

12.2 Categorical variables

We now provide a closer visual examination of the categorical predictors.

12.3 Continuous variables

A closer visual examination of continuous predictors and the outcome variable.

There is evidence of influential points in some of the distributions. This is explored further with targeted summaries. A more detailed univariate summaries for the variables of interest are also provided below.

12.3.1 Age

Distribution of age

Figure 12.1: Distribution of age

12.3.2 Blood pressure

Distribution of SBP

Figure 7.1: Distribution of SBP

12.3.3 Body mass index

Distribution of respiratory rate

Figure 7.2: Distribution of respiratory rate

There is a participant with an unusual high value (130.2). It is possible that this is an entry error (bmi=30.2).

12.4 Section session info

## R version 4.1.3 (2022-03-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Austria.1252  LC_CTYPE=English_Austria.1252   
## [3] LC_MONETARY=English_Austria.1252 LC_NUMERIC=C                    
## [5] LC_TIME=English_Austria.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] Hmisc_4.6-0     Formula_1.2-4   survival_3.2-13 lattice_0.20-45
##  [5] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.8     purrr_0.3.4    
##  [9] readr_2.1.2     tidyr_1.2.0     tibble_3.1.6    ggplot2_3.3.5  
## [13] tidyverse_1.3.1 here_1.0.1     
## 
## loaded via a namespace (and not attached):
##  [1] fs_1.5.2            lubridate_1.8.0     RColorBrewer_1.1-2 
##  [4] httr_1.4.2          rprojroot_2.0.2     tools_4.1.3        
##  [7] backports_1.4.1     bslib_0.3.1         utf8_1.2.2         
## [10] R6_2.5.1            rpart_4.1.16        DBI_1.1.2          
## [13] colorspace_2.0-3    nnet_7.3-17         withr_2.5.0        
## [16] tidyselect_1.1.2    gridExtra_2.3       compiler_4.1.3     
## [19] cli_3.2.0           rvest_1.0.2         htmlTable_2.4.0    
## [22] xml2_1.3.3          labeling_0.4.2      bookdown_0.25      
## [25] sass_0.4.1          scales_1.1.1        checkmate_2.0.0    
## [28] digest_0.6.29       foreign_0.8-82      rmarkdown_2.13     
## [31] base64enc_0.1-3     jpeg_0.1-9          pkgconfig_2.0.3    
## [34] htmltools_0.5.2     highr_0.9           dbplyr_2.1.1       
## [37] fastmap_1.1.0       htmlwidgets_1.5.4   rlang_1.0.2        
## [40] readxl_1.3.1        rstudioapi_0.13     jquerylib_0.1.4    
## [43] generics_0.1.2      farver_2.1.0        jsonlite_1.8.0     
## [46] magrittr_2.0.2      patchwork_1.1.1     Matrix_1.4-0       
## [49] Rcpp_1.0.8.3        munsell_0.5.0       fansi_1.0.3        
## [52] lifecycle_1.0.1     stringi_1.7.6       yaml_2.3.5         
## [55] grid_4.1.3          crayon_1.5.1        haven_2.4.3        
## [58] splines_4.1.3       hms_1.1.1           knitr_1.38         
## [61] pillar_1.7.0        reprex_2.0.1        glue_1.6.2         
## [64] evaluate_0.15       latticeExtra_0.6-29 data.table_1.14.2  
## [67] modelr_0.1.8        png_0.1-7           vctrs_0.3.8        
## [70] tzdb_0.2.0          cellranger_1.1.0    gtable_0.3.0       
## [73] assertthat_0.2.1    xfun_0.30           broom_0.7.12       
## [76] cluster_2.1.2       ellipsis_0.3.2