Chapter 9 Introduction to NHANES

To demonstrate the workflow and content of IDA, we created a hypothetical research aim and corresponding statistical analysis plan, which is described in more detail in the section nhanes_IDAP.

Hypothetical research aim for IDA
is to develop a multivariable model for MVPA (minutes of moderate/vigorous physical activity).

MVPA can be used to examine factors distinguishing very active participants with large amounts of time spent on MVPA from others (using untransformed data) or distinguishing participants according to percentage changes in MVPA (logarithmic scale) thus de-emphasizing extreme values.

9.1 NHANES Dataset Description

The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey examines a nationally representative sample of non-institutionalized US civilians using a multistage probability sampling design that considers geographical area and minority representation. Sample weights are generated to create nationally representative estimates for the US population and subgroups defined by age, sex, and race/ethnicity. Link to CDC NHANES website. NHANES collects data on various health and behavior indicators, including physical activity and self‐reported diagnosis of prevalent health conditions such as diabetes mellitus, coronary artery disease, stroke, and cancer.

Physical activity was measured with a waist‐worn uniaxial accelerometer (AM‐7164; ActiGraph) for up to 7 days. Participants were asked to wear the devie while awake except when simming or bathing. Data were cleaned according to calibration specification and nonwear time defined by an interval of at least 60 consecutive minutes of zero activity intensity counts. Days with fewer than 10 hours of wear time were excluded and participants with at least 1 valid day of accelerometer data were included in the analysis. Mean counts per minute were calculated by dividing the sum of activity counts for a valid day by the number of minutes of wear time in that day across all valid days (Troiano 2008).

Moderate or vigorous intensity was based on count thresholds. Time spent in such activities was determined by summing minutes in a day where the count met the criterion for that intensity (Troiano 2008).

The NHANES 2003–2004 and 2005–2006 have a total of 14,631 participants with accelerometry data. Participants aged 30 to 85 at the time they wore the accelerometer are included. Other inclusion criteria are in line with the choices for the prediction model of 5 year mortality presented by Smirnova et al, J Gerontol A Biol Sci Med Sci 2020. The preparation of the data was based on “Organizing and Analyzing the Activity Data in NHANES” Leroux et al, Statistics in Biosciences 2019. High quality processed activity data combined with mortality and demographic information can be downloaded and used in R with code from Andrew Leroux (https://andrew-leroux.github.io/rnhanesdata/articles).

9.1.1 Variables

9.1.1.1 Outcome variable

MVPA (total minutes of moderate/vigorous physical activity which is defined as more than 2020 counts per minute) (mvpa, minutes)

9.1.1.2 Sociodemographic variables

  • age at examination (i.e. when participants wore the device) (age, years)
  • gender (gender, “Male” and “Female”)
  • race/ethnicity (non-Hispanic “White”, non-Hispanic “Black”, “Mexican American”, and “Other”)
  • education (“Less than high school”, “High school” (high school graduate/general educational development [GED]), “More than high school” (some college, and college graduate)) (educationadult)
  • Person Months of Follow-up from MEC/Exam Date (permth.exm) (follow-up time in this cohort in years = permth.exm/12)

9.1.1.3 Health and behavior variables

  • smoking status (Current, Former [those reporting quitting within the previous 6 months], and Never) (smokecigs)
  • alcohol consumption (drinkstatus) (Non-Drinker, Moderate Drinker, Heavy Drinker, Missing alcohol)
  • bmi (bmi, kg/m2)
  • obesity (bmi.cat, No-Yes)
  • diabetes (diabetes)
  • congestive heart failure (chf, No-Yes)
  • cancer (cancer, No-Yes)
  • stroke (stroke, No-Yes)
  • average systolic blood pressure using the 4 measurements per participant (sys, mmHg)
  • Total cholesterol (lbxtc, mg/dL)
  • HDL cholesterol (lbdhdd, mg/dL)

9.1.1.4 Physical activity data

Summary measures are calculated due to the large size of minute level accelerometer-derived physical activity data.

  • total activity counts per day (TAC/d)
  • total log activity count (TLAC log(1+TAC))
  • total minutes of moderate/vigorous physical activity (MVPA)
  • total accelerometer wear time (WT)
  • total log activity count summary measures (tlac.1, tlac.2, …, tlac.12) in each 2-hr window, i.e. 12AM-2AM, 2AM-4AM, 4AM-6AM, etc.

9.2 NHANES dataset contents

9.2.1 Source dataset

We refer to the source data set as the dataset available online here

9.2.2 Data dictionary

Additional meta-data is added to the original source data set. We write this new modified data set back to the data folder after adding additional meta-data (units, labels).

Input object size: 1196032 bytes; 31 variables 5972 observations New object size: 1196032 bytes; 31 variables 5972 observations


Data frame:a_nhanes

5972 observations and 31 variables, maximum # NAs:395  
NameLabelsUnitsLevelsClassStorageNAs
seqnrespondent sequence numberintegerinteger 0
ageageyearsnumericdouble 0
gendergender2integer 0
educationadulteducation level3integer 4
smokecigssmoking status3integer 2
drinkstatusalcohol consumption4integer 0
alcoholalcohol consumptionintegerinteger395
bmibody mass indexkg/m2numericdouble 44
diabetesdiabetes2integer 0
chfcongestive heart failure2integer 0
cancercancer2integer 0
strokestroke2integer 0
sysSystolic blood pressuremg/dlintegerinteger274
lbxtcTotal cholesterolmg/dlintegerinteger230
lbdhddHDL cholesterolmg/dlintegerinteger230
tactotal activity counts per daynumericdouble 0
tlactotal log activity count (log(1+activity))numericdouble 0
mvpaModerate or vigorous physical activityminutesnumericdouble 0
wttotal accelerometer wear timeminutesnumericdouble 0
tlac.1total log actvity count 12:00AM-2:00AMnumericdouble 0
tlac.2total log actvity count 2:00AM-4:00AMnumericdouble 0
tlac.3total log actvity count 4:00AM-6:00AMnumericdouble 0
tlac.4total log actvity count 6:00AM-8:00AMnumericdouble 0
tlac.5total log actvity count 8:00AM-10:00AMnumericdouble 0
tlac.6total log actvity count 10:00AM-12:00PMnumericdouble 0
tlac.7total log actvity count 12:00PM-2:00PMnumericdouble 0
tlac.8total log actvity count 2:00PM-4:00PMnumericdouble 0
tlac.9total log actvity count 4:00PM-6:00PMnumericdouble 0
tlac.10total log actvity count 6:00PM-8:00PMnumericdouble 0
tlac.11total log actvity count 8:00PM-10:00PMnumericdouble 0
tlac.12total log actvity count 10:00PM-12:00AMnumericdouble 0

VariableLevels
genderMale
Female
educationadultLess than high school
High school
More than high school
smokecigsNever
Former
Current
drinkstatusModerate Drinker
Non-Drinker
Heavy Drinker
Missing alcohol
diabetes, chfNo
 cancer, strokeYes

9.3 References

Troiano RP, Berrigan D, Dodd KW, Mâsse LC, Tilert T, McDowell M. Physical activity in the United States measured by accelerometer. Med Sci Sports Exerc. 2008 Jan;40(1):181-8. doi: 10.1249/mss.0b013e31815a51b3. PMID: 18091006.

Leroux A, Di J, Smirnova E, Mcguffey E, Cao Q, Bayatmokhtari E, Tabacu L, Zipunnikov V, Urbanek JK, Crainiceanu C. Organizing and Analyzing the Activity Data in NHANES. Stat Biosci 11, 262–287 (2019). https://doi-org.proxy1.cl.msu.edu/10.1007/s12561-018-09229-9

Smirnova E, Leroux A, Tabacu L, Zipunnikov V, Crainiceanu C, Urbanek JK. The Predictive Performance of Objective Measures of Physical Activity Derived From Accelerometry Data for 5-Year All-Cause Mortality in Older Adults: National Health and Nutritional Examination Survey 2003–2006, The Journals of Gerontology: Series A, Volume 75, Issue 9, September 2020, Pages 1779–1785, https://doi.org/10.1093/gerona/glz193

9.4 Section session info

## R version 4.1.3 (2022-03-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Austria.1252  LC_CTYPE=English_Austria.1252   
## [3] LC_MONETARY=English_Austria.1252 LC_NUMERIC=C                    
## [5] LC_TIME=English_Austria.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] Hmisc_4.6-0     Formula_1.2-4   survival_3.2-13 lattice_0.20-45
##  [5] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.8     purrr_0.3.4    
##  [9] readr_2.1.2     tidyr_1.2.0     tibble_3.1.6    ggplot2_3.3.5  
## [13] tidyverse_1.3.1 here_1.0.1     
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2          sass_0.4.1          jsonlite_1.8.0     
##  [4] splines_4.1.3       modelr_0.1.8        bslib_0.3.1        
##  [7] assertthat_0.2.1    latticeExtra_0.6-29 cellranger_1.1.0   
## [10] yaml_2.3.5          pillar_1.7.0        backports_1.4.1    
## [13] glue_1.6.2          digest_0.6.29       checkmate_2.0.0    
## [16] RColorBrewer_1.1-2  rvest_1.0.2         colorspace_2.0-3   
## [19] htmltools_0.5.2     Matrix_1.4-0        pkgconfig_2.0.3    
## [22] broom_0.7.12        haven_2.4.3         bookdown_0.25      
## [25] scales_1.1.1        jpeg_0.1-9          tzdb_0.2.0         
## [28] htmlTable_2.4.0     generics_0.1.2      ellipsis_0.3.2     
## [31] withr_2.5.0         nnet_7.3-17         cli_3.2.0          
## [34] magrittr_2.0.2      crayon_1.5.1        readxl_1.3.1       
## [37] evaluate_0.15       fs_1.5.2            fansi_1.0.3        
## [40] xml2_1.3.3          foreign_0.8-82      data.table_1.14.2  
## [43] tools_4.1.3         hms_1.1.1           lifecycle_1.0.1    
## [46] munsell_0.5.0       reprex_2.0.1        cluster_2.1.2      
## [49] compiler_4.1.3      jquerylib_0.1.4     rlang_1.0.2        
## [52] grid_4.1.3          rstudioapi_0.13     htmlwidgets_1.5.4  
## [55] base64enc_0.1-3     rmarkdown_2.13      gtable_0.3.0       
## [58] DBI_1.1.2           R6_2.5.1            gridExtra_2.3      
## [61] lubridate_1.8.0     knitr_1.38          fastmap_1.1.0      
## [64] utf8_1.2.2          rprojroot_2.0.2     stringi_1.7.6      
## [67] Rcpp_1.0.8.3        vctrs_0.3.8         rpart_4.1.16       
## [70] png_0.1-7           dbplyr_2.1.1        tidyselect_1.1.2   
## [73] xfun_0.30