Chapter 9 Introduction to NHANES
To demonstrate the workflow and content of IDA, we created a hypothetical research aim and corresponding statistical analysis plan, which is described in more detail in the section nhanes_IDAP.
Hypothetical research aim for IDA
is to develop a multivariable model for MVPA (minutes of moderate/vigorous physical activity).
MVPA can be used to examine factors distinguishing very active participants with large amounts of time spent on MVPA from others (using untransformed data) or distinguishing participants according to percentage changes in MVPA (logarithmic scale) thus de-emphasizing extreme values.
9.1 NHANES Dataset Description
The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey examines a nationally representative sample of non-institutionalized US civilians using a multistage probability sampling design that considers geographical area and minority representation. Sample weights are generated to create nationally representative estimates for the US population and subgroups defined by age, sex, and race/ethnicity. Link to CDC NHANES website. NHANES collects data on various health and behavior indicators, including physical activity and self‐reported diagnosis of prevalent health conditions such as diabetes mellitus, coronary artery disease, stroke, and cancer.
Physical activity was measured with a waist‐worn uniaxial accelerometer (AM‐7164; ActiGraph) for up to 7 days. Participants were asked to wear the devie while awake except when simming or bathing. Data were cleaned according to calibration specification and nonwear time defined by an interval of at least 60 consecutive minutes of zero activity intensity counts. Days with fewer than 10 hours of wear time were excluded and participants with at least 1 valid day of accelerometer data were included in the analysis. Mean counts per minute were calculated by dividing the sum of activity counts for a valid day by the number of minutes of wear time in that day across all valid days (Troiano 2008).
Moderate or vigorous intensity was based on count thresholds. Time spent in such activities was determined by summing minutes in a day where the count met the criterion for that intensity (Troiano 2008).
The NHANES 2003–2004 and 2005–2006 have a total of 14,631 participants with accelerometry data. Participants aged 30 to 85 at the time they wore the accelerometer are included. Other inclusion criteria are in line with the choices for the prediction model of 5 year mortality presented by Smirnova et al, J Gerontol A Biol Sci Med Sci 2020. The preparation of the data was based on “Organizing and Analyzing the Activity Data in NHANES” Leroux et al, Statistics in Biosciences 2019. High quality processed activity data combined with mortality and demographic information can be downloaded and used in R with code from Andrew Leroux (https://andrew-leroux.github.io/rnhanesdata/articles).
9.1.1 Variables
9.1.1.1 Outcome variable
MVPA (total minutes of moderate/vigorous physical activity which is defined as more than 2020 counts per minute) (mvpa
, minutes)
9.1.1.2 Sociodemographic variables
- age at examination (i.e. when participants wore the device) (
age
, years) - gender (
gender
, “Male” and “Female”) - race/ethnicity (non-Hispanic “White”, non-Hispanic “Black”, “Mexican American”, and “Other”)
- education (“Less than high school”, “High school” (high school graduate/general educational development [GED]), “More than high school” (some college, and college graduate)) (
educationadult
) - Person Months of Follow-up from MEC/Exam Date (
permth.exm
) (follow-up time in this cohort in years = permth.exm/12)
9.1.1.3 Health and behavior variables
- smoking status (Current, Former [those reporting quitting within the previous 6 months], and Never) (
smokecigs
) - alcohol consumption (
drinkstatus
) (Non-Drinker, Moderate Drinker, Heavy Drinker, Missing alcohol) - bmi (
bmi
, kg/m2) - obesity (
bmi.cat
, No-Yes) - diabetes (
diabetes
) - congestive heart failure (
chf, No-Yes
) - cancer (
cancer
, No-Yes) - stroke (
stroke
, No-Yes) - average systolic blood pressure using the 4 measurements per participant (
sys
, mmHg) - Total cholesterol (
lbxtc
, mg/dL) - HDL cholesterol (
lbdhdd
, mg/dL)
9.1.1.4 Physical activity data
Summary measures are calculated due to the large size of minute level accelerometer-derived physical activity data.
- total activity counts per day (
TAC/d
) - total log activity count (
TLAC
log(1+TAC)) - total minutes of moderate/vigorous physical activity (
MVPA
) - total accelerometer wear time (
WT
) - total log activity count summary measures (
tlac.1
,tlac.2
, …,tlac.12
) in each 2-hr window, i.e. 12AM-2AM, 2AM-4AM, 4AM-6AM, etc.
9.2 NHANES dataset contents
9.2.1 Source dataset
We refer to the source data set as the dataset available online here
9.2.2 Data dictionary
Additional meta-data is added to the original source data set. We write this new modified data set back to the data folder after adding additional meta-data (units, labels).
Input object size: 1196032 bytes; 31 variables 5972 observations New object size: 1196032 bytes; 31 variables 5972 observations
Data frame:a_nhanes
5972 observations and 31 variables, maximum # NAs:395Name | Labels | Units | Levels | Class | Storage | NAs |
---|---|---|---|---|---|---|
seqn | respondent sequence number | integer | integer | 0 | ||
age | age | years | numeric | double | 0 | |
gender | gender | 2 | integer | 0 | ||
educationadult | education level | 3 | integer | 4 | ||
smokecigs | smoking status | 3 | integer | 2 | ||
drinkstatus | alcohol consumption | 4 | integer | 0 | ||
alcohol | alcohol consumption | integer | integer | 395 | ||
bmi | body mass index | kg/m2 | numeric | double | 44 | |
diabetes | diabetes | 2 | integer | 0 | ||
chf | congestive heart failure | 2 | integer | 0 | ||
cancer | cancer | 2 | integer | 0 | ||
stroke | stroke | 2 | integer | 0 | ||
sys | Systolic blood pressure | mg/dl | integer | integer | 274 | |
lbxtc | Total cholesterol | mg/dl | integer | integer | 230 | |
lbdhdd | HDL cholesterol | mg/dl | integer | integer | 230 | |
tac | total activity counts per day | numeric | double | 0 | ||
tlac | total log activity count (log(1+activity)) | numeric | double | 0 | ||
mvpa | Moderate or vigorous physical activity | minutes | numeric | double | 0 | |
wt | total accelerometer wear time | minutes | numeric | double | 0 | |
tlac.1 | total log actvity count 12:00AM-2:00AM | numeric | double | 0 | ||
tlac.2 | total log actvity count 2:00AM-4:00AM | numeric | double | 0 | ||
tlac.3 | total log actvity count 4:00AM-6:00AM | numeric | double | 0 | ||
tlac.4 | total log actvity count 6:00AM-8:00AM | numeric | double | 0 | ||
tlac.5 | total log actvity count 8:00AM-10:00AM | numeric | double | 0 | ||
tlac.6 | total log actvity count 10:00AM-12:00PM | numeric | double | 0 | ||
tlac.7 | total log actvity count 12:00PM-2:00PM | numeric | double | 0 | ||
tlac.8 | total log actvity count 2:00PM-4:00PM | numeric | double | 0 | ||
tlac.9 | total log actvity count 4:00PM-6:00PM | numeric | double | 0 | ||
tlac.10 | total log actvity count 6:00PM-8:00PM | numeric | double | 0 | ||
tlac.11 | total log actvity count 8:00PM-10:00PM | numeric | double | 0 | ||
tlac.12 | total log actvity count 10:00PM-12:00AM | numeric | double | 0 |
Variable | Levels |
---|---|
gender | Male |
Female | |
educationadult | Less than high school |
High school | |
More than high school | |
smokecigs | Never |
Former | |
Current | |
drinkstatus | Moderate Drinker |
Non-Drinker | |
Heavy Drinker | |
Missing alcohol | |
diabetes, chf | No |
cancer, stroke | Yes |
9.3 References
Troiano RP, Berrigan D, Dodd KW, Mâsse LC, Tilert T, McDowell M. Physical activity in the United States measured by accelerometer. Med Sci Sports Exerc. 2008 Jan;40(1):181-8. doi: 10.1249/mss.0b013e31815a51b3. PMID: 18091006.
Leroux A, Di J, Smirnova E, Mcguffey E, Cao Q, Bayatmokhtari E, Tabacu L, Zipunnikov V, Urbanek JK, Crainiceanu C. Organizing and Analyzing the Activity Data in NHANES. Stat Biosci 11, 262–287 (2019). https://doi-org.proxy1.cl.msu.edu/10.1007/s12561-018-09229-9
Smirnova E, Leroux A, Tabacu L, Zipunnikov V, Crainiceanu C, Urbanek JK. The Predictive Performance of Objective Measures of Physical Activity Derived From Accelerometry Data for 5-Year All-Cause Mortality in Older Adults: National Health and Nutritional Examination Survey 2003–2006, The Journals of Gerontology: Series A, Volume 75, Issue 9, September 2020, Pages 1779–1785, https://doi.org/10.1093/gerona/glz193
9.4 Section session info
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Austria.1252 LC_CTYPE=English_Austria.1252
## [3] LC_MONETARY=English_Austria.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Austria.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Hmisc_4.6-0 Formula_1.2-4 survival_3.2-13 lattice_0.20-45
## [5] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4
## [9] readr_2.1.2 tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5
## [13] tidyverse_1.3.1 here_1.0.1
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.2 sass_0.4.1 jsonlite_1.8.0
## [4] splines_4.1.3 modelr_0.1.8 bslib_0.3.1
## [7] assertthat_0.2.1 latticeExtra_0.6-29 cellranger_1.1.0
## [10] yaml_2.3.5 pillar_1.7.0 backports_1.4.1
## [13] glue_1.6.2 digest_0.6.29 checkmate_2.0.0
## [16] RColorBrewer_1.1-2 rvest_1.0.2 colorspace_2.0-3
## [19] htmltools_0.5.2 Matrix_1.4-0 pkgconfig_2.0.3
## [22] broom_0.7.12 haven_2.4.3 bookdown_0.25
## [25] scales_1.1.1 jpeg_0.1-9 tzdb_0.2.0
## [28] htmlTable_2.4.0 generics_0.1.2 ellipsis_0.3.2
## [31] withr_2.5.0 nnet_7.3-17 cli_3.2.0
## [34] magrittr_2.0.2 crayon_1.5.1 readxl_1.3.1
## [37] evaluate_0.15 fs_1.5.2 fansi_1.0.3
## [40] xml2_1.3.3 foreign_0.8-82 data.table_1.14.2
## [43] tools_4.1.3 hms_1.1.1 lifecycle_1.0.1
## [46] munsell_0.5.0 reprex_2.0.1 cluster_2.1.2
## [49] compiler_4.1.3 jquerylib_0.1.4 rlang_1.0.2
## [52] grid_4.1.3 rstudioapi_0.13 htmlwidgets_1.5.4
## [55] base64enc_0.1-3 rmarkdown_2.13 gtable_0.3.0
## [58] DBI_1.1.2 R6_2.5.1 gridExtra_2.3
## [61] lubridate_1.8.0 knitr_1.38 fastmap_1.1.0
## [64] utf8_1.2.2 rprojroot_2.0.2 stringi_1.7.6
## [67] Rcpp_1.0.8.3 vctrs_0.3.8 rpart_4.1.16
## [70] png_0.1-7 dbplyr_2.1.1 tidyselect_1.1.2
## [73] xfun_0.30