Demographic Trends in Recreational Substance Use

Research Question

How does recreational substance use vary across age and sex?

A person’s relationship with substance use is dynamic and may change throughout their life. Researchers are often interested in exploring how different groups of people use recreational drugs. Recently, there have been increased concerns about the interaction of substance use and mental health issues for youth1. Other reports have described increased binge drinking and marijuana use among middle aged adults2 and substance use disorders among older adults3. While men are more likely to use substances than women4, stigmatization contributes to women being less likely to connect with substance use treatment resources5. Overall, understanding how different groups of people interact with recreational drugs helps medical and public health professionals better support these impacted groups.

In this analysis, I am interested in further exploring how substance use varies across age and sex in the United States. To do so, I will be using survey data from the National Survey on Drug Use and Health (NSDUH). The intended audience of this analysis is anyone interested in public health and in trends related to substance use.

Data Source

The Substance Abuse and Mental Health Services Administration (SAMHSA) administers the NSDUH. The agency focuses on behavioral health and was established by congress in 1992. As noted in the SAMHSA’s Strategic Plan6, the agency’s top five priorities are:

  • Preventing Overdose

  • Enhancing Access to Suicide Prevention and Crisis Care

  • Promoting Resilience and Emotional Health for Children, Youth and Families

  • Integrating Behavioral and Physical Health Care

  • Strengthening the Behavioral Health Workforce

I used data from the 2021 National Survey of Drug Use and Health (NSDUH). The NSDUH has been conducted by the U.S. government since 1971. This survey explores questions related to substance use, mental health, and other health concerns. I downloaded the data via this link. The associated data dictionary can be found here.

Data Wrangling

Throughout this analysis, I used data collected about consumption of cigarettes, cigars, alcohol, marijuana, cocaine, heroin, hallucinogens, inhalants, methamphetamine, and inhalants. While I focused on these recreational drugs in this analysis, the NSDUH explores different types of substances, including prescription medications, as well as specific substances within a class of drugs, such as LSD.

I primarily used tidyverse functions to clean and transform my data. I have added comments to the included code to describe my steps.

Code
# download the haven package to use the stata file
library(haven)
# download the tidyverse package
library(tidyverse)

# importing the raw dataset
nsduh_2021_raw <- read_dta("NSDUH_2021.DTA")

# creating a vector of columns I want to keep from the raw dataset
keep_columns <- c("AGE3", "irsex", "cigever", "cigtry", "ircigrc", "cigarevr", "cigartry", "ircgrrc", "alcever", "alctry", "iralcrc", "mjever", "mjage", "irmjrc", "cocever", "cocage", "ircocrc", "herever", "herage", "irherrc", "hallucevr", "hallucage", "irhallucrec", "methamevr", "methamage", "irmethamrec", "inhalever", "inhalage", "irinhalrec")

# creating a vector of the variables that I need to be transformed to factors
factors <- c("age", "sex", "cig_ever", "cigar_ever", "alc_ever", "mj_ever", "coc_ever", "her_ever", "halluc_ever", "meth_ever", "inhal_ever", "cig_lastyear", "cigar_lastyear", "alc_lastyear", "mj_lastyear", "coc_lastyear", "her_lastyear", "halluc_lastyear", "meth_lastyear", "inhal_lastyear")

# cleaning the raw data to create new dataset
nsduh_2021_v1 <- nsduh_2021_raw %>%
# selecting desired variables
  select(keep_columns) %>%
# create numerical variable for age groups
  mutate(agenum = case_when(
      AGE3 == 1 ~ 12.5,
      AGE3 == 2 ~ 14.5,
      AGE3 == 3 ~ 16.5,
      AGE3 == 4 ~ 19,
      AGE3 == 5 ~ 22,
      AGE3 == 6 ~ 24.5,
      AGE3 == 7 ~ 27.5,
      AGE3 == 8 ~ 32,
      AGE3 == 9 ~ 42,
      AGE3 == 10 ~ 57,
      AGE3 == 11 ~ 70,
      TRUE ~ NA
    ),
    # create categorical variable for age groups
    age = case_when(
    AGE3 == 1 ~ "12 to 13",
    AGE3 == 2 ~ "14 to 15",
    AGE3 == 3 ~ "16 to 17",
    AGE3 == 4 ~ "18 to 20",
    AGE3 == 5 ~ "21 to 23",
    AGE3 == 6 ~ "24 to 25",
    AGE3 == 7 ~ "26 to 29",
    AGE3 == 8 ~ "30 to 34",
    AGE3 == 9 ~ "35 to 49",
    AGE3 == 10 ~ "50 to 64",
    AGE3 == 11 ~ "65+",
    TRUE ~ NA),
# create categorical variable for sex
    sex = case_when(
      irsex == 1 ~ "Male",
      irsex == 2 ~ "Female",
      TRUE ~ NA),
# create categorical variable for whether respondent has ever used cigarettes
    cig_ever = case_when(
      cigever == 1 ~ "Yes",
      cigever == 2 ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent has ever used cigars
    cigar_ever = case_when(
      cigarevr == 1 ~ "Yes",
      cigarevr == 2 ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent has ever used alcohol
    alc_ever = case_when(
      alcever == 1 ~ "Yes",
      alcever == 2 ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent has ever used marijuana
    mj_ever = case_when(
      mjever == 1 ~ "Yes",
      mjever == 2 ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent has ever used cocaine
    coc_ever = case_when(
      cocever == 1 ~ "Yes",
      cocever == 2 ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent has ever used heroin
    her_ever = case_when(
      herever == 1 ~ "Yes",
      herever == 2 ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent has ever used hallucinogens
    halluc_ever = case_when(
      hallucevr == 1 ~ "Yes",
      hallucevr == 91 ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent has ever used methamphetamine
    meth_ever = case_when(
      methamevr == 1 ~ "Yes",
      methamevr == 2 ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent has ever used inhalants
    inhal_ever = case_when(
      inhalever == 1 ~ "Yes",
      inhalever == 91 ~ "No",
      TRUE ~ NA),
# create numeric variable for age when first tried cigarettes
    cig_tryage = ifelse(
      cigtry >= 1 & cigtry <= 100, cigtry, NA),
# create numeric variable for age when first tried cigars
    cigar_tryage = ifelse(
      cigartry >= 1 & cigartry <= 100, cigartry, NA),
# create numeric variable for age when first tried alcohol
    alc_tryage = ifelse(
      alctry >= 1 & alctry <= 100, alctry, NA),
# create numeric variable for age when first tried cocaine
    coc_tryage = ifelse(
      cocage >= 1 & cocage <= 100, cocage, NA),
# create numeric variable for age when first tried hallucinogens
    halluc_tryage = ifelse(
      hallucage >= 1 & hallucage <= 100, hallucage, NA),
# create numeric variable for age when first tried heroin
    her_tryage = ifelse(
      herage >= 1 & herage <= 100, herage, NA),
# create numeric variable for age when first tried methamphetamine
    meth_tryage = ifelse(
      methamage >= 1 & methamage <= 100, methamage, NA),
# create numeric variable for age when first tried marijuana
    mj_tryage = ifelse(
      mjage >= 1 & mjage <= 100, mjage, NA),
# create numeric variable for age when first tried inhalants
    inhal_tryage = ifelse(
      inhalage >= 1 & inhalage <= 100, inhalage, NA),
# create categorical variable for whether respondent had cigarettes in last year
    cig_lastyear = case_when(
      ircigrc %in% c(1, 2) ~ "Yes",
      ircigrc %in% c(3, 4, 9) ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent had cigars in last year
    cigar_lastyear = case_when(
      ircgrrc %in% c(1, 2) ~ "Yes",
      ircgrrc %in% c(3, 4, 9) ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent had alcohol in last year
    alc_lastyear = case_when(
      iralcrc %in% c(1, 2) ~ "Yes",
      iralcrc %in% c(3, 4, 9) ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent had cocaine in last year
    coc_lastyear = case_when(
      ircocrc %in% c(1, 2) ~ "Yes",
      ircocrc %in% c(3, 9) ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent had hallucinogens in last year
    halluc_lastyear = case_when(
      irhallucrec %in% c(1, 2) ~ "Yes",
      irhallucrec %in% c(3, 9) ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent had heroin in last year
    her_lastyear = case_when(
      irherrc %in% c(1, 2) ~ "Yes",
      irherrc %in% c(3, 9) ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent had methamphetamine in last year
    meth_lastyear = case_when(
      irmethamrec %in% c(1, 2) ~ "Yes",
      irmethamrec %in% c(3, 9) ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent had marijuana in last year
    mj_lastyear = case_when(
      irmjrc %in% c(1, 2) ~ "Yes",
      irmjrc %in% c(3, 9) ~ "No",
      TRUE ~ NA),
# create categorical variable for whether respondent had inhalants in last year
    inhal_lastyear = case_when(
      irinhalrec %in% c(1, 2) ~ "Yes",
      irinhalrec %in% c(3, 9) ~ "No",
      TRUE ~ NA),
# transforms listed varialbes to factors
    across(factors, as.factor)
    )

Analysis and Results

Substances Ever Used Across Age

How does any lifetime drug use vary across age?

I used survey data asking respondents whether they had ever used a substance in their life. Fewer people have tried substances traditionally thought of as “harder drugs,” like heroin and meth, than legal substances, like alcohol and cigarettes. In older age groups, there are wider gaps in exposure across the different substances. In addition, there are trends related to certain substances being more popular among certain age groups. Inhalants are more popular relative to other substances in younger age groups. In the 30 to 34 age group and those younger, marijuana is more likely to have been tried than cigarettes, but this trend is reversed in older age groups.

Note about Age Groups

Please note that these age categories are not of equal size. These are the same age categories used in the NSDUH survey. The dashed lines connecting the points are meant to help illustrate trends across these age groups, but not to imply continuous age data.

Code
# loading a package for other color options
library(paletteer)

substance_ever <- nsduh_2021_v1 %>%
  filter(across(c(cig_ever, cigar_ever, alc_ever, mj_ever, coc_ever, her_ever, halluc_ever, meth_ever, inhal_ever), ~ !is.na(.))) %>%
  pivot_longer(cols = c(cig_ever, cigar_ever, alc_ever, mj_ever, coc_ever, her_ever, halluc_ever, meth_ever, inhal_ever), names_to = "Substance", values_to = "Ever_Tried") %>%
  group_by(age, Substance, Ever_Tried) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(age, Substance) %>%
  mutate(Percentage = sum(Count[Ever_Tried == "Yes"]) / sum(Count) * 100)

substance_ever$Substance <- factor(substance_ever$Substance)

ggplot(substance_ever, aes(x = age, y = Percentage, color = Substance, group = Substance)) +
  geom_point(size = 2.5) +
  geom_path(linetype = 2) +
  labs(title = "Percentage of Respondents Who Have Ever Tried Substance",
       subtitle = "This visualization is color coded by substance and compared across age group. In the NSDUH, \nage is recorded as a categorical variables using a set of age groups, which vary in the amount of \nyears included. The age group interval lengths are narrower in adolescence as compared to \nyounger and older years.",
       x = "Age Group (years)",
       y = "Percentage of Respondents (%)",
       caption = "Source: NSDUH 2021") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  +
  scale_colour_manual(values = paletteer::paletteer_d("ggprism::stained_glass"),
                      labels = c(
    cig_ever = "Cigarettes",
    cigar_ever = "Cigars",
    alc_ever = "Alcohol",
    mj_ever = "Marijuana",
    coc_ever = "Cocaine",
    her_ever = "Heroin",
    halluc_ever = "Hallucinogens",
    meth_ever = "Meth",
    inhal_ever = "Inhalants"
  ))

Substances Ever Used across Sex

How does any lifetime drug use vary across age and sex?

This analysis uses the same survey questions as the previous section, though the responses are grouped by sex, which was recorded as a binary variable in the survey data. In most cases, the percentage of male participants who have experimented with these substances in greater than the percentage of female participants. This gender gap in lifetime substance use experience appears to be larger in older age groups.

Warning about Y-axis Scaling

Please note that the scales on the y-axis are not identical across the faceted plots for the next two visualizations. Because some substances are used much more widely than other substances, I chose to adjust the y-axis’ scale in order to highlight the use differences across sex.

Code
substance_ever_sex <- nsduh_2021_v1 %>%
  filter(across(c(cig_ever, cigar_ever, alc_ever, mj_ever, coc_ever, her_ever, halluc_ever, meth_ever, inhal_ever), ~ !is.na(.))) %>%
  pivot_longer(cols = c(cig_ever, cigar_ever, alc_ever, mj_ever, coc_ever, her_ever, halluc_ever, meth_ever, inhal_ever), names_to = "Substance", values_to = "Ever_Tried") %>%
  group_by(age, sex, Substance, Ever_Tried) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(age, sex, Substance) %>%
  mutate(Percentage = sum(Count[Ever_Tried == "Yes"]) / sum(Count) * 100,
         substance_label = case_when(
           Substance == "cig_ever" ~ "Cigarettes",
           Substance == "cigar_ever" ~ "Cigars",
           Substance == "alc_ever" ~ "Alcohol",
           Substance == "mj_ever" ~ "Marijuana",
           Substance == "coc_ever" ~ "Cocaine",
           Substance == "her_ever" ~ "Heroin",
           Substance == "halluc_ever" ~ "Hallucinogen",
           Substance == "meth_ever" ~ "Methamphetamine",
           Substance == "inhal_ever" ~ "Inhalants",
           TRUE ~ NA
         ))

substance_ever_sex$Substance <- factor(substance_ever_sex$Substance)

ggplot(substance_ever_sex, aes(x = age, y = Percentage, color = sex, group = sex)) +
  geom_point() +
  geom_path() +
  labs(title = "Percentage of Respondents Who Have Ever Tried Substance",
       subtitle = "This visualization is faceted by substance and color coded by sex. The y-axis scales are not \nidentical and are fit to reflect the scale most appropriate for that specific substance.",
       x = "Age Group (years)",
       y = "Percentage of Respondents (%)",
       color = "Sex:",
       caption = "Source: NSDUH 2021") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7)) +
facet_wrap(~ substance_label, scales = "free_y", ncol = 3) +
scale_color_manual(values = c("Female" = "purple", "Male" = "darkgreen")) +
theme(legend.position = "top")

Recent Substance Use Across Age and Sex

How does recent drug use vary across age and sex?

In this section, I used survey data about more recent substance use, as compared to any lifetime use. I chose to define “recent use” as happening in the last 12 months across all substances. Similar to the responses related to any lifetime use in my earlier visualization, male respondents are overall more likely to use these substances than female participants.

Code
proportion_lastyear <- nsduh_2021_v1 %>%
  pivot_longer(cols = c(cig_lastyear, cigar_lastyear, alc_lastyear, mj_lastyear, coc_lastyear, her_lastyear, halluc_lastyear, meth_lastyear, inhal_lastyear), names_to = "Substance", values_to = "Last_Year") %>%
  filter(!is.na(Last_Year)) %>%
  group_by(age, sex, Substance, Last_Year) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(age, sex, Substance) %>%
  mutate(Percentage = sum(Count[Last_Year == "Yes"]) / sum(Count) * 100,
         substance_label = case_when(
           Substance == "cig_lastyear" ~ "Cigarettes",
           Substance == "cigar_lastyear" ~ "Cigars",
           Substance == "alc_lastyear" ~ "Alcohol",
           Substance == "mj_lastyear" ~ "Marijuana",
           Substance == "coc_lastyear" ~ "Cocaine",
           Substance == "her_lastyear" ~ "Heroin",
           Substance == "halluc_lastyear" ~ "Hallucinogen",
           Substance == "meth_lastyear" ~ "Methamphetamine",
           Substance == "inhal_lastyear" ~ "Inhalants",
           TRUE ~ NA
         ))

ggplot(proportion_lastyear, aes(x = age, y = Percentage, fill = sex, group = sex)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Percentage of Respondents with Recent Substance Use",
    subtitle = "Recent substance use is defined as within the last 12 months. This visualization is faceted by \nsubstance and color coded by sex. The y-axis scales are not identical and are fit to reflect the \nscale most appropriate for that specific substance.",
    x = "Age Group (years)",
    y = "Percentage of Respondents (%)",
    fill = "Sex:",
    caption = "Source: NSDUH 2021") +
  theme_minimal() +
scale_fill_manual(values = c("Female" = "purple", "Male" = "darkgreen")) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7)) +
  facet_wrap(~ substance_label, scales = "free_y", ncol = 3) +
theme(legend.position = "top") 

Age of Substance Use Initiation

When are people most likely to try a drug for the first time?

To explore this question, I used survey data asking respondents about the age of their first use of a substance. While this data was numeric with integers for age, I chose to instead create age group categories. For nearly all of the substances, the age group 15-19 is the most common time when people use a drug for the first time. In general, substances generally considered “harder drugs” are initiated at older ages than drugs that do not have this same connotation.

Code
median_Substance_order <- nsduh_2021_v1 %>%
  pivot_longer(cols = c(cig_tryage, cigar_tryage, alc_tryage, mj_tryage, coc_tryage, her_tryage, halluc_tryage, meth_tryage, inhal_tryage), names_to = "Substance", values_to = "Try_Age") %>%
  group_by(Substance) %>%
  summarise(Median_Try_Age = median(Try_Age, na.rm = TRUE)) %>%
  arrange(Median_Try_Age) %>%
  pull(Substance)

custom_labels <- c(
  "cig_tryage" = "Cigarettes",
  "cigar_tryage" = "Cigars",
  "alc_tryage" = "Alcohol",
  "mj_tryage" = "Marijuana",
  "coc_tryage" = "Cocaine",
  "her_tryage" = "Heroin",
  "halluc_tryage" = "Hallucinogens",
  "meth_tryage" = "Meth",
  "inhal_tryage" = "Inhalants"
)

proportion_data <- nsduh_2021_v1 %>%
  pivot_longer(cols = c(cig_tryage, cigar_tryage, alc_tryage, mj_tryage, coc_tryage, her_tryage, halluc_tryage, meth_tryage, inhal_tryage), names_to = "Substance", values_to = "Try_Age") %>%
  filter(!is.na(Try_Age)) %>%
  mutate(Try_Age_Group = case_when(
    Try_Age >= 0 & Try_Age < 5 ~ "0-4",
    Try_Age >= 5 & Try_Age < 10 ~ "5-9",
    Try_Age >= 10 & Try_Age < 15 ~ "10-14",
    Try_Age >= 15 & Try_Age < 20 ~ "15-19",
    Try_Age >= 20 & Try_Age < 25 ~ "20-24",
    Try_Age >= 25 & Try_Age < 30 ~ "25-29",
    Try_Age >= 30 & Try_Age < 35 ~ "30-34",
    Try_Age >= 35 & Try_Age < 40 ~ "35-39",
    Try_Age >= 40 & Try_Age < 45 ~ "40-44",
    Try_Age >= 45 & Try_Age < 50 ~ "45-49",
    Try_Age >= 50 & Try_Age < 55 ~ "50-54",
    Try_Age >= 55 ~ "55+",
    TRUE ~ "Other"
  )) %>%
  mutate(Try_Age_Group = factor(Try_Age_Group, levels = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55+", "Other"))) %>%
  group_by(Substance, Try_Age_Group) %>%
  count() %>%
  ungroup() %>%
  group_by(Substance) %>%
  mutate(proportion = n / sum(n) * 100) %>%
  ungroup()

ggplot(proportion_data, aes(x = Try_Age_Group, y = factor(Substance, levels = median_Substance_order), fill = proportion)) +
  geom_tile(width = 0.9, height = 0.9, color = "white") +
  geom_text(aes(label = sprintf("%.1f%%", proportion)), vjust = 1, color = "white", size = 2.5) +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  theme_minimal() +
  labs(
    title = "Age When Respondent First Used Substance",
    subtitle = "Substances are organized by median age of substance use initiation, with greater \nmedian age at the top of the plot. The percentage on each tile reflects the percentage \nof respondents who have ever used that substance and first used it during that age \nrange.",
    x = "Age of Substance Use Initiation (years)",
    y = "Substance",
    fill = "Percentage \nof Respondents (%)",
    caption = "Source: NSDUH 2021"
  ) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_y_discrete(labels = custom_labels) 

Summary

As represented in the NSDUH survey data, substance use varies widely across age and sex. There are gender and age gaps in the number of respondents who have ever used and recently used these substances. Some of these trends are unique to specific substances. In general, respondents were most likely to have first tried a substance in their later teenage years. One possible limitation of this analysis is the use of survey data, which relies on self-reported substance use. This analysis is only the tip of the iceberg in exploring demographic differences in substance use. I am so excited to continue learning about epidemiology and statistical analyses so that I will become better equipped to continue exploring these research questions.

Functions used

dyplr: filter, select, summarise, mutate, group_by, ungroup, arrange, case_when

tidyr: pivot_longer

ggplot2: geom_point, geom_path, geom_bar, geom_tile, geom_text

References

1.
Panchal, N., Rudowitz, R. & Cox, C. Recent Trends in Mental Health and Substance Use Concerns Among Adolescents. Kaiser Family Foundation (2022).
2.
3.
4.
5.
Krug, M. Women less likely to seek substance use treatment due to stigma, logistics Penn State University. https://www.psu.edu/news/social-science-research-institute/story/women-less-likely-seek-substance-use-treatment-due-stigma/ (2023).
6.
SAMHSA’s 2023-2026 Strategic Plan. Substance Abuse and Mental Health Services Administration (2023).