Identify Enrollment Periods of BPH Ever Users

The cohort identified in find_treated.Rmd is everyone who ever took one of the study drugs (TZ, DZ, AZ, tamsulosin, or a 5ARI). We want to impose three additional restrictions: male sex, age >= 40 at index date, and at least one year of lookback to identify new users.

Jacob Simmering, PhD (University of Iowa)

This file was compiled on 2022-05-27 16:28:31 by jsimmeri on argon-lc-g14-35.hpc.

We had previously computed the ever users in find_rx_events.Rmd. We use that pre-computed data to reduce the number of enrollments and demographic rows that we need to return.

ever_users <- read_rds("/Shared/lss_jsimmeri_backup/data/pdd/bph_rx/ever_users.rds")

Pulling Demographics

Demographic data for Truven enrollees are stored in /Shared/Statepi_Marketscan/databases/Truven/truven_demographics.db. We start by creating a connection to the database.

db_demo <- DBI::dbConnect(RSQLite::SQLite(),

Then we pull back the rows matching one of our users. We are only interested in male users to reduce unobserved heterogenity and so we require female == 0 as well.

demographics <- tbl(db_demo, "demographics") %>%
  filter(female == 0) %>%
  filter(enrolid %in% local(ever_users$enrolid)) %>%
  select(enrolid, dobyr) %>%

And close the connection.


Pulling Enrollments

Enrollment data is stored in /Shared/Statepi_Marketscan/databases/Truven/truven_enrollments_corrected.db. Connect to that database:

db_enrollment <- DBI::dbConnect(RSQLite::SQLite(),

Since we are interested in medication exposures, we want to require the enrollee to have prescription drug coverage to be included by requiring rx == 1. We are also only interested in collecting enrollments on the male ever users and so use demographics$enrolid as a filtering step.

enrollments <- tbl(db_enrollment, "truven_enrollments") %>%
  filter(rx == 1) %>%
  filter(enrolid %in% local(demographics$enrolid)) %>%
  select(enrolid, start_date, end_date) %>%

And closing the connection:


The resulting enrollments tibble has one row per enrollment period. New rows are started when any of the other variables changed - e.g., MSA, state. We are not interested in those variables and so we want to “merge” or “collapse” these periods if contiguous.

We do this by sorting the data by enrolid and then start_date. We compare the current row to the prior row to see if either the enrolid value has changed or if the start of the current enrollment period is later than and non-contiguous with the end of the prior period. If either of these values is true, we create a new period.

collapsed_enrollments <- enrollments %>%
    arrange(enrolid, start_date) %>%
      break_enrolid = enrolid != lag(enrolid),
      break_time = start_date > lag(end_date + 1)
    ) %>%
      break_any = break_enrolid | break_time,
      break_any = replace_na(break_any, TRUE)

We then calculate the cumulative sum of the break_any variable to make a period ID. We group by enrolid and period, which uniquely identify a collection of contiguous enrollment phases, and find the earliest and latest date in that collection.

collapsed_enrollments <- collapsed_enrollments %>%
      period = cumsum(break_any)
    ) %>%
    group_by(enrolid, period) %>%
      start_date = min(start_date),
      end_date = max(end_date),
      .groups = "drop"
    ) %>%
    select(enrolid, start_date, end_date)

The resulting tibble has one row per contiguous enrollment period. If an enrollee has several non-contiguous enrollment periods, that enrollee will have multiple rows.

Write Data

Save the data we extracted for later use.

write_rds(demographics, "/Shared/lss_jsimmeri_backup/data/pdd/bph_rx/demographics.rds")
write_rds(collapsed_enrollments, "/Shared/lss_jsimmeri_backup/data/pdd/bph_rx/enrollments.rds")

Session Info

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.4     purrr_0.3.4    
[5] readr_1.4.0     tidyr_1.1.2     tibble_3.0.6    ggplot2_3.3.3  
[9] tidyverse_1.3.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0  xfun_0.21         haven_2.3.1      
 [4] colorspace_2.0-0  vctrs_0.3.6       generics_0.1.0   
 [7] htmltools_0.5.1.1 yaml_2.2.1        blob_1.2.1       
[10] rlang_0.4.10      pillar_1.4.7      withr_2.4.1      
[13] glue_1.4.2        DBI_1.1.1         bit64_4.0.5      
[16] dbplyr_2.1.0      modelr_0.1.8      readxl_1.3.1     
[19] lifecycle_1.0.0   munsell_0.5.0     gtable_0.3.0     
[22] cellranger_1.1.0  rvest_0.3.6       memoise_2.0.0    
[25] evaluate_0.14     knitr_1.31        fastmap_1.1.0    
[28] ps_1.5.0          fansi_0.4.2       broom_0.7.4      
[31] Rcpp_1.0.6        backports_1.2.1   scales_1.1.1     
[34] cachem_1.0.4      jsonlite_1.7.2    bit_4.0.4        
[37] fs_1.5.0          distill_1.2       hms_1.0.0        
[40] digest_0.6.27     stringi_1.5.3     grid_4.0.4       
[43] cli_2.3.0         tools_4.0.4       magrittr_2.0.1   
[46] RSQLite_2.2.3     crayon_1.4.1      pkgconfig_2.0.3  
[49] downlit_0.2.1     ellipsis_0.3.1    xml2_1.3.2       
[52] reprex_1.0.0      lubridate_1.7.9.2 assertthat_0.2.1 
[55] rmarkdown_2.6     httr_1.4.2        rstudioapi_0.13  
[58] R6_2.5.0          compiler_4.0.4