Balance Assessment and Survival Analysis | Tamsulosin vs 5ARI

Published

May 8, 2022

DOI

library("Rcpp", lib.loc = "~/R/x86_64-pc-linux-gnu-library/4.0/")
library(tidyverse)
library(gtsummary)
library(survival)
library(effectsize)
library(patchwork)

This file was compiled on 2022-05-09 14:51:26 by jsimmeri on argon-lc-f14-25.hpc.

The script analysis_functions.R provides:

  1. some functions for summarizing the cohort and building tables that are shared across scripts
  2. code to load the entire collection out outcomes data (which it returns as model_data to the workspace)
  3. sets root_dir according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOS
source("~/projects/pd-preprint/analysis/analysis_functions.R")

Data Load

Unadjusted Data

The tibble model_data is loaded by analysis_functions from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data contains data on all users of the three drugs. We want to reduce that here to just tamsulosin and 5ARI.

model_data <- model_data %>%
  filter(drug %in% c("5ari", "tamsulosin")) %>%
  mutate(treatment = drug == "tamsulosin")

Matches

We also want to load our matching results:

matches <- read_rds(glue::glue("{root_dir}/matches/tam_5ari.rds"))

And using the matches, build matched versions of the tamsulosin versus 5ARI model data:

matched_model_data <- model_data %>%
  select(-treatment) %>%
  inner_join(matches,
    by = "enrolid")

Balance and Cohort Description

Before Matching

describe_cohort(model_data)

Group

Characteristic

5ari, N = 79,133

tamsulosin, N = 429,741

Difference1

95% CI12

Age and Medication Start Time

Age at Index Date

62 (54, 70)

62 (56, 71)

-0.097

-0.105, -0.089

Medication Index Date

2,010.0 (2,008.0, 2,013.0)

2,012.0 (2,008.0, 2,014.0)

-0.273

-0.281, -0.266

Duration of Enrollment Time

Lookback Time (Years)

3.1 (1.8, 5.6)

3.6 (2.0, 6.3)

-0.176

-0.184, -0.168

Follow Up Time (Years)

2.23 (0.98, 4.43)

1.90 (0.86, 3.78)

0.138

0.130, 0.145

Rates of Inpatient and Outpatient Encounters

Inpatient Events Per Year

0.00 (0.00, 0.00)

0.00 (0.00, 0.21)

-0.181

-0.188, -0.173

Outpatient Events Per Year

9 (5, 16)

10 (5, 18)

-0.114

-0.122, -0.107

Mean Number of DX Codes Per Outpatient Event

1.40 (1.20, 1.87)

1.56 (1.27, 2.09)

-0.243

-0.251, -0.236

Outpatient DX Code Incidence Per Year

0.04 (0.02, 0.07)

0.04 (0.02, 0.08)

-0.157

-0.165, -0.150

mean_ndx_out_rate

13 (7, 24)

16 (9, 29)

-0.157

-0.165, -0.150

Elixhauser/AHRQ Comorbidity Flags

Alcohol Abuse

1,135 (1.4%)

11,147 (2.6%)

-0.076

-0.083, -0.068

Anemia

9,051 (11.4%)

66,743 (15.5%)

-0.115

-0.123, -0.107

Blood Loss

886 (1.1%)

6,996 (1.6%)

-0.041

-0.049, -0.034

Heart Failure

5,190 (6.6%)

41,392 (9.6%)

-0.107

-0.114, -0.099

Coagulopathy

2,302 (2.9%)

18,433 (4.3%)

-0.070

-0.077, -0.062

Depression

5,380 (6.8%)

42,310 (9.8%)

-0.105

-0.112, -0.097

Diabetes Without Complications

15,703 (19.8%)

118,841 (27.7%)

-0.177

-0.185, -0.170

Diabetes With Complications

5,009 (6.3%)

45,465 (10.6%)

-0.142

-0.150, -0.135

Drug Abuse

568 (0.7%)

6,533 (1.5%)

-0.068

-0.076, -0.061

Fluid or Electrolyte Disorders

5,812 (7.3%)

54,203 (12.6%)

-0.164

-0.171, -0.156

HIV

546 (0.7%)

1,426 (0.3%)

0.058

0.050, 0.065

HTN Without Complications

41,769 (52.8%)

266,446 (62.0%)

-0.189

-0.197, -0.181

HTN With Complications

7,828 (9.9%)

55,606 (12.9%)

-0.092

-0.100, -0.085

Hypothyroidism

6,927 (8.8%)

42,431 (9.9%)

-0.038

-0.045, -0.030

Liver Disease

2,454 (3.1%)

23,871 (5.6%)

-0.111

-0.118, -0.103

Lymphoma

829 (1.0%)

6,226 (1.4%)

-0.034

-0.042, -0.027

Metastatic Cancer

1,090 (1.4%)

10,529 (2.5%)

-0.072

-0.079, -0.064

Other Neurologic Disorders

4,419 (5.6%)

39,518 (9.2%)

-0.129

-0.136, -0.121

Obesity

4,462 (5.6%)

47,431 (11.0%)

-0.179

-0.186, -0.171

Paralysis

960 (1.2%)

10,106 (2.4%)

-0.078

-0.086, -0.071

Pulmonary Hypertension

1,521 (1.9%)

12,901 (3.0%)

-0.065

-0.073, -0.058

Psychoses

4,098 (5.2%)

31,075 (7.2%)

-0.081

-0.089, -0.073

Peptic Ulcer Disease

183 (0.2%)

1,672 (0.4%)

-0.026

-0.034, -0.019

COPD

13,975 (17.7%)

102,194 (23.8%)

-0.146

-0.154, -0.138

Peripheral Vascular Disease

8,260 (10.4%)

62,941 (14.6%)

-0.121

-0.129, -0.114

Renal Failure

3,431 (4.3%)

30,407 (7.1%)

-0.110

-0.118, -0.102

Rheumatoid Arthritis

2,785 (3.5%)

21,620 (5.0%)

-0.071

-0.078, -0.063

Solid Tumor

10,371 (13.1%)

72,259 (16.8%)

-0.101

-0.108, -0.093

Valvular Disease

9,097 (11.5%)

60,458 (14.1%)

-0.075

-0.083, -0.067

Weight Loss

2,219 (2.8%)

19,310 (4.5%)

-0.084

-0.092, -0.076

Hypotension

Orthostatic Hypotension

748 (0.9%)

5,000 (1.2%)

-0.021

-0.028, -0.013

Other Hypotension

1,451 (1.8%)

12,550 (2.9%)

-0.066

-0.074, -0.059

Prostate Specific Antigen

PSA Measurement Taken

48,034 (60.7%)

247,089 (57.5%)

0.065

0.057, 0.072

Diagnosis of Abnormal PSA

23,538 (29.7%)

69,520 (16.2%)

0.354

0.389, 0.389

Bladder Function and Urinary Flow

Slow Urinary Stream Diagnosis

1,936 (2.4%)

13,557 (3.2%)

-0.041

-0.049, -0.034

Uroflow Study Performed

5,556 (7.0%)

27,946 (6.5%)

0.021

0.013, 0.028

Cystometrogram Performed

524 (0.7%)

3,304 (0.8%)

-0.012

-0.020, -0.005

Diagnosis of BPH

35,618 (45.0%)

179,924 (41.9%)

0.064

0.056, 0.071

Other Comorbidities

Diagnosis of Anxiety

5,406 (6.8%)

38,428 (8.9%)

-0.075

-0.083, -0.068

Diagnosis of Erectile Dysfunction

7,153 (9.0%)

44,895 (10.4%)

-0.046

-0.054, -0.039

1Cohen's D

2CI = Confidence Interval

After Matching

describe_cohort(matched_model_data)

Group

Characteristic

5ari, N = 78,747

tamsulosin, N = 78,747

Difference1

95% CI12

Age and Medication Start Time

Age at Index Date

62 (54, 70)

62 (54, 71)

-0.002

-0.011, 0.008

Medication Index Date

2,010.0 (2,008.0, 2,013.0)

2,010.0 (2,008.0, 2,013.0)

-0.010

-0.020, 0.000

Duration of Enrollment Time

Lookback Time (Years)

3.15 (1.84, 5.57)

3.31 (1.92, 5.78)

-0.059

-0.069, -0.050

Follow Up Time (Years)

2.22 (0.98, 4.41)

2.23 (0.98, 4.41)

0.000

-0.009, 0.010

Rates of Inpatient and Outpatient Encounters

Inpatient Events Per Year

0.00 (0.00, 0.00)

0.00 (0.00, 0.00)

-0.016

-0.026, -0.006

Outpatient Events Per Year

9 (5, 16)

9 (5, 16)

-0.012

-0.022, -0.002

Mean Number of DX Codes Per Outpatient Event

1.41 (1.20, 1.87)

1.41 (1.20, 1.88)

-0.006

-0.015, 0.004

Outpatient DX Code Incidence Per Year

0.04 (0.02, 0.07)

0.04 (0.02, 0.07)

-0.015

-0.025, -0.005

mean_ndx_out_rate

13 (7, 24)

13 (7, 24)

-0.015

-0.025, -0.005

Elixhauser/AHRQ Comorbidity Flags

Alcohol Abuse

1,135 (1.4%)

1,197 (1.5%)

-0.007

-0.016, 0.003

Anemia

9,030 (11.5%)

9,143 (11.6%)

-0.004

-0.014, 0.005

Blood Loss

886 (1.1%)

888 (1.1%)

0.000

-0.010, 0.010

Heart Failure

5,180 (6.6%)

5,327 (6.8%)

-0.007

-0.017, 0.002

Coagulopathy

2,296 (2.9%)

2,388 (3.0%)

-0.007

-0.017, 0.003

Depression

5,370 (6.8%)

5,417 (6.9%)

-0.002

-0.012, 0.008

Diabetes Without Complications

15,690 (19.9%)

16,026 (20.4%)

-0.011

-0.021, -0.001

Diabetes With Complications

5,004 (6.4%)

5,302 (6.7%)

-0.015

-0.025, -0.005

Drug Abuse

567 (0.7%)

669 (0.8%)

-0.015

-0.025, -0.005

Fluid or Electrolyte Disorders

5,808 (7.4%)

6,022 (7.6%)

-0.010

-0.020, 0.000

HIV

520 (0.7%)

522 (0.7%)

0.000

-0.010, 0.010

HTN Without Complications

41,657 (52.9%)

41,763 (53.0%)

-0.003

-0.013, 0.007

HTN With Complications

7,807 (9.9%)

8,004 (10.2%)

-0.008

-0.018, 0.002

Hypothyroidism

6,900 (8.8%)

7,077 (9.0%)

-0.008

-0.018, 0.002

Liver Disease

2,452 (3.1%)

2,567 (3.3%)

-0.008

-0.018, 0.002

Lymphoma

825 (1.0%)

874 (1.1%)

-0.006

-0.016, 0.004

Metastatic Cancer

1,087 (1.4%)

1,165 (1.5%)

-0.008

-0.018, 0.002

Other Neurologic Disorders

4,415 (5.6%)

4,593 (5.8%)

-0.010

-0.020, 0.000

Obesity

4,461 (5.7%)

4,701 (6.0%)

-0.013

-0.023, -0.003

Paralysis

959 (1.2%)

1,005 (1.3%)

-0.005

-0.015, 0.005

Pulmonary Hypertension

1,520 (1.9%)

1,658 (2.1%)

-0.012

-0.022, -0.003

Psychoses

4,086 (5.2%)

4,166 (5.3%)

-0.005

-0.014, 0.005

Peptic Ulcer Disease

183 (0.2%)

210 (0.3%)

-0.007

-0.017, 0.003

COPD

13,956 (17.7%)

14,204 (18.0%)

-0.008

-0.018, 0.002

Peripheral Vascular Disease

8,241 (10.5%)

8,520 (10.8%)

-0.011

-0.021, -0.002

Renal Failure

3,429 (4.4%)

3,546 (4.5%)

-0.007

-0.017, 0.003

Rheumatoid Arthritis

2,783 (3.5%)

2,887 (3.7%)

-0.007

-0.017, 0.003

Solid Tumor

10,368 (13.2%)

11,007 (14.0%)

-0.024

-0.034, -0.014

Valvular Disease

9,068 (11.5%)

9,265 (11.8%)

-0.008

-0.018, 0.002

Weight Loss

2,212 (2.8%)

2,292 (2.9%)

-0.006

-0.016, 0.004

Hypotension

Orthostatic Hypotension

745 (0.9%)

766 (1.0%)

-0.003

-0.013, 0.007

Other Hypotension

1,450 (1.8%)

1,460 (1.9%)

-0.001

-0.011, 0.009

Prostate Specific Antigen

PSA Measurement Taken

47,756 (60.6%)

47,798 (60.7%)

-0.001

-0.011, 0.009

Diagnosis of Abnormal PSA

23,236 (29.5%)

23,361 (29.7%)

-0.003

-0.013, 0.006

Bladder Function and Urinary Flow

Slow Urinary Stream Diagnosis

1,934 (2.5%)

1,944 (2.5%)

-0.001

-0.011, 0.009

Uroflow Study Performed

5,524 (7.0%)

5,752 (7.3%)

-0.011

-0.021, -0.001

Cystometrogram Performed

524 (0.7%)

572 (0.7%)

-0.007

-0.017, 0.003

Diagnosis of BPH

35,390 (44.9%)

35,467 (45.0%)

-0.002

-0.012, 0.008

Other Comorbidities

Diagnosis of Anxiety

5,381 (6.8%)

5,491 (7.0%)

-0.006

-0.015, 0.004

Diagnosis of Erectile Dysfunction

7,124 (9.0%)

7,168 (9.1%)

-0.002

-0.012, 0.008

1Cohen's D

2CI = Confidence Interval

Propensity Score Overlap

We can also visually inspect the estimated propensity scores for overlap:

ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tam_5ari.rds"))
bind_rows(
  model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "Before Matching"),
  matched_model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "After Matching")
  ) %>%
  mutate(type = forcats::fct_rev(type)) %>%
  ggplot(aes(x = ptx, fill = drug, color = drug)) +
  geom_density(alpha = 0.5) + 
  labs(x = "Probability of Treatment with Tamsulosin", 
       y = "Density",
       color = "Observed Treatment",
       fill = "Observed Treatment") +
  theme_minimal() +
  theme(legend.position = "bottom") + 
  facet_wrap(~ type, nrow = 2)

It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.

Survival Analysis

Before Matching

Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.

survfit(Surv(survival_time, develops_pd) ~ treatment, 
        data = model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "treatment=TRUE", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

A finding that is statistically significant using the log rank test:

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = model_data)
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = model_data)

                     N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE  79133      765      973     44.57        54
treatment=TRUE  429741     4867     4659      9.31        54

 Chisq= 54  on 1 degrees of freedom, p= 2e-13 

The Cox regression pegs the protection at HR at 1.32 (1.22, 1.43).

raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                 data = model_data,
                 robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)
Characteristic HR1 95% CI1 p-value
as.numeric(treatment) 1.33 1.23, 1.44 <0.001

1 HR = Hazard Ratio, CI = Confidence Interval

However, this estimate is not trustworthy as we know the groups could be better balanced.

With Propensity Score Matching

Further adjustment yields similar results:

We observe the following overall incidence:

matched_model_data %>%
  group_by(drug) %>%
  summarize(
    cases = sum(develops_pd),
    time = sum(survival_time / 365),
    incidence = cases / time * 10000
  ) %>%
  knitr::kable()
drug cases time incidence
5ari 757 241922.9 31.29096
tamsulosin 846 241542.2 35.02494

With a Kaplan Meier curve:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "drug=tamsulosin", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

The population size at risk by year is given in the table below:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  select(strata, time,, n.risk) %>%
  complete(strata, time = 0:(16*365)) %>%
  fill(n.risk, .direction = "down") %>%
  mutate(drug = ifelse(strata == "drug=5ari", "5ARI", "Tamsulosin")) %>%
  select(time, drug, n.risk) %>%
  filter(time %in% c(1:16 * 365)) %>%
  mutate(year = round(time / 365)) %>%
  spread(drug, n.risk) %>%
  knitr::kable()
time year 5ARI Tamsulosin
365 1 58387 58426
730 2 41709 41987
1095 3 30354 30486
1460 4 22122 22078
1825 5 16032 16102
2190 6 11475 11533
2555 7 8076 8111
2920 8 5471 5463
3285 9 3584 3542
3650 10 2238 2204
4015 11 1316 1301
4380 12 643 658
4745 13 328 327
5110 14 163 161
5475 15 43 48
5840 16 2 1

A log rank test confirms that the curves remain different but only with p = 0.05.

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = matched_model_data)
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = matched_model_data)

                N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 78747      757      802      2.55       5.1
treatment=1 78747      846      801      2.55       5.1

 Chisq= 5.1  on 1 degrees of freedom, p= 0.02 

Survival times

tbl_survfit(
  survfit(
    Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
  ),
  probs = seq(0.5, 2, by = 0.5) / 100
)
Characteristic 0.5% Percentile 1.0% Percentile 1.5% Percentile 2.0% Percentile
drug
5ari 557 (497, 649) 1,175 (1,067, 1,279) 1,778 (1,666, 1,943) 2,385 (2,183, 2,694)
tamsulosin 530 (472, 581) 1,089 (980, 1,203) 1,665 (1,519, 1,822) 2,160 (1,995, 2,365)

We fit a Cox regression model to estimate the differences in hazard:

full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                       data = matched_model_data,
                       robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)
Characteristic HR1 95% CI1 p-value
as.numeric(treatment) 1.12 1.01, 1.23 0.024

1 HR = Hazard Ratio, CI = Confidence Interval

The Cox regression yields a HR of 1.10 (95% CI: 1.00, 1.22; p = 0.0528).

Fit Summaries

tbl_merge(
  tbls = list(
    tbl_regression(raw_cox, exponentiate = TRUE),
    tbl_regression(full_cox, exponentiate = TRUE)
  ),
  tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>% 
  as_flex_table()

Unadjusted

Propensity Score Matched

Characteristic

HR1

95% CI1

p-value

HR1

95% CI1

p-value

as.numeric(treatment)

1.33

1.23, 1.44

<0.001

1.12

1.01, 1.23

0.024

1HR = Hazard Ratio, CI = Confidence Interval

Diagnostics

scf_residuals <- cox.zph(full_cox)
scf_residuals
                      chisq df     p
as.numeric(treatment)  4.94  1 0.026
GLOBAL                 4.94  1 0.026
with(scf_residuals, cor.test(x, y))

    Pearson's product-moment correlation

data:  x and y
t = 2.2253, df = 1601, p-value = 0.0262
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.006587255 0.104205887
sample estimates:
       cor 
0.05552927 
tibble(
  x = scf_residuals$time,
  y = scf_residuals$y
) %>%
  ggplot(aes(x = x / 365, y = y)) + 
  geom_point() + 
  geom_smooth(method = "loess") + 
  labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") + 
  theme_minimal()

There is a weak correlation that is not statistically significant when checking the Schoenfeld residuals. It is unlikely to be a serious threat to our estimates.

Footnotes