Balance Assessment and Survival Analysis | Tz/Dz/Az vs Tamsulosin

Published

May 8, 2022

DOI

library("Rcpp", lib.loc = "~/R/x86_64-pc-linux-gnu-library/4.0/")
library(tidyverse)
library(gtsummary)
library(survival)
library(effectsize)
library(patchwork)

This file was compiled on 2022-05-09 14:48:08 by jsimmeri on argon-lc-f14-25.hpc.

The script analysis_functions.R provides:

  1. some functions for summarizing the cohort and building tables that are shared across scripts
  2. code to load the entire collection out outcomes data (which it returns as model_data to the workspace)
  3. sets root_dir according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOS
source("~/projects/pd-preprint/analysis/analysis_functions.R")

Data Load

Unadjusted Data

The tibble model_data is loaded by analysis_functions from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data contains data on all users of the three drugs. We want to reduce that here to just Tz/Dz/Az and tamsulosin.

model_data <- model_data %>%
  filter(drug %in% c("tamsulosin", "tz/dz/az")) %>%
  mutate(treatment = drug == "tz/dz/az")

Matches

We also want to load our matching results:

matches <- read_rds(glue::glue("{root_dir}/matches/tz_tam.rds"))

And using the matches, build matched versions of the tamsulosin versus 5ARI model data:

matched_model_data <- model_data %>%
  select(-treatment) %>%
  inner_join(matches,
    by = "enrolid")

Balance and Cohort Description

Before Matching

describe_cohort(model_data)

Group

Characteristic

tamsulosin, N = 429,741

tz/dz/az, N = 124,905

Difference1

95% CI12

Age and Medication Start Time

Age at Index Date

62 (56, 71)

62 (55, 70)

0.051

0.045, 0.058

Medication Index Date

2,012.0 (2,008.0, 2,014.0)

2,009.0 (2,006.0, 2,012.0)

0.481

0.529, 0.529

Duration of Enrollment Time

Lookback Time (Years)

3.6 (2.0, 6.3)

2.9 (1.7, 5.2)

0.242

0.266, 0.266

Follow Up Time (Years)

1.90 (0.86, 3.78)

1.96 (0.93, 4.19)

-0.085

-0.092, -0.079

Rates of Inpatient and Outpatient Encounters

Inpatient Events Per Year

0.00 (0.00, 0.21)

0.00 (0.00, 0.12)

0.080

0.073, 0.086

Outpatient Events Per Year

10 (5, 18)

9 (5, 16)

0.065

0.059, 0.071

Mean Number of DX Codes Per Outpatient Event

1.56 (1.27, 2.09)

1.37 (1.18, 1.83)

0.279

0.307, 0.307

Outpatient DX Code Incidence Per Year

0.04 (0.02, 0.08)

0.04 (0.02, 0.07)

0.129

0.122, 0.135

mean_ndx_out_rate

16 (9, 29)

14 (7, 25)

0.129

0.122, 0.135

Elixhauser/AHRQ Comorbidity Flags

Alcohol Abuse

11,147 (2.6%)

2,652 (2.1%)

0.030

0.024, 0.037

Anemia

66,743 (15.5%)

16,683 (13.4%)

0.061

0.055, 0.067

Blood Loss

6,996 (1.6%)

1,482 (1.2%)

0.036

0.030, 0.042

Heart Failure

41,392 (9.6%)

9,957 (8.0%)

0.057

0.051, 0.064

Coagulopathy

18,433 (4.3%)

3,572 (2.9%)

0.073

0.067, 0.080

Depression

42,310 (9.8%)

8,769 (7.0%)

0.098

0.091, 0.104

Diabetes Without Complications

118,841 (27.7%)

36,097 (28.9%)

-0.028

-0.034, -0.021

Diabetes With Complications

45,465 (10.6%)

14,154 (11.3%)

-0.024

-0.031, -0.018

Drug Abuse

6,533 (1.5%)

1,311 (1.0%)

0.040

0.034, 0.046

Fluid or Electrolyte Disorders

54,203 (12.6%)

13,535 (10.8%)

0.054

0.048, 0.061

HIV

1,426 (0.3%)

481 (0.4%)

-0.009

-0.015, -0.003

HTN Without Complications

266,446 (62.0%)

86,651 (69.4%)

-0.154

-0.160, -0.147

HTN With Complications

55,606 (12.9%)

20,483 (16.4%)

-0.101

-0.107, -0.094

Hypothyroidism

42,431 (9.9%)

9,649 (7.7%)

0.074

0.067, 0.080

Liver Disease

23,871 (5.6%)

4,791 (3.8%)

0.078

0.071, 0.084

Lymphoma

6,226 (1.4%)

1,291 (1.0%)

0.036

0.030, 0.042

Metastatic Cancer

10,529 (2.5%)

1,525 (1.2%)

0.084

0.078, 0.091

Other Neurologic Disorders

39,518 (9.2%)

7,231 (5.8%)

0.123

0.116, 0.129

Obesity

47,431 (11.0%)

11,396 (9.1%)

0.062

0.056, 0.068

Paralysis

10,106 (2.4%)

2,017 (1.6%)

0.050

0.044, 0.057

Pulmonary Hypertension

12,901 (3.0%)

2,402 (1.9%)

0.066

0.060, 0.072

Psychoses

31,075 (7.2%)

6,706 (5.4%)

0.074

0.068, 0.080

Peptic Ulcer Disease

1,672 (0.4%)

293 (0.2%)

0.026

0.020, 0.032

COPD

102,194 (23.8%)

23,707 (19.0%)

0.115

0.108, 0.121

Peripheral Vascular Disease

62,941 (14.6%)

14,995 (12.0%)

0.076

0.070, 0.082

Renal Failure

30,407 (7.1%)

11,908 (9.5%)

-0.093

-0.099, -0.086

Rheumatoid Arthritis

21,620 (5.0%)

4,669 (3.7%)

0.061

0.055, 0.067

Solid Tumor

72,259 (16.8%)

13,665 (10.9%)

0.163

0.143, 0.188

Valvular Disease

60,458 (14.1%)

13,805 (11.1%)

0.089

0.082, 0.095

Weight Loss

19,310 (4.5%)

3,334 (2.7%)

0.092

0.086, 0.099

Hypotension

Orthostatic Hypotension

5,000 (1.2%)

898 (0.7%)

0.043

0.037, 0.050

Other Hypotension

12,550 (2.9%)

2,129 (1.7%)

0.076

0.069, 0.082

Prostate Specific Antigen

PSA Measurement Taken

247,089 (57.5%)

71,601 (57.3%)

0.003

-0.003, 0.010

Diagnosis of Abnormal PSA

69,520 (16.2%)

13,950 (11.2%)

0.140

0.134, 0.147

Bladder Function and Urinary Flow

Slow Urinary Stream Diagnosis

13,557 (3.2%)

2,908 (2.3%)

0.049

0.042, 0.055

Uroflow Study Performed

27,946 (6.5%)

7,554 (6.0%)

0.019

0.012, 0.025

Cystometrogram Performed

3,304 (0.8%)

892 (0.7%)

0.006

0.000, 0.013

Diagnosis of BPH

179,924 (41.9%)

41,958 (33.6%)

0.169

0.149, 0.196

Other Comorbidities

Diagnosis of Anxiety

38,428 (8.9%)

7,700 (6.2%)

0.101

0.094, 0.107

Diagnosis of Erectile Dysfunction

44,895 (10.4%)

11,152 (8.9%)

0.050

0.044, 0.057

1Cohen's D

2CI = Confidence Interval

After Matching

describe_cohort(matched_model_data)

Group

Characteristic

tamsulosin, N = 119,944

tz/dz/az, N = 119,944

Difference1

95% CI12

Age and Medication Start Time

Age at Index Date

62 (55, 70)

62 (55, 70)

0.015

0.007, 0.023

Medication Index Date

2,009.0 (2,006.0, 2,012.0)

2,009.0 (2,006.0, 2,012.0)

0.011

0.003, 0.019

Duration of Enrollment Time

Lookback Time (Years)

3.19 (1.88, 5.50)

2.94 (1.76, 5.30)

0.061

0.052, 0.069

Follow Up Time (Years)

2.07 (0.92, 4.29)

2.05 (0.93, 4.29)

-0.001

-0.009, 0.007

Rates of Inpatient and Outpatient Encounters

Inpatient Events Per Year

0.00 (0.00, 0.14)

0.00 (0.00, 0.13)

0.005

-0.003, 0.013

Outpatient Events Per Year

9 (5, 16)

9 (5, 16)

0.008

0.000, 0.016

Mean Number of DX Codes Per Outpatient Event

1.39 (1.19, 1.87)

1.38 (1.19, 1.86)

0.006

-0.002, 0.014

Outpatient DX Code Incidence Per Year

0.04 (0.02, 0.07)

0.04 (0.02, 0.07)

0.010

0.002, 0.018

mean_ndx_out_rate

14 (7, 25)

14 (7, 25)

0.010

0.002, 0.018

Elixhauser/AHRQ Comorbidity Flags

Alcohol Abuse

2,684 (2.2%)

2,599 (2.2%)

0.005

-0.003, 0.013

Anemia

16,334 (13.6%)

16,209 (13.5%)

0.003

-0.005, 0.011

Blood Loss

1,494 (1.2%)

1,456 (1.2%)

0.003

-0.005, 0.011

Heart Failure

9,925 (8.3%)

9,705 (8.1%)

0.007

-0.001, 0.015

Coagulopathy

3,549 (3.0%)

3,506 (2.9%)

0.002

-0.006, 0.010

Depression

8,731 (7.3%)

8,660 (7.2%)

0.002

-0.006, 0.010

Diabetes Without Complications

34,932 (29.1%)

34,735 (29.0%)

0.004

-0.004, 0.012

Diabetes With Complications

13,786 (11.5%)

13,616 (11.4%)

0.004

-0.004, 0.012

Drug Abuse

1,300 (1.1%)

1,299 (1.1%)

0.000

-0.008, 0.008

Fluid or Electrolyte Disorders

13,302 (11.1%)

13,181 (11.0%)

0.003

-0.005, 0.011

HIV

493 (0.4%)

467 (0.4%)

0.003

-0.005, 0.011

HTN Without Complications

82,991 (69.2%)

82,654 (68.9%)

0.006

-0.002, 0.014

HTN With Complications

19,857 (16.6%)

19,442 (16.2%)

0.009

0.001, 0.017

Hypothyroidism

9,647 (8.0%)

9,449 (7.9%)

0.006

-0.002, 0.014

Liver Disease

4,807 (4.0%)

4,737 (3.9%)

0.003

-0.005, 0.011

Lymphoma

1,296 (1.1%)

1,266 (1.1%)

0.002

-0.006, 0.010

Metastatic Cancer

1,542 (1.3%)

1,514 (1.3%)

0.002

-0.006, 0.010

Other Neurologic Disorders

7,305 (6.1%)

7,159 (6.0%)

0.005

-0.003, 0.013

Obesity

11,270 (9.4%)

11,260 (9.4%)

0.000

-0.008, 0.008

Paralysis

2,013 (1.7%)

1,985 (1.7%)

0.002

-0.006, 0.010

Pulmonary Hypertension

2,480 (2.1%)

2,386 (2.0%)

0.006

-0.002, 0.014

Psychoses

6,816 (5.7%)

6,637 (5.5%)

0.006

-0.002, 0.014

Peptic Ulcer Disease

290 (0.2%)

290 (0.2%)

0.000

-0.008, 0.008

COPD

23,561 (19.6%)

23,283 (19.4%)

0.006

-0.002, 0.014

Peripheral Vascular Disease

14,877 (12.4%)

14,648 (12.2%)

0.006

-0.002, 0.014

Renal Failure

11,382 (9.5%)

11,288 (9.4%)

0.003

-0.005, 0.011

Rheumatoid Arthritis

4,615 (3.8%)

4,589 (3.8%)

0.001

-0.007, 0.009

Solid Tumor

13,662 (11.4%)

13,507 (11.3%)

0.004

-0.004, 0.012

Valvular Disease

13,851 (11.5%)

13,586 (11.3%)

0.007

-0.001, 0.015

Weight Loss

3,380 (2.8%)

3,293 (2.7%)

0.004

-0.004, 0.012

Hypotension

Orthostatic Hypotension

957 (0.8%)

893 (0.7%)

0.006

-0.002, 0.014

Other Hypotension

2,160 (1.8%)

2,110 (1.8%)

0.003

-0.005, 0.011

Prostate Specific Antigen

PSA Measurement Taken

69,252 (57.7%)

68,586 (57.2%)

0.011

0.003, 0.019

Diagnosis of Abnormal PSA

14,040 (11.7%)

13,746 (11.5%)

0.008

0.000, 0.016

Bladder Function and Urinary Flow

Slow Urinary Stream Diagnosis

3,061 (2.6%)

2,891 (2.4%)

0.009

0.001, 0.017

Uroflow Study Performed

7,682 (6.4%)

7,428 (6.2%)

0.009

0.001, 0.017

Cystometrogram Performed

910 (0.8%)

886 (0.7%)

0.002

-0.006, 0.010

Diagnosis of BPH

41,863 (34.9%)

41,127 (34.3%)

0.013

0.005, 0.021

Other Comorbidities

Diagnosis of Anxiety

7,737 (6.5%)

7,628 (6.4%)

0.004

-0.004, 0.012

Diagnosis of Erectile Dysfunction

11,200 (9.3%)

10,965 (9.1%)

0.007

-0.001, 0.015

1Cohen's D

2CI = Confidence Interval

Propensity Score Overlap

We can also visually inspect the estimated propensity scores for overlap:

ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tz_tam.rds"))
bind_rows(
  model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "Before Matching"),
  matched_model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "After Matching")
  ) %>%
  mutate(type = forcats::fct_rev(type)) %>%
  ggplot(aes(x = ptx, fill = drug, color = drug)) +
  geom_density(alpha = 0.5) + 
  labs(x = "Probability of Treatment with Tamsulosin", 
       y = "Density",
       color = "Observed Treatment",
       fill = "Observed Treatment") +
  theme_minimal() +
  theme(legend.position = "bottom") + 
  facet_wrap(~ type, nrow = 2)

It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.

Survival Analysis

Before Matching

Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.

survfit(Surv(survival_time, develops_pd) ~ treatment, 
        data = model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "treatment=TRUE", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

A finding that is statistically significant using the log rank test:

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = model_data)
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = model_data)

                     N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE 429741     4867     4476      34.2       143
treatment=TRUE  124905     1022     1413     108.3       143

 Chisq= 143  on 1 degrees of freedom, p= <2e-16 

The Cox regression pegs the protection at HR at 0.67 (0.62, 0.71).

raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                 data = model_data,
                 robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)
Characteristic HR1 95% CI1 p-value
as.numeric(treatment) 0.66 0.62, 0.71 <0.001

1 HR = Hazard Ratio, CI = Confidence Interval

However, this estimate is not trustworthy as we know the groups could be better balanced.

With Propensity Score Matching

Further adjustment yields similar results.

We observe the following overall incidence:

matched_model_data %>%
  group_by(drug) %>%
  summarize(
    cases = sum(develops_pd),
    time = sum(survival_time / 365),
    incidence = cases / time * 10000
  ) %>%
  knitr::kable()
drug cases time incidence
tamsulosin 1391 358504.4 38.80008
tz/dz/az 991 359705.2 27.55034

With a Kaplan Meier curve:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "drug=tamsulosin", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

The population size at risk by year is given in the table below:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  select(strata, time,, n.risk) %>%
  complete(strata, time = 0:(16*365)) %>%
  fill(n.risk, .direction = "down") %>%
  mutate(drug = ifelse(strata == "drug=tamsulosin", "Tamsulosin", "TZ/DZ/AZ")) %>%
  select(time, drug, n.risk) %>%
  filter(time %in% c(1:16 * 365)) %>%
  mutate(year = round(time / 365)) %>%
  spread(drug, n.risk) %>%
  knitr::kable()
time year Tamsulosin TZ/DZ/AZ
365 1 86903 87067
730 2 60751 60535
1095 3 44291 44272
1460 4 32414 32461
1825 5 23443 23524
2190 6 16894 16944
2555 7 12166 12186
2920 8 8306 8332
3285 9 5535 5554
3650 10 3484 3568
4015 11 2124 2146
4380 12 1133 1121
4745 13 629 646
5110 14 304 317
5475 15 121 117
5840 16 2 3

A log rank test confirms that the curves remain different.

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = matched_model_data)
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = matched_model_data)

                 N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 119944     1391     1189      34.3      68.5
treatment=1 119944      991     1193      34.2      68.5

 Chisq= 68.5  on 1 degrees of freedom, p= <2e-16 

Survival times

tbl_survfit(
  survfit(
    Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
  ),
  probs = seq(0.5, 2, by = 0.5) / 100
)
Characteristic 0.5% Percentile 1.0% Percentile 1.5% Percentile 2.0% Percentile
drug
tamsulosin 442 (412, 496) 958 (890, 1,015) 1,463 (1,380, 1,584) 1,996 (1,891, 2,156)
tz/dz/az 595 (547, 651) 1,317 (1,237, 1,481) 2,100 (1,968, 2,264) 2,768 (2,578, 3,030)

We fit a Cox regression model to estimate the differences in hazard:

full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                       data = matched_model_data,
                       robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)
Characteristic HR1 95% CI1 p-value
as.numeric(treatment) 0.71 0.65, 0.77 <0.001

1 HR = Hazard Ratio, CI = Confidence Interval

The Cox regression yields a HR of 0.71 (0.66, 0.78; p < 0.001).

Fit Summaries

tbl_merge(
  tbls = list(
    tbl_regression(raw_cox, exponentiate = TRUE),
    tbl_regression(full_cox, exponentiate = TRUE)
  ),
  tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>% 
  as_flex_table()

Unadjusted

Propensity Score Matched

Characteristic

HR1

95% CI1

p-value

HR1

95% CI1

p-value

as.numeric(treatment)

0.66

0.62, 0.71

<0.001

0.71

0.65, 0.77

<0.001

1HR = Hazard Ratio, CI = Confidence Interval

Diagnostics

scf_residuals <- cox.zph(full_cox)
scf_residuals
                      chisq df    p
as.numeric(treatment)  1.98  1 0.16
GLOBAL                 1.98  1 0.16
with(scf_residuals, cor.test(x, y))

    Pearson's product-moment correlation

data:  x and y
t = -1.4089, df = 2380, p-value = 0.159
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.06894934  0.01130818
sample estimates:
       cor 
-0.0288671 
tibble(
  x = scf_residuals$time,
  y = scf_residuals$y
) %>%
  ggplot(aes(x = x / 365, y = y)) + 
  geom_point() + 
  geom_smooth(method = "loess") + 
  labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") + 
  theme_minimal()

There is statistically significant correlation with the Schoenfeld residuals; however, we are likely over-powered to detect a deviation. Looking at the estimated correlation (-0.059), we think it is unlikely that this has a meaningful impact on our estimated hazard ratio.

Footnotes