Balance Assessment and Survival Analysis | Tz/Dz/Az vs 5ARI

Published

May 8, 2022

DOI

library("Rcpp", lib.loc = "~/R/x86_64-pc-linux-gnu-library/4.0/")
library(tidyverse)
library(gtsummary)
library(survival)
library(effectsize)
library(patchwork)

This file was compiled on 2022-05-09 14:50:05 by jsimmeri on argon-lc-f14-25.hpc. The script analysis_functions.R provides:

  1. some functions for summarizing the cohort and building tables that are shared across scripts
  2. code to load the entire collection out outcomes data (which it returns as model_data to the workspace)
  3. sets root_dir according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOS
source("~/projects/pd-preprint/analysis/analysis_functions.R")

Data Load

Unadjusted Data

The tibble model_data is loaded by analysis_functions from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data contains data on all users of the three drugs. We want to reduce that here to just Tz/Dz/Az and 5ARI.

model_data <- model_data %>%
  filter(drug %in% c("5ari", "tz/dz/az")) %>%
  mutate(treatment = drug == "tz/dz/az")

Matches

We also want to load our matching results:

matches <- read_rds(glue::glue("{root_dir}/matches/tz_5ari.rds"))

And using the matches, build matched versions of the tamsulosin versus 5ARI model data:

matched_model_data <- model_data %>%
  select(-treatment) %>%
  inner_join(matches,
    by = "enrolid")

Balance and Cohort Description

Before Matching

describe_cohort(model_data)

Group

Characteristic

5ari, N = 79,133

tz/dz/az, N = 124,905

Difference1

95% CI12

Age and Medication Start Time

Age at Index Date

62 (54, 70)

62 (55, 70)

-0.047

-0.055, -0.038

Medication Index Date

2,010.0 (2,008.0, 2,013.0)

2,009.0 (2,006.0, 2,012.0)

0.220

0.194, 0.254

Duration of Enrollment Time

Lookback Time (Years)

3.15 (1.83, 5.57)

2.85 (1.71, 5.18)

0.073

0.064, 0.082

Follow Up Time (Years)

2.23 (0.98, 4.43)

1.96 (0.93, 4.19)

0.049

0.041, 0.058

Rates of Inpatient and Outpatient Encounters

Inpatient Events Per Year

0.00 (0.00, 0.00)

0.00 (0.00, 0.12)

-0.114

-0.123, -0.105

Outpatient Events Per Year

9 (5, 16)

9 (5, 16)

-0.051

-0.060, -0.042

Mean Number of DX Codes Per Outpatient Event

1.40 (1.20, 1.87)

1.37 (1.18, 1.83)

0.037

0.029, 0.046

Outpatient DX Code Incidence Per Year

0.04 (0.02, 0.07)

0.04 (0.02, 0.07)

-0.031

-0.040, -0.022

mean_ndx_out_rate

13 (7, 24)

14 (7, 25)

-0.031

-0.040, -0.022

Elixhauser/AHRQ Comorbidity Flags

Alcohol Abuse

1,135 (1.4%)

2,652 (2.1%)

-0.051

-0.060, -0.042

Anemia

9,051 (11.4%)

16,683 (13.4%)

-0.058

-0.067, -0.049

Blood Loss

886 (1.1%)

1,482 (1.2%)

-0.006

-0.015, 0.003

Heart Failure

5,190 (6.6%)

9,957 (8.0%)

-0.054

-0.063, -0.045

Coagulopathy

2,302 (2.9%)

3,572 (2.9%)

0.003

-0.006, 0.012

Depression

5,380 (6.8%)

8,769 (7.0%)

-0.009

-0.018, 0.000

Diabetes Without Complications

15,703 (19.8%)

36,097 (28.9%)

-0.209

-0.218, -0.200

Diabetes With Complications

5,009 (6.3%)

14,154 (11.3%)

-0.172

-0.181, -0.163

Drug Abuse

568 (0.7%)

1,311 (1.0%)

-0.035

-0.044, -0.026

Fluid or Electrolyte Disorders

5,812 (7.3%)

13,535 (10.8%)

-0.119

-0.128, -0.110

HIV

546 (0.7%)

481 (0.4%)

0.043

0.034, 0.052

HTN Without Complications

41,769 (52.8%)

86,651 (69.4%)

-0.348

-0.357, -0.339

HTN With Complications

7,828 (9.9%)

20,483 (16.4%)

-0.189

-0.198, -0.180

Hypothyroidism

6,927 (8.8%)

9,649 (7.7%)

0.038

0.029, 0.047

Liver Disease

2,454 (3.1%)

4,791 (3.8%)

-0.040

-0.049, -0.031

Lymphoma

829 (1.0%)

1,291 (1.0%)

0.001

-0.008, 0.010

Metastatic Cancer

1,090 (1.4%)

1,525 (1.2%)

0.014

0.005, 0.023

Other Neurologic Disorders

4,419 (5.6%)

7,231 (5.8%)

-0.009

-0.018, 0.000

Obesity

4,462 (5.6%)

11,396 (9.1%)

-0.130

-0.139, -0.122

Paralysis

960 (1.2%)

2,017 (1.6%)

-0.034

-0.042, -0.025

Pulmonary Hypertension

1,521 (1.9%)

2,402 (1.9%)

0.000

-0.009, 0.009

Psychoses

4,098 (5.2%)

6,706 (5.4%)

-0.008

-0.017, 0.000

Peptic Ulcer Disease

183 (0.2%)

293 (0.2%)

-0.001

-0.010, 0.008

COPD

13,975 (17.7%)

23,707 (19.0%)

-0.034

-0.043, -0.025

Peripheral Vascular Disease

8,260 (10.4%)

14,995 (12.0%)

-0.049

-0.058, -0.040

Renal Failure

3,431 (4.3%)

11,908 (9.5%)

-0.198

-0.207, -0.189

Rheumatoid Arthritis

2,785 (3.5%)

4,669 (3.7%)

-0.012

-0.021, -0.003

Solid Tumor

10,371 (13.1%)

13,665 (10.9%)

0.067

0.058, 0.076

Valvular Disease

9,097 (11.5%)

13,805 (11.1%)

0.014

0.005, 0.023

Weight Loss

2,219 (2.8%)

3,334 (2.7%)

0.008

-0.001, 0.017

Hypotension

Orthostatic Hypotension

748 (0.9%)

898 (0.7%)

0.025

0.016, 0.034

Other Hypotension

1,451 (1.8%)

2,129 (1.7%)

0.010

0.001, 0.019

Prostate Specific Antigen

PSA Measurement Taken

48,034 (60.7%)

71,601 (57.3%)

0.069

0.060, 0.077

Diagnosis of Abnormal PSA

23,538 (29.7%)

13,950 (11.2%)

0.493

0.543, 0.543

Bladder Function and Urinary Flow

Slow Urinary Stream Diagnosis

1,936 (2.4%)

2,908 (2.3%)

0.008

-0.001, 0.017

Uroflow Study Performed

5,556 (7.0%)

7,554 (6.0%)

0.040

0.031, 0.049

Cystometrogram Performed

524 (0.7%)

892 (0.7%)

-0.006

-0.015, 0.003

Diagnosis of BPH

35,618 (45.0%)

41,958 (33.6%)

0.237

0.208, 0.273

Other Comorbidities

Diagnosis of Anxiety

5,406 (6.8%)

7,700 (6.2%)

0.027

0.018, 0.036

Diagnosis of Erectile Dysfunction

7,153 (9.0%)

11,152 (8.9%)

0.004

-0.005, 0.013

1Cohen's D

2CI = Confidence Interval

After Matching

describe_cohort(matched_model_data)

Group

Characteristic

5ari, N = 64,558

tz/dz/az, N = 64,558

Difference1

95% CI12

Age and Medication Start Time

Age at Index Date

61 (54, 70)

61 (54, 70)

0.005

-0.006, 0.016

Medication Index Date

2,010.0 (2,007.0, 2,013.0)

2,010.0 (2,007.0, 2,012.0)

0.032

0.021, 0.043

Duration of Enrollment Time

Lookback Time (Years)

3.10 (1.82, 5.48)

3.02 (1.80, 5.43)

0.012

0.001, 0.023

Follow Up Time (Years)

2.16 (0.95, 4.37)

2.15 (0.94, 4.37)

0.001

-0.010, 0.012

Rates of Inpatient and Outpatient Encounters

Inpatient Events Per Year

0.00 (0.00, 0.00)

0.00 (0.00, 0.00)

0.001

-0.010, 0.011

Outpatient Events Per Year

9 (5, 16)

9 (5, 16)

0.002

-0.008, 0.013

Mean Number of DX Codes Per Outpatient Event

1.40 (1.19, 1.88)

1.39 (1.19, 1.86)

0.019

0.008, 0.030

Outpatient DX Code Incidence Per Year

0.04 (0.02, 0.07)

0.04 (0.02, 0.06)

0.008

-0.002, 0.019

mean_ndx_out_rate

13 (7, 24)

13 (7, 24)

0.008

-0.002, 0.019

Elixhauser/AHRQ Comorbidity Flags

Alcohol Abuse

1,050 (1.6%)

1,053 (1.6%)

0.000

-0.011, 0.011

Anemia

7,560 (11.7%)

7,490 (11.6%)

0.003

-0.008, 0.014

Blood Loss

724 (1.1%)

733 (1.1%)

-0.001

-0.012, 0.010

Heart Failure

4,400 (6.8%)

4,335 (6.7%)

0.004

-0.007, 0.015

Coagulopathy

1,866 (2.9%)

1,850 (2.9%)

0.001

-0.009, 0.012

Depression

4,526 (7.0%)

4,598 (7.1%)

-0.004

-0.015, 0.007

Diabetes Without Complications

13,925 (21.6%)

13,976 (21.6%)

-0.002

-0.013, 0.009

Diabetes With Complications

4,584 (7.1%)

4,582 (7.1%)

0.000

-0.011, 0.011

Drug Abuse

524 (0.8%)

515 (0.8%)

0.002

-0.009, 0.012

Fluid or Electrolyte Disorders

5,097 (7.9%)

5,072 (7.9%)

0.001

-0.009, 0.012

HIV

371 (0.6%)

356 (0.6%)

0.003

-0.008, 0.014

HTN Without Complications

36,318 (56.3%)

36,374 (56.3%)

-0.002

-0.013, 0.009

HTN With Complications

7,050 (10.9%)

6,969 (10.8%)

0.004

-0.007, 0.015

Hypothyroidism

5,508 (8.5%)

5,412 (8.4%)

0.005

-0.006, 0.016

Liver Disease

2,111 (3.3%)

2,162 (3.3%)

-0.004

-0.015, 0.006

Lymphoma

661 (1.0%)

686 (1.1%)

-0.004

-0.015, 0.007

Metastatic Cancer

873 (1.4%)

893 (1.4%)

-0.003

-0.014, 0.008

Other Neurologic Disorders

3,663 (5.7%)

3,665 (5.7%)

0.000

-0.011, 0.011

Obesity

4,109 (6.4%)

4,035 (6.3%)

0.005

-0.006, 0.016

Paralysis

851 (1.3%)

839 (1.3%)

0.002

-0.009, 0.013

Pulmonary Hypertension

1,234 (1.9%)

1,241 (1.9%)

-0.001

-0.012, 0.010

Psychoses

3,418 (5.3%)

3,494 (5.4%)

-0.005

-0.016, 0.006

Peptic Ulcer Disease

153 (0.2%)

147 (0.2%)

0.002

-0.009, 0.013

COPD

11,603 (18.0%)

11,709 (18.1%)

-0.004

-0.015, 0.007

Peripheral Vascular Disease

6,894 (10.7%)

6,879 (10.7%)

0.001

-0.010, 0.012

Renal Failure

3,246 (5.0%)

3,130 (4.8%)

0.008

-0.003, 0.019

Rheumatoid Arthritis

2,345 (3.6%)

2,344 (3.6%)

0.000

-0.011, 0.011

Solid Tumor

7,988 (12.4%)

7,987 (12.4%)

0.000

-0.011, 0.011

Valvular Disease

7,290 (11.3%)

7,323 (11.3%)

-0.002

-0.013, 0.009

Weight Loss

1,807 (2.8%)

1,784 (2.8%)

0.002

-0.009, 0.013

Hypotension

Orthostatic Hypotension

566 (0.9%)

547 (0.8%)

0.003

-0.008, 0.014

Other Hypotension

1,158 (1.8%)

1,154 (1.8%)

0.000

-0.010, 0.011

Prostate Specific Antigen

PSA Measurement Taken

38,395 (59.5%)

38,094 (59.0%)

0.009

-0.001, 0.020

Diagnosis of Abnormal PSA

13,277 (20.6%)

12,868 (19.9%)

0.016

0.005, 0.027

Bladder Function and Urinary Flow

Slow Urinary Stream Diagnosis

1,621 (2.5%)

1,688 (2.6%)

-0.007

-0.017, 0.004

Uroflow Study Performed

4,459 (6.9%)

4,715 (7.3%)

-0.015

-0.026, -0.005

Cystometrogram Performed

452 (0.7%)

502 (0.8%)

-0.009

-0.020, 0.002

Diagnosis of BPH

26,978 (41.8%)

27,429 (42.5%)

-0.014

-0.025, -0.003

Other Comorbidities

Diagnosis of Anxiety

4,375 (6.8%)

4,369 (6.8%)

0.000

-0.011, 0.011

Diagnosis of Erectile Dysfunction

5,860 (9.1%)

5,930 (9.2%)

-0.004

-0.015, 0.007

1Cohen's D

2CI = Confidence Interval

Propensity Score Overlap

We can also visually inspect the estimated propensity scores for overlap:

ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tz_5ari.rds"))
bind_rows(
  model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "Before Matching"),
  matched_model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "After Matching")
  ) %>%
  mutate(type = forcats::fct_rev(type)) %>%
  ggplot(aes(x = ptx, fill = drug, color = drug)) +
  geom_density(alpha = 0.5) + 
  labs(x = "Probability of Treatment with Tamsulosin", 
       y = "Density",
       color = "Observed Treatment",
       fill = "Observed Treatment") +
  theme_minimal() +
  theme(legend.position = "bottom") + 
  facet_wrap(~ type, nrow = 2)

It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.

Survival Analysis

Before Matching

Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.

survfit(Surv(survival_time, develops_pd) ~ treatment, 
        data = model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "treatment=TRUE", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

A finding that is statistically significant using the log rank test:

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = model_data)
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = model_data)

                     N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE  79133      765      711      4.13      6.86
treatment=TRUE  124905     1022     1076      2.73      6.86

 Chisq= 6.9  on 1 degrees of freedom, p= 0.009 

The Cox regression pegs the protection at HR at 0.88 (0.80, 0.97).

raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                 data = model_data,
                 robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)
Characteristic HR1 95% CI1 p-value
as.numeric(treatment) 0.88 0.80, 0.97 0.009

1 HR = Hazard Ratio, CI = Confidence Interval

However, this estimate is not trustworthy as we know the groups could be better balanced.

With Propensity Score Matching

Further adjustment yields similar results:

We observe the following overall incidence:

matched_model_data %>%
  group_by(drug) %>%
  summarize(
    cases = sum(develops_pd),
    time = sum(survival_time / 365),
    incidence = cases / time * 10000
  ) %>%
  knitr::kable()
drug cases time incidence
5ari 620 196284.6 31.58680
tz/dz/az 521 196256.5 26.54689

With a Kaplan Meier curve:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "drug=5ari", 
    "5ARI", "TZ/DZ/AZ")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

The population size at risk by year is given in the table below:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  select(strata, time,, n.risk) %>%
  complete(strata, time = 0:(16*365)) %>%
  fill(n.risk, .direction = "down") %>%
  mutate(drug = ifelse(strata == "drug=5ari", "5ARI", "TZ/DZ/AZ")) %>%
  select(time, drug, n.risk) %>%
  filter(time %in% c(1:16 * 365)) %>%
  mutate(year = round(time / 365)) %>%
  spread(drug, n.risk) %>%
  knitr::kable()
time year 5ARI TZ/DZ/AZ
365 1 47223 47125
730 2 33639 33552
1095 3 24454 24446
1460 4 17905 17858
1825 5 12985 12986
2190 6 9315 9341
2555 7 6648 6645
2920 8 4504 4493
3285 9 2993 2975
3650 10 1865 1877
4015 11 1089 1072
4380 12 536 511
4745 13 266 270
5110 14 115 127
5475 15 34 35
5840 16 2 2

A log rank test confirms that the curves remain different.

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = matched_model_data)
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = matched_model_data)

                N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 64558      620      571      4.28      8.57
treatment=1 64558      521      570      4.28      8.57

 Chisq= 8.6  on 1 degrees of freedom, p= 0.003 

Survival times

tbl_survfit(
  survfit(
    Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
  ),
  probs = seq(0.5, 2, by = 0.5) / 100
)
Characteristic 0.5% Percentile 1.0% Percentile 1.5% Percentile 2.0% Percentile
drug
5ari 556 (495, 649) 1,160 (1,064, 1,307) 1,746 (1,616, 1,943) 2,385 (2,183, 2,702)
tz/dz/az 640 (569, 691) 1,342 (1,188, 1,562) 2,140 (1,995, 2,369) 2,815 (2,575, 3,352)

We fit a Cox regression model to estimate the differences in hazard:

full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                       data = matched_model_data,
                       robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)
Characteristic HR1 95% CI1 p-value
as.numeric(treatment) 0.84 0.75, 0.94 0.003

1 HR = Hazard Ratio, CI = Confidence Interval

The Cox regression yields a HR of 0.81 (0.72, 0.92; p < 0.0001).

Fit Summaries

tbl_merge(
  tbls = list(
    tbl_regression(raw_cox, exponentiate = TRUE),
    tbl_regression(full_cox, exponentiate = TRUE)
  ),
  tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>% 
  as_flex_table()

Unadjusted

Propensity Score Matched

Characteristic

HR1

95% CI1

p-value

HR1

95% CI1

p-value

as.numeric(treatment)

0.88

0.80, 0.97

0.009

0.84

0.75, 0.94

0.003

1HR = Hazard Ratio, CI = Confidence Interval

Diagnostics

scf_residuals <- cox.zph(full_cox)
scf_residuals
                        chisq df    p
as.numeric(treatment) 0.00724  1 0.93
GLOBAL                0.00724  1 0.93
with(scf_residuals, cor.test(x, y))

    Pearson's product-moment correlation

data:  x and y
t = 0.085003, df = 1139, p-value = 0.9323
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.05552426  0.06054465
sample estimates:
        cor 
0.002518679 
tibble(
  x = scf_residuals$time,
  y = scf_residuals$y
) %>%
  ggplot(aes(x = x / 365, y = y)) + 
  geom_point() + 
  geom_smooth(method = "loess") + 
  labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") + 
  theme_minimal()

There is a no meaningful or significant correlation in the Schoenfeld residuals.

Footnotes