Balance Assessment and Survival Analysis | Tz/Dz/Az vs Tamsulosin

Jacob Simmering, PhD https://jacobsimmering.com (University of Iowa)https://uiowa.edu
2022-05-09
library("Rcpp", lib.loc = "~/R/x86_64-pc-linux-gnu-library/4.0/")
library(tidyverse)
library(gtsummary)
library(survival)
library(effectsize)
library(patchwork)

This file was compiled on 2022-05-09 14:48:08 by jsimmeri on argon-lc-f14-25.hpc.

The script analysis_functions.R provides:

  1. some functions for summarizing the cohort and building tables that are shared across scripts
  2. code to load the entire collection out outcomes data (which it returns as model_data to the workspace)
  3. sets root_dir according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOS
source("~/projects/pd-preprint/analysis/analysis_functions.R")

Data Load

Unadjusted Data

The tibble model_data is loaded by analysis_functions from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data contains data on all users of the three drugs. We want to reduce that here to just Tz/Dz/Az and tamsulosin.

model_data <- model_data %>%
  filter(drug %in% c("tamsulosin", "tz/dz/az")) %>%
  mutate(treatment = drug == "tz/dz/az")

Matches

We also want to load our matching results:

matches <- read_rds(glue::glue("{root_dir}/matches/tz_tam.rds"))

And using the matches, build matched versions of the tamsulosin versus 5ARI model data:

matched_model_data <- model_data %>%
  select(-treatment) %>%
  inner_join(matches,
    by = "enrolid")

Balance and Cohort Description

Before Matching

describe_cohort(model_data)

After Matching

describe_cohort(matched_model_data)

Propensity Score Overlap

We can also visually inspect the estimated propensity scores for overlap:

ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tz_tam.rds"))
bind_rows(
  model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "Before Matching"),
  matched_model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "After Matching")
  ) %>%
  mutate(type = forcats::fct_rev(type)) %>%
  ggplot(aes(x = ptx, fill = drug, color = drug)) +
  geom_density(alpha = 0.5) + 
  labs(x = "Probability of Treatment with Tamsulosin", 
       y = "Density",
       color = "Observed Treatment",
       fill = "Observed Treatment") +
  theme_minimal() +
  theme(legend.position = "bottom") + 
  facet_wrap(~ type, nrow = 2)

It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.

Survival Analysis

Before Matching

Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.

survfit(Surv(survival_time, develops_pd) ~ treatment, 
        data = model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "treatment=TRUE", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

A finding that is statistically significant using the log rank test:

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = model_data)
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = model_data)

                     N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE 429741     4867     4476      34.2       143
treatment=TRUE  124905     1022     1413     108.3       143

 Chisq= 143  on 1 degrees of freedom, p= <2e-16 

The Cox regression pegs the protection at HR at 0.67 (0.62, 0.71).

raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                 data = model_data,
                 robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)
Characteristic HR1 95% CI1 p-value
as.numeric(treatment) 0.66 0.62, 0.71 <0.001

1 HR = Hazard Ratio, CI = Confidence Interval

However, this estimate is not trustworthy as we know the groups could be better balanced.

With Propensity Score Matching

Further adjustment yields similar results.

We observe the following overall incidence:

matched_model_data %>%
  group_by(drug) %>%
  summarize(
    cases = sum(develops_pd),
    time = sum(survival_time / 365),
    incidence = cases / time * 10000
  ) %>%
  knitr::kable()
drug cases time incidence
tamsulosin 1391 358504.4 38.80008
tz/dz/az 991 359705.2 27.55034

With a Kaplan Meier curve:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "drug=tamsulosin", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

The population size at risk by year is given in the table below:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  select(strata, time,, n.risk) %>%
  complete(strata, time = 0:(16*365)) %>%
  fill(n.risk, .direction = "down") %>%
  mutate(drug = ifelse(strata == "drug=tamsulosin", "Tamsulosin", "TZ/DZ/AZ")) %>%
  select(time, drug, n.risk) %>%
  filter(time %in% c(1:16 * 365)) %>%
  mutate(year = round(time / 365)) %>%
  spread(drug, n.risk) %>%
  knitr::kable()
time year Tamsulosin TZ/DZ/AZ
365 1 86903 87067
730 2 60751 60535
1095 3 44291 44272
1460 4 32414 32461
1825 5 23443 23524
2190 6 16894 16944
2555 7 12166 12186
2920 8 8306 8332
3285 9 5535 5554
3650 10 3484 3568
4015 11 2124 2146
4380 12 1133 1121
4745 13 629 646
5110 14 304 317
5475 15 121 117
5840 16 2 3

A log rank test confirms that the curves remain different.

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = matched_model_data)
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = matched_model_data)

                 N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 119944     1391     1189      34.3      68.5
treatment=1 119944      991     1193      34.2      68.5

 Chisq= 68.5  on 1 degrees of freedom, p= <2e-16 

Survival times

tbl_survfit(
  survfit(
    Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
  ),
  probs = seq(0.5, 2, by = 0.5) / 100
)
Characteristic 0.5% Percentile 1.0% Percentile 1.5% Percentile 2.0% Percentile
drug
tamsulosin 442 (412, 496) 958 (890, 1,015) 1,463 (1,380, 1,584) 1,996 (1,891, 2,156)
tz/dz/az 595 (547, 651) 1,317 (1,237, 1,481) 2,100 (1,968, 2,264) 2,768 (2,578, 3,030)

We fit a Cox regression model to estimate the differences in hazard:

full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                       data = matched_model_data,
                       robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)
Characteristic HR1 95% CI1 p-value
as.numeric(treatment) 0.71 0.65, 0.77 <0.001

1 HR = Hazard Ratio, CI = Confidence Interval

The Cox regression yields a HR of 0.71 (0.66, 0.78; p < 0.001).

Fit Summaries

tbl_merge(
  tbls = list(
    tbl_regression(raw_cox, exponentiate = TRUE),
    tbl_regression(full_cox, exponentiate = TRUE)
  ),
  tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>% 
  as_flex_table()

Diagnostics

scf_residuals <- cox.zph(full_cox)
scf_residuals
                      chisq df    p
as.numeric(treatment)  1.98  1 0.16
GLOBAL                 1.98  1 0.16
with(scf_residuals, cor.test(x, y))

    Pearson's product-moment correlation

data:  x and y
t = -1.4089, df = 2380, p-value = 0.159
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.06894934  0.01130818
sample estimates:
       cor 
-0.0288671 
tibble(
  x = scf_residuals$time,
  y = scf_residuals$y
) %>%
  ggplot(aes(x = x / 365, y = y)) + 
  geom_point() + 
  geom_smooth(method = "loess") + 
  labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") + 
  theme_minimal()

There is statistically significant correlation with the Schoenfeld residuals; however, we are likely over-powered to detect a deviation. Looking at the estimated correlation (-0.059), we think it is unlikely that this has a meaningful impact on our estimated hazard ratio.