Balance Assessment and Survival Analysis | Tz/Dz/Az vs 5ARI

Jacob Simmering, PhD https://jacobsimmering.com (University of Iowa)https://uiowa.edu
2022-05-09
library("Rcpp", lib.loc = "~/R/x86_64-pc-linux-gnu-library/4.0/")
library(tidyverse)
library(gtsummary)
library(survival)
library(effectsize)
library(patchwork)

This file was compiled on 2022-05-09 14:50:05 by jsimmeri on argon-lc-f14-25.hpc. The script analysis_functions.R provides:

  1. some functions for summarizing the cohort and building tables that are shared across scripts
  2. code to load the entire collection out outcomes data (which it returns as model_data to the workspace)
  3. sets root_dir according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOS
source("~/projects/pd-preprint/analysis/analysis_functions.R")

Data Load

Unadjusted Data

The tibble model_data is loaded by analysis_functions from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data contains data on all users of the three drugs. We want to reduce that here to just Tz/Dz/Az and 5ARI.

model_data <- model_data %>%
  filter(drug %in% c("5ari", "tz/dz/az")) %>%
  mutate(treatment = drug == "tz/dz/az")

Matches

We also want to load our matching results:

matches <- read_rds(glue::glue("{root_dir}/matches/tz_5ari.rds"))

And using the matches, build matched versions of the tamsulosin versus 5ARI model data:

matched_model_data <- model_data %>%
  select(-treatment) %>%
  inner_join(matches,
    by = "enrolid")

Balance and Cohort Description

Before Matching

describe_cohort(model_data)

After Matching

describe_cohort(matched_model_data)

Propensity Score Overlap

We can also visually inspect the estimated propensity scores for overlap:

ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tz_5ari.rds"))
bind_rows(
  model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "Before Matching"),
  matched_model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "After Matching")
  ) %>%
  mutate(type = forcats::fct_rev(type)) %>%
  ggplot(aes(x = ptx, fill = drug, color = drug)) +
  geom_density(alpha = 0.5) + 
  labs(x = "Probability of Treatment with Tamsulosin", 
       y = "Density",
       color = "Observed Treatment",
       fill = "Observed Treatment") +
  theme_minimal() +
  theme(legend.position = "bottom") + 
  facet_wrap(~ type, nrow = 2)

It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.

Survival Analysis

Before Matching

Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.

survfit(Surv(survival_time, develops_pd) ~ treatment, 
        data = model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "treatment=TRUE", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

A finding that is statistically significant using the log rank test:

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = model_data)
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = model_data)

                     N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE  79133      765      711      4.13      6.86
treatment=TRUE  124905     1022     1076      2.73      6.86

 Chisq= 6.9  on 1 degrees of freedom, p= 0.009 

The Cox regression pegs the protection at HR at 0.88 (0.80, 0.97).

raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                 data = model_data,
                 robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)
Characteristic HR1 95% CI1 p-value
as.numeric(treatment) 0.88 0.80, 0.97 0.009

1 HR = Hazard Ratio, CI = Confidence Interval

However, this estimate is not trustworthy as we know the groups could be better balanced.

With Propensity Score Matching

Further adjustment yields similar results:

We observe the following overall incidence:

matched_model_data %>%
  group_by(drug) %>%
  summarize(
    cases = sum(develops_pd),
    time = sum(survival_time / 365),
    incidence = cases / time * 10000
  ) %>%
  knitr::kable()
drug cases time incidence
5ari 620 196284.6 31.58680
tz/dz/az 521 196256.5 26.54689

With a Kaplan Meier curve:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "drug=5ari", 
    "5ARI", "TZ/DZ/AZ")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

The population size at risk by year is given in the table below:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  select(strata, time,, n.risk) %>%
  complete(strata, time = 0:(16*365)) %>%
  fill(n.risk, .direction = "down") %>%
  mutate(drug = ifelse(strata == "drug=5ari", "5ARI", "TZ/DZ/AZ")) %>%
  select(time, drug, n.risk) %>%
  filter(time %in% c(1:16 * 365)) %>%
  mutate(year = round(time / 365)) %>%
  spread(drug, n.risk) %>%
  knitr::kable()
time year 5ARI TZ/DZ/AZ
365 1 47223 47125
730 2 33639 33552
1095 3 24454 24446
1460 4 17905 17858
1825 5 12985 12986
2190 6 9315 9341
2555 7 6648 6645
2920 8 4504 4493
3285 9 2993 2975
3650 10 1865 1877
4015 11 1089 1072
4380 12 536 511
4745 13 266 270
5110 14 115 127
5475 15 34 35
5840 16 2 2

A log rank test confirms that the curves remain different.

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = matched_model_data)
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = matched_model_data)

                N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 64558      620      571      4.28      8.57
treatment=1 64558      521      570      4.28      8.57

 Chisq= 8.6  on 1 degrees of freedom, p= 0.003 

Survival times

tbl_survfit(
  survfit(
    Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
  ),
  probs = seq(0.5, 2, by = 0.5) / 100
)
Characteristic 0.5% Percentile 1.0% Percentile 1.5% Percentile 2.0% Percentile
drug
5ari 556 (495, 649) 1,160 (1,064, 1,307) 1,746 (1,616, 1,943) 2,385 (2,183, 2,702)
tz/dz/az 640 (569, 691) 1,342 (1,188, 1,562) 2,140 (1,995, 2,369) 2,815 (2,575, 3,352)

We fit a Cox regression model to estimate the differences in hazard:

full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                       data = matched_model_data,
                       robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)
Characteristic HR1 95% CI1 p-value
as.numeric(treatment) 0.84 0.75, 0.94 0.003

1 HR = Hazard Ratio, CI = Confidence Interval

The Cox regression yields a HR of 0.81 (0.72, 0.92; p < 0.0001).

Fit Summaries

tbl_merge(
  tbls = list(
    tbl_regression(raw_cox, exponentiate = TRUE),
    tbl_regression(full_cox, exponentiate = TRUE)
  ),
  tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>% 
  as_flex_table()

Diagnostics

scf_residuals <- cox.zph(full_cox)
scf_residuals
                        chisq df    p
as.numeric(treatment) 0.00724  1 0.93
GLOBAL                0.00724  1 0.93
with(scf_residuals, cor.test(x, y))

    Pearson's product-moment correlation

data:  x and y
t = 0.085003, df = 1139, p-value = 0.9323
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.05552426  0.06054465
sample estimates:
        cor 
0.002518679 
tibble(
  x = scf_residuals$time,
  y = scf_residuals$y
) %>%
  ggplot(aes(x = x / 365, y = y)) + 
  geom_point() + 
  geom_smooth(method = "loess") + 
  labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") + 
  theme_minimal()

There is a no meaningful or significant correlation in the Schoenfeld residuals.