Balance Assessment and Survival Analysis

library("Rcpp", lib.loc = "~/R/x86_64-pc-linux-gnu-library/4.0/")
library(tidyverse)
library(gtsummary)
library(survival)
library(effectsize)
library(patchwork)

This file was compiled on 2022-05-09 14:50:05 by jsimmeri on argon-lc-f14-25.hpc. The script analysis_functions.R provides:

some functions for summarizing the cohort and building tables that are shared across scripts
code to load the entire collection out outcomes data (which it returns as model_data to the workspace)
sets root_dir according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOS

source("~/projects/pd-preprint/analysis/analysis_functions.R")

Data Load

Unadjusted Data

The tibble model_data is loaded by analysis_functions from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data contains data on all users of the three drugs. We want to reduce that here to just Tz/Dz/Az and 5ARI.

model_data <- model_data %>%
  filter(drug %in% c("5ari", "tz/dz/az")) %>%
  mutate(treatment = drug == "tz/dz/az")

Matches

We also want to load our matching results:

matches <- read_rds(glue::glue("{root_dir}/matches/tz_5ari.rds"))

And using the matches, build matched versions of the tamsulosin versus 5ARI model data:

matched_model_data <- model_data %>%
  select(-treatment) %>%
  inner_join(matches,
    by = "enrolid")

Balance and Cohort Description

Before Matching

describe_cohort(model_data)

Group	Characteristic	5ari, N = 79,133	tz/dz/az, N = 124,905	Difference1	95% CI12
Age and Medication Start Time	Age at Index Date	62 (54, 70)	62 (55, 70)	-0.047	-0.055, -0.038
	Medication Index Date	2,010.0 (2,008.0, 2,013.0)	2,009.0 (2,006.0, 2,012.0)	0.220	0.194, 0.254
Duration of Enrollment Time	Lookback Time (Years)	3.15 (1.83, 5.57)	2.85 (1.71, 5.18)	0.073	0.064, 0.082
	Follow Up Time (Years)	2.23 (0.98, 4.43)	1.96 (0.93, 4.19)	0.049	0.041, 0.058
Rates of Inpatient and Outpatient Encounters	Inpatient Events Per Year	0.00 (0.00, 0.00)	0.00 (0.00, 0.12)	-0.114	-0.123, -0.105
	Outpatient Events Per Year	9 (5, 16)	9 (5, 16)	-0.051	-0.060, -0.042
	Mean Number of DX Codes Per Outpatient Event	1.40 (1.20, 1.87)	1.37 (1.18, 1.83)	0.037	0.029, 0.046
	Outpatient DX Code Incidence Per Year	0.04 (0.02, 0.07)	0.04 (0.02, 0.07)	-0.031	-0.040, -0.022
	mean_ndx_out_rate	13 (7, 24)	14 (7, 25)	-0.031	-0.040, -0.022
Elixhauser/AHRQ Comorbidity Flags	Alcohol Abuse	1,135 (1.4%)	2,652 (2.1%)	-0.051	-0.060, -0.042
	Anemia	9,051 (11.4%)	16,683 (13.4%)	-0.058	-0.067, -0.049
	Blood Loss	886 (1.1%)	1,482 (1.2%)	-0.006	-0.015, 0.003
	Heart Failure	5,190 (6.6%)	9,957 (8.0%)	-0.054	-0.063, -0.045
	Coagulopathy	2,302 (2.9%)	3,572 (2.9%)	0.003	-0.006, 0.012
	Depression	5,380 (6.8%)	8,769 (7.0%)	-0.009	-0.018, 0.000
	Diabetes Without Complications	15,703 (19.8%)	36,097 (28.9%)	-0.209	-0.218, -0.200
	Diabetes With Complications	5,009 (6.3%)	14,154 (11.3%)	-0.172	-0.181, -0.163
	Drug Abuse	568 (0.7%)	1,311 (1.0%)	-0.035	-0.044, -0.026
	Fluid or Electrolyte Disorders	5,812 (7.3%)	13,535 (10.8%)	-0.119	-0.128, -0.110
	HIV	546 (0.7%)	481 (0.4%)	0.043	0.034, 0.052
	HTN Without Complications	41,769 (52.8%)	86,651 (69.4%)	-0.348	-0.357, -0.339
	HTN With Complications	7,828 (9.9%)	20,483 (16.4%)	-0.189	-0.198, -0.180
	Hypothyroidism	6,927 (8.8%)	9,649 (7.7%)	0.038	0.029, 0.047
	Liver Disease	2,454 (3.1%)	4,791 (3.8%)	-0.040	-0.049, -0.031
	Lymphoma	829 (1.0%)	1,291 (1.0%)	0.001	-0.008, 0.010
	Metastatic Cancer	1,090 (1.4%)	1,525 (1.2%)	0.014	0.005, 0.023
	Other Neurologic Disorders	4,419 (5.6%)	7,231 (5.8%)	-0.009	-0.018, 0.000
	Obesity	4,462 (5.6%)	11,396 (9.1%)	-0.130	-0.139, -0.122
	Paralysis	960 (1.2%)	2,017 (1.6%)	-0.034	-0.042, -0.025
	Pulmonary Hypertension	1,521 (1.9%)	2,402 (1.9%)	0.000	-0.009, 0.009
	Psychoses	4,098 (5.2%)	6,706 (5.4%)	-0.008	-0.017, 0.000
	Peptic Ulcer Disease	183 (0.2%)	293 (0.2%)	-0.001	-0.010, 0.008
	COPD	13,975 (17.7%)	23,707 (19.0%)	-0.034	-0.043, -0.025
	Peripheral Vascular Disease	8,260 (10.4%)	14,995 (12.0%)	-0.049	-0.058, -0.040
	Renal Failure	3,431 (4.3%)	11,908 (9.5%)	-0.198	-0.207, -0.189
	Rheumatoid Arthritis	2,785 (3.5%)	4,669 (3.7%)	-0.012	-0.021, -0.003
	Solid Tumor	10,371 (13.1%)	13,665 (10.9%)	0.067	0.058, 0.076
	Valvular Disease	9,097 (11.5%)	13,805 (11.1%)	0.014	0.005, 0.023
	Weight Loss	2,219 (2.8%)	3,334 (2.7%)	0.008	-0.001, 0.017
Hypotension	Orthostatic Hypotension	748 (0.9%)	898 (0.7%)	0.025	0.016, 0.034
	Other Hypotension	1,451 (1.8%)	2,129 (1.7%)	0.010	0.001, 0.019
Prostate Specific Antigen	PSA Measurement Taken	48,034 (60.7%)	71,601 (57.3%)	0.069	0.060, 0.077
	Diagnosis of Abnormal PSA	23,538 (29.7%)	13,950 (11.2%)	0.493	0.543, 0.543
Bladder Function and Urinary Flow	Slow Urinary Stream Diagnosis	1,936 (2.4%)	2,908 (2.3%)	0.008	-0.001, 0.017
	Uroflow Study Performed	5,556 (7.0%)	7,554 (6.0%)	0.040	0.031, 0.049
	Cystometrogram Performed	524 (0.7%)	892 (0.7%)	-0.006	-0.015, 0.003
	Diagnosis of BPH	35,618 (45.0%)	41,958 (33.6%)	0.237	0.208, 0.273
Other Comorbidities	Diagnosis of Anxiety	5,406 (6.8%)	7,700 (6.2%)	0.027	0.018, 0.036
	Diagnosis of Erectile Dysfunction	7,153 (9.0%)	11,152 (8.9%)	0.004	-0.005, 0.013
1Cohen's D
2CI = Confidence Interval

After Matching

describe_cohort(matched_model_data)

Group	Characteristic	5ari, N = 64,558	tz/dz/az, N = 64,558	Difference1	95% CI12
Age and Medication Start Time	Age at Index Date	61 (54, 70)	61 (54, 70)	0.005	-0.006, 0.016
	Medication Index Date	2,010.0 (2,007.0, 2,013.0)	2,010.0 (2,007.0, 2,012.0)	0.032	0.021, 0.043
Duration of Enrollment Time	Lookback Time (Years)	3.10 (1.82, 5.48)	3.02 (1.80, 5.43)	0.012	0.001, 0.023
	Follow Up Time (Years)	2.16 (0.95, 4.37)	2.15 (0.94, 4.37)	0.001	-0.010, 0.012
Rates of Inpatient and Outpatient Encounters	Inpatient Events Per Year	0.00 (0.00, 0.00)	0.00 (0.00, 0.00)	0.001	-0.010, 0.011
	Outpatient Events Per Year	9 (5, 16)	9 (5, 16)	0.002	-0.008, 0.013
	Mean Number of DX Codes Per Outpatient Event	1.40 (1.19, 1.88)	1.39 (1.19, 1.86)	0.019	0.008, 0.030
	Outpatient DX Code Incidence Per Year	0.04 (0.02, 0.07)	0.04 (0.02, 0.06)	0.008	-0.002, 0.019
	mean_ndx_out_rate	13 (7, 24)	13 (7, 24)	0.008	-0.002, 0.019
Elixhauser/AHRQ Comorbidity Flags	Alcohol Abuse	1,050 (1.6%)	1,053 (1.6%)	0.000	-0.011, 0.011
	Anemia	7,560 (11.7%)	7,490 (11.6%)	0.003	-0.008, 0.014
	Blood Loss	724 (1.1%)	733 (1.1%)	-0.001	-0.012, 0.010
	Heart Failure	4,400 (6.8%)	4,335 (6.7%)	0.004	-0.007, 0.015
	Coagulopathy	1,866 (2.9%)	1,850 (2.9%)	0.001	-0.009, 0.012
	Depression	4,526 (7.0%)	4,598 (7.1%)	-0.004	-0.015, 0.007
	Diabetes Without Complications	13,925 (21.6%)	13,976 (21.6%)	-0.002	-0.013, 0.009
	Diabetes With Complications	4,584 (7.1%)	4,582 (7.1%)	0.000	-0.011, 0.011
	Drug Abuse	524 (0.8%)	515 (0.8%)	0.002	-0.009, 0.012
	Fluid or Electrolyte Disorders	5,097 (7.9%)	5,072 (7.9%)	0.001	-0.009, 0.012
	HIV	371 (0.6%)	356 (0.6%)	0.003	-0.008, 0.014
	HTN Without Complications	36,318 (56.3%)	36,374 (56.3%)	-0.002	-0.013, 0.009
	HTN With Complications	7,050 (10.9%)	6,969 (10.8%)	0.004	-0.007, 0.015
	Hypothyroidism	5,508 (8.5%)	5,412 (8.4%)	0.005	-0.006, 0.016
	Liver Disease	2,111 (3.3%)	2,162 (3.3%)	-0.004	-0.015, 0.006
	Lymphoma	661 (1.0%)	686 (1.1%)	-0.004	-0.015, 0.007
	Metastatic Cancer	873 (1.4%)	893 (1.4%)	-0.003	-0.014, 0.008
	Other Neurologic Disorders	3,663 (5.7%)	3,665 (5.7%)	0.000	-0.011, 0.011
	Obesity	4,109 (6.4%)	4,035 (6.3%)	0.005	-0.006, 0.016
	Paralysis	851 (1.3%)	839 (1.3%)	0.002	-0.009, 0.013
	Pulmonary Hypertension	1,234 (1.9%)	1,241 (1.9%)	-0.001	-0.012, 0.010
	Psychoses	3,418 (5.3%)	3,494 (5.4%)	-0.005	-0.016, 0.006
	Peptic Ulcer Disease	153 (0.2%)	147 (0.2%)	0.002	-0.009, 0.013
	COPD	11,603 (18.0%)	11,709 (18.1%)	-0.004	-0.015, 0.007
	Peripheral Vascular Disease	6,894 (10.7%)	6,879 (10.7%)	0.001	-0.010, 0.012
	Renal Failure	3,246 (5.0%)	3,130 (4.8%)	0.008	-0.003, 0.019
	Rheumatoid Arthritis	2,345 (3.6%)	2,344 (3.6%)	0.000	-0.011, 0.011
	Solid Tumor	7,988 (12.4%)	7,987 (12.4%)	0.000	-0.011, 0.011
	Valvular Disease	7,290 (11.3%)	7,323 (11.3%)	-0.002	-0.013, 0.009
	Weight Loss	1,807 (2.8%)	1,784 (2.8%)	0.002	-0.009, 0.013
Hypotension	Orthostatic Hypotension	566 (0.9%)	547 (0.8%)	0.003	-0.008, 0.014
	Other Hypotension	1,158 (1.8%)	1,154 (1.8%)	0.000	-0.010, 0.011
Prostate Specific Antigen	PSA Measurement Taken	38,395 (59.5%)	38,094 (59.0%)	0.009	-0.001, 0.020
	Diagnosis of Abnormal PSA	13,277 (20.6%)	12,868 (19.9%)	0.016	0.005, 0.027
Bladder Function and Urinary Flow	Slow Urinary Stream Diagnosis	1,621 (2.5%)	1,688 (2.6%)	-0.007	-0.017, 0.004
	Uroflow Study Performed	4,459 (6.9%)	4,715 (7.3%)	-0.015	-0.026, -0.005
	Cystometrogram Performed	452 (0.7%)	502 (0.8%)	-0.009	-0.020, 0.002
	Diagnosis of BPH	26,978 (41.8%)	27,429 (42.5%)	-0.014	-0.025, -0.003
Other Comorbidities	Diagnosis of Anxiety	4,375 (6.8%)	4,369 (6.8%)	0.000	-0.011, 0.011
	Diagnosis of Erectile Dysfunction	5,860 (9.1%)	5,930 (9.2%)	-0.004	-0.015, 0.007
1Cohen's D
2CI = Confidence Interval

Propensity Score Overlap

We can also visually inspect the estimated propensity scores for overlap:

ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tz_5ari.rds"))
bind_rows(
  model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "Before Matching"),
  matched_model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "After Matching")
  ) %>%
  mutate(type = forcats::fct_rev(type)) %>%
  ggplot(aes(x = ptx, fill = drug, color = drug)) +
  geom_density(alpha = 0.5) + 
  labs(x = "Probability of Treatment with Tamsulosin", 
       y = "Density",
       color = "Observed Treatment",
       fill = "Observed Treatment") +
  theme_minimal() +
  theme(legend.position = "bottom") + 
  facet_wrap(~ type, nrow = 2)

It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.

Survival Analysis

Before Matching

Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.

survfit(Surv(survival_time, develops_pd) ~ treatment, 
        data = model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "treatment=TRUE", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

A finding that is statistically significant using the log rank test:

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = model_data)

Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = model_data)

                     N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE  79133      765      711      4.13      6.86
treatment=TRUE  124905     1022     1076      2.73      6.86

 Chisq= 6.9  on 1 degrees of freedom, p= 0.009

The Cox regression pegs the protection at HR at 0.88 (0.80, 0.97).

raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                 data = model_data,
                 robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)

Characteristic	HR¹	95% CI¹	p-value
as.numeric(treatment)	0.88	0.80, 0.97	0.009
¹ HR = Hazard Ratio, CI = Confidence Interval

However, this estimate is not trustworthy as we know the groups could be better balanced.

With Propensity Score Matching

Further adjustment yields similar results:

We observe the following overall incidence:

matched_model_data %>%
  group_by(drug) %>%
  summarize(
    cases = sum(develops_pd),
    time = sum(survival_time / 365),
    incidence = cases / time * 10000
  ) %>%
  knitr::kable()

drug	cases	time	incidence
5ari	620	196284.6	31.58680
tz/dz/az	521	196256.5	26.54689

With a Kaplan Meier curve:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "drug=5ari", 
    "5ARI", "TZ/DZ/AZ")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

The population size at risk by year is given in the table below:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  select(strata, time,, n.risk) %>%
  complete(strata, time = 0:(16*365)) %>%
  fill(n.risk, .direction = "down") %>%
  mutate(drug = ifelse(strata == "drug=5ari", "5ARI", "TZ/DZ/AZ")) %>%
  select(time, drug, n.risk) %>%
  filter(time %in% c(1:16 * 365)) %>%
  mutate(year = round(time / 365)) %>%
  spread(drug, n.risk) %>%
  knitr::kable()

time	year	5ARI	TZ/DZ/AZ
365	1	47223	47125
730	2	33639	33552
1095	3	24454	24446
1460	4	17905	17858
1825	5	12985	12986
2190	6	9315	9341
2555	7	6648	6645
2920	8	4504	4493
3285	9	2993	2975
3650	10	1865	1877
4015	11	1089	1072
4380	12	536	511
4745	13	266	270
5110	14	115	127
5475	15	34	35
5840	16	2	2

A log rank test confirms that the curves remain different.

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = matched_model_data)

Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = matched_model_data)

                N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 64558      620      571      4.28      8.57
treatment=1 64558      521      570      4.28      8.57

 Chisq= 8.6  on 1 degrees of freedom, p= 0.003

Survival times

tbl_survfit(
  survfit(
    Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
  ),
  probs = seq(0.5, 2, by = 0.5) / 100
)

Characteristic	0.5% Percentile	1.0% Percentile	1.5% Percentile	2.0% Percentile
drug
5ari	556 (495, 649)	1,160 (1,064, 1,307)	1,746 (1,616, 1,943)	2,385 (2,183, 2,702)
tz/dz/az	640 (569, 691)	1,342 (1,188, 1,562)	2,140 (1,995, 2,369)	2,815 (2,575, 3,352)

We fit a Cox regression model to estimate the differences in hazard:

full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                       data = matched_model_data,
                       robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)

Characteristic	HR¹	95% CI¹	p-value
as.numeric(treatment)	0.84	0.75, 0.94	0.003
¹ HR = Hazard Ratio, CI = Confidence Interval

The Cox regression yields a HR of 0.81 (0.72, 0.92; p < 0.0001).

Fit Summaries

tbl_merge(
  tbls = list(
    tbl_regression(raw_cox, exponentiate = TRUE),
    tbl_regression(full_cox, exponentiate = TRUE)
  ),
  tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>% 
  as_flex_table()

	Unadjusted	Propensity Score Matched
as.numeric(treatment)	0.88	0.80, 0.97	0.009	0.84	0.75, 0.94	0.003
1HR = Hazard Ratio, CI = Confidence Interval

Diagnostics

scf_residuals <- cox.zph(full_cox)
scf_residuals

                        chisq df    p
as.numeric(treatment) 0.00724  1 0.93
GLOBAL                0.00724  1 0.93

with(scf_residuals, cor.test(x, y))


    Pearson's product-moment correlation

data:  x and y
t = 0.085003, df = 1139, p-value = 0.9323
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.05552426  0.06054465
sample estimates:
        cor 
0.002518679

tibble(
  x = scf_residuals$time,
  y = scf_residuals$y
) %>%
  ggplot(aes(x = x / 365, y = y)) + 
  geom_point() + 
  geom_smooth(method = "loess") + 
  labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") + 
  theme_minimal()

There is a no meaningful or significant correlation in the Schoenfeld residuals.

Balance Assessment and Survival Analysis | Tz/Dz/Az vs 5ARI

Data Load

Unadjusted Data

Matches

Balance and Cohort Description

Before Matching

After Matching

Propensity Score Overlap

Survival Analysis

Before Matching

With Propensity Score Matching

Fit Summaries

Diagnostics