Balance Assessment and Survival Analysis

library("Rcpp", lib.loc = "~/R/x86_64-pc-linux-gnu-library/4.0/")
library(tidyverse)
library(gtsummary)
library(survival)
library(effectsize)
library(patchwork)

This file was compiled on 2022-05-09 14:48:08 by jsimmeri on argon-lc-f14-25.hpc.

The script analysis_functions.R provides:

some functions for summarizing the cohort and building tables that are shared across scripts
code to load the entire collection out outcomes data (which it returns as model_data to the workspace)
sets root_dir according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOS

source("~/projects/pd-preprint/analysis/analysis_functions.R")

Data Load

Unadjusted Data

The tibble model_data is loaded by analysis_functions from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data contains data on all users of the three drugs. We want to reduce that here to just Tz/Dz/Az and tamsulosin.

model_data <- model_data %>%
  filter(drug %in% c("tamsulosin", "tz/dz/az")) %>%
  mutate(treatment = drug == "tz/dz/az")

Matches

We also want to load our matching results:

matches <- read_rds(glue::glue("{root_dir}/matches/tz_tam.rds"))

And using the matches, build matched versions of the tamsulosin versus 5ARI model data:

matched_model_data <- model_data %>%
  select(-treatment) %>%
  inner_join(matches,
    by = "enrolid")

Balance and Cohort Description

Before Matching

describe_cohort(model_data)

Group	Characteristic	tamsulosin, N = 429,741	tz/dz/az, N = 124,905	Difference1	95% CI12
Age and Medication Start Time	Age at Index Date	62 (56, 71)	62 (55, 70)	0.051	0.045, 0.058
	Medication Index Date	2,012.0 (2,008.0, 2,014.0)	2,009.0 (2,006.0, 2,012.0)	0.481	0.529, 0.529
Duration of Enrollment Time	Lookback Time (Years)	3.6 (2.0, 6.3)	2.9 (1.7, 5.2)	0.242	0.266, 0.266
	Follow Up Time (Years)	1.90 (0.86, 3.78)	1.96 (0.93, 4.19)	-0.085	-0.092, -0.079
Rates of Inpatient and Outpatient Encounters	Inpatient Events Per Year	0.00 (0.00, 0.21)	0.00 (0.00, 0.12)	0.080	0.073, 0.086
	Outpatient Events Per Year	10 (5, 18)	9 (5, 16)	0.065	0.059, 0.071
	Mean Number of DX Codes Per Outpatient Event	1.56 (1.27, 2.09)	1.37 (1.18, 1.83)	0.279	0.307, 0.307
	Outpatient DX Code Incidence Per Year	0.04 (0.02, 0.08)	0.04 (0.02, 0.07)	0.129	0.122, 0.135
	mean_ndx_out_rate	16 (9, 29)	14 (7, 25)	0.129	0.122, 0.135
Elixhauser/AHRQ Comorbidity Flags	Alcohol Abuse	11,147 (2.6%)	2,652 (2.1%)	0.030	0.024, 0.037
	Anemia	66,743 (15.5%)	16,683 (13.4%)	0.061	0.055, 0.067
	Blood Loss	6,996 (1.6%)	1,482 (1.2%)	0.036	0.030, 0.042
	Heart Failure	41,392 (9.6%)	9,957 (8.0%)	0.057	0.051, 0.064
	Coagulopathy	18,433 (4.3%)	3,572 (2.9%)	0.073	0.067, 0.080
	Depression	42,310 (9.8%)	8,769 (7.0%)	0.098	0.091, 0.104
	Diabetes Without Complications	118,841 (27.7%)	36,097 (28.9%)	-0.028	-0.034, -0.021
	Diabetes With Complications	45,465 (10.6%)	14,154 (11.3%)	-0.024	-0.031, -0.018
	Drug Abuse	6,533 (1.5%)	1,311 (1.0%)	0.040	0.034, 0.046
	Fluid or Electrolyte Disorders	54,203 (12.6%)	13,535 (10.8%)	0.054	0.048, 0.061
	HIV	1,426 (0.3%)	481 (0.4%)	-0.009	-0.015, -0.003
	HTN Without Complications	266,446 (62.0%)	86,651 (69.4%)	-0.154	-0.160, -0.147
	HTN With Complications	55,606 (12.9%)	20,483 (16.4%)	-0.101	-0.107, -0.094
	Hypothyroidism	42,431 (9.9%)	9,649 (7.7%)	0.074	0.067, 0.080
	Liver Disease	23,871 (5.6%)	4,791 (3.8%)	0.078	0.071, 0.084
	Lymphoma	6,226 (1.4%)	1,291 (1.0%)	0.036	0.030, 0.042
	Metastatic Cancer	10,529 (2.5%)	1,525 (1.2%)	0.084	0.078, 0.091
	Other Neurologic Disorders	39,518 (9.2%)	7,231 (5.8%)	0.123	0.116, 0.129
	Obesity	47,431 (11.0%)	11,396 (9.1%)	0.062	0.056, 0.068
	Paralysis	10,106 (2.4%)	2,017 (1.6%)	0.050	0.044, 0.057
	Pulmonary Hypertension	12,901 (3.0%)	2,402 (1.9%)	0.066	0.060, 0.072
	Psychoses	31,075 (7.2%)	6,706 (5.4%)	0.074	0.068, 0.080
	Peptic Ulcer Disease	1,672 (0.4%)	293 (0.2%)	0.026	0.020, 0.032
	COPD	102,194 (23.8%)	23,707 (19.0%)	0.115	0.108, 0.121
	Peripheral Vascular Disease	62,941 (14.6%)	14,995 (12.0%)	0.076	0.070, 0.082
	Renal Failure	30,407 (7.1%)	11,908 (9.5%)	-0.093	-0.099, -0.086
	Rheumatoid Arthritis	21,620 (5.0%)	4,669 (3.7%)	0.061	0.055, 0.067
	Solid Tumor	72,259 (16.8%)	13,665 (10.9%)	0.163	0.143, 0.188
	Valvular Disease	60,458 (14.1%)	13,805 (11.1%)	0.089	0.082, 0.095
	Weight Loss	19,310 (4.5%)	3,334 (2.7%)	0.092	0.086, 0.099
Hypotension	Orthostatic Hypotension	5,000 (1.2%)	898 (0.7%)	0.043	0.037, 0.050
	Other Hypotension	12,550 (2.9%)	2,129 (1.7%)	0.076	0.069, 0.082
Prostate Specific Antigen	PSA Measurement Taken	247,089 (57.5%)	71,601 (57.3%)	0.003	-0.003, 0.010
	Diagnosis of Abnormal PSA	69,520 (16.2%)	13,950 (11.2%)	0.140	0.134, 0.147
Bladder Function and Urinary Flow	Slow Urinary Stream Diagnosis	13,557 (3.2%)	2,908 (2.3%)	0.049	0.042, 0.055
	Uroflow Study Performed	27,946 (6.5%)	7,554 (6.0%)	0.019	0.012, 0.025
	Cystometrogram Performed	3,304 (0.8%)	892 (0.7%)	0.006	0.000, 0.013
	Diagnosis of BPH	179,924 (41.9%)	41,958 (33.6%)	0.169	0.149, 0.196
Other Comorbidities	Diagnosis of Anxiety	38,428 (8.9%)	7,700 (6.2%)	0.101	0.094, 0.107
	Diagnosis of Erectile Dysfunction	44,895 (10.4%)	11,152 (8.9%)	0.050	0.044, 0.057
1Cohen's D
2CI = Confidence Interval

After Matching

describe_cohort(matched_model_data)

Group	Characteristic	tamsulosin, N = 119,944	tz/dz/az, N = 119,944	Difference1	95% CI12
Age and Medication Start Time	Age at Index Date	62 (55, 70)	62 (55, 70)	0.015	0.007, 0.023
	Medication Index Date	2,009.0 (2,006.0, 2,012.0)	2,009.0 (2,006.0, 2,012.0)	0.011	0.003, 0.019
Duration of Enrollment Time	Lookback Time (Years)	3.19 (1.88, 5.50)	2.94 (1.76, 5.30)	0.061	0.052, 0.069
	Follow Up Time (Years)	2.07 (0.92, 4.29)	2.05 (0.93, 4.29)	-0.001	-0.009, 0.007
Rates of Inpatient and Outpatient Encounters	Inpatient Events Per Year	0.00 (0.00, 0.14)	0.00 (0.00, 0.13)	0.005	-0.003, 0.013
	Outpatient Events Per Year	9 (5, 16)	9 (5, 16)	0.008	0.000, 0.016
	Mean Number of DX Codes Per Outpatient Event	1.39 (1.19, 1.87)	1.38 (1.19, 1.86)	0.006	-0.002, 0.014
	Outpatient DX Code Incidence Per Year	0.04 (0.02, 0.07)	0.04 (0.02, 0.07)	0.010	0.002, 0.018
	mean_ndx_out_rate	14 (7, 25)	14 (7, 25)	0.010	0.002, 0.018
Elixhauser/AHRQ Comorbidity Flags	Alcohol Abuse	2,684 (2.2%)	2,599 (2.2%)	0.005	-0.003, 0.013
	Anemia	16,334 (13.6%)	16,209 (13.5%)	0.003	-0.005, 0.011
	Blood Loss	1,494 (1.2%)	1,456 (1.2%)	0.003	-0.005, 0.011
	Heart Failure	9,925 (8.3%)	9,705 (8.1%)	0.007	-0.001, 0.015
	Coagulopathy	3,549 (3.0%)	3,506 (2.9%)	0.002	-0.006, 0.010
	Depression	8,731 (7.3%)	8,660 (7.2%)	0.002	-0.006, 0.010
	Diabetes Without Complications	34,932 (29.1%)	34,735 (29.0%)	0.004	-0.004, 0.012
	Diabetes With Complications	13,786 (11.5%)	13,616 (11.4%)	0.004	-0.004, 0.012
	Drug Abuse	1,300 (1.1%)	1,299 (1.1%)	0.000	-0.008, 0.008
	Fluid or Electrolyte Disorders	13,302 (11.1%)	13,181 (11.0%)	0.003	-0.005, 0.011
	HIV	493 (0.4%)	467 (0.4%)	0.003	-0.005, 0.011
	HTN Without Complications	82,991 (69.2%)	82,654 (68.9%)	0.006	-0.002, 0.014
	HTN With Complications	19,857 (16.6%)	19,442 (16.2%)	0.009	0.001, 0.017
	Hypothyroidism	9,647 (8.0%)	9,449 (7.9%)	0.006	-0.002, 0.014
	Liver Disease	4,807 (4.0%)	4,737 (3.9%)	0.003	-0.005, 0.011
	Lymphoma	1,296 (1.1%)	1,266 (1.1%)	0.002	-0.006, 0.010
	Metastatic Cancer	1,542 (1.3%)	1,514 (1.3%)	0.002	-0.006, 0.010
	Other Neurologic Disorders	7,305 (6.1%)	7,159 (6.0%)	0.005	-0.003, 0.013
	Obesity	11,270 (9.4%)	11,260 (9.4%)	0.000	-0.008, 0.008
	Paralysis	2,013 (1.7%)	1,985 (1.7%)	0.002	-0.006, 0.010
	Pulmonary Hypertension	2,480 (2.1%)	2,386 (2.0%)	0.006	-0.002, 0.014
	Psychoses	6,816 (5.7%)	6,637 (5.5%)	0.006	-0.002, 0.014
	Peptic Ulcer Disease	290 (0.2%)	290 (0.2%)	0.000	-0.008, 0.008
	COPD	23,561 (19.6%)	23,283 (19.4%)	0.006	-0.002, 0.014
	Peripheral Vascular Disease	14,877 (12.4%)	14,648 (12.2%)	0.006	-0.002, 0.014
	Renal Failure	11,382 (9.5%)	11,288 (9.4%)	0.003	-0.005, 0.011
	Rheumatoid Arthritis	4,615 (3.8%)	4,589 (3.8%)	0.001	-0.007, 0.009
	Solid Tumor	13,662 (11.4%)	13,507 (11.3%)	0.004	-0.004, 0.012
	Valvular Disease	13,851 (11.5%)	13,586 (11.3%)	0.007	-0.001, 0.015
	Weight Loss	3,380 (2.8%)	3,293 (2.7%)	0.004	-0.004, 0.012
Hypotension	Orthostatic Hypotension	957 (0.8%)	893 (0.7%)	0.006	-0.002, 0.014
	Other Hypotension	2,160 (1.8%)	2,110 (1.8%)	0.003	-0.005, 0.011
Prostate Specific Antigen	PSA Measurement Taken	69,252 (57.7%)	68,586 (57.2%)	0.011	0.003, 0.019
	Diagnosis of Abnormal PSA	14,040 (11.7%)	13,746 (11.5%)	0.008	0.000, 0.016
Bladder Function and Urinary Flow	Slow Urinary Stream Diagnosis	3,061 (2.6%)	2,891 (2.4%)	0.009	0.001, 0.017
	Uroflow Study Performed	7,682 (6.4%)	7,428 (6.2%)	0.009	0.001, 0.017
	Cystometrogram Performed	910 (0.8%)	886 (0.7%)	0.002	-0.006, 0.010
	Diagnosis of BPH	41,863 (34.9%)	41,127 (34.3%)	0.013	0.005, 0.021
Other Comorbidities	Diagnosis of Anxiety	7,737 (6.5%)	7,628 (6.4%)	0.004	-0.004, 0.012
	Diagnosis of Erectile Dysfunction	11,200 (9.3%)	10,965 (9.1%)	0.007	-0.001, 0.015
1Cohen's D
2CI = Confidence Interval

Propensity Score Overlap

We can also visually inspect the estimated propensity scores for overlap:

ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tz_tam.rds"))
bind_rows(
  model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "Before Matching"),
  matched_model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "After Matching")
  ) %>%
  mutate(type = forcats::fct_rev(type)) %>%
  ggplot(aes(x = ptx, fill = drug, color = drug)) +
  geom_density(alpha = 0.5) + 
  labs(x = "Probability of Treatment with Tamsulosin", 
       y = "Density",
       color = "Observed Treatment",
       fill = "Observed Treatment") +
  theme_minimal() +
  theme(legend.position = "bottom") + 
  facet_wrap(~ type, nrow = 2)

It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.

Survival Analysis

Before Matching

Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.

survfit(Surv(survival_time, develops_pd) ~ treatment, 
        data = model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "treatment=TRUE", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

A finding that is statistically significant using the log rank test:

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = model_data)

Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = model_data)

                     N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE 429741     4867     4476      34.2       143
treatment=TRUE  124905     1022     1413     108.3       143

 Chisq= 143  on 1 degrees of freedom, p= <2e-16

The Cox regression pegs the protection at HR at 0.67 (0.62, 0.71).

raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                 data = model_data,
                 robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)

Characteristic	HR¹	95% CI¹	p-value
as.numeric(treatment)	0.66	0.62, 0.71	<0.001
¹ HR = Hazard Ratio, CI = Confidence Interval

However, this estimate is not trustworthy as we know the groups could be better balanced.

With Propensity Score Matching

Further adjustment yields similar results.

We observe the following overall incidence:

matched_model_data %>%
  group_by(drug) %>%
  summarize(
    cases = sum(develops_pd),
    time = sum(survival_time / 365),
    incidence = cases / time * 10000
  ) %>%
  knitr::kable()

drug	cases	time	incidence
tamsulosin	1391	358504.4	38.80008
tz/dz/az	991	359705.2	27.55034

With a Kaplan Meier curve:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "drug=tamsulosin", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

The population size at risk by year is given in the table below:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  select(strata, time,, n.risk) %>%
  complete(strata, time = 0:(16*365)) %>%
  fill(n.risk, .direction = "down") %>%
  mutate(drug = ifelse(strata == "drug=tamsulosin", "Tamsulosin", "TZ/DZ/AZ")) %>%
  select(time, drug, n.risk) %>%
  filter(time %in% c(1:16 * 365)) %>%
  mutate(year = round(time / 365)) %>%
  spread(drug, n.risk) %>%
  knitr::kable()

time	year	Tamsulosin	TZ/DZ/AZ
365	1	86903	87067
730	2	60751	60535
1095	3	44291	44272
1460	4	32414	32461
1825	5	23443	23524
2190	6	16894	16944
2555	7	12166	12186
2920	8	8306	8332
3285	9	5535	5554
3650	10	3484	3568
4015	11	2124	2146
4380	12	1133	1121
4745	13	629	646
5110	14	304	317
5475	15	121	117
5840	16	2	3

A log rank test confirms that the curves remain different.

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = matched_model_data)

Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = matched_model_data)

                 N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 119944     1391     1189      34.3      68.5
treatment=1 119944      991     1193      34.2      68.5

 Chisq= 68.5  on 1 degrees of freedom, p= <2e-16

Survival times

tbl_survfit(
  survfit(
    Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
  ),
  probs = seq(0.5, 2, by = 0.5) / 100
)

Characteristic	0.5% Percentile	1.0% Percentile	1.5% Percentile	2.0% Percentile
drug
tamsulosin	442 (412, 496)	958 (890, 1,015)	1,463 (1,380, 1,584)	1,996 (1,891, 2,156)
tz/dz/az	595 (547, 651)	1,317 (1,237, 1,481)	2,100 (1,968, 2,264)	2,768 (2,578, 3,030)

We fit a Cox regression model to estimate the differences in hazard:

full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                       data = matched_model_data,
                       robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)

Characteristic	HR¹	95% CI¹	p-value
as.numeric(treatment)	0.71	0.65, 0.77	<0.001
¹ HR = Hazard Ratio, CI = Confidence Interval

The Cox regression yields a HR of 0.71 (0.66, 0.78; p < 0.001).

Fit Summaries

tbl_merge(
  tbls = list(
    tbl_regression(raw_cox, exponentiate = TRUE),
    tbl_regression(full_cox, exponentiate = TRUE)
  ),
  tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>% 
  as_flex_table()

	Unadjusted			Propensity Score Matched
Characteristic	HR1	95% CI1	p-value	HR1	95% CI1	p-value
as.numeric(treatment)	0.66	0.62, 0.71	<0.001	0.71	0.65, 0.77	<0.001
1HR = Hazard Ratio, CI = Confidence Interval

Diagnostics

scf_residuals <- cox.zph(full_cox)
scf_residuals

                      chisq df    p
as.numeric(treatment)  1.98  1 0.16
GLOBAL                 1.98  1 0.16

with(scf_residuals, cor.test(x, y))


    Pearson's product-moment correlation

data:  x and y
t = -1.4089, df = 2380, p-value = 0.159
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.06894934  0.01130818
sample estimates:
       cor 
-0.0288671

tibble(
  x = scf_residuals$time,
  y = scf_residuals$y
) %>%
  ggplot(aes(x = x / 365, y = y)) + 
  geom_point() + 
  geom_smooth(method = "loess") + 
  labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") + 
  theme_minimal()

There is statistically significant correlation with the Schoenfeld residuals; however, we are likely over-powered to detect a deviation. Looking at the estimated correlation (-0.059), we think it is unlikely that this has a meaningful impact on our estimated hazard ratio.

Balance Assessment and Survival Analysis | Tz/Dz/Az vs Tamsulosin

Author

Affiliation

Published

DOI

Data Load

Unadjusted Data

Matches

Balance and Cohort Description

Before Matching

After Matching

Propensity Score Overlap

Survival Analysis

Before Matching

With Propensity Score Matching

Fit Summaries

Diagnostics

Footnotes