Balance Assessment and Survival Analysis

library("Rcpp", lib.loc = "~/R/x86_64-pc-linux-gnu-library/4.0/")
library(tidyverse)
library(gtsummary)
library(survival)
library(effectsize)
library(patchwork)

This file was compiled on 2022-05-09 14:51:26 by jsimmeri on argon-lc-f14-25.hpc.

The script analysis_functions.R provides:

some functions for summarizing the cohort and building tables that are shared across scripts
code to load the entire collection out outcomes data (which it returns as model_data to the workspace)
sets root_dir according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOS

source("~/projects/pd-preprint/analysis/analysis_functions.R")

Data Load

Unadjusted Data

The tibble model_data is loaded by analysis_functions from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data contains data on all users of the three drugs. We want to reduce that here to just tamsulosin and 5ARI.

model_data <- model_data %>%
  filter(drug %in% c("5ari", "tamsulosin")) %>%
  mutate(treatment = drug == "tamsulosin")

Matches

We also want to load our matching results:

matches <- read_rds(glue::glue("{root_dir}/matches/tam_5ari.rds"))

And using the matches, build matched versions of the tamsulosin versus 5ARI model data:

matched_model_data <- model_data %>%
  select(-treatment) %>%
  inner_join(matches,
    by = "enrolid")

Balance and Cohort Description

Before Matching

describe_cohort(model_data)

Group	Characteristic	5ari, N = 79,133	tamsulosin, N = 429,741	Difference1	95% CI12
Age and Medication Start Time	Age at Index Date	62 (54, 70)	62 (56, 71)	-0.097	-0.105, -0.089
	Medication Index Date	2,010.0 (2,008.0, 2,013.0)	2,012.0 (2,008.0, 2,014.0)	-0.273	-0.281, -0.266
Duration of Enrollment Time	Lookback Time (Years)	3.1 (1.8, 5.6)	3.6 (2.0, 6.3)	-0.176	-0.184, -0.168
	Follow Up Time (Years)	2.23 (0.98, 4.43)	1.90 (0.86, 3.78)	0.138	0.130, 0.145
Rates of Inpatient and Outpatient Encounters	Inpatient Events Per Year	0.00 (0.00, 0.00)	0.00 (0.00, 0.21)	-0.181	-0.188, -0.173
	Outpatient Events Per Year	9 (5, 16)	10 (5, 18)	-0.114	-0.122, -0.107
	Mean Number of DX Codes Per Outpatient Event	1.40 (1.20, 1.87)	1.56 (1.27, 2.09)	-0.243	-0.251, -0.236
	Outpatient DX Code Incidence Per Year	0.04 (0.02, 0.07)	0.04 (0.02, 0.08)	-0.157	-0.165, -0.150
	mean_ndx_out_rate	13 (7, 24)	16 (9, 29)	-0.157	-0.165, -0.150
Elixhauser/AHRQ Comorbidity Flags	Alcohol Abuse	1,135 (1.4%)	11,147 (2.6%)	-0.076	-0.083, -0.068
	Anemia	9,051 (11.4%)	66,743 (15.5%)	-0.115	-0.123, -0.107
	Blood Loss	886 (1.1%)	6,996 (1.6%)	-0.041	-0.049, -0.034
	Heart Failure	5,190 (6.6%)	41,392 (9.6%)	-0.107	-0.114, -0.099
	Coagulopathy	2,302 (2.9%)	18,433 (4.3%)	-0.070	-0.077, -0.062
	Depression	5,380 (6.8%)	42,310 (9.8%)	-0.105	-0.112, -0.097
	Diabetes Without Complications	15,703 (19.8%)	118,841 (27.7%)	-0.177	-0.185, -0.170
	Diabetes With Complications	5,009 (6.3%)	45,465 (10.6%)	-0.142	-0.150, -0.135
	Drug Abuse	568 (0.7%)	6,533 (1.5%)	-0.068	-0.076, -0.061
	Fluid or Electrolyte Disorders	5,812 (7.3%)	54,203 (12.6%)	-0.164	-0.171, -0.156
	HIV	546 (0.7%)	1,426 (0.3%)	0.058	0.050, 0.065
	HTN Without Complications	41,769 (52.8%)	266,446 (62.0%)	-0.189	-0.197, -0.181
	HTN With Complications	7,828 (9.9%)	55,606 (12.9%)	-0.092	-0.100, -0.085
	Hypothyroidism	6,927 (8.8%)	42,431 (9.9%)	-0.038	-0.045, -0.030
	Liver Disease	2,454 (3.1%)	23,871 (5.6%)	-0.111	-0.118, -0.103
	Lymphoma	829 (1.0%)	6,226 (1.4%)	-0.034	-0.042, -0.027
	Metastatic Cancer	1,090 (1.4%)	10,529 (2.5%)	-0.072	-0.079, -0.064
	Other Neurologic Disorders	4,419 (5.6%)	39,518 (9.2%)	-0.129	-0.136, -0.121
	Obesity	4,462 (5.6%)	47,431 (11.0%)	-0.179	-0.186, -0.171
	Paralysis	960 (1.2%)	10,106 (2.4%)	-0.078	-0.086, -0.071
	Pulmonary Hypertension	1,521 (1.9%)	12,901 (3.0%)	-0.065	-0.073, -0.058
	Psychoses	4,098 (5.2%)	31,075 (7.2%)	-0.081	-0.089, -0.073
	Peptic Ulcer Disease	183 (0.2%)	1,672 (0.4%)	-0.026	-0.034, -0.019
	COPD	13,975 (17.7%)	102,194 (23.8%)	-0.146	-0.154, -0.138
	Peripheral Vascular Disease	8,260 (10.4%)	62,941 (14.6%)	-0.121	-0.129, -0.114
	Renal Failure	3,431 (4.3%)	30,407 (7.1%)	-0.110	-0.118, -0.102
	Rheumatoid Arthritis	2,785 (3.5%)	21,620 (5.0%)	-0.071	-0.078, -0.063
	Solid Tumor	10,371 (13.1%)	72,259 (16.8%)	-0.101	-0.108, -0.093
	Valvular Disease	9,097 (11.5%)	60,458 (14.1%)	-0.075	-0.083, -0.067
	Weight Loss	2,219 (2.8%)	19,310 (4.5%)	-0.084	-0.092, -0.076
Hypotension	Orthostatic Hypotension	748 (0.9%)	5,000 (1.2%)	-0.021	-0.028, -0.013
	Other Hypotension	1,451 (1.8%)	12,550 (2.9%)	-0.066	-0.074, -0.059
Prostate Specific Antigen	PSA Measurement Taken	48,034 (60.7%)	247,089 (57.5%)	0.065	0.057, 0.072
	Diagnosis of Abnormal PSA	23,538 (29.7%)	69,520 (16.2%)	0.354	0.389, 0.389
Bladder Function and Urinary Flow	Slow Urinary Stream Diagnosis	1,936 (2.4%)	13,557 (3.2%)	-0.041	-0.049, -0.034
	Uroflow Study Performed	5,556 (7.0%)	27,946 (6.5%)	0.021	0.013, 0.028
	Cystometrogram Performed	524 (0.7%)	3,304 (0.8%)	-0.012	-0.020, -0.005
	Diagnosis of BPH	35,618 (45.0%)	179,924 (41.9%)	0.064	0.056, 0.071
Other Comorbidities	Diagnosis of Anxiety	5,406 (6.8%)	38,428 (8.9%)	-0.075	-0.083, -0.068
	Diagnosis of Erectile Dysfunction	7,153 (9.0%)	44,895 (10.4%)	-0.046	-0.054, -0.039
1Cohen's D
2CI = Confidence Interval

After Matching

describe_cohort(matched_model_data)

Group	Characteristic	5ari, N = 78,747	tamsulosin, N = 78,747	Difference1	95% CI12
Age and Medication Start Time	Age at Index Date	62 (54, 70)	62 (54, 71)	-0.002	-0.011, 0.008
	Medication Index Date	2,010.0 (2,008.0, 2,013.0)	2,010.0 (2,008.0, 2,013.0)	-0.010	-0.020, 0.000
Duration of Enrollment Time	Lookback Time (Years)	3.15 (1.84, 5.57)	3.31 (1.92, 5.78)	-0.059	-0.069, -0.050
	Follow Up Time (Years)	2.22 (0.98, 4.41)	2.23 (0.98, 4.41)	0.000	-0.009, 0.010
Rates of Inpatient and Outpatient Encounters	Inpatient Events Per Year	0.00 (0.00, 0.00)	0.00 (0.00, 0.00)	-0.016	-0.026, -0.006
	Outpatient Events Per Year	9 (5, 16)	9 (5, 16)	-0.012	-0.022, -0.002
	Mean Number of DX Codes Per Outpatient Event	1.41 (1.20, 1.87)	1.41 (1.20, 1.88)	-0.006	-0.015, 0.004
	Outpatient DX Code Incidence Per Year	0.04 (0.02, 0.07)	0.04 (0.02, 0.07)	-0.015	-0.025, -0.005
	mean_ndx_out_rate	13 (7, 24)	13 (7, 24)	-0.015	-0.025, -0.005
Elixhauser/AHRQ Comorbidity Flags	Alcohol Abuse	1,135 (1.4%)	1,197 (1.5%)	-0.007	-0.016, 0.003
	Anemia	9,030 (11.5%)	9,143 (11.6%)	-0.004	-0.014, 0.005
	Blood Loss	886 (1.1%)	888 (1.1%)	0.000	-0.010, 0.010
	Heart Failure	5,180 (6.6%)	5,327 (6.8%)	-0.007	-0.017, 0.002
	Coagulopathy	2,296 (2.9%)	2,388 (3.0%)	-0.007	-0.017, 0.003
	Depression	5,370 (6.8%)	5,417 (6.9%)	-0.002	-0.012, 0.008
	Diabetes Without Complications	15,690 (19.9%)	16,026 (20.4%)	-0.011	-0.021, -0.001
	Diabetes With Complications	5,004 (6.4%)	5,302 (6.7%)	-0.015	-0.025, -0.005
	Drug Abuse	567 (0.7%)	669 (0.8%)	-0.015	-0.025, -0.005
	Fluid or Electrolyte Disorders	5,808 (7.4%)	6,022 (7.6%)	-0.010	-0.020, 0.000
	HIV	520 (0.7%)	522 (0.7%)	0.000	-0.010, 0.010
	HTN Without Complications	41,657 (52.9%)	41,763 (53.0%)	-0.003	-0.013, 0.007
	HTN With Complications	7,807 (9.9%)	8,004 (10.2%)	-0.008	-0.018, 0.002
	Hypothyroidism	6,900 (8.8%)	7,077 (9.0%)	-0.008	-0.018, 0.002
	Liver Disease	2,452 (3.1%)	2,567 (3.3%)	-0.008	-0.018, 0.002
	Lymphoma	825 (1.0%)	874 (1.1%)	-0.006	-0.016, 0.004
	Metastatic Cancer	1,087 (1.4%)	1,165 (1.5%)	-0.008	-0.018, 0.002
	Other Neurologic Disorders	4,415 (5.6%)	4,593 (5.8%)	-0.010	-0.020, 0.000
	Obesity	4,461 (5.7%)	4,701 (6.0%)	-0.013	-0.023, -0.003
	Paralysis	959 (1.2%)	1,005 (1.3%)	-0.005	-0.015, 0.005
	Pulmonary Hypertension	1,520 (1.9%)	1,658 (2.1%)	-0.012	-0.022, -0.003
	Psychoses	4,086 (5.2%)	4,166 (5.3%)	-0.005	-0.014, 0.005
	Peptic Ulcer Disease	183 (0.2%)	210 (0.3%)	-0.007	-0.017, 0.003
	COPD	13,956 (17.7%)	14,204 (18.0%)	-0.008	-0.018, 0.002
	Peripheral Vascular Disease	8,241 (10.5%)	8,520 (10.8%)	-0.011	-0.021, -0.002
	Renal Failure	3,429 (4.4%)	3,546 (4.5%)	-0.007	-0.017, 0.003
	Rheumatoid Arthritis	2,783 (3.5%)	2,887 (3.7%)	-0.007	-0.017, 0.003
	Solid Tumor	10,368 (13.2%)	11,007 (14.0%)	-0.024	-0.034, -0.014
	Valvular Disease	9,068 (11.5%)	9,265 (11.8%)	-0.008	-0.018, 0.002
	Weight Loss	2,212 (2.8%)	2,292 (2.9%)	-0.006	-0.016, 0.004
Hypotension	Orthostatic Hypotension	745 (0.9%)	766 (1.0%)	-0.003	-0.013, 0.007
	Other Hypotension	1,450 (1.8%)	1,460 (1.9%)	-0.001	-0.011, 0.009
Prostate Specific Antigen	PSA Measurement Taken	47,756 (60.6%)	47,798 (60.7%)	-0.001	-0.011, 0.009
	Diagnosis of Abnormal PSA	23,236 (29.5%)	23,361 (29.7%)	-0.003	-0.013, 0.006
Bladder Function and Urinary Flow	Slow Urinary Stream Diagnosis	1,934 (2.5%)	1,944 (2.5%)	-0.001	-0.011, 0.009
	Uroflow Study Performed	5,524 (7.0%)	5,752 (7.3%)	-0.011	-0.021, -0.001
	Cystometrogram Performed	524 (0.7%)	572 (0.7%)	-0.007	-0.017, 0.003
	Diagnosis of BPH	35,390 (44.9%)	35,467 (45.0%)	-0.002	-0.012, 0.008
Other Comorbidities	Diagnosis of Anxiety	5,381 (6.8%)	5,491 (7.0%)	-0.006	-0.015, 0.004
	Diagnosis of Erectile Dysfunction	7,124 (9.0%)	7,168 (9.1%)	-0.002	-0.012, 0.008
1Cohen's D
2CI = Confidence Interval

Propensity Score Overlap

We can also visually inspect the estimated propensity scores for overlap:

ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tam_5ari.rds"))
bind_rows(
  model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "Before Matching"),
  matched_model_data %>%
    inner_join(ps, by = "enrolid") %>%
    mutate(type = "After Matching")
  ) %>%
  mutate(type = forcats::fct_rev(type)) %>%
  ggplot(aes(x = ptx, fill = drug, color = drug)) +
  geom_density(alpha = 0.5) + 
  labs(x = "Probability of Treatment with Tamsulosin", 
       y = "Density",
       color = "Observed Treatment",
       fill = "Observed Treatment") +
  theme_minimal() +
  theme(legend.position = "bottom") + 
  facet_wrap(~ type, nrow = 2)

It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.

Survival Analysis

Before Matching

Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.

survfit(Surv(survival_time, develops_pd) ~ treatment, 
        data = model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "treatment=TRUE", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

A finding that is statistically significant using the log rank test:

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = model_data)

Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = model_data)

                     N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE  79133      765      973     44.57        54
treatment=TRUE  429741     4867     4659      9.31        54

 Chisq= 54  on 1 degrees of freedom, p= 2e-13

The Cox regression pegs the protection at HR at 1.32 (1.22, 1.43).

raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                 data = model_data,
                 robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)

Characteristic	HR¹	95% CI¹	p-value
as.numeric(treatment)	1.33	1.23, 1.44	<0.001
¹ HR = Hazard Ratio, CI = Confidence Interval

However, this estimate is not trustworthy as we know the groups could be better balanced.

With Propensity Score Matching

Further adjustment yields similar results:

We observe the following overall incidence:

matched_model_data %>%
  group_by(drug) %>%
  summarize(
    cases = sum(develops_pd),
    time = sum(survival_time / 365),
    incidence = cases / time * 10000
  ) %>%
  knitr::kable()

drug	cases	time	incidence
5ari	757	241922.9	31.29096
tamsulosin	846	241542.2	35.02494

With a Kaplan Meier curve:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  mutate(treatment = ifelse(strata == "drug=tamsulosin", 
    "Tamsulosin", "5ARI")) %>%
  ggplot(
    aes(
      x = time / 365, 
      y = 100 * estimate,
      ymin = 100 * conf.low,
      ymax = 100 * conf.high
    )
  ) +
  geom_step(aes(color = treatment)) + 
  geom_ribbon(aes(fill = treatment), alpha = 0.25) +
  theme_minimal() + 
  labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
       color = "", fill = "") + 
  theme(legend.position = c(0.25, 0.25))

The population size at risk by year is given in the table below:

survfit(Surv(survival_time, develops_pd) ~ drug, 
        data = matched_model_data) %>%
  broom::tidy() %>%
  select(strata, time,, n.risk) %>%
  complete(strata, time = 0:(16*365)) %>%
  fill(n.risk, .direction = "down") %>%
  mutate(drug = ifelse(strata == "drug=5ari", "5ARI", "Tamsulosin")) %>%
  select(time, drug, n.risk) %>%
  filter(time %in% c(1:16 * 365)) %>%
  mutate(year = round(time / 365)) %>%
  spread(drug, n.risk) %>%
  knitr::kable()

time	year	5ARI	Tamsulosin
365	1	58387	58426
730	2	41709	41987
1095	3	30354	30486
1460	4	22122	22078
1825	5	16032	16102
2190	6	11475	11533
2555	7	8076	8111
2920	8	5471	5463
3285	9	3584	3542
3650	10	2238	2204
4015	11	1316	1301
4380	12	643	658
4745	13	328	327
5110	14	163	161
5475	15	43	48
5840	16	2	1

A log rank test confirms that the curves remain different but only with p = 0.05.

survdiff(Surv(survival_time, develops_pd) ~ treatment, 
         data = matched_model_data)

Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment, 
    data = matched_model_data)

                N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 78747      757      802      2.55       5.1
treatment=1 78747      846      801      2.55       5.1

 Chisq= 5.1  on 1 degrees of freedom, p= 0.02

Survival times

tbl_survfit(
  survfit(
    Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
  ),
  probs = seq(0.5, 2, by = 0.5) / 100
)

Characteristic	0.5% Percentile	1.0% Percentile	1.5% Percentile	2.0% Percentile
drug
5ari	557 (497, 649)	1,175 (1,067, 1,279)	1,778 (1,666, 1,943)	2,385 (2,183, 2,694)
tamsulosin	530 (472, 581)	1,089 (980, 1,203)	1,665 (1,519, 1,822)	2,160 (1,995, 2,365)

We fit a Cox regression model to estimate the differences in hazard:

full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment), 
                       data = matched_model_data,
                       robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)

Characteristic	HR¹	95% CI¹	p-value
as.numeric(treatment)	1.12	1.01, 1.23	0.024
¹ HR = Hazard Ratio, CI = Confidence Interval

The Cox regression yields a HR of 1.10 (95% CI: 1.00, 1.22; p = 0.0528).

Fit Summaries

tbl_merge(
  tbls = list(
    tbl_regression(raw_cox, exponentiate = TRUE),
    tbl_regression(full_cox, exponentiate = TRUE)
  ),
  tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>% 
  as_flex_table()

	Unadjusted			Propensity Score Matched
Characteristic	HR1	95% CI1	p-value	HR1	95% CI1	p-value
as.numeric(treatment)	1.33	1.23, 1.44	<0.001	1.12	1.01, 1.23	0.024
1HR = Hazard Ratio, CI = Confidence Interval

Diagnostics

scf_residuals <- cox.zph(full_cox)
scf_residuals

                      chisq df     p
as.numeric(treatment)  4.94  1 0.026
GLOBAL                 4.94  1 0.026

with(scf_residuals, cor.test(x, y))


    Pearson's product-moment correlation

data:  x and y
t = 2.2253, df = 1601, p-value = 0.0262
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.006587255 0.104205887
sample estimates:
       cor 
0.05552927

tibble(
  x = scf_residuals$time,
  y = scf_residuals$y
) %>%
  ggplot(aes(x = x / 365, y = y)) + 
  geom_point() + 
  geom_smooth(method = "loess") + 
  labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") + 
  theme_minimal()

There is a weak correlation that is not statistically significant when checking the Schoenfeld residuals. It is unlikely to be a serious threat to our estimates.

Balance Assessment and Survival Analysis | Tamsulosin vs 5ARI

Author

Affiliation

Published

DOI

Data Load

Unadjusted Data

Matches

Balance and Cohort Description

Before Matching

After Matching

Propensity Score Overlap

Survival Analysis

Before Matching

With Propensity Score Matching

Fit Summaries

Diagnostics

Footnotes