This file was compiled on 2022-05-09 14:51:26 by jsimmeri
on argon-lc-f14-25.hpc.
The script analysis_functions.R
provides:
model_data
to the workspace)root_dir
according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOSsource("~/projects/pd-preprint/analysis/analysis_functions.R")
The tibble model_data
is loaded by analysis_functions
from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data
contains data on all users of the three drugs. We want to reduce that here to just tamsulosin and 5ARI.
We also want to load our matching results:
matches <- read_rds(glue::glue("{root_dir}/matches/tam_5ari.rds"))
And using the matches, build matched versions of the tamsulosin versus 5ARI model data:
matched_model_data <- model_data %>%
select(-treatment) %>%
inner_join(matches,
by = "enrolid")
describe_cohort(model_data)
Group | Characteristic | 5ari, N = 79,133 | tamsulosin, N = 429,741 | Difference1 | 95% CI12 |
Age and Medication Start Time | Age at Index Date | 62 (54, 70) | 62 (56, 71) | -0.097 | -0.105, -0.089 |
Medication Index Date | 2,010.0 (2,008.0, 2,013.0) | 2,012.0 (2,008.0, 2,014.0) | -0.273 | -0.281, -0.266 | |
Duration of Enrollment Time | Lookback Time (Years) | 3.1 (1.8, 5.6) | 3.6 (2.0, 6.3) | -0.176 | -0.184, -0.168 |
Follow Up Time (Years) | 2.23 (0.98, 4.43) | 1.90 (0.86, 3.78) | 0.138 | 0.130, 0.145 | |
Rates of Inpatient and Outpatient Encounters | Inpatient Events Per Year | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.21) | -0.181 | -0.188, -0.173 |
Outpatient Events Per Year | 9 (5, 16) | 10 (5, 18) | -0.114 | -0.122, -0.107 | |
Mean Number of DX Codes Per Outpatient Event | 1.40 (1.20, 1.87) | 1.56 (1.27, 2.09) | -0.243 | -0.251, -0.236 | |
Outpatient DX Code Incidence Per Year | 0.04 (0.02, 0.07) | 0.04 (0.02, 0.08) | -0.157 | -0.165, -0.150 | |
mean_ndx_out_rate | 13 (7, 24) | 16 (9, 29) | -0.157 | -0.165, -0.150 | |
Elixhauser/AHRQ Comorbidity Flags | Alcohol Abuse | 1,135 (1.4%) | 11,147 (2.6%) | -0.076 | -0.083, -0.068 |
Anemia | 9,051 (11.4%) | 66,743 (15.5%) | -0.115 | -0.123, -0.107 | |
Blood Loss | 886 (1.1%) | 6,996 (1.6%) | -0.041 | -0.049, -0.034 | |
Heart Failure | 5,190 (6.6%) | 41,392 (9.6%) | -0.107 | -0.114, -0.099 | |
Coagulopathy | 2,302 (2.9%) | 18,433 (4.3%) | -0.070 | -0.077, -0.062 | |
Depression | 5,380 (6.8%) | 42,310 (9.8%) | -0.105 | -0.112, -0.097 | |
Diabetes Without Complications | 15,703 (19.8%) | 118,841 (27.7%) | -0.177 | -0.185, -0.170 | |
Diabetes With Complications | 5,009 (6.3%) | 45,465 (10.6%) | -0.142 | -0.150, -0.135 | |
Drug Abuse | 568 (0.7%) | 6,533 (1.5%) | -0.068 | -0.076, -0.061 | |
Fluid or Electrolyte Disorders | 5,812 (7.3%) | 54,203 (12.6%) | -0.164 | -0.171, -0.156 | |
HIV | 546 (0.7%) | 1,426 (0.3%) | 0.058 | 0.050, 0.065 | |
HTN Without Complications | 41,769 (52.8%) | 266,446 (62.0%) | -0.189 | -0.197, -0.181 | |
HTN With Complications | 7,828 (9.9%) | 55,606 (12.9%) | -0.092 | -0.100, -0.085 | |
Hypothyroidism | 6,927 (8.8%) | 42,431 (9.9%) | -0.038 | -0.045, -0.030 | |
Liver Disease | 2,454 (3.1%) | 23,871 (5.6%) | -0.111 | -0.118, -0.103 | |
Lymphoma | 829 (1.0%) | 6,226 (1.4%) | -0.034 | -0.042, -0.027 | |
Metastatic Cancer | 1,090 (1.4%) | 10,529 (2.5%) | -0.072 | -0.079, -0.064 | |
Other Neurologic Disorders | 4,419 (5.6%) | 39,518 (9.2%) | -0.129 | -0.136, -0.121 | |
Obesity | 4,462 (5.6%) | 47,431 (11.0%) | -0.179 | -0.186, -0.171 | |
Paralysis | 960 (1.2%) | 10,106 (2.4%) | -0.078 | -0.086, -0.071 | |
Pulmonary Hypertension | 1,521 (1.9%) | 12,901 (3.0%) | -0.065 | -0.073, -0.058 | |
Psychoses | 4,098 (5.2%) | 31,075 (7.2%) | -0.081 | -0.089, -0.073 | |
Peptic Ulcer Disease | 183 (0.2%) | 1,672 (0.4%) | -0.026 | -0.034, -0.019 | |
COPD | 13,975 (17.7%) | 102,194 (23.8%) | -0.146 | -0.154, -0.138 | |
Peripheral Vascular Disease | 8,260 (10.4%) | 62,941 (14.6%) | -0.121 | -0.129, -0.114 | |
Renal Failure | 3,431 (4.3%) | 30,407 (7.1%) | -0.110 | -0.118, -0.102 | |
Rheumatoid Arthritis | 2,785 (3.5%) | 21,620 (5.0%) | -0.071 | -0.078, -0.063 | |
Solid Tumor | 10,371 (13.1%) | 72,259 (16.8%) | -0.101 | -0.108, -0.093 | |
Valvular Disease | 9,097 (11.5%) | 60,458 (14.1%) | -0.075 | -0.083, -0.067 | |
Weight Loss | 2,219 (2.8%) | 19,310 (4.5%) | -0.084 | -0.092, -0.076 | |
Hypotension | Orthostatic Hypotension | 748 (0.9%) | 5,000 (1.2%) | -0.021 | -0.028, -0.013 |
Other Hypotension | 1,451 (1.8%) | 12,550 (2.9%) | -0.066 | -0.074, -0.059 | |
Prostate Specific Antigen | PSA Measurement Taken | 48,034 (60.7%) | 247,089 (57.5%) | 0.065 | 0.057, 0.072 |
Diagnosis of Abnormal PSA | 23,538 (29.7%) | 69,520 (16.2%) | 0.354 | 0.389, 0.389 | |
Bladder Function and Urinary Flow | Slow Urinary Stream Diagnosis | 1,936 (2.4%) | 13,557 (3.2%) | -0.041 | -0.049, -0.034 |
Uroflow Study Performed | 5,556 (7.0%) | 27,946 (6.5%) | 0.021 | 0.013, 0.028 | |
Cystometrogram Performed | 524 (0.7%) | 3,304 (0.8%) | -0.012 | -0.020, -0.005 | |
Diagnosis of BPH | 35,618 (45.0%) | 179,924 (41.9%) | 0.064 | 0.056, 0.071 | |
Other Comorbidities | Diagnosis of Anxiety | 5,406 (6.8%) | 38,428 (8.9%) | -0.075 | -0.083, -0.068 |
Diagnosis of Erectile Dysfunction | 7,153 (9.0%) | 44,895 (10.4%) | -0.046 | -0.054, -0.039 | |
1Cohen's D | |||||
2CI = Confidence Interval |
describe_cohort(matched_model_data)
Group | Characteristic | 5ari, N = 78,747 | tamsulosin, N = 78,747 | Difference1 | 95% CI12 |
Age and Medication Start Time | Age at Index Date | 62 (54, 70) | 62 (54, 71) | -0.002 | -0.011, 0.008 |
Medication Index Date | 2,010.0 (2,008.0, 2,013.0) | 2,010.0 (2,008.0, 2,013.0) | -0.010 | -0.020, 0.000 | |
Duration of Enrollment Time | Lookback Time (Years) | 3.15 (1.84, 5.57) | 3.31 (1.92, 5.78) | -0.059 | -0.069, -0.050 |
Follow Up Time (Years) | 2.22 (0.98, 4.41) | 2.23 (0.98, 4.41) | 0.000 | -0.009, 0.010 | |
Rates of Inpatient and Outpatient Encounters | Inpatient Events Per Year | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | -0.016 | -0.026, -0.006 |
Outpatient Events Per Year | 9 (5, 16) | 9 (5, 16) | -0.012 | -0.022, -0.002 | |
Mean Number of DX Codes Per Outpatient Event | 1.41 (1.20, 1.87) | 1.41 (1.20, 1.88) | -0.006 | -0.015, 0.004 | |
Outpatient DX Code Incidence Per Year | 0.04 (0.02, 0.07) | 0.04 (0.02, 0.07) | -0.015 | -0.025, -0.005 | |
mean_ndx_out_rate | 13 (7, 24) | 13 (7, 24) | -0.015 | -0.025, -0.005 | |
Elixhauser/AHRQ Comorbidity Flags | Alcohol Abuse | 1,135 (1.4%) | 1,197 (1.5%) | -0.007 | -0.016, 0.003 |
Anemia | 9,030 (11.5%) | 9,143 (11.6%) | -0.004 | -0.014, 0.005 | |
Blood Loss | 886 (1.1%) | 888 (1.1%) | 0.000 | -0.010, 0.010 | |
Heart Failure | 5,180 (6.6%) | 5,327 (6.8%) | -0.007 | -0.017, 0.002 | |
Coagulopathy | 2,296 (2.9%) | 2,388 (3.0%) | -0.007 | -0.017, 0.003 | |
Depression | 5,370 (6.8%) | 5,417 (6.9%) | -0.002 | -0.012, 0.008 | |
Diabetes Without Complications | 15,690 (19.9%) | 16,026 (20.4%) | -0.011 | -0.021, -0.001 | |
Diabetes With Complications | 5,004 (6.4%) | 5,302 (6.7%) | -0.015 | -0.025, -0.005 | |
Drug Abuse | 567 (0.7%) | 669 (0.8%) | -0.015 | -0.025, -0.005 | |
Fluid or Electrolyte Disorders | 5,808 (7.4%) | 6,022 (7.6%) | -0.010 | -0.020, 0.000 | |
HIV | 520 (0.7%) | 522 (0.7%) | 0.000 | -0.010, 0.010 | |
HTN Without Complications | 41,657 (52.9%) | 41,763 (53.0%) | -0.003 | -0.013, 0.007 | |
HTN With Complications | 7,807 (9.9%) | 8,004 (10.2%) | -0.008 | -0.018, 0.002 | |
Hypothyroidism | 6,900 (8.8%) | 7,077 (9.0%) | -0.008 | -0.018, 0.002 | |
Liver Disease | 2,452 (3.1%) | 2,567 (3.3%) | -0.008 | -0.018, 0.002 | |
Lymphoma | 825 (1.0%) | 874 (1.1%) | -0.006 | -0.016, 0.004 | |
Metastatic Cancer | 1,087 (1.4%) | 1,165 (1.5%) | -0.008 | -0.018, 0.002 | |
Other Neurologic Disorders | 4,415 (5.6%) | 4,593 (5.8%) | -0.010 | -0.020, 0.000 | |
Obesity | 4,461 (5.7%) | 4,701 (6.0%) | -0.013 | -0.023, -0.003 | |
Paralysis | 959 (1.2%) | 1,005 (1.3%) | -0.005 | -0.015, 0.005 | |
Pulmonary Hypertension | 1,520 (1.9%) | 1,658 (2.1%) | -0.012 | -0.022, -0.003 | |
Psychoses | 4,086 (5.2%) | 4,166 (5.3%) | -0.005 | -0.014, 0.005 | |
Peptic Ulcer Disease | 183 (0.2%) | 210 (0.3%) | -0.007 | -0.017, 0.003 | |
COPD | 13,956 (17.7%) | 14,204 (18.0%) | -0.008 | -0.018, 0.002 | |
Peripheral Vascular Disease | 8,241 (10.5%) | 8,520 (10.8%) | -0.011 | -0.021, -0.002 | |
Renal Failure | 3,429 (4.4%) | 3,546 (4.5%) | -0.007 | -0.017, 0.003 | |
Rheumatoid Arthritis | 2,783 (3.5%) | 2,887 (3.7%) | -0.007 | -0.017, 0.003 | |
Solid Tumor | 10,368 (13.2%) | 11,007 (14.0%) | -0.024 | -0.034, -0.014 | |
Valvular Disease | 9,068 (11.5%) | 9,265 (11.8%) | -0.008 | -0.018, 0.002 | |
Weight Loss | 2,212 (2.8%) | 2,292 (2.9%) | -0.006 | -0.016, 0.004 | |
Hypotension | Orthostatic Hypotension | 745 (0.9%) | 766 (1.0%) | -0.003 | -0.013, 0.007 |
Other Hypotension | 1,450 (1.8%) | 1,460 (1.9%) | -0.001 | -0.011, 0.009 | |
Prostate Specific Antigen | PSA Measurement Taken | 47,756 (60.6%) | 47,798 (60.7%) | -0.001 | -0.011, 0.009 |
Diagnosis of Abnormal PSA | 23,236 (29.5%) | 23,361 (29.7%) | -0.003 | -0.013, 0.006 | |
Bladder Function and Urinary Flow | Slow Urinary Stream Diagnosis | 1,934 (2.5%) | 1,944 (2.5%) | -0.001 | -0.011, 0.009 |
Uroflow Study Performed | 5,524 (7.0%) | 5,752 (7.3%) | -0.011 | -0.021, -0.001 | |
Cystometrogram Performed | 524 (0.7%) | 572 (0.7%) | -0.007 | -0.017, 0.003 | |
Diagnosis of BPH | 35,390 (44.9%) | 35,467 (45.0%) | -0.002 | -0.012, 0.008 | |
Other Comorbidities | Diagnosis of Anxiety | 5,381 (6.8%) | 5,491 (7.0%) | -0.006 | -0.015, 0.004 |
Diagnosis of Erectile Dysfunction | 7,124 (9.0%) | 7,168 (9.1%) | -0.002 | -0.012, 0.008 | |
1Cohen's D | |||||
2CI = Confidence Interval |
We can also visually inspect the estimated propensity scores for overlap:
ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tam_5ari.rds"))
bind_rows(
model_data %>%
inner_join(ps, by = "enrolid") %>%
mutate(type = "Before Matching"),
matched_model_data %>%
inner_join(ps, by = "enrolid") %>%
mutate(type = "After Matching")
) %>%
mutate(type = forcats::fct_rev(type)) %>%
ggplot(aes(x = ptx, fill = drug, color = drug)) +
geom_density(alpha = 0.5) +
labs(x = "Probability of Treatment with Tamsulosin",
y = "Density",
color = "Observed Treatment",
fill = "Observed Treatment") +
theme_minimal() +
theme(legend.position = "bottom") +
facet_wrap(~ type, nrow = 2)
It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.
Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.
survfit(Surv(survival_time, develops_pd) ~ treatment,
data = model_data) %>%
broom::tidy() %>%
mutate(treatment = ifelse(strata == "treatment=TRUE",
"Tamsulosin", "5ARI")) %>%
ggplot(
aes(
x = time / 365,
y = 100 * estimate,
ymin = 100 * conf.low,
ymax = 100 * conf.high
)
) +
geom_step(aes(color = treatment)) +
geom_ribbon(aes(fill = treatment), alpha = 0.25) +
theme_minimal() +
labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
color = "", fill = "") +
theme(legend.position = c(0.25, 0.25))
A finding that is statistically significant using the log rank test:
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment,
data = model_data)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE 79133 765 973 44.57 54
treatment=TRUE 429741 4867 4659 9.31 54
Chisq= 54 on 1 degrees of freedom, p= 2e-13
The Cox regression pegs the protection at HR at 1.32 (1.22, 1.43).
raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment),
data = model_data,
robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)
Characteristic | HR1 | 95% CI1 | p-value |
---|---|---|---|
as.numeric(treatment) | 1.33 | 1.23, 1.44 | <0.001 |
1
HR = Hazard Ratio, CI = Confidence Interval
|
However, this estimate is not trustworthy as we know the groups could be better balanced.
Further adjustment yields similar results:
We observe the following overall incidence:
matched_model_data %>%
group_by(drug) %>%
summarize(
cases = sum(develops_pd),
time = sum(survival_time / 365),
incidence = cases / time * 10000
) %>%
knitr::kable()
drug | cases | time | incidence |
---|---|---|---|
5ari | 757 | 241922.9 | 31.29096 |
tamsulosin | 846 | 241542.2 | 35.02494 |
With a Kaplan Meier curve:
survfit(Surv(survival_time, develops_pd) ~ drug,
data = matched_model_data) %>%
broom::tidy() %>%
mutate(treatment = ifelse(strata == "drug=tamsulosin",
"Tamsulosin", "5ARI")) %>%
ggplot(
aes(
x = time / 365,
y = 100 * estimate,
ymin = 100 * conf.low,
ymax = 100 * conf.high
)
) +
geom_step(aes(color = treatment)) +
geom_ribbon(aes(fill = treatment), alpha = 0.25) +
theme_minimal() +
labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
color = "", fill = "") +
theme(legend.position = c(0.25, 0.25))
The population size at risk by year is given in the table below:
survfit(Surv(survival_time, develops_pd) ~ drug,
data = matched_model_data) %>%
broom::tidy() %>%
select(strata, time,, n.risk) %>%
complete(strata, time = 0:(16*365)) %>%
fill(n.risk, .direction = "down") %>%
mutate(drug = ifelse(strata == "drug=5ari", "5ARI", "Tamsulosin")) %>%
select(time, drug, n.risk) %>%
filter(time %in% c(1:16 * 365)) %>%
mutate(year = round(time / 365)) %>%
spread(drug, n.risk) %>%
knitr::kable()
time | year | 5ARI | Tamsulosin |
---|---|---|---|
365 | 1 | 58387 | 58426 |
730 | 2 | 41709 | 41987 |
1095 | 3 | 30354 | 30486 |
1460 | 4 | 22122 | 22078 |
1825 | 5 | 16032 | 16102 |
2190 | 6 | 11475 | 11533 |
2555 | 7 | 8076 | 8111 |
2920 | 8 | 5471 | 5463 |
3285 | 9 | 3584 | 3542 |
3650 | 10 | 2238 | 2204 |
4015 | 11 | 1316 | 1301 |
4380 | 12 | 643 | 658 |
4745 | 13 | 328 | 327 |
5110 | 14 | 163 | 161 |
5475 | 15 | 43 | 48 |
5840 | 16 | 2 | 1 |
A log rank test confirms that the curves remain different but only with p = 0.05.
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment,
data = matched_model_data)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 78747 757 802 2.55 5.1
treatment=1 78747 846 801 2.55 5.1
Chisq= 5.1 on 1 degrees of freedom, p= 0.02
Survival times
tbl_survfit(
survfit(
Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
),
probs = seq(0.5, 2, by = 0.5) / 100
)
Characteristic | 0.5% Percentile | 1.0% Percentile | 1.5% Percentile | 2.0% Percentile |
---|---|---|---|---|
drug | ||||
5ari | 557 (497, 649) | 1,175 (1,067, 1,279) | 1,778 (1,666, 1,943) | 2,385 (2,183, 2,694) |
tamsulosin | 530 (472, 581) | 1,089 (980, 1,203) | 1,665 (1,519, 1,822) | 2,160 (1,995, 2,365) |
We fit a Cox regression model to estimate the differences in hazard:
full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment),
data = matched_model_data,
robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)
Characteristic | HR1 | 95% CI1 | p-value |
---|---|---|---|
as.numeric(treatment) | 1.12 | 1.01, 1.23 | 0.024 |
1
HR = Hazard Ratio, CI = Confidence Interval
|
The Cox regression yields a HR of 1.10 (95% CI: 1.00, 1.22; p = 0.0528).
tbl_merge(
tbls = list(
tbl_regression(raw_cox, exponentiate = TRUE),
tbl_regression(full_cox, exponentiate = TRUE)
),
tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>%
as_flex_table()
| Unadjusted | Propensity Score Matched | ||||
Characteristic | HR1 | 95% CI1 | p-value | HR1 | 95% CI1 | p-value |
as.numeric(treatment) | 1.33 | 1.23, 1.44 | <0.001 | 1.12 | 1.01, 1.23 | 0.024 |
1HR = Hazard Ratio, CI = Confidence Interval |
scf_residuals <- cox.zph(full_cox)
scf_residuals
chisq df p
as.numeric(treatment) 4.94 1 0.026
GLOBAL 4.94 1 0.026
Pearson's product-moment correlation
data: x and y
t = 2.2253, df = 1601, p-value = 0.0262
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.006587255 0.104205887
sample estimates:
cor
0.05552927
tibble(
x = scf_residuals$time,
y = scf_residuals$y
) %>%
ggplot(aes(x = x / 365, y = y)) +
geom_point() +
geom_smooth(method = "loess") +
labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") +
theme_minimal()
There is a weak correlation that is not statistically significant when checking the Schoenfeld residuals. It is unlikely to be a serious threat to our estimates.