This file was compiled on 2022-05-09 14:48:08 by jsimmeri
on argon-lc-f14-25.hpc.
The script analysis_functions.R
provides:
model_data
to the workspace)root_dir
according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOSsource("~/projects/pd-preprint/analysis/analysis_functions.R")
The tibble model_data
is loaded by analysis_functions
from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data
contains data on all users of the three drugs. We want to reduce that here to just Tz/Dz/Az and tamsulosin.
We also want to load our matching results:
matches <- read_rds(glue::glue("{root_dir}/matches/tz_tam.rds"))
And using the matches, build matched versions of the tamsulosin versus 5ARI model data:
matched_model_data <- model_data %>%
select(-treatment) %>%
inner_join(matches,
by = "enrolid")
describe_cohort(model_data)
Group | Characteristic | tamsulosin, N = 429,741 | tz/dz/az, N = 124,905 | Difference1 | 95% CI12 |
Age and Medication Start Time | Age at Index Date | 62 (56, 71) | 62 (55, 70) | 0.051 | 0.045, 0.058 |
Medication Index Date | 2,012.0 (2,008.0, 2,014.0) | 2,009.0 (2,006.0, 2,012.0) | 0.481 | 0.529, 0.529 | |
Duration of Enrollment Time | Lookback Time (Years) | 3.6 (2.0, 6.3) | 2.9 (1.7, 5.2) | 0.242 | 0.266, 0.266 |
Follow Up Time (Years) | 1.90 (0.86, 3.78) | 1.96 (0.93, 4.19) | -0.085 | -0.092, -0.079 | |
Rates of Inpatient and Outpatient Encounters | Inpatient Events Per Year | 0.00 (0.00, 0.21) | 0.00 (0.00, 0.12) | 0.080 | 0.073, 0.086 |
Outpatient Events Per Year | 10 (5, 18) | 9 (5, 16) | 0.065 | 0.059, 0.071 | |
Mean Number of DX Codes Per Outpatient Event | 1.56 (1.27, 2.09) | 1.37 (1.18, 1.83) | 0.279 | 0.307, 0.307 | |
Outpatient DX Code Incidence Per Year | 0.04 (0.02, 0.08) | 0.04 (0.02, 0.07) | 0.129 | 0.122, 0.135 | |
mean_ndx_out_rate | 16 (9, 29) | 14 (7, 25) | 0.129 | 0.122, 0.135 | |
Elixhauser/AHRQ Comorbidity Flags | Alcohol Abuse | 11,147 (2.6%) | 2,652 (2.1%) | 0.030 | 0.024, 0.037 |
Anemia | 66,743 (15.5%) | 16,683 (13.4%) | 0.061 | 0.055, 0.067 | |
Blood Loss | 6,996 (1.6%) | 1,482 (1.2%) | 0.036 | 0.030, 0.042 | |
Heart Failure | 41,392 (9.6%) | 9,957 (8.0%) | 0.057 | 0.051, 0.064 | |
Coagulopathy | 18,433 (4.3%) | 3,572 (2.9%) | 0.073 | 0.067, 0.080 | |
Depression | 42,310 (9.8%) | 8,769 (7.0%) | 0.098 | 0.091, 0.104 | |
Diabetes Without Complications | 118,841 (27.7%) | 36,097 (28.9%) | -0.028 | -0.034, -0.021 | |
Diabetes With Complications | 45,465 (10.6%) | 14,154 (11.3%) | -0.024 | -0.031, -0.018 | |
Drug Abuse | 6,533 (1.5%) | 1,311 (1.0%) | 0.040 | 0.034, 0.046 | |
Fluid or Electrolyte Disorders | 54,203 (12.6%) | 13,535 (10.8%) | 0.054 | 0.048, 0.061 | |
HIV | 1,426 (0.3%) | 481 (0.4%) | -0.009 | -0.015, -0.003 | |
HTN Without Complications | 266,446 (62.0%) | 86,651 (69.4%) | -0.154 | -0.160, -0.147 | |
HTN With Complications | 55,606 (12.9%) | 20,483 (16.4%) | -0.101 | -0.107, -0.094 | |
Hypothyroidism | 42,431 (9.9%) | 9,649 (7.7%) | 0.074 | 0.067, 0.080 | |
Liver Disease | 23,871 (5.6%) | 4,791 (3.8%) | 0.078 | 0.071, 0.084 | |
Lymphoma | 6,226 (1.4%) | 1,291 (1.0%) | 0.036 | 0.030, 0.042 | |
Metastatic Cancer | 10,529 (2.5%) | 1,525 (1.2%) | 0.084 | 0.078, 0.091 | |
Other Neurologic Disorders | 39,518 (9.2%) | 7,231 (5.8%) | 0.123 | 0.116, 0.129 | |
Obesity | 47,431 (11.0%) | 11,396 (9.1%) | 0.062 | 0.056, 0.068 | |
Paralysis | 10,106 (2.4%) | 2,017 (1.6%) | 0.050 | 0.044, 0.057 | |
Pulmonary Hypertension | 12,901 (3.0%) | 2,402 (1.9%) | 0.066 | 0.060, 0.072 | |
Psychoses | 31,075 (7.2%) | 6,706 (5.4%) | 0.074 | 0.068, 0.080 | |
Peptic Ulcer Disease | 1,672 (0.4%) | 293 (0.2%) | 0.026 | 0.020, 0.032 | |
COPD | 102,194 (23.8%) | 23,707 (19.0%) | 0.115 | 0.108, 0.121 | |
Peripheral Vascular Disease | 62,941 (14.6%) | 14,995 (12.0%) | 0.076 | 0.070, 0.082 | |
Renal Failure | 30,407 (7.1%) | 11,908 (9.5%) | -0.093 | -0.099, -0.086 | |
Rheumatoid Arthritis | 21,620 (5.0%) | 4,669 (3.7%) | 0.061 | 0.055, 0.067 | |
Solid Tumor | 72,259 (16.8%) | 13,665 (10.9%) | 0.163 | 0.143, 0.188 | |
Valvular Disease | 60,458 (14.1%) | 13,805 (11.1%) | 0.089 | 0.082, 0.095 | |
Weight Loss | 19,310 (4.5%) | 3,334 (2.7%) | 0.092 | 0.086, 0.099 | |
Hypotension | Orthostatic Hypotension | 5,000 (1.2%) | 898 (0.7%) | 0.043 | 0.037, 0.050 |
Other Hypotension | 12,550 (2.9%) | 2,129 (1.7%) | 0.076 | 0.069, 0.082 | |
Prostate Specific Antigen | PSA Measurement Taken | 247,089 (57.5%) | 71,601 (57.3%) | 0.003 | -0.003, 0.010 |
Diagnosis of Abnormal PSA | 69,520 (16.2%) | 13,950 (11.2%) | 0.140 | 0.134, 0.147 | |
Bladder Function and Urinary Flow | Slow Urinary Stream Diagnosis | 13,557 (3.2%) | 2,908 (2.3%) | 0.049 | 0.042, 0.055 |
Uroflow Study Performed | 27,946 (6.5%) | 7,554 (6.0%) | 0.019 | 0.012, 0.025 | |
Cystometrogram Performed | 3,304 (0.8%) | 892 (0.7%) | 0.006 | 0.000, 0.013 | |
Diagnosis of BPH | 179,924 (41.9%) | 41,958 (33.6%) | 0.169 | 0.149, 0.196 | |
Other Comorbidities | Diagnosis of Anxiety | 38,428 (8.9%) | 7,700 (6.2%) | 0.101 | 0.094, 0.107 |
Diagnosis of Erectile Dysfunction | 44,895 (10.4%) | 11,152 (8.9%) | 0.050 | 0.044, 0.057 | |
1Cohen's D | |||||
2CI = Confidence Interval |
describe_cohort(matched_model_data)
Group | Characteristic | tamsulosin, N = 119,944 | tz/dz/az, N = 119,944 | Difference1 | 95% CI12 |
Age and Medication Start Time | Age at Index Date | 62 (55, 70) | 62 (55, 70) | 0.015 | 0.007, 0.023 |
Medication Index Date | 2,009.0 (2,006.0, 2,012.0) | 2,009.0 (2,006.0, 2,012.0) | 0.011 | 0.003, 0.019 | |
Duration of Enrollment Time | Lookback Time (Years) | 3.19 (1.88, 5.50) | 2.94 (1.76, 5.30) | 0.061 | 0.052, 0.069 |
Follow Up Time (Years) | 2.07 (0.92, 4.29) | 2.05 (0.93, 4.29) | -0.001 | -0.009, 0.007 | |
Rates of Inpatient and Outpatient Encounters | Inpatient Events Per Year | 0.00 (0.00, 0.14) | 0.00 (0.00, 0.13) | 0.005 | -0.003, 0.013 |
Outpatient Events Per Year | 9 (5, 16) | 9 (5, 16) | 0.008 | 0.000, 0.016 | |
Mean Number of DX Codes Per Outpatient Event | 1.39 (1.19, 1.87) | 1.38 (1.19, 1.86) | 0.006 | -0.002, 0.014 | |
Outpatient DX Code Incidence Per Year | 0.04 (0.02, 0.07) | 0.04 (0.02, 0.07) | 0.010 | 0.002, 0.018 | |
mean_ndx_out_rate | 14 (7, 25) | 14 (7, 25) | 0.010 | 0.002, 0.018 | |
Elixhauser/AHRQ Comorbidity Flags | Alcohol Abuse | 2,684 (2.2%) | 2,599 (2.2%) | 0.005 | -0.003, 0.013 |
Anemia | 16,334 (13.6%) | 16,209 (13.5%) | 0.003 | -0.005, 0.011 | |
Blood Loss | 1,494 (1.2%) | 1,456 (1.2%) | 0.003 | -0.005, 0.011 | |
Heart Failure | 9,925 (8.3%) | 9,705 (8.1%) | 0.007 | -0.001, 0.015 | |
Coagulopathy | 3,549 (3.0%) | 3,506 (2.9%) | 0.002 | -0.006, 0.010 | |
Depression | 8,731 (7.3%) | 8,660 (7.2%) | 0.002 | -0.006, 0.010 | |
Diabetes Without Complications | 34,932 (29.1%) | 34,735 (29.0%) | 0.004 | -0.004, 0.012 | |
Diabetes With Complications | 13,786 (11.5%) | 13,616 (11.4%) | 0.004 | -0.004, 0.012 | |
Drug Abuse | 1,300 (1.1%) | 1,299 (1.1%) | 0.000 | -0.008, 0.008 | |
Fluid or Electrolyte Disorders | 13,302 (11.1%) | 13,181 (11.0%) | 0.003 | -0.005, 0.011 | |
HIV | 493 (0.4%) | 467 (0.4%) | 0.003 | -0.005, 0.011 | |
HTN Without Complications | 82,991 (69.2%) | 82,654 (68.9%) | 0.006 | -0.002, 0.014 | |
HTN With Complications | 19,857 (16.6%) | 19,442 (16.2%) | 0.009 | 0.001, 0.017 | |
Hypothyroidism | 9,647 (8.0%) | 9,449 (7.9%) | 0.006 | -0.002, 0.014 | |
Liver Disease | 4,807 (4.0%) | 4,737 (3.9%) | 0.003 | -0.005, 0.011 | |
Lymphoma | 1,296 (1.1%) | 1,266 (1.1%) | 0.002 | -0.006, 0.010 | |
Metastatic Cancer | 1,542 (1.3%) | 1,514 (1.3%) | 0.002 | -0.006, 0.010 | |
Other Neurologic Disorders | 7,305 (6.1%) | 7,159 (6.0%) | 0.005 | -0.003, 0.013 | |
Obesity | 11,270 (9.4%) | 11,260 (9.4%) | 0.000 | -0.008, 0.008 | |
Paralysis | 2,013 (1.7%) | 1,985 (1.7%) | 0.002 | -0.006, 0.010 | |
Pulmonary Hypertension | 2,480 (2.1%) | 2,386 (2.0%) | 0.006 | -0.002, 0.014 | |
Psychoses | 6,816 (5.7%) | 6,637 (5.5%) | 0.006 | -0.002, 0.014 | |
Peptic Ulcer Disease | 290 (0.2%) | 290 (0.2%) | 0.000 | -0.008, 0.008 | |
COPD | 23,561 (19.6%) | 23,283 (19.4%) | 0.006 | -0.002, 0.014 | |
Peripheral Vascular Disease | 14,877 (12.4%) | 14,648 (12.2%) | 0.006 | -0.002, 0.014 | |
Renal Failure | 11,382 (9.5%) | 11,288 (9.4%) | 0.003 | -0.005, 0.011 | |
Rheumatoid Arthritis | 4,615 (3.8%) | 4,589 (3.8%) | 0.001 | -0.007, 0.009 | |
Solid Tumor | 13,662 (11.4%) | 13,507 (11.3%) | 0.004 | -0.004, 0.012 | |
Valvular Disease | 13,851 (11.5%) | 13,586 (11.3%) | 0.007 | -0.001, 0.015 | |
Weight Loss | 3,380 (2.8%) | 3,293 (2.7%) | 0.004 | -0.004, 0.012 | |
Hypotension | Orthostatic Hypotension | 957 (0.8%) | 893 (0.7%) | 0.006 | -0.002, 0.014 |
Other Hypotension | 2,160 (1.8%) | 2,110 (1.8%) | 0.003 | -0.005, 0.011 | |
Prostate Specific Antigen | PSA Measurement Taken | 69,252 (57.7%) | 68,586 (57.2%) | 0.011 | 0.003, 0.019 |
Diagnosis of Abnormal PSA | 14,040 (11.7%) | 13,746 (11.5%) | 0.008 | 0.000, 0.016 | |
Bladder Function and Urinary Flow | Slow Urinary Stream Diagnosis | 3,061 (2.6%) | 2,891 (2.4%) | 0.009 | 0.001, 0.017 |
Uroflow Study Performed | 7,682 (6.4%) | 7,428 (6.2%) | 0.009 | 0.001, 0.017 | |
Cystometrogram Performed | 910 (0.8%) | 886 (0.7%) | 0.002 | -0.006, 0.010 | |
Diagnosis of BPH | 41,863 (34.9%) | 41,127 (34.3%) | 0.013 | 0.005, 0.021 | |
Other Comorbidities | Diagnosis of Anxiety | 7,737 (6.5%) | 7,628 (6.4%) | 0.004 | -0.004, 0.012 |
Diagnosis of Erectile Dysfunction | 11,200 (9.3%) | 10,965 (9.1%) | 0.007 | -0.001, 0.015 | |
1Cohen's D | |||||
2CI = Confidence Interval |
We can also visually inspect the estimated propensity scores for overlap:
ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tz_tam.rds"))
bind_rows(
model_data %>%
inner_join(ps, by = "enrolid") %>%
mutate(type = "Before Matching"),
matched_model_data %>%
inner_join(ps, by = "enrolid") %>%
mutate(type = "After Matching")
) %>%
mutate(type = forcats::fct_rev(type)) %>%
ggplot(aes(x = ptx, fill = drug, color = drug)) +
geom_density(alpha = 0.5) +
labs(x = "Probability of Treatment with Tamsulosin",
y = "Density",
color = "Observed Treatment",
fill = "Observed Treatment") +
theme_minimal() +
theme(legend.position = "bottom") +
facet_wrap(~ type, nrow = 2)
It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.
Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.
survfit(Surv(survival_time, develops_pd) ~ treatment,
data = model_data) %>%
broom::tidy() %>%
mutate(treatment = ifelse(strata == "treatment=TRUE",
"Tamsulosin", "5ARI")) %>%
ggplot(
aes(
x = time / 365,
y = 100 * estimate,
ymin = 100 * conf.low,
ymax = 100 * conf.high
)
) +
geom_step(aes(color = treatment)) +
geom_ribbon(aes(fill = treatment), alpha = 0.25) +
theme_minimal() +
labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
color = "", fill = "") +
theme(legend.position = c(0.25, 0.25))
A finding that is statistically significant using the log rank test:
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment,
data = model_data)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE 429741 4867 4476 34.2 143
treatment=TRUE 124905 1022 1413 108.3 143
Chisq= 143 on 1 degrees of freedom, p= <2e-16
The Cox regression pegs the protection at HR at 0.67 (0.62, 0.71).
raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment),
data = model_data,
robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)
Characteristic | HR1 | 95% CI1 | p-value |
---|---|---|---|
as.numeric(treatment) | 0.66 | 0.62, 0.71 | <0.001 |
1
HR = Hazard Ratio, CI = Confidence Interval
|
However, this estimate is not trustworthy as we know the groups could be better balanced.
Further adjustment yields similar results.
We observe the following overall incidence:
matched_model_data %>%
group_by(drug) %>%
summarize(
cases = sum(develops_pd),
time = sum(survival_time / 365),
incidence = cases / time * 10000
) %>%
knitr::kable()
drug | cases | time | incidence |
---|---|---|---|
tamsulosin | 1391 | 358504.4 | 38.80008 |
tz/dz/az | 991 | 359705.2 | 27.55034 |
With a Kaplan Meier curve:
survfit(Surv(survival_time, develops_pd) ~ drug,
data = matched_model_data) %>%
broom::tidy() %>%
mutate(treatment = ifelse(strata == "drug=tamsulosin",
"Tamsulosin", "5ARI")) %>%
ggplot(
aes(
x = time / 365,
y = 100 * estimate,
ymin = 100 * conf.low,
ymax = 100 * conf.high
)
) +
geom_step(aes(color = treatment)) +
geom_ribbon(aes(fill = treatment), alpha = 0.25) +
theme_minimal() +
labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
color = "", fill = "") +
theme(legend.position = c(0.25, 0.25))
The population size at risk by year is given in the table below:
survfit(Surv(survival_time, develops_pd) ~ drug,
data = matched_model_data) %>%
broom::tidy() %>%
select(strata, time,, n.risk) %>%
complete(strata, time = 0:(16*365)) %>%
fill(n.risk, .direction = "down") %>%
mutate(drug = ifelse(strata == "drug=tamsulosin", "Tamsulosin", "TZ/DZ/AZ")) %>%
select(time, drug, n.risk) %>%
filter(time %in% c(1:16 * 365)) %>%
mutate(year = round(time / 365)) %>%
spread(drug, n.risk) %>%
knitr::kable()
time | year | Tamsulosin | TZ/DZ/AZ |
---|---|---|---|
365 | 1 | 86903 | 87067 |
730 | 2 | 60751 | 60535 |
1095 | 3 | 44291 | 44272 |
1460 | 4 | 32414 | 32461 |
1825 | 5 | 23443 | 23524 |
2190 | 6 | 16894 | 16944 |
2555 | 7 | 12166 | 12186 |
2920 | 8 | 8306 | 8332 |
3285 | 9 | 5535 | 5554 |
3650 | 10 | 3484 | 3568 |
4015 | 11 | 2124 | 2146 |
4380 | 12 | 1133 | 1121 |
4745 | 13 | 629 | 646 |
5110 | 14 | 304 | 317 |
5475 | 15 | 121 | 117 |
5840 | 16 | 2 | 3 |
A log rank test confirms that the curves remain different.
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment,
data = matched_model_data)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 119944 1391 1189 34.3 68.5
treatment=1 119944 991 1193 34.2 68.5
Chisq= 68.5 on 1 degrees of freedom, p= <2e-16
Survival times
tbl_survfit(
survfit(
Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
),
probs = seq(0.5, 2, by = 0.5) / 100
)
Characteristic | 0.5% Percentile | 1.0% Percentile | 1.5% Percentile | 2.0% Percentile |
---|---|---|---|---|
drug | ||||
tamsulosin | 442 (412, 496) | 958 (890, 1,015) | 1,463 (1,380, 1,584) | 1,996 (1,891, 2,156) |
tz/dz/az | 595 (547, 651) | 1,317 (1,237, 1,481) | 2,100 (1,968, 2,264) | 2,768 (2,578, 3,030) |
We fit a Cox regression model to estimate the differences in hazard:
full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment),
data = matched_model_data,
robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)
Characteristic | HR1 | 95% CI1 | p-value |
---|---|---|---|
as.numeric(treatment) | 0.71 | 0.65, 0.77 | <0.001 |
1
HR = Hazard Ratio, CI = Confidence Interval
|
The Cox regression yields a HR of 0.71 (0.66, 0.78; p < 0.001).
tbl_merge(
tbls = list(
tbl_regression(raw_cox, exponentiate = TRUE),
tbl_regression(full_cox, exponentiate = TRUE)
),
tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>%
as_flex_table()
| Unadjusted | Propensity Score Matched | ||||
Characteristic | HR1 | 95% CI1 | p-value | HR1 | 95% CI1 | p-value |
as.numeric(treatment) | 0.66 | 0.62, 0.71 | <0.001 | 0.71 | 0.65, 0.77 | <0.001 |
1HR = Hazard Ratio, CI = Confidence Interval |
scf_residuals <- cox.zph(full_cox)
scf_residuals
chisq df p
as.numeric(treatment) 1.98 1 0.16
GLOBAL 1.98 1 0.16
Pearson's product-moment correlation
data: x and y
t = -1.4089, df = 2380, p-value = 0.159
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.06894934 0.01130818
sample estimates:
cor
-0.0288671
tibble(
x = scf_residuals$time,
y = scf_residuals$y
) %>%
ggplot(aes(x = x / 365, y = y)) +
geom_point() +
geom_smooth(method = "loess") +
labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") +
theme_minimal()
There is statistically significant correlation with the Schoenfeld residuals; however, we are likely over-powered to detect a deviation. Looking at the estimated correlation (-0.059), we think it is unlikely that this has a meaningful impact on our estimated hazard ratio.