This file was compiled on 2022-05-09 14:50:05 by jsimmeri
on argon-lc-f14-25.hpc. The script analysis_functions.R
provides:
model_data
to the workspace)root_dir
according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOSsource("~/projects/pd-preprint/analysis/analysis_functions.R")
The tibble model_data
is loaded by analysis_functions
from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data
contains data on all users of the three drugs. We want to reduce that here to just Tz/Dz/Az and 5ARI.
We also want to load our matching results:
matches <- read_rds(glue::glue("{root_dir}/matches/tz_5ari.rds"))
And using the matches, build matched versions of the tamsulosin versus 5ARI model data:
matched_model_data <- model_data %>%
select(-treatment) %>%
inner_join(matches,
by = "enrolid")
describe_cohort(model_data)
Group | Characteristic | 5ari, N = 79,133 | tz/dz/az, N = 124,905 | Difference1 | 95% CI12 |
Age and Medication Start Time | Age at Index Date | 62 (54, 70) | 62 (55, 70) | -0.047 | -0.055, -0.038 |
Medication Index Date | 2,010.0 (2,008.0, 2,013.0) | 2,009.0 (2,006.0, 2,012.0) | 0.220 | 0.194, 0.254 | |
Duration of Enrollment Time | Lookback Time (Years) | 3.15 (1.83, 5.57) | 2.85 (1.71, 5.18) | 0.073 | 0.064, 0.082 |
Follow Up Time (Years) | 2.23 (0.98, 4.43) | 1.96 (0.93, 4.19) | 0.049 | 0.041, 0.058 | |
Rates of Inpatient and Outpatient Encounters | Inpatient Events Per Year | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.12) | -0.114 | -0.123, -0.105 |
Outpatient Events Per Year | 9 (5, 16) | 9 (5, 16) | -0.051 | -0.060, -0.042 | |
Mean Number of DX Codes Per Outpatient Event | 1.40 (1.20, 1.87) | 1.37 (1.18, 1.83) | 0.037 | 0.029, 0.046 | |
Outpatient DX Code Incidence Per Year | 0.04 (0.02, 0.07) | 0.04 (0.02, 0.07) | -0.031 | -0.040, -0.022 | |
mean_ndx_out_rate | 13 (7, 24) | 14 (7, 25) | -0.031 | -0.040, -0.022 | |
Elixhauser/AHRQ Comorbidity Flags | Alcohol Abuse | 1,135 (1.4%) | 2,652 (2.1%) | -0.051 | -0.060, -0.042 |
Anemia | 9,051 (11.4%) | 16,683 (13.4%) | -0.058 | -0.067, -0.049 | |
Blood Loss | 886 (1.1%) | 1,482 (1.2%) | -0.006 | -0.015, 0.003 | |
Heart Failure | 5,190 (6.6%) | 9,957 (8.0%) | -0.054 | -0.063, -0.045 | |
Coagulopathy | 2,302 (2.9%) | 3,572 (2.9%) | 0.003 | -0.006, 0.012 | |
Depression | 5,380 (6.8%) | 8,769 (7.0%) | -0.009 | -0.018, 0.000 | |
Diabetes Without Complications | 15,703 (19.8%) | 36,097 (28.9%) | -0.209 | -0.218, -0.200 | |
Diabetes With Complications | 5,009 (6.3%) | 14,154 (11.3%) | -0.172 | -0.181, -0.163 | |
Drug Abuse | 568 (0.7%) | 1,311 (1.0%) | -0.035 | -0.044, -0.026 | |
Fluid or Electrolyte Disorders | 5,812 (7.3%) | 13,535 (10.8%) | -0.119 | -0.128, -0.110 | |
HIV | 546 (0.7%) | 481 (0.4%) | 0.043 | 0.034, 0.052 | |
HTN Without Complications | 41,769 (52.8%) | 86,651 (69.4%) | -0.348 | -0.357, -0.339 | |
HTN With Complications | 7,828 (9.9%) | 20,483 (16.4%) | -0.189 | -0.198, -0.180 | |
Hypothyroidism | 6,927 (8.8%) | 9,649 (7.7%) | 0.038 | 0.029, 0.047 | |
Liver Disease | 2,454 (3.1%) | 4,791 (3.8%) | -0.040 | -0.049, -0.031 | |
Lymphoma | 829 (1.0%) | 1,291 (1.0%) | 0.001 | -0.008, 0.010 | |
Metastatic Cancer | 1,090 (1.4%) | 1,525 (1.2%) | 0.014 | 0.005, 0.023 | |
Other Neurologic Disorders | 4,419 (5.6%) | 7,231 (5.8%) | -0.009 | -0.018, 0.000 | |
Obesity | 4,462 (5.6%) | 11,396 (9.1%) | -0.130 | -0.139, -0.122 | |
Paralysis | 960 (1.2%) | 2,017 (1.6%) | -0.034 | -0.042, -0.025 | |
Pulmonary Hypertension | 1,521 (1.9%) | 2,402 (1.9%) | 0.000 | -0.009, 0.009 | |
Psychoses | 4,098 (5.2%) | 6,706 (5.4%) | -0.008 | -0.017, 0.000 | |
Peptic Ulcer Disease | 183 (0.2%) | 293 (0.2%) | -0.001 | -0.010, 0.008 | |
COPD | 13,975 (17.7%) | 23,707 (19.0%) | -0.034 | -0.043, -0.025 | |
Peripheral Vascular Disease | 8,260 (10.4%) | 14,995 (12.0%) | -0.049 | -0.058, -0.040 | |
Renal Failure | 3,431 (4.3%) | 11,908 (9.5%) | -0.198 | -0.207, -0.189 | |
Rheumatoid Arthritis | 2,785 (3.5%) | 4,669 (3.7%) | -0.012 | -0.021, -0.003 | |
Solid Tumor | 10,371 (13.1%) | 13,665 (10.9%) | 0.067 | 0.058, 0.076 | |
Valvular Disease | 9,097 (11.5%) | 13,805 (11.1%) | 0.014 | 0.005, 0.023 | |
Weight Loss | 2,219 (2.8%) | 3,334 (2.7%) | 0.008 | -0.001, 0.017 | |
Hypotension | Orthostatic Hypotension | 748 (0.9%) | 898 (0.7%) | 0.025 | 0.016, 0.034 |
Other Hypotension | 1,451 (1.8%) | 2,129 (1.7%) | 0.010 | 0.001, 0.019 | |
Prostate Specific Antigen | PSA Measurement Taken | 48,034 (60.7%) | 71,601 (57.3%) | 0.069 | 0.060, 0.077 |
Diagnosis of Abnormal PSA | 23,538 (29.7%) | 13,950 (11.2%) | 0.493 | 0.543, 0.543 | |
Bladder Function and Urinary Flow | Slow Urinary Stream Diagnosis | 1,936 (2.4%) | 2,908 (2.3%) | 0.008 | -0.001, 0.017 |
Uroflow Study Performed | 5,556 (7.0%) | 7,554 (6.0%) | 0.040 | 0.031, 0.049 | |
Cystometrogram Performed | 524 (0.7%) | 892 (0.7%) | -0.006 | -0.015, 0.003 | |
Diagnosis of BPH | 35,618 (45.0%) | 41,958 (33.6%) | 0.237 | 0.208, 0.273 | |
Other Comorbidities | Diagnosis of Anxiety | 5,406 (6.8%) | 7,700 (6.2%) | 0.027 | 0.018, 0.036 |
Diagnosis of Erectile Dysfunction | 7,153 (9.0%) | 11,152 (8.9%) | 0.004 | -0.005, 0.013 | |
1Cohen's D | |||||
2CI = Confidence Interval |
describe_cohort(matched_model_data)
Group | Characteristic | 5ari, N = 64,558 | tz/dz/az, N = 64,558 | Difference1 | 95% CI12 |
Age and Medication Start Time | Age at Index Date | 61 (54, 70) | 61 (54, 70) | 0.005 | -0.006, 0.016 |
Medication Index Date | 2,010.0 (2,007.0, 2,013.0) | 2,010.0 (2,007.0, 2,012.0) | 0.032 | 0.021, 0.043 | |
Duration of Enrollment Time | Lookback Time (Years) | 3.10 (1.82, 5.48) | 3.02 (1.80, 5.43) | 0.012 | 0.001, 0.023 |
Follow Up Time (Years) | 2.16 (0.95, 4.37) | 2.15 (0.94, 4.37) | 0.001 | -0.010, 0.012 | |
Rates of Inpatient and Outpatient Encounters | Inpatient Events Per Year | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.001 | -0.010, 0.011 |
Outpatient Events Per Year | 9 (5, 16) | 9 (5, 16) | 0.002 | -0.008, 0.013 | |
Mean Number of DX Codes Per Outpatient Event | 1.40 (1.19, 1.88) | 1.39 (1.19, 1.86) | 0.019 | 0.008, 0.030 | |
Outpatient DX Code Incidence Per Year | 0.04 (0.02, 0.07) | 0.04 (0.02, 0.06) | 0.008 | -0.002, 0.019 | |
mean_ndx_out_rate | 13 (7, 24) | 13 (7, 24) | 0.008 | -0.002, 0.019 | |
Elixhauser/AHRQ Comorbidity Flags | Alcohol Abuse | 1,050 (1.6%) | 1,053 (1.6%) | 0.000 | -0.011, 0.011 |
Anemia | 7,560 (11.7%) | 7,490 (11.6%) | 0.003 | -0.008, 0.014 | |
Blood Loss | 724 (1.1%) | 733 (1.1%) | -0.001 | -0.012, 0.010 | |
Heart Failure | 4,400 (6.8%) | 4,335 (6.7%) | 0.004 | -0.007, 0.015 | |
Coagulopathy | 1,866 (2.9%) | 1,850 (2.9%) | 0.001 | -0.009, 0.012 | |
Depression | 4,526 (7.0%) | 4,598 (7.1%) | -0.004 | -0.015, 0.007 | |
Diabetes Without Complications | 13,925 (21.6%) | 13,976 (21.6%) | -0.002 | -0.013, 0.009 | |
Diabetes With Complications | 4,584 (7.1%) | 4,582 (7.1%) | 0.000 | -0.011, 0.011 | |
Drug Abuse | 524 (0.8%) | 515 (0.8%) | 0.002 | -0.009, 0.012 | |
Fluid or Electrolyte Disorders | 5,097 (7.9%) | 5,072 (7.9%) | 0.001 | -0.009, 0.012 | |
HIV | 371 (0.6%) | 356 (0.6%) | 0.003 | -0.008, 0.014 | |
HTN Without Complications | 36,318 (56.3%) | 36,374 (56.3%) | -0.002 | -0.013, 0.009 | |
HTN With Complications | 7,050 (10.9%) | 6,969 (10.8%) | 0.004 | -0.007, 0.015 | |
Hypothyroidism | 5,508 (8.5%) | 5,412 (8.4%) | 0.005 | -0.006, 0.016 | |
Liver Disease | 2,111 (3.3%) | 2,162 (3.3%) | -0.004 | -0.015, 0.006 | |
Lymphoma | 661 (1.0%) | 686 (1.1%) | -0.004 | -0.015, 0.007 | |
Metastatic Cancer | 873 (1.4%) | 893 (1.4%) | -0.003 | -0.014, 0.008 | |
Other Neurologic Disorders | 3,663 (5.7%) | 3,665 (5.7%) | 0.000 | -0.011, 0.011 | |
Obesity | 4,109 (6.4%) | 4,035 (6.3%) | 0.005 | -0.006, 0.016 | |
Paralysis | 851 (1.3%) | 839 (1.3%) | 0.002 | -0.009, 0.013 | |
Pulmonary Hypertension | 1,234 (1.9%) | 1,241 (1.9%) | -0.001 | -0.012, 0.010 | |
Psychoses | 3,418 (5.3%) | 3,494 (5.4%) | -0.005 | -0.016, 0.006 | |
Peptic Ulcer Disease | 153 (0.2%) | 147 (0.2%) | 0.002 | -0.009, 0.013 | |
COPD | 11,603 (18.0%) | 11,709 (18.1%) | -0.004 | -0.015, 0.007 | |
Peripheral Vascular Disease | 6,894 (10.7%) | 6,879 (10.7%) | 0.001 | -0.010, 0.012 | |
Renal Failure | 3,246 (5.0%) | 3,130 (4.8%) | 0.008 | -0.003, 0.019 | |
Rheumatoid Arthritis | 2,345 (3.6%) | 2,344 (3.6%) | 0.000 | -0.011, 0.011 | |
Solid Tumor | 7,988 (12.4%) | 7,987 (12.4%) | 0.000 | -0.011, 0.011 | |
Valvular Disease | 7,290 (11.3%) | 7,323 (11.3%) | -0.002 | -0.013, 0.009 | |
Weight Loss | 1,807 (2.8%) | 1,784 (2.8%) | 0.002 | -0.009, 0.013 | |
Hypotension | Orthostatic Hypotension | 566 (0.9%) | 547 (0.8%) | 0.003 | -0.008, 0.014 |
Other Hypotension | 1,158 (1.8%) | 1,154 (1.8%) | 0.000 | -0.010, 0.011 | |
Prostate Specific Antigen | PSA Measurement Taken | 38,395 (59.5%) | 38,094 (59.0%) | 0.009 | -0.001, 0.020 |
Diagnosis of Abnormal PSA | 13,277 (20.6%) | 12,868 (19.9%) | 0.016 | 0.005, 0.027 | |
Bladder Function and Urinary Flow | Slow Urinary Stream Diagnosis | 1,621 (2.5%) | 1,688 (2.6%) | -0.007 | -0.017, 0.004 |
Uroflow Study Performed | 4,459 (6.9%) | 4,715 (7.3%) | -0.015 | -0.026, -0.005 | |
Cystometrogram Performed | 452 (0.7%) | 502 (0.8%) | -0.009 | -0.020, 0.002 | |
Diagnosis of BPH | 26,978 (41.8%) | 27,429 (42.5%) | -0.014 | -0.025, -0.003 | |
Other Comorbidities | Diagnosis of Anxiety | 4,375 (6.8%) | 4,369 (6.8%) | 0.000 | -0.011, 0.011 |
Diagnosis of Erectile Dysfunction | 5,860 (9.1%) | 5,930 (9.2%) | -0.004 | -0.015, 0.007 | |
1Cohen's D | |||||
2CI = Confidence Interval |
We can also visually inspect the estimated propensity scores for overlap:
ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tz_5ari.rds"))
bind_rows(
model_data %>%
inner_join(ps, by = "enrolid") %>%
mutate(type = "Before Matching"),
matched_model_data %>%
inner_join(ps, by = "enrolid") %>%
mutate(type = "After Matching")
) %>%
mutate(type = forcats::fct_rev(type)) %>%
ggplot(aes(x = ptx, fill = drug, color = drug)) +
geom_density(alpha = 0.5) +
labs(x = "Probability of Treatment with Tamsulosin",
y = "Density",
color = "Observed Treatment",
fill = "Observed Treatment") +
theme_minimal() +
theme(legend.position = "bottom") +
facet_wrap(~ type, nrow = 2)
It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.
Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.
survfit(Surv(survival_time, develops_pd) ~ treatment,
data = model_data) %>%
broom::tidy() %>%
mutate(treatment = ifelse(strata == "treatment=TRUE",
"Tamsulosin", "5ARI")) %>%
ggplot(
aes(
x = time / 365,
y = 100 * estimate,
ymin = 100 * conf.low,
ymax = 100 * conf.high
)
) +
geom_step(aes(color = treatment)) +
geom_ribbon(aes(fill = treatment), alpha = 0.25) +
theme_minimal() +
labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
color = "", fill = "") +
theme(legend.position = c(0.25, 0.25))
A finding that is statistically significant using the log rank test:
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment,
data = model_data)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE 79133 765 711 4.13 6.86
treatment=TRUE 124905 1022 1076 2.73 6.86
Chisq= 6.9 on 1 degrees of freedom, p= 0.009
The Cox regression pegs the protection at HR at 0.88 (0.80, 0.97).
raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment),
data = model_data,
robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)
Characteristic | HR1 | 95% CI1 | p-value |
---|---|---|---|
as.numeric(treatment) | 0.88 | 0.80, 0.97 | 0.009 |
1
HR = Hazard Ratio, CI = Confidence Interval
|
However, this estimate is not trustworthy as we know the groups could be better balanced.
Further adjustment yields similar results:
We observe the following overall incidence:
matched_model_data %>%
group_by(drug) %>%
summarize(
cases = sum(develops_pd),
time = sum(survival_time / 365),
incidence = cases / time * 10000
) %>%
knitr::kable()
drug | cases | time | incidence |
---|---|---|---|
5ari | 620 | 196284.6 | 31.58680 |
tz/dz/az | 521 | 196256.5 | 26.54689 |
With a Kaplan Meier curve:
survfit(Surv(survival_time, develops_pd) ~ drug,
data = matched_model_data) %>%
broom::tidy() %>%
mutate(treatment = ifelse(strata == "drug=5ari",
"5ARI", "TZ/DZ/AZ")) %>%
ggplot(
aes(
x = time / 365,
y = 100 * estimate,
ymin = 100 * conf.low,
ymax = 100 * conf.high
)
) +
geom_step(aes(color = treatment)) +
geom_ribbon(aes(fill = treatment), alpha = 0.25) +
theme_minimal() +
labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
color = "", fill = "") +
theme(legend.position = c(0.25, 0.25))
The population size at risk by year is given in the table below:
survfit(Surv(survival_time, develops_pd) ~ drug,
data = matched_model_data) %>%
broom::tidy() %>%
select(strata, time,, n.risk) %>%
complete(strata, time = 0:(16*365)) %>%
fill(n.risk, .direction = "down") %>%
mutate(drug = ifelse(strata == "drug=5ari", "5ARI", "TZ/DZ/AZ")) %>%
select(time, drug, n.risk) %>%
filter(time %in% c(1:16 * 365)) %>%
mutate(year = round(time / 365)) %>%
spread(drug, n.risk) %>%
knitr::kable()
time | year | 5ARI | TZ/DZ/AZ |
---|---|---|---|
365 | 1 | 47223 | 47125 |
730 | 2 | 33639 | 33552 |
1095 | 3 | 24454 | 24446 |
1460 | 4 | 17905 | 17858 |
1825 | 5 | 12985 | 12986 |
2190 | 6 | 9315 | 9341 |
2555 | 7 | 6648 | 6645 |
2920 | 8 | 4504 | 4493 |
3285 | 9 | 2993 | 2975 |
3650 | 10 | 1865 | 1877 |
4015 | 11 | 1089 | 1072 |
4380 | 12 | 536 | 511 |
4745 | 13 | 266 | 270 |
5110 | 14 | 115 | 127 |
5475 | 15 | 34 | 35 |
5840 | 16 | 2 | 2 |
A log rank test confirms that the curves remain different.
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment,
data = matched_model_data)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 64558 620 571 4.28 8.57
treatment=1 64558 521 570 4.28 8.57
Chisq= 8.6 on 1 degrees of freedom, p= 0.003
Survival times
tbl_survfit(
survfit(
Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
),
probs = seq(0.5, 2, by = 0.5) / 100
)
Characteristic | 0.5% Percentile | 1.0% Percentile | 1.5% Percentile | 2.0% Percentile |
---|---|---|---|---|
drug | ||||
5ari | 556 (495, 649) | 1,160 (1,064, 1,307) | 1,746 (1,616, 1,943) | 2,385 (2,183, 2,702) |
tz/dz/az | 640 (569, 691) | 1,342 (1,188, 1,562) | 2,140 (1,995, 2,369) | 2,815 (2,575, 3,352) |
We fit a Cox regression model to estimate the differences in hazard:
full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment),
data = matched_model_data,
robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)
Characteristic | HR1 | 95% CI1 | p-value |
---|---|---|---|
as.numeric(treatment) | 0.84 | 0.75, 0.94 | 0.003 |
1
HR = Hazard Ratio, CI = Confidence Interval
|
The Cox regression yields a HR of 0.81 (0.72, 0.92; p < 0.0001).
tbl_merge(
tbls = list(
tbl_regression(raw_cox, exponentiate = TRUE),
tbl_regression(full_cox, exponentiate = TRUE)
),
tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>%
as_flex_table()
| Unadjusted | Propensity Score Matched | ||||
Characteristic | HR1 | 95% CI1 | p-value | HR1 | 95% CI1 | p-value |
as.numeric(treatment) | 0.88 | 0.80, 0.97 | 0.009 | 0.84 | 0.75, 0.94 | 0.003 |
1HR = Hazard Ratio, CI = Confidence Interval |
scf_residuals <- cox.zph(full_cox)
scf_residuals
chisq df p
as.numeric(treatment) 0.00724 1 0.93
GLOBAL 0.00724 1 0.93
Pearson's product-moment correlation
data: x and y
t = 0.085003, df = 1139, p-value = 0.9323
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.05552426 0.06054465
sample estimates:
cor
0.002518679
tibble(
x = scf_residuals$time,
y = scf_residuals$y
) %>%
ggplot(aes(x = x / 365, y = y)) +
geom_point() +
geom_smooth(method = "loess") +
labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") +
theme_minimal()
There is a no meaningful or significant correlation in the Schoenfeld residuals.