library("Rcpp", lib.loc = "~/R/x86_64-pc-linux-gnu-library/4.0/")
library(tidyverse)
library(gtsummary)
library(survival)
library(effectsize)
library(patchwork)
This file was compiled on 2022-05-09 14:51:26 by jsimmeri
on argon-lc-f14-25.hpc.
The script analysis_functions.R
provides:
model_data
to the workspace)root_dir
according to the location where the script is being run to handle different syntax for referring to network drives in MacOS and CentOSsource("~/projects/pd-preprint/analysis/analysis_functions.R")
The tibble model_data
is loaded by analysis_functions
from /Shared/lss_jsimmeri_backup/data/tz-5ari-final. model_data
contains data on all users of the three drugs. We want to reduce that here to just tamsulosin and 5ARI.
model_data <- model_data %>%
filter(drug %in% c("5ari", "tamsulosin")) %>%
mutate(treatment = drug == "tamsulosin")
We also want to load our matching results:
matches <- read_rds(glue::glue("{root_dir}/matches/tam_5ari.rds"))
And using the matches, build matched versions of the tamsulosin versus 5ARI model data:
matched_model_data <- model_data %>%
select(-treatment) %>%
inner_join(matches,
by = "enrolid")
describe_cohort(model_data)
describe_cohort(matched_model_data)
We can also visually inspect the estimated propensity scores for overlap:
ps <- read_rds(glue::glue("{root_dir}/matching_candidates/tam_5ari.rds"))
bind_rows(
model_data %>%
inner_join(ps, by = "enrolid") %>%
mutate(type = "Before Matching"),
matched_model_data %>%
inner_join(ps, by = "enrolid") %>%
mutate(type = "After Matching")
) %>%
mutate(type = forcats::fct_rev(type)) %>%
ggplot(aes(x = ptx, fill = drug, color = drug)) +
geom_density(alpha = 0.5) +
labs(x = "Probability of Treatment with Tamsulosin",
y = "Density",
color = "Observed Treatment",
fill = "Observed Treatment") +
theme_minimal() +
theme(legend.position = "bottom") +
facet_wrap(~ type, nrow = 2)
It is visually apparent that the two cohorts have a limited overlap before matching, which is largely addressed with matching.
Prior to matching, we observe a large survival difference with greater incidence among men taking tamsulosin then 5ARI.
survfit(Surv(survival_time, develops_pd) ~ treatment,
data = model_data) %>%
broom::tidy() %>%
mutate(treatment = ifelse(strata == "treatment=TRUE",
"Tamsulosin", "5ARI")) %>%
ggplot(
aes(
x = time / 365,
y = 100 * estimate,
ymin = 100 * conf.low,
ymax = 100 * conf.high
)
) +
geom_step(aes(color = treatment)) +
geom_ribbon(aes(fill = treatment), alpha = 0.25) +
theme_minimal() +
labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
color = "", fill = "") +
theme(legend.position = c(0.25, 0.25))
A finding that is statistically significant using the log rank test:
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment,
data = model_data)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=FALSE 79133 765 973 44.57 54
treatment=TRUE 429741 4867 4659 9.31 54
Chisq= 54 on 1 degrees of freedom, p= 2e-13
The Cox regression pegs the protection at HR at 1.32 (1.22, 1.43).
raw_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment),
data = model_data,
robust = TRUE)
tbl_regression(raw_cox, exponentiate = TRUE)
Characteristic | HR1 | 95% CI1 | p-value |
---|---|---|---|
as.numeric(treatment) | 1.33 | 1.23, 1.44 | <0.001 |
1
HR = Hazard Ratio, CI = Confidence Interval
|
However, this estimate is not trustworthy as we know the groups could be better balanced.
Further adjustment yields similar results:
We observe the following overall incidence:
matched_model_data %>%
group_by(drug) %>%
summarize(
cases = sum(develops_pd),
time = sum(survival_time / 365),
incidence = cases / time * 10000
) %>%
knitr::kable()
drug | cases | time | incidence |
---|---|---|---|
5ari | 757 | 241922.9 | 31.29096 |
tamsulosin | 846 | 241542.2 | 35.02494 |
With a Kaplan Meier curve:
survfit(Surv(survival_time, develops_pd) ~ drug,
data = matched_model_data) %>%
broom::tidy() %>%
mutate(treatment = ifelse(strata == "drug=tamsulosin",
"Tamsulosin", "5ARI")) %>%
ggplot(
aes(
x = time / 365,
y = 100 * estimate,
ymin = 100 * conf.low,
ymax = 100 * conf.high
)
) +
geom_step(aes(color = treatment)) +
geom_ribbon(aes(fill = treatment), alpha = 0.25) +
theme_minimal() +
labs(x = "Years of Follow Up", y = "Percent Remaining PD Free",
color = "", fill = "") +
theme(legend.position = c(0.25, 0.25))
The population size at risk by year is given in the table below:
survfit(Surv(survival_time, develops_pd) ~ drug,
data = matched_model_data) %>%
broom::tidy() %>%
select(strata, time,, n.risk) %>%
complete(strata, time = 0:(16*365)) %>%
fill(n.risk, .direction = "down") %>%
mutate(drug = ifelse(strata == "drug=5ari", "5ARI", "Tamsulosin")) %>%
select(time, drug, n.risk) %>%
filter(time %in% c(1:16 * 365)) %>%
mutate(year = round(time / 365)) %>%
spread(drug, n.risk) %>%
knitr::kable()
time | year | 5ARI | Tamsulosin |
---|---|---|---|
365 | 1 | 58387 | 58426 |
730 | 2 | 41709 | 41987 |
1095 | 3 | 30354 | 30486 |
1460 | 4 | 22122 | 22078 |
1825 | 5 | 16032 | 16102 |
2190 | 6 | 11475 | 11533 |
2555 | 7 | 8076 | 8111 |
2920 | 8 | 5471 | 5463 |
3285 | 9 | 3584 | 3542 |
3650 | 10 | 2238 | 2204 |
4015 | 11 | 1316 | 1301 |
4380 | 12 | 643 | 658 |
4745 | 13 | 328 | 327 |
5110 | 14 | 163 | 161 |
5475 | 15 | 43 | 48 |
5840 | 16 | 2 | 1 |
A log rank test confirms that the curves remain different but only with p = 0.05.
Call:
survdiff(formula = Surv(survival_time, develops_pd) ~ treatment,
data = matched_model_data)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 78747 757 802 2.55 5.1
treatment=1 78747 846 801 2.55 5.1
Chisq= 5.1 on 1 degrees of freedom, p= 0.02
Survival times
tbl_survfit(
survfit(
Surv(survival_time, develops_pd) ~ drug, data = matched_model_data
),
probs = seq(0.5, 2, by = 0.5) / 100
)
Characteristic | 0.5% Percentile | 1.0% Percentile | 1.5% Percentile | 2.0% Percentile |
---|---|---|---|---|
drug | ||||
5ari | 557 (497, 649) | 1,175 (1,067, 1,279) | 1,778 (1,666, 1,943) | 2,385 (2,183, 2,694) |
tamsulosin | 530 (472, 581) | 1,089 (980, 1,203) | 1,665 (1,519, 1,822) | 2,160 (1,995, 2,365) |
We fit a Cox regression model to estimate the differences in hazard:
full_cox <- coxph(Surv(survival_time, develops_pd) ~ as.numeric(treatment),
data = matched_model_data,
robust = TRUE)
tbl_regression(full_cox, exponentiate = TRUE)
Characteristic | HR1 | 95% CI1 | p-value |
---|---|---|---|
as.numeric(treatment) | 1.12 | 1.01, 1.23 | 0.024 |
1
HR = Hazard Ratio, CI = Confidence Interval
|
The Cox regression yields a HR of 1.10 (95% CI: 1.00, 1.22; p = 0.0528).
tbl_merge(
tbls = list(
tbl_regression(raw_cox, exponentiate = TRUE),
tbl_regression(full_cox, exponentiate = TRUE)
),
tab_spanner = c("Unadjusted", "Propensity Score Matched")
) %>%
as_flex_table()
scf_residuals <- cox.zph(full_cox)
scf_residuals
chisq df p
as.numeric(treatment) 4.94 1 0.026
GLOBAL 4.94 1 0.026
Pearson's product-moment correlation
data: x and y
t = 2.2253, df = 1601, p-value = 0.0262
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.006587255 0.104205887
sample estimates:
cor
0.05552927
tibble(
x = scf_residuals$time,
y = scf_residuals$y
) %>%
ggplot(aes(x = x / 365, y = y)) +
geom_point() +
geom_smooth(method = "loess") +
labs(x = "Follow Up Time in Years", y = "Schoenfeld Residual") +
theme_minimal()
There is a weak correlation that is not statistically significant when checking the Schoenfeld residuals. It is unlikely to be a serious threat to our estimates.