Bonnett, Laura
2018-Sep-27 15:26 UTC
[R] Choosing between functional forms using flexible parametric survival models
Dear all,
I am using R 3.4.3 on Windows 10. I am writing code to use in a forthcoming
teaching session. As part of the workshop the students are using breast cancer
data made available by Patrick Royston and available from
http://www.statapress.com/data/fpsaus.html (I didn't pick the dataset by the
way). I would like the students to visualise linear, fractional polynomial and
spline transformations of the "node" variable using a flexible
parametric model with 3 knots for the baseline hazard. I can do this using the
"predict" option within stpm2 as follows:
flex_nodes_lin <- stpm2(Surv(rfs/12,rfsi)~nodes,
data=Practical_Rott_dev,df=3)
haz_lin <- predict(flex_nodes_lin,type="hazard")
flex_nodes_fp <-
stpm2(Surv(rfs/12,rfsi)~log(nodes),data=Practical_Rott_dev,df=3)
haz_fp <- predict(flex_nodes_fp,type="hazard")
spline3 <- stpm2(Surv(rfs/12,rfsi)~1, data=Practical_Rott_dev,df=3)
haz_spline3 <- predict(spline3,type="hazard")
data_part9 <-
data.frame(nodes,haz_lin[nodes],haz_spline3[nodes],haz_fp[nodes])
data_part9_m <- melt(data_part9,id.vars='nodes',factorsAsStrings=F)
plot_part9 <-
ggplot(data_part9_m,aes(nodes,value,colour=variable))+geom_line()+scale_colour_manual(labels=c("Linear","FP1","Spline
3
knots"),values=c("green","red","blue"))+theme_bw()
plot_part9 + labs(x="Number of positive
nodes",y="",color="") +
theme(legend.position=c(0.8,0.8))
However, to my mind using "hazard" (or "survival") leads to
a plot which do not help to understand the different functional form of
"nodes". Therefore, I would prefer to do this using the linear
predictor for each model instead. I've written the following code to do
this:
lp_nodes_lin <- flex_nodes_lin at lm$fitted.values
lp_nodes_spline <- flex_nodes_spline at lm$fitted.values
lp_nodes_fp <- flex_nodes_fp at lm$fitted.values
data_part9 <- data.frame(flex_nodes_lin at
lm$model$nodes,lp_nodes_lin,lp_nodes_spline,lp_nodes_fp)
colnames(data_part9)[1] <- "nodes"
data_part9_m <- melt(data_part9,id.vars='nodes')
plot_part9 <-
ggplot(data_part9_m,aes(nodes,value,colour=variable))+geom_line()+scale_colour_manual(labels=c("Linear","Spline
(3 knots)",
"FP1"),values=c("green","red","blue"))+theme_bw()
plot_part9 + labs(x="Number of positive
nodes",y="Prediction",color="") +
theme(legend.position=c(0.8,0.8))
I have 2 concerns over this:
1. The plots are still not the shape I would expect them to be i.e. a line
along the 45 degree line for the linear transformation, and a curve for each of
the spline and FP transformations.
2. This code is really complicated - there must be an easier way?!
Any help gratefully received!
Kind regards,
Laura
P.S. If I was doing this in the logistic regression the code would be relatively
simple:
age_mod <- glm(DAY30~AGE,family="binomial")
lp_age_lin <- predict(age_mod)
agefp1_mod <- mfp(DAY30~fp(AGE,df=2,alpha=1),family="binomial")
lp_agefp1 <- predict(agefp1_mod)
age3_mod <- glm(DAY30~age3_spline,family="binomial")
lp_age3 <- predict(age3_mod)
data_part8 <- data.frame(AGE,lp_age_lin,lp_agefp1,lp_age3)
data_part8_m <- melt(data_part8,id.vars='AGE')
plot_part8 <-
ggplot(data_part8_m,aes(AGE,value,colour=variable))+geom_line()+scale_colour_manual(labels=c("Linear","FP1","Spline
3
knots"),values=c("green","blue","red"))+theme_bw()
plot_part8 + labs(x="Age (years)",y="Linear Predictor (log
odds)",color="") + theme(legend.position=c(0.2,0.8))
[[alternative HTML version deleted]]