Dear list members, I'm unable to fit a parametric survival regression using survreg() in the survival package with data in "counting-process" ("long") form. To illustrate using a scaled-down problem with 10 subjects (with data placed on the web): --------------- snip ------------> library(survival) > RW <- read.table("http://socserv.mcmaster.ca/jfox/.Pickup/RW.txt") > RL <- read.table("http://socserv.mcmaster.ca/jfox/.Pickup/RL.txt")> RW # "wide" dataweek arrest age 1 20 1 27 2 17 1 18 3 25 1 19 4 52 0 23 5 52 0 19 6 52 0 24 7 23 1 25 8 52 0 21 9 52 0 22 10 52 0 20> head(RL, 20) # "long" data, counting-process formstart stop arrest.time age 1.1 0 1 0 27 1.2 1 2 0 27 1.3 2 3 0 27 1.4 3 4 0 27 1.5 4 5 0 27 1.6 5 6 0 27 1.7 6 7 0 27 1.8 7 8 0 27 1.9 8 9 0 27 1.10 9 10 0 27 1.11 10 11 0 27 1.12 11 12 0 27 1.13 12 13 0 27 1.14 13 14 0 27 1.15 14 15 0 27 1.16 15 16 0 27 1.17 16 17 0 27 1.18 17 18 0 27 1.19 18 19 0 27 1.20 19 20 1 27 --------------- snip ------------ I have no trouble fitting a Cox model to both the wide and long forms of the data, obtaining (as should be the case) identical results: --------------- snip ------------> coxph(Surv(week, arrest) ~ age, data=RW) # worksCall: coxph(formula = Surv(week, arrest) ~ age, data = RW) coef exp(coef) se(coef) z p age 0.0963 1.1011 0.2073 0.46 0.64 Likelihood ratio test=0.21 on 1 df, p=0.643 n= 10, number of events= 4> coxph(Surv(start, stop, arrest.time) ~ age, data=RL) # works, sameCall: coxph(formula = Surv(start, stop, arrest.time) ~ age, data = RL) coef exp(coef) se(coef) z p age 0.0963 1.1011 0.2073 0.46 0.64 Likelihood ratio test=0.21 on 1 df, p=0.643 n= 397, number of events= 4 --------------- snip ------------ But when I try to fit a parametric survival regression with survreg(), I get an error with the long form of the data: --------------- snip ------------> survreg(Surv(week, arrest) ~ age, data=RW) # worksCall: survreg(formula = Surv(week, arrest) ~ age, data = RW) Coefficients: (Intercept) age 6.35386771 -0.08982624 Scale= 0.7363196 Loglik(model)= -22.1 Loglik(intercept only)= -22.2 Chisq= 0.3 on 1 degrees of freedom, p= 0.58 n= 10> survreg(Surv(start, stop, arrest.time) ~ age, data=RL) # failsError in survreg(Surv(start, stop, arrest.time) ~ age, data = RL) : Invalid survival type --------------- snip ------------ I expect that there's something about survreg() that I'm missing. I first noted this problem quite some time ago but didn't look into it carefully because I didn't really need to use survreg(). Any help would be appreciated. Thanks, John ----------------------------- John Fox, Professor McMaster University Hamilton, Ontario Canada L8S 4M4 Web: socserv.mcmaster.ca/jfox
Göran Broström
2015-Aug-29 19:59 UTC
[R] using survreg() in survival package with "long" data
Dear John, I think you are missing that 'survreg' does not handle left truncated data. With that you should use the 'eha' package, for a PH model the function 'phreg', and for an AFT model the function 'aftreg' (you didn't tell which model you want to fit). Your attempt with the 'survreg' function implies that you are satisfied with a Weibull baseline distribution, in which case you could choose either model. G?ran On 08/29/2015 07:06 PM, Fox, John wrote:> Dear list members, > > I'm unable to fit a parametric survival regression using survreg() in > the survival package with data in "counting-process" ("long") form. > > To illustrate using a scaled-down problem with 10 subjects (with data > placed on the web): > > --------------- snip ------------ >> library(survival) RW <- >> read.table("http://socserv.mcmaster.ca/jfox/.Pickup/RW.txt") RL <- >> read.table("http://socserv.mcmaster.ca/jfox/.Pickup/RL.txt") > >> RW # "wide" data > week arrest age 1 20 1 27 2 17 1 18 3 25 1 > 19 4 52 0 23 5 52 0 19 6 52 0 24 7 23 > 1 25 8 52 0 21 9 52 0 22 10 52 0 20 > >> head(RL, 20) # "long" data, counting-process form > start stop arrest.time age 1.1 0 1 0 27 1.2 1 > 2 0 27 1.3 2 3 0 27 1.4 3 4 > 0 27 1.5 4 5 0 27 1.6 5 6 0 > 27 1.7 6 7 0 27 1.8 7 8 0 27 > 1.9 8 9 0 27 1.10 9 10 0 27 1.11 > 10 11 0 27 1.12 11 12 0 27 1.13 12 > 13 0 27 1.14 13 14 0 27 1.15 14 15 > 0 27 1.16 15 16 0 27 1.17 16 17 0 > 27 1.18 17 18 0 27 1.19 18 19 0 27 > 1.20 19 20 1 27 > > --------------- snip ------------ > > I have no trouble fitting a Cox model to both the wide and long forms > of the data, obtaining (as should be the case) identical results: > > --------------- snip ------------ > >> coxph(Surv(week, arrest) ~ age, data=RW) # works > Call: coxph(formula = Surv(week, arrest) ~ age, data = RW) > > > coef exp(coef) se(coef) z p age 0.0963 1.1011 0.2073 0.46 > 0.64 > > Likelihood ratio test=0.21 on 1 df, p=0.643 n= 10, number of events> 4 >> coxph(Surv(start, stop, arrest.time) ~ age, data=RL) # works, >> same > Call: coxph(formula = Surv(start, stop, arrest.time) ~ age, data > RL) > > > coef exp(coef) se(coef) z p age 0.0963 1.1011 0.2073 0.46 > 0.64 > > Likelihood ratio test=0.21 on 1 df, p=0.643 n= 397, number of > events= 4 > > --------------- snip ------------ > > But when I try to fit a parametric survival regression with > survreg(), I get an error with the long form of the data: > > --------------- snip ------------ > >> survreg(Surv(week, arrest) ~ age, data=RW) # works > Call: survreg(formula = Surv(week, arrest) ~ age, data = RW) > > Coefficients: (Intercept) age 6.35386771 -0.08982624 > > Scale= 0.7363196 > > Loglik(model)= -22.1 Loglik(intercept only)= -22.2 Chisq= 0.3 on 1 > degrees of freedom, p= 0.58 n= 10 > >> survreg(Surv(start, stop, arrest.time) ~ age, data=RL) # fails > Error in survreg(Surv(start, stop, arrest.time) ~ age, data = RL) : > Invalid survival type > > --------------- snip ------------ > > I expect that there's something about survreg() that I'm missing. I > first noted this problem quite some time ago but didn't look into it > carefully because I didn't really need to use survreg(). > > Any help would be appreciated. > > Thanks, John ----------------------------- John Fox, Professor > McMaster University Hamilton, Ontario Canada L8S 4M4 Web: > socserv.mcmaster.ca/jfox > > ______________________________________________ R-help at r-project.org > mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the > posting guide http://www.R-project.org/posting-guide.html and provide > commented, minimal, self-contained, reproducible code. >
Dear G?ran, Thank you for responding to my query; please see below:> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of G?ran > Brostr?m > Sent: August 29, 2015 3:59 PM > To: r-help at r-project.org > Subject: Re: [R] using survreg() in survival package with "long" data > > Dear John, > > I think you are missing that 'survreg' does not handle left truncated data. With > that you should use the 'eha' package, for a PH model the function 'phreg', and > for an AFT model the function 'aftreg' (you didn't tell which model you want to > fit).That's odd, in that it's not true in general, since, e.g., survreg() can be used to fit the left-censored Tobit regression model, as illustrated in this example from ?survreg: tobinfit <- survreg(Surv(durable, durable>0, type='left') ~ age + quant, data=tobin, dist='gaussian') In fact, in my example, the data are right-censored, but in the second data set are represented in counting-process form as one-week intervals. I imagine that you're right in the sense that survreg() can't handle data like this in counting-process form. Yet, I've reviewed the documentation in the survey package but can't find any reference to this -- which is not to say that it's not there, only that I don't see it.> > Your attempt with the 'survreg' function implies that you are satisfied with a > Weibull baseline distribution, in which case you could choose either model.Right, I understand that this is the default. The problem is just a small toy example meant to illustrate the error. Thanks again, John> > G?ran > > > On 08/29/2015 07:06 PM, Fox, John wrote: > > Dear list members, > > > > I'm unable to fit a parametric survival regression using survreg() in > > the survival package with data in "counting-process" ("long") form. > > > > To illustrate using a scaled-down problem with 10 subjects (with data > > placed on the web): > > > > --------------- snip ------------ > >> library(survival) RW <- > >> read.table("http://socserv.mcmaster.ca/jfox/.Pickup/RW.txt") RL <- > >> read.table("http://socserv.mcmaster.ca/jfox/.Pickup/RL.txt") > > > >> RW # "wide" data > > week arrest age 1 20 1 27 2 17 1 18 3 25 1 > > 19 4 52 0 23 5 52 0 19 6 52 0 24 7 23 > > 1 25 8 52 0 21 9 52 0 22 10 52 0 20 > > > >> head(RL, 20) # "long" data, counting-process form > > start stop arrest.time age 1.1 0 1 0 27 1.2 1 > > 2 0 27 1.3 2 3 0 27 1.4 3 4 > > 0 27 1.5 4 5 0 27 1.6 5 6 0 > > 27 1.7 6 7 0 27 1.8 7 8 0 27 > > 1.9 8 9 0 27 1.10 9 10 0 27 1.11 > > 10 11 0 27 1.12 11 12 0 27 1.13 12 > > 13 0 27 1.14 13 14 0 27 1.15 14 15 > > 0 27 1.16 15 16 0 27 1.17 16 17 0 > > 27 1.18 17 18 0 27 1.19 18 19 0 27 > > 1.20 19 20 1 27 > > > > --------------- snip ------------ > > > > I have no trouble fitting a Cox model to both the wide and long forms > > of the data, obtaining (as should be the case) identical results: > > > > --------------- snip ------------ > > > >> coxph(Surv(week, arrest) ~ age, data=RW) # works > > Call: coxph(formula = Surv(week, arrest) ~ age, data = RW) > > > > > > coef exp(coef) se(coef) z p age 0.0963 1.1011 0.2073 0.46 > > 0.64 > > > > Likelihood ratio test=0.21 on 1 df, p=0.643 n= 10, number of events> > 4 > >> coxph(Surv(start, stop, arrest.time) ~ age, data=RL) # works, same > > Call: coxph(formula = Surv(start, stop, arrest.time) ~ age, data > > RL) > > > > > > coef exp(coef) se(coef) z p age 0.0963 1.1011 0.2073 0.46 > > 0.64 > > > > Likelihood ratio test=0.21 on 1 df, p=0.643 n= 397, number of events> > 4 > > > > --------------- snip ------------ > > > > But when I try to fit a parametric survival regression with survreg(), > > I get an error with the long form of the data: > > > > --------------- snip ------------ > > > >> survreg(Surv(week, arrest) ~ age, data=RW) # works > > Call: survreg(formula = Surv(week, arrest) ~ age, data = RW) > > > > Coefficients: (Intercept) age 6.35386771 -0.08982624 > > > > Scale= 0.7363196 > > > > Loglik(model)= -22.1 Loglik(intercept only)= -22.2 Chisq= 0.3 on 1 > > degrees of freedom, p= 0.58 n= 10 > > > >> survreg(Surv(start, stop, arrest.time) ~ age, data=RL) # fails > > Error in survreg(Surv(start, stop, arrest.time) ~ age, data = RL) : > > Invalid survival type > > > > --------------- snip ------------ > > > > I expect that there's something about survreg() that I'm missing. I > > first noted this problem quite some time ago but didn't look into it > > carefully because I didn't really need to use survreg(). > > > > Any help would be appreciated. > > > > Thanks, John ----------------------------- John Fox, Professor > > McMaster University Hamilton, Ontario Canada L8S 4M4 Web: > > socserv.mcmaster.ca/jfox > > > > ______________________________________________ R-help at r-project.org > > mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the > > posting guide http://www.R-project.org/posting-guide.html and provide > > commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.