Hello, As I am quitte an ignorant user of R, excuse me for any wrongfull usage of all the terms. My question relates to the statistics behind the survdiff function in the package survival. My textbook knowledge of the logrank test tells me that if I want to compare two survival curves, I have to take the sum of the factors: (O-E)^2/E of both groups, which will give me the Chisq. If I calculate this by hand, I get a different value than the one R is giving me. Actually, the (O-E)^2/E that R gives me, those I agree with, but if I then take the sum, this is not the chisq R gives. Two questions: - How is Chisq calculated in R? - What does the column (O-E)^2/V mean? What is V, and how does this possibly relate to the calculated Chisq? The syntax would be something like this: gr1<-c(1,2,3,4,5) gr <-rep(1,length(gr1)) gr2<-c(6,7,8,9) gr<-c(gr,rep(2,length(gr2))) surv<-c(gr1,gr2) event<-rep(1,7) event<-c(event,0,0) mydatafile<-cbind(surv,gr,event) mydatafile <- data.frame(mydatafile) mydatafile$gr<-factor(mydatafile$gr) library(survival) mydatafile.LR<-survdiff(Surv(surv,event)~gr,data=mydatafile) print(mydatafile.LR) And the response would be: Call: survdiff(formula = Surv(surv, event) ~ gr, data = mydatafile) N Observed Expected (O-E)^2/E (O-E)^2/V gr=1 5 5 2.02 4.41 7.91 gr=2 4 2 4.98 1.79 7.91 Chisq= 7.9 on 1 degrees of freedom, p= 0.00491 But, as said, I, with my textbook knowledge, I would calculate Chisq as: 4.41+1.19=6.20 Hopefully someone can clarify this for me. Sincerely, Krista Haanstra
"Krista Haanstra" <krista at aha.demon.nl> writes:> As I am quitte an ignorant user of R, excuse me for any wrongfull usage of > all the terms. > My question relates to the statistics behind the survdiff function in the > package survival. > My textbook knowledge of the logrank test tells me that if I want to compare > two survival curves, I have to take the sum of the factors: (O-E)^2/E of > both groups, which will give me the Chisq. > If I calculate this by hand, I get a different value than the one R is > giving me. > Actually, the (O-E)^2/E that R gives me, those I agree with, but if I then > take the sum, this is not the chisq R gives. > Two questions: > - How is Chisq calculated in R? > - What does the column (O-E)^2/V mean? What is V, and how does this possibly > relate to the calculated Chisq?You really need to read a theory book for this, but here's the basic idea: V is the theoretical variance of O-E for the first group. If O-E is approximately normally distributed, as it will be in large samples, then (O-E)^2/V will be approximately chi-squared distributed on 1 DF. In *other* models, notably those for contingency tables, the same idea works out as the familiar sum((O-E)^2/E) formula. That formula has historically been used for the logrank test too, and it still appears in some textbooks, but as it turns out, it is not actually correct (although often quite close). [To fix ideas, consider testing for a given p in the binomial distribution, you can either say O=x E=np V=npq and get chisq = (x-np)^2/npq or have O = (x, n-x), E = (np, nq) and get chisq = (x-np)^2/np + ((n-x) - nq)^2/nq and a little calculus show that the latter expression is = (x-np)^2*(1/np + 1/nq) = (x-np)^2 * (p+q)/npq so the two formulas are one and the same. In this case!] -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907