Liaw, Andy
2006-Mar-01 01:14 UTC
[R] Help - lm, glm, aov results inconsistent with other stati stical package
1. You have levels(A) as "2" and "4", yet you showed equations for A=0 and A=1? 2. y = A + X + A*X means you're allowing the different groups of A to have different slopes. Probably not what you intended. 3. It's probably best to provide a small sample of the data (and R code) so we know how you got what you got. Andy From: Ben Ridenhour> > Hello, > > I 'm sure there must a be a simple explanation for what I'm > doing wrong but I am stumped. I am a novice R user and this > has shaken my confidence in what I'm doing! I am trying to > run a simple ANCOVA using the model y~A*x, where y and x are > continuous and A has two levels. Everything seems fine until > I compare the output with what I get from another statistical > package (JMP). JMP has the model y=A+x+A*x (this should be > the same as what I specified to R, correct?). In comparison > I get the line equations > > y = 7072.09-1024.94 x (for A=0) and > y = 7875.58-970.088 x (for A=1) > > from JMP. And from R I get > > y = 6276.7-1259.8 x (for A=0) and > y = 7867.5-1150.1 x (for A=1). > > Obviously, these aren't even close to the same answer. I've > tried this using glm(), lm(), and aov() and as expected they > all give the same answer. If I do > > >levels(A) > [1] "2" "4" > > which are the two levels of A. Why can't I get the same > answer from JMP as in R? This is very disturbing to me! > > Thanks, > Ben > > > ------------------------------- > Benjamin Ridenhour > School of Biological Sciences > Washigton State University > P.O. Box 644236 > Pullman, WA 99164-4236 > Phone (509)335-7218 > -------------------------------- > "Nothing in biology makes sense except in the light of evolution." > -T. Dobzhansky > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Ben Ridenhour
2006-Mar-01 06:50 UTC
[R] Help - lm, glm, aov results inconsistent with other stati stical package
Alright, I'll try to give some sample code. # create A with 2 levels - 2 and 4 > A<-c(rep(2,times=30),rep(4,times=42)) # make A a factor > A<-as.factor(A) #generate 72 random x points > x<-rnorm(72) #create different slopes & intercepts for A=2 and A=4 #add a random error term > y<-c(x[(1:30)]+1,2*x[(31:72)]-2)+rnorm(72,mean=0,sd=2) #use model y~A*x for lm (or glm) > test.lm<-lm(y~A*x) #check the output > summary(test.lm) This essentially creates something like my data set and uses the same model. In response to (1), I was just using 0/1 because these are codings for the 2 levels, correct? (i.e., when A=2 the dummy variable=0, when A=4 the dummy variable=1?). In response to (2), yes, I do want different slopes for the two categories (that is what I am interested in testing). If I export the data created above to JMP and run what I believe to be the same model, I get a different answer for my equations :( ------------------------------- Benjamin Ridenhour School of Biological Sciences Washigton State University P.O. Box 644236 Pullman, WA 99164-4236 Phone (509)335-7218 -------------------------------- "Nothing in biology makes sense except in the light of evolution." -T. Dobzhansky ----- Original Message ---- From: "Liaw, Andy" <andy_liaw@merck.com> Sent: Tuesday, February 28, 2006 5:14:57 PM Subject: RE: [R] Help - lm, glm, aov results inconsistent with other stati stical package 1. You have levels(A) as "2" and "4", yet you showed equations for A=0 and A=1? 2. y = A + X + A*X means you're allowing the different groups of A to have different slopes. Probably not what you intended. 3. It's probably best to provide a small sample of the data (and R code) so we know how you got what you got. Andy From: Ben Ridenhour> > Hello, > > I 'm sure there must a be a simple explanation for what I'm > doing wrong but I am stumped. I am a novice R user and this > has shaken my confidence in what I'm doing! I am trying to > run a simple ANCOVA using the model y~A*x, where y and x are > continuous and A has two levels. Everything seems fine until > I compare the output with what I get from another statistical > package (JMP). JMP has the model y=A+x+A*x (this should be > the same as what I specified to R, correct?). In comparison > I get the line equations > > y = 7072.09-1024.94 x (for A=0) and > y = 7875.58-970.088 x (for A=1) > > from JMP. And from R I get > > y = 6276.7-1259.8 x (for A=0) and > y = 7867.5-1150.1 x (for A=1). > > Obviously, these aren't even close to the same answer. I've > tried this using glm(), lm(), and aov() and as expected they > all give the same answer. If I do > > >levels(A) > [1] "2" "4" > > which are the two levels of A. Why can't I get the same > answer from JMP as in R? This is very disturbing to me! > > Thanks, > Ben > > > ------------------------------- > Benjamin Ridenhour > School of Biological Sciences > Washigton State University > P.O. Box 644236 > Pullman, WA 99164-4236 > Phone (509)335-7218 > -------------------------------- > "Nothing in biology makes sense except in the light of evolution." > -T. Dobzhansky > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >------------------------------------------------------------------------------ ------------------------------------------------------------------------------ [[alternative HTML version deleted]]
Prof Brian Ripley
2006-Mar-01 07:55 UTC
[R] Help - lm, glm, aov results inconsistent with other stati stical package
I think we should clarify your subject line. >One< other statistical package (JMP) is giving in your hands results inconsistent with R. I have to say I am puzzled by this. Why ask the volunteers here why you get a different result in JMP from the answer given by R? You could ask your support contact for JMP, where there will be people whom you have paid to answer your question. (People here are not going to be very interested in the reliability of JMP.) On Tue, 28 Feb 2006, Ben Ridenhour wrote:> Alright, I'll try to give some sample code. > > # create A with 2 levels - 2 and 4 > > A<-c(rep(2,times=30),rep(4,times=42)) > > # make A a factor > > A<-as.factor(A) > > #generate 72 random x points > > x<-rnorm(72) > > #create different slopes & intercepts for A=2 and A=4 > #add a random error term > > y<-c(x[(1:30)]+1,2*x[(31:72)]-2)+rnorm(72,mean=0,sd=2) > > #use model y~A*x for lm (or glm) > > test.lm<-lm(y~A*x) > > #check the output > > summary(test.lm) > > This essentially creates something like my data set and uses the same > model. In response to (1), I was just using 0/1 because these are > codings for the 2 levels, correct? (i.e., when A=2 the dummy variable=0, > when A=4 the dummy variable=1?). In response to (2), yes, I do want > different slopes for the two categories (that is what I am interested in > testing). > > If I export the data created above to JMP and run what I believe to be > the same model, I get a different answer for my equations :( > > > ------------------------------- > Benjamin Ridenhour > School of Biological Sciences > Washigton State University > P.O. Box 644236 > Pullman, WA 99164-4236 > Phone (509)335-7218 > -------------------------------- > "Nothing in biology makes sense except in the light of evolution." > -T. Dobzhansky > > > ----- Original Message ---- > From: "Liaw, Andy" <andy_liaw at merck.com> > > Sent: Tuesday, February 28, 2006 5:14:57 PM > Subject: RE: [R] Help - lm, glm, aov results inconsistent with other stati stical package > > 1. You have levels(A) as "2" and "4", yet you showed equations for A=0 and > A=1? > > 2. y = A + X + A*X means you're allowing the different groups of A to have > different slopes. Probably not what you intended. > > 3. It's probably best to provide a small sample of the data (and R code) so > we know how you got what you got. > > Andy > > From: Ben Ridenhour >> >> Hello, >> >> I 'm sure there must a be a simple explanation for what I'm >> doing wrong but I am stumped. I am a novice R user and this >> has shaken my confidence in what I'm doing! I am trying to >> run a simple ANCOVA using the model y~A*x, where y and x are >> continuous and A has two levels. Everything seems fine until >> I compare the output with what I get from another statistical >> package (JMP). JMP has the model y=A+x+A*x (this should be >> the same as what I specified to R, correct?). In comparison >> I get the line equations >> >> y = 7072.09-1024.94 x (for A=0) and >> y = 7875.58-970.088 x (for A=1) >> >> from JMP. And from R I get >> >> y = 6276.7-1259.8 x (for A=0) and >> y = 7867.5-1150.1 x (for A=1). >> >> Obviously, these aren't even close to the same answer. I've >> tried this using glm(), lm(), and aov() and as expected they >> all give the same answer. If I do >> >> >levels(A) >> [1] "2" "4" >> >> which are the two levels of A. Why can't I get the same >> answer from JMP as in R? This is very disturbing to me! >> >> Thanks, >> Ben >> >> >> ------------------------------- >> Benjamin Ridenhour >> School of Biological Sciences >> Washigton State University >> P.O. Box 644236 >> Pullman, WA 99164-4236 >> Phone (509)335-7218 >> -------------------------------- >> "Nothing in biology makes sense except in the light of evolution." >> -T. Dobzhansky >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >> >> > > > ------------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Berwin A Turlach
2006-Mar-01 08:01 UTC
[R] Help - lm, glm, aov results inconsistent with other stati stical package
>>>>> "BR" == Ben Ridenhour <snake8mynewt at yahoo.com> writes:BR> Alright, I'll try to give some sample code. BR> # create A with 2 levels - 2 and 4 O.k., let us add a> set.seed(1)to make thinks reproducible. BR> > A<-c(rep(2,times=30),rep(4,times=42)) BR> # make A a factor BR> > A<-as.factor(A) BR> #generate 72 random x points BR> > x<-rnorm(72) BR> #create different slopes & intercepts for A=2 and A=4 BR> #add a random error term BR> > y<-c(x[(1:30)]+1,2*x[(31:72)]-2)+rnorm(72,mean=0,sd=2) BR> #use model y~A*x for lm (or glm) BR> > test.lm<-lm(y~A*x) BR> This essentially creates something like my data set and uses BR> the same model. In response to (1), I was just using 0/1 BR> because these are codings for the 2 levels, correct? Actually, no, the internal coding of the factor is with 1 and 2:> str(A)Factor w/ 2 levels "2","4": 1 1 1 1 1 1 1 1 1 1 ...> unique(as.numeric(A))[1] 1 2 and how factors are internally coded is documented on the help page of `factor'. BR> In response to (2), yes, I do want different slopes for the BR> two categories (that is what I am interested in testing). Whether you get estimates of different slopes does not depend on how the factor is internally coded but how your design matrix is constructed given your model specification and how the factor is coded when the design matrix is created. R has plenty of options here. In your above example, you created data from 1+x for one group and -2+2*x for the other groups. Assuming that you have not changed the way R treats factors, some possibilities would be:> lm(y~x*A)Call: lm(formula = y ~ x * A) Coefficients: (Intercept) x A4 x:A4 0.9515 1.2449 -3.1825 0.8744 In this case, the first and second values are estimates for the intercept and the slope, respectively, of the regression line in the first group. The other two values are the estimates for the *differences* in intercept and slope for the two groups.> lm(y~A/x)Call: lm(formula = y ~ A/x) Coefficients: (Intercept) A4 A2:x A4:x 0.9515 -3.1825 1.2449 2.1193 In this case, the first value is the estimate for the intercept in the first group. The second value is the *difference* in intercept for the two regression lines. The last two values are estimates for the slopes in the two groups.> lm(y~A/(x-1))Call: lm(formula = y ~ A/(x - 1)) Coefficients: A2 A4 A2:x A4:x 0.9515 -2.2309 1.2449 2.1193 In this case, the first two values are the estimates for the intercepts for the two regression lines and the last two values give the two slopes. Quite reasonable estimates in my book. BR> If I export the data created above to JMP and run what I BR> believe to be the same model, I get a different answer for my BR> equations :( Presumably because you are using different parameterisations and you interpret the estimates incorrectly. To interpret the output, you have to know what parameters your estimates are estimating. I do not know what JMP is doing or what kind of parameterisation it uses given a certain model specification. >> [...] Why can't I get the same answer from JMP as in R? This >> is very disturbing to me! Rest assured, no matter how disturbing this might be for you, it is much more disturbing for us to see people using statistical packages without knowing what they are doing. :) And I would find it particular disturbing if there is no statistician (or someone with sufficient statistical training) at Washigton State University who could explain all this to you. Best wishes, Berwin ========================== Full address ===========================Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Mathematics and Statistics +61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009 e-mail: berwin at maths.uwa.edu.au Australia http://www.maths.uwa.edu.au/~berwin