Jenny Hodgson
2007-Mar-13 15:43 UTC
[R] inconsistent behaviour of add1 and drop1 with a weighted linear model
Dear R Help, I have noticed some inconsistent behaviour of add1 and drop1 with a weighted linear model, which affects the interpretation of the results. I have these data to fit with a linear model, I want to weight them by the relative size of the geographical areas they represent. _________________________________________________________________________________________ > example y x1 x2 weights 1 -4.546477 0.1859556 50.00000 0.9466022 2 1.484246 0.4740497 29.88000 1.3252430 3 2.203681 0.8594264 16.93333 0.9466022 4 1.943163 0.8713360 42.11765 2.7766997 5 1.886473 0.9006082 19.00000 0.9466022 6 1.795393 0.8455183 23.68421 1.1674760 7 1.878077 0.5641396 35.00000 0.8203885 8 -4.215484 0.4356858 58.75000 0.4417477 9 1.993339 0.5440061 19.28571 0.8519420 10 1.560869 0.6285066 19.54545 0.8203885 11 2.761535 0.7017720 15.83333 0.1262136 12 0.995959 0.4638751 0.00000 0.9466022 13 -4.516514 0.2794199 77.85714 0.8834954 > sum(example$weights) [1] 13.00000 > model<-lm(y~1,data=example,weights=weights) > add1(model,.~.+x1+x2) Single term additions Model: y ~ 1 Df Sum of Sq RSS AIC <none> 94.000 27.719 x1 1 55.290 38.710 18.185 x2 1 58.630 35.371 17.012 > addterm(model,.~.+x1+x2) Single term additions Model: y ~ 1 Df Sum of Sq RSS AIC <none> 94.000 27.719 x1 1 55.290 38.710 18.185 x2 1 58.630 35.371 17.012 #so add1 and addterm (MASS package) give the same answer, nothing obviously untoward, but #both SS and RSS change when you do... > model<-lm(y~x1,data=example,weights=weights) > drop1(model) Single term deletions Model: y ~ x1 Df Sum of Sq RSS AIC <none> 30.164 14.942############I would expect RSS to be 38.710 x1 1 44.377 74.541 24.703############I would expect SS to be 55.290 and RSS 94.000 #{3}# > model<-lm(y~x2,data=example,weights=weights) > drop1(model) Single term deletions Model: y ~ x2 Df Sum of Sq RSS AIC <none> 37.922 17.918 x2 1 36.619 74.541 24.703 #{1}# #not only are SS and RSS different, the relative importance of x1 and x2 has swapped! I have checked that this #does not happen if weights are not used (everything adds up as expected, but the null RSS and other SSs are not the same as any quoted here). #The inconsistency is still there if you are not adding to a null model: #NB I realise that x1 and x2 are correlated so whichever is last to be added appears to have a lower SS - this is not the issue > add1(model,.~.+x1+x2)#This or... Single term additions Model: y ~ x2 Df Sum of Sq RSS AIC <none> 35.371 17.012 x1 1 18.576 16.794 9.329 > addterm(model,.~x1+x2)# ...this is inconsistent with: Single term additions Model: y ~ x2 Df Sum of Sq RSS AIC <none> 35.371 17.012 x1 1 18.576 16.794 9.329 > drop1(update(model,.~.+x1))#this or... Single term deletions Model: y ~ x2 + x1 Df Sum of Sq RSS AIC <none> 14.456 7.380 x2 1 15.708 30.164 14.942 #{4}# x1 1 23.466 37.922 17.918 #{2}# > dropterm(update(model,.~.+x1))#...this Single term deletions Model: y ~ x2 + x1 Df Sum of Sq RSS AIC <none> 14.456 7.380 x2 1 15.708 30.164 14.942 x1 1 23.466 37.922 17.918 #and the same thing happens with the x2 variable - compare below with above > add1(update(model,.~x1),.~x1+x2) Single term additions Model: y ~ x1 Df Sum of Sq RSS AIC <none> 38.710 18.185 x2 1 21.916 16.794 9.329 > addterm(update(model,.~x1),.~x1+x2) Single term additions Model: y ~ x1 Df Sum of Sq RSS AIC <none> 38.710 18.185 x2 1 21.916 16.794 9.329 #Why do I think add1/addterm are the problem rather than drop1/dropterm? #Because drop1/dropterm agree with the anova: > model<-lm(y~x2+x1,data=example,weights=weights) > anova(model) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x2 1 36.619 36.619 25.331 0.0005119 *** ####this agrees with #{1}# above x1 1 23.466 23.466 16.233 0.0024035 ** ####this agrees with #{2}# above Residuals 10 14.456 1.446 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > model<-lm(y~x1+x2,data=example,weights=weights) > anova(model) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x1 1 44.377 44.377 30.698 0.0002474 *** ####this agrees with #{3}# above x2 1 15.708 15.708 10.866 0.0080633 ** ####this agrees with #{4}# above Residuals 10 14.456 1.446 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ___________________________________________________________________________________________ My question is, why does this inconsistency happen? And is it safe to assume that anova and drop1/dropterm are giving me the answers I want, therefore to explore my model with these functions and avoid using add1/addterm? HUGE thanks for reading to the end! I would be extremely grateful if someone could help me with this problem. I wasn't able to find any clues in the docs or the r-help archives (but perhaps as it's a complex problem I wasn't using the right search terms, if so, apologies) Best wishes Jenny _____________ Jenny Hodgson Department of Biology - area 18 PO box 373 University of York YO10 5YW UK Tel: 01904 328623
Peter Dalgaard
2007-Mar-13 16:35 UTC
[R] inconsistent behaviour of add1 and drop1 with a weighted linear model
Jenny Hodgson wrote:> Dear R Help, > I have noticed some inconsistent behaviour of add1 and drop1 with a > weighted linear model, which affects the interpretation of the results. > I have these data to fit with a linear model, I want to weight them by > the relative size of the geographical areas they represent. > _________________________________________________________________________________________ > > example > y x1 x2 weights > 1 -4.546477 0.1859556 50.00000 0.9466022 > 2 1.484246 0.4740497 29.88000 1.3252430 > 3 2.203681 0.8594264 16.93333 0.9466022 > 4 1.943163 0.8713360 42.11765 2.7766997 > 5 1.886473 0.9006082 19.00000 0.9466022 > 6 1.795393 0.8455183 23.68421 1.1674760 > 7 1.878077 0.5641396 35.00000 0.8203885 > 8 -4.215484 0.4356858 58.75000 0.4417477 > 9 1.993339 0.5440061 19.28571 0.8519420 > 10 1.560869 0.6285066 19.54545 0.8203885 > 11 2.761535 0.7017720 15.83333 0.1262136 > 12 0.995959 0.4638751 0.00000 0.9466022 > 13 -4.516514 0.2794199 77.85714 0.8834954 > > sum(example$weights) > [1] 13.00000 > > > model<-lm(y~1,data=example,weights=weights) > > add1(model,.~.+x1+x2) > Single term additions > > Model: > y ~ 1 > Df Sum of Sq RSS AIC > <none> 94.000 27.719 > x1 1 55.290 38.710 18.185 > x2 1 58.630 35.371 17.012 >Which version of R??! I get (2.4.1 on Fedora 6): > add1(model,.~.+x1+x2) Single term additions Model: y ~ 1 Df Sum of Sq RSS AIC <none> 74.541 24.703 x1 1 44.377 30.164 14.942 x2 1 36.619 37.922 17.918
Bojanowski, M.J. (Michal)
2007-Mar-13 17:01 UTC
[R] inconsistent behaviour of add1 and drop1 with a weighted linear model
Hmmm, this is the same I got> add1(model,.~.+x1+x2)Single term additions Model: y ~ 1 Df Sum of Sq RSS AIC <none> 74.541 24.703 x1 1 44.377 30.164 14.942 x2 1 36.619 37.922 17.918 R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: LC_COLLATE=Polish_Poland.1250;LC_CTYPE=Polish_Poland.1250;LC_MONETARY=Po lish_Poland.1250;LC_NUMERIC=C;LC_TIME=Polish_Poland.1250 attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base">*** Note that my e-mail address has changed to m.j.bojanowski at uu.nl *** Please update your address books accordingly. Thank you! _________________________________________ Michal Bojanowski ICS / Sociology Utrecht University Heidelberglaan 2; 3584 CS Utrecht Room 1428 m.j.bojanowski at uu.nl http://www.fss.uu.nl/soc/bojanowski -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Peter Dalgaard Sent: Tuesday, March 13, 2007 5:35 PM To: Jenny Hodgson Cc: r-help at stat.math.ethz.ch Subject: Re: [R] inconsistent behaviour of add1 and drop1 with a weighted linear model Jenny Hodgson wrote:> Dear R Help, > I have noticed some inconsistent behaviour of add1 and drop1 with a > weighted linear model, which affects the interpretation of theresults.> I have these data to fit with a linear model, I want to weight them by> the relative size of the geographical areas they represent. > ______________________________________________________________________ > ___________________ > > example > y x1 x2 weights > 1 -4.546477 0.1859556 50.00000 0.9466022 > 2 1.484246 0.4740497 29.88000 1.3252430 > 3 2.203681 0.8594264 16.93333 0.9466022 > 4 1.943163 0.8713360 42.11765 2.7766997 > 5 1.886473 0.9006082 19.00000 0.9466022 > 6 1.795393 0.8455183 23.68421 1.1674760 > 7 1.878077 0.5641396 35.00000 0.8203885 > 8 -4.215484 0.4356858 58.75000 0.4417477 > 9 1.993339 0.5440061 19.28571 0.8519420 > 10 1.560869 0.6285066 19.54545 0.8203885 > 11 2.761535 0.7017720 15.83333 0.1262136 > 12 0.995959 0.4638751 0.00000 0.9466022 > 13 -4.516514 0.2794199 77.85714 0.8834954 > sum(example$weights) [1] > 13.00000 > > > model<-lm(y~1,data=example,weights=weights) > > add1(model,.~.+x1+x2) > Single term additions > > Model: > y ~ 1 > Df Sum of Sq RSS AIC > <none> 94.000 27.719 > x1 1 55.290 38.710 18.185 > x2 1 58.630 35.371 17.012 >Which version of R??! I get (2.4.1 on Fedora 6): > add1(model,.~.+x1+x2) Single term additions Model: y ~ 1 Df Sum of Sq RSS AIC <none> 74.541 24.703 x1 1 44.377 30.164 14.942 x2 1 36.619 37.922 17.918 ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
hadley wickham
2007-Mar-13 23:46 UTC
[R] inconsistent behaviour of add1 and drop1 with a weighted linear model
On 3/13/07, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:> On 3/13/2007 3:14 PM, hadley wickham wrote: > > On 3/13/07, Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote: > >> Jenny Hodgson wrote: > >>> I was using Version 2.3.1 for Windows (binary download version). Guess I > >>> should be using the latest version, (but the reason is I'm writing up my > >>> PhD and I thought my results would be more 'repeatable' if I didn't keep > >>> changing my version of the software, I didn't really think there would > >>> be any glitches as big as this). Sorry if this is a waste of your time. > >>> And thanks very much for replying so quickly. > >>> > >>> Jenny > >>> > >>> > >> Always a good idea to check the NEWS file: > >> > >> o add1.lm() had been broken by other changes for weighted fits. > > > > Yes, but a little impractical. Has any one ever considered a standard > > for the news file so changes could be extracted and included in online > > documentation somewhere? > > It's in a structured format, and is online. The readNEWS() function can > read it. It would be nice if someone would contribute a more friendly > reader... > > Try > > readNEWS(url("http://cran.r-project.org/bin/windows/base/NEWS.R-2.5.0dev"), > chop="keepAll")[[c("2.5","2.5.0","BUG FIXES")]] > > for the unfriendly but informative version.Oh, that's neat. One thing that would be very useful would be to figure out what (if any) functions an news item refers to. A grep against all function names in base packages wouldn't be too hard to do. I'll think about it, and how to render it nicely in a webpage. Hadley