john.maindonald@anu.edu.au
2000-Jul-12 04:41 UTC
[Rd] Faulty handling of aliasing in predict.lm; wrong constant (PR#600)
There are several residual problems in predict.lm (1) The constant is wrongly set; it should be the mean of the values of the y-variable. (2) Standard errors are wrongly calculated when columns of X are aliased, or when part of a term is aliased. (3) There is an associated column labelling issue. I have sent Thomas Lumley code that fixes these problems, and replaces looping by rows with matrix operations. There is an issue of whether predicted values should be set to zero or to NA for terms that are totally aliased.> "xx" <-+ structure(list(fac1 = structure(c(1, 2, 3, 1, 2, 3, 1, 2, 3, + 1, 2, 3, 1, 2, 3, 1, 2, 3), .Label = c("1", "2", "3"), class = "factor"), + fac2 = structure(c(1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 3, 3, 1, + 3, 3, 1, 3, 3), .Label = c("1", "2", "3"), class = "factor"), + x1 = c(2, 6, 6, 8, 8, 10, 16, 6, 2, 6, 18, 4, 10, 6, 4, 10, + 14, 4), x2 = c(6, 25, 9, 8, 31, 29, 40, 8, 8, 6, 21, 20, + 22, 5, 15, 30, 22, 10), alias = c(7, 28, 12, 12, 35, 34, + 48, 11, 9, 9, 30, 22, 27, 8, 17, 35, 29, 12), x4 = c(16, + 48, 34, 46, 62, 73, 205, 36, 30, 40, 193, 43, 65, 44, 27, + 72, 98, 17), y = c(27, 103, 38, 35, 126, 119, 311, 34, 34, + 28, 95, 82, 92, 22, 62, 124, 94, 41)), .Names = c("fac1", + "fac2", "x1", "x2", "alias", "x4", "y"), row.names = c("1", "2", + "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", + "15", "16", "17", "18"), class = "data.frame")> > # fac2 is partly aliased with fac1 > # alias is aliased with x1 and x2 > > xxfac1 fac2 x1 x2 alias x4 y 1 1 1 2 6 7 16 27 2 2 2 6 25 28 48 103 3 3 2 6 9 12 34 38 4 1 1 8 8 12 46 35 5 2 2 8 31 35 62 126 6 3 2 10 29 34 73 119 7 1 1 16 40 48 205 311 8 2 2 6 8 11 36 34 9 3 2 2 8 9 30 34 10 1 1 6 6 9 40 28 11 2 3 18 21 30 193 95 12 3 3 4 20 22 43 82 13 1 1 10 22 27 65 92 14 2 3 6 5 8 44 22 15 3 3 4 15 17 27 62 16 1 1 10 30 35 72 124 17 2 3 14 22 29 98 94 18 3 3 4 10 12 17 41> xx.lm<-lm(y~x1+x2+alias+x4,data=xx) > coef(xx.lm)(Intercept) x1 x2 alias x4 -1.9862799 -7.9406705 4.6773165 NA 0.9931177> xx0.lm<-lm(y~x1+x2+x4,data=xx) > > predict(xx0.lm,se=T)$se[1:2][1] 9.874704 8.793774> predict(xx.lm,se=T)$se[1:2] # Should be the same as above[1] 10.939605 9.329757> > predict(xx.lm,type="terms")[1:2,]x1 x2 alias x4 1 45.87943 -53.78914 -47.50413 NA 2 14.11675 35.07987 -15.72436 NA> > attributes(predict(xx.lm,type="terms"))$constant(Intercept) -1.98628> > predict(xx0.lm,type="terms",se=T)$se[1:2,]x1 x2 x4 1 17.820298 7.791400 12.783126 2 5.483169 5.081348 4.231348> predict(xx.lm,type="terms",se=T)$se[1:2,] # Should be the same as abovex1 x2 alias x4 1 9.613248e+15 3.826812e+16 12.797489 NA 2 2.957923e+15 2.495747e+16 4.236103 NA> > xxfac1 fac2 x1 x2 alias x4 y 1 1 1 2 6 7 16 27 2 2 2 6 25 28 48 103 3 3 2 6 9 12 34 38 4 1 1 8 8 12 46 35 5 2 2 8 31 35 62 126 6 3 2 10 29 34 73 119 7 1 1 16 40 48 205 311 8 2 2 6 8 11 36 34 9 3 2 2 8 9 30 34 10 1 1 6 6 9 40 28 11 2 3 18 21 30 193 95 12 3 3 4 20 22 43 82 13 1 1 10 22 27 65 92 14 2 3 6 5 8 44 22 15 3 3 4 15 17 27 62 16 1 1 10 30 35 72 124 17 2 3 14 22 29 98 94 18 3 3 4 10 12 17 41 --please do not edit the information below-- Version: platform = Windows arch = x86 os = Win32 system = x86, Win32 status = major = 1 minor = 1.0 year = 2000 month = June day = 15 language = R Windows 9x 4.10 (build 2222) A Search Path: .GlobalEnv, Autoloads, package:base John Maindonald email : john.maindonald@anu.edu.au Statistical Consulting Unit, phone : (6249)3998 c/o CMA, SMS, fax : (6249)5549 John Dedman Mathematical Sciences Building Australian National University Canberra ACT 0200 Australia -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._