Bliese, Paul D MAJ WRAIR-Wash DC
2002-Nov-21 19:08 UTC
[R] Analysis of Data with Observation Weights Revisited
In lm and glm, the weights command should only be used when the variances of the observations are inversely proportional to the weights. Recently, however, a question came up regarding how one could estimate lm and glm models with weights where the weights refer to number of observations (counts). Because lm and glm do not handle this case, SE values will be wrong if one uses the weight command with counts. Notice that the SE values in the second model are larger than those in the first model, though paramater estimates are identical.> x <- rep(c(1,2), c(6,4)) > y <- rep(c(1,2,3,4),c(2,3,3,2)) > cbind(x,y)x y [1,] 1 1 [2,] 1 1 [3,] 1 2 [4,] 1 2 [5,] 1 2 [6,] 1 3 [7,] 2 3 [8,] 2 3 [9,] 2 4 [10,] 2 4> summary(m <- lm(y~x))$coefEstimate Std. Error t value Pr(>|t|) (Intercept) 0.1666667 0.6627489 0.2514778 0.807784044 x 1.6666667 0.4468252 3.7300192 0.005787586> > x1 <- rep(c(1,2), c(3,2)) > y1 <- rep(c(1,2,3,4), c(1,1,2,1)) > w <- c(2,3,1,2,2) > cbind(x1,y1,w)x1 y1 w [1,] 1 1 2 [2,] 1 2 3 [3,] 1 3 1 [4,] 2 3 2 [5,] 2 4 2> > summary(m1 <- lm(y1~x1, weights=w))$coefEstimate Std. Error t value Pr(>|t|) (Intercept) 0.1666667 1.0822644 0.1539981 0.8873876 x1 1.6666667 0.7296625 2.2841610 0.1065266 The thread left off with Thomas Lumley recommending that one use gee to estimate the SE values while sticking with the parameter estimates from the weighted lm or glm model. Specifically something like, mod<-gee(y1~x1,id=1:5, weights=w) #were each observation has its own id. The problem with this solution is that gee does NOT allow weights (probably something that would be fairly simple to modify); thus, the gee option doesn't really work. For those who are interested, one work around is to use the Thomas Lumley's infjack.glm function in the glmerrs.R file at http://faculty.washington.edu/tlumley/glmerrs.R and the code is all in the WEAVE package at http://faculty.washington.edu/tlumley/weave.html In the example, notice that the SE estimates are much closer to the SE values in the original model without weights.> model<-glm(y1~x1,weights=w) > varmat<-infjack.glm(model, groups=1:5) > se<-sqrt(diag(varmat)) > se(Intercept) x1 0.7827224 0.4969040 MAJ Paul Bliese, Ph.D. Walter Reed Army Institute of Research Phone: (301) 319-9873 Fax: (301) 319-9484 paul.bliese at na.amedd.army.mil -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._