Dear R Help Team.
I get some weird results when I use the lm function with weight. The issue
can be reproduced by the example below:
The input data is (weights are intentionally designed to reflect some
structures in the data)
> df
y x weight
1.51156139 0.55209240 2.117337e-34
-0.63653132 -0.12599316 2.117337e-34
0.37782776 0.42095384 4.934135e-31
3.03792318 1.40315446 2.679495e-24
1.53646523 0.46076858 2.679495e-24
-2.37727874 -0.73963576 6.244160e-21
0.37183065 0.20407468 1.455107e-17
-1.53917553 -0.95519361 1.455107e-17
1.10926675 0.03897129 3.390908e-14
-0.37786333 -0.17523593 3.390908e-14
2.43973603 0.97970095 7.902000e-11
-0.35432394 -0.03742559 7.902000e-11
2.19296613 1.00355263 4.289362e-04
0.49845532 0.34816207 4.289362e-04
1.25005260 0.76306225 5.000000e-01
0.84360691 0.45152356 5.000000e-01
0.29565993 0.53880068 5.000000e-01
-0.54081334 -0.28104525 5.000000e-01
0.83612836 -0.12885659 9.995711e-01
-1.42526769 -0.87107631 9.999998e-01
0.10204789 -0.11649899 1.000000e+00
1.14292898 0.37249631 1.000000e+00
-3.02942081 -1.28966997 1.000000e+00
-1.37549764 -0.74676145 1.000000e+00
-2.00118016 -0.55182759 1.000000e+00
-4.24441674 -1.94603608 1.000000e+00
1.17168144 1.00868008 1.000000e+00
2.64007761 1.26333069 1.000000e+00
1.98550114 1.18509599 1.000000e+00
-0.58941683 -0.61972416 9.999998e-01
-4.57559611 -2.30914920 9.995711e-01
-0.82610544 -0.39347576 9.995711e-01
-0.02768220 0.20076910 9.995711e-01
0.78186399 0.25690215 9.995711e-01
-0.88314153 -0.20200148 5.000000e-01
-4.17076452 -2.03547588 5.000000e-01
0.93373070 0.54190626 4.289362e-04
-0.08517734 0.17692491 4.289362e-04
-4.47546619 -2.14876688 4.289362e-04
-1.65509103 -0.76898087 4.289362e-04
-0.39403030 -0.12689705 4.289362e-04
0.01203300 -0.18689898 1.841442e-07
-4.82762639 -2.31391121 1.841442e-07
-0.72658380 -0.39751171 3.397282e-14
-2.35886866 -1.01082109 0.000000e+00
-2.03762707 -0.96439902 0.000000e+00
0.90115123 0.60172286 0.000000e+00
1.55999194 0.83433953 0.000000e+00
3.07994058 1.30942776 0.000000e+00
1.78871462 1.10605530 0.000000e+00
Running simple linear model returns:
> lm(y~x,data=df)
Call:
lm(formula = y ~ x, data = df)
Coefficients:
(Intercept) x
-0.04173 2.03790
and> max(resid(lm(y~x,data=df)))
[1] 1.14046
*HOWEVER if I use the weighted model then:*
lm(formula = y ~ x, data = df, weights = df$weights)
Coefficients:
(Intercept) x
-0.05786 1.96087
and> max(resid(lm(y~x,data=df,weights=df$weights)))
[1] 60.91888
as you see, the estimation of the coefficients are nearly the same but the
resid() function returns a giant residual (I have some cases where the
value is much much higher). Further, if I calculate the residuals by
simply predict(lm(y~x,data=df,weights=df$weights))-df$y then I get the true
value for the residuals.
Thanks.
Please do not hesitate to contact me for more details.
Regards,
Hamed.
[[alternative HTML version deleted]]