Dear Hamed, When you post a question to r-help, generally you should cc subsequent messages there as well, as I've done to this response. The algorithm that lm() uses is much more numerically stable than inverting the weighted sum-of-squares-and-product matrix. If you want to see how the computations are done, look at lm.wfit(), in which the residuals and fits are computed as z$residuals <- z$residuals/wts z$fitted.values <- y - z$residuals Zero weights are handled specially, and your tiny weights are thus the source of the problem. When you divide by a number less than the machine double-epsilon, you can't expect numerically stable results. I suppose that lm.wfit() could check for 0 weights to a tolerance rather than exactly. John> -----Original Message----- > From: Hamed Ha [mailto:hamedhaseli at gmail.com] > Sent: Friday, September 14, 2018 5:34 PM > To: Fox, John <jfox at mcmaster.ca> > Subject: Re: [R] Problem with lm.resid() when weights are provided > > Hi John, > > Thank you for your reply. > > I agree that the small weights are the potential source of the instability in the > result. I also suspected that there are some failure/bugs in the actual > algorithm that R uses for fitting the model. I remember that at some points I > checked the theoretical estimation of the parameters, solve(t(x) %*% w %*% > x) %*% t(x) %*% w %*% y, (besides the point that I had to set tol parameter in > solve() to a super small value) and realised that lm() and the theoretical > results match together. That is the parameter estimation is right in R. > Moreover, I checked the predictions, predict(lm.fit), and it was right. Then the > only source of error remained was resid() function. I further checked this > function and it is nothing more than calling a sub-element from and lm() fit. > Putting all together, I think that there is something wrong/bug/miss- > configuration in the lm() algorithm and I highly recommend the R core team to > fix that. > > Please feel free to contact me for more details if required. > > Warm regards, > Hamed. > > > > > > > > > > On Fri, 14 Sep 2018 at 13:35, Fox, John <jfox at mcmaster.ca > <mailto:jfox at mcmaster.ca> > wrote: > > > Dear Hamed, > > I don't think that anyone has picked up on this problem. > > What's peculiar about your weights is that several are 0 within > rounding error but not exactly 0: > > > head(df) > y x weight > 1 1.5115614 0.5520924 2.117337e-34 > 2 -0.6365313 -0.1259932 2.117337e-34 > 3 0.3778278 0.4209538 4.934135e-31 > 4 3.0379232 1.4031545 2.679495e-24 > 5 1.5364652 0.4607686 2.679495e-24 > 6 -2.3772787 -0.7396358 6.244160e-21 > > I can reproduce the results that you report: > > > (mod.1 <- lm(y ~ x, data=df)) > > Call: > lm(formula = y ~ x, data = df) > > Coefficients: > (Intercept) x > -0.04173 2.03790 > > > max(resid(mod.1)) > [1] 1.14046 > > (mod.2 <- lm(y ~ x, data=df, weights=weight)) > > Call: > lm(formula = y ~ x, data = df, weights = weight) > > Coefficients: > (Intercept) x > -0.05786 1.96087 > > > max(resid(mod.2)) > [1] 36.84939 > > But the problem disappears when the tiny nonzero weight are set to 0: > > > df2 <- df > > df2$weight <- zapsmall(df2$weight) > > head(df2) > y x weight > 1 1.5115614 0.5520924 0 > 2 -0.6365313 -0.1259932 0 > 3 0.3778278 0.4209538 0 > 4 3.0379232 1.4031545 0 > 5 1.5364652 0.4607686 0 > 6 -2.3772787 -0.7396358 0 > > (mod.3 <- update(mod.2, data=df2)) > > Call: > lm(formula = y ~ x, data = df2, weights = weight) > > Coefficients: > (Intercept) x > -0.05786 1.96087 > > > max(resid(mod.3)) > [1] 1.146663 > > I don't know exactly why this happens, but suspect numerical > instability produced by the near-zero weights, which are smaller than the > machine double-epsilon > > > .Machine$double.neg.eps > [1] 1.110223e-16 > > The problem also disappears, e.g., if the tiny weight are set to 1e-15 > rather than 0. > > I hope this helps, > John > > ----------------------------------------------------------------- > John Fox > Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > Web: https://socialsciences.mcmaster.ca/jfox/ > > > > > -----Original Message----- > > From: R-help [mailto:r-help-bounces at r-project.org <mailto:r-help- > bounces at r-project.org> ] On Behalf Of Hamed Ha > > Sent: Tuesday, September 11, 2018 8:39 AM > > To: r-help at r-project.org <mailto:r-help at r-project.org> > > Subject: [R] Problem with lm.resid() when weights are provided > > > > Dear R Help Team. > > > > I get some weird results when I use the lm function with weight. The > issue can > > be reproduced by the example below: > > > > > > The input data is (weights are intentionally designed to reflect some > > structures in the data) > > > > > > > df > > y x weight > > 1.51156139 0.55209240 2.117337e-34 > > -0.63653132 -0.12599316 2.117337e-34 > > 0.37782776 0.42095384 4.934135e-31 > > 3.03792318 1.40315446 2.679495e-24 > > 1.53646523 0.46076858 2.679495e-24 > > -2.37727874 -0.73963576 6.244160e-21 > > 0.37183065 0.20407468 1.455107e-17 > > -1.53917553 -0.95519361 1.455107e-17 > > 1.10926675 0.03897129 3.390908e-14 > > -0.37786333 -0.17523593 3.390908e-14 > > 2.43973603 0.97970095 7.902000e-11 > > -0.35432394 -0.03742559 7.902000e-11 > > 2.19296613 1.00355263 4.289362e-04 > > 0.49845532 0.34816207 4.289362e-04 > > 1.25005260 0.76306225 5.000000e-01 > > 0.84360691 0.45152356 5.000000e-01 > > 0.29565993 0.53880068 5.000000e-01 > > -0.54081334 -0.28104525 5.000000e-01 > > 0.83612836 -0.12885659 9.995711e-01 > > -1.42526769 -0.87107631 9.999998e-01 > > 0.10204789 -0.11649899 1.000000e+00 > > 1.14292898 0.37249631 1.000000e+00 > > -3.02942081 -1.28966997 1.000000e+00 > > -1.37549764 -0.74676145 1.000000e+00 > > -2.00118016 -0.55182759 1.000000e+00 > > -4.24441674 -1.94603608 1.000000e+00 > > 1.17168144 1.00868008 1.000000e+00 > > 2.64007761 1.26333069 1.000000e+00 > > 1.98550114 1.18509599 1.000000e+00 > > -0.58941683 -0.61972416 9.999998e-01 > > -4.57559611 -2.30914920 9.995711e-01 > > -0.82610544 -0.39347576 9.995711e-01 > > -0.02768220 0.20076910 9.995711e-01 > > 0.78186399 0.25690215 9.995711e-01 > > -0.88314153 -0.20200148 5.000000e-01 > > -4.17076452 -2.03547588 5.000000e-01 > > 0.93373070 0.54190626 4.289362e-04 > > -0.08517734 0.17692491 4.289362e-04 > > -4.47546619 -2.14876688 4.289362e-04 > > -1.65509103 -0.76898087 4.289362e-04 > > -0.39403030 -0.12689705 4.289362e-04 > > 0.01203300 -0.18689898 1.841442e-07 > > -4.82762639 -2.31391121 1.841442e-07 > > -0.72658380 -0.39751171 3.397282e-14 > > -2.35886866 -1.01082109 0.000000e+00 > > -2.03762707 -0.96439902 0.000000e+00 > > 0.90115123 0.60172286 0.000000e+00 > > 1.55999194 0.83433953 0.000000e+00 > > 3.07994058 1.30942776 0.000000e+00 > > 1.78871462 1.10605530 0.000000e+00 > > > > > > > > Running simple linear model returns: > > > > > lm(y~x,data=df) > > > > Call: > > lm(formula = y ~ x, data = df) > > > > Coefficients: > > (Intercept) x > > -0.04173 2.03790 > > > > and > > > max(resid(lm(y~x,data=df))) > > [1] 1.14046 > > > > > > *HOWEVER if I use the weighted model then:* > > > > lm(formula = y ~ x, data = df, weights = df$weights) > > > > Coefficients: > > (Intercept) x > > -0.05786 1.96087 > > > > and > > > max(resid(lm(y~x,data=df,weights=df$weights))) > > [1] 60.91888 > > > > > > as you see, the estimation of the coefficients are nearly the same > but the > > resid() function returns a giant residual (I have some cases where > the value is > > much much higher). Further, if I calculate the residuals by simply > > predict(lm(y~x,data=df,weights=df$weights))-df$y then I get the true > value for > > the residuals. > > > > > > Thanks. > > > > Please do not hesitate to contact me for more details. > > Regards, > > Hamed. > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible > code. >
H i John, Thank you for your reply. I see your point, thanks. I checked lm.wfit() and realised that there is a tol parameter that is already set to 10^-7. This is not even the half decimal to the machine precision. Furthermore, plying with tol parameter does not solve the problem, as far as I checked. I still see this issue as critical and we should report it to the R core team to be investigated more. What do you think? Regards, Hamed. On Fri, 14 Sep 2018 at 22:46, Fox, John <jfox at mcmaster.ca> wrote:> Dear Hamed, > > When you post a question to r-help, generally you should cc subsequent > messages there as well, as I've done to this response. > > The algorithm that lm() uses is much more numerically stable than > inverting the weighted sum-of-squares-and-product matrix. If you want to > see how the computations are done, look at lm.wfit(), in which the > residuals and fits are computed as > > z$residuals <- z$residuals/wts > z$fitted.values <- y - z$residuals > > Zero weights are handled specially, and your tiny weights are thus the > source of the problem. When you divide by a number less than the machine > double-epsilon, you can't expect numerically stable results. I suppose that > lm.wfit() could check for 0 weights to a tolerance rather than exactly. > > John > > > -----Original Message----- > > From: Hamed Ha [mailto:hamedhaseli at gmail.com] > > Sent: Friday, September 14, 2018 5:34 PM > > To: Fox, John <jfox at mcmaster.ca> > > Subject: Re: [R] Problem with lm.resid() when weights are provided > > > > Hi John, > > > > Thank you for your reply. > > > > I agree that the small weights are the potential source of the > instability in the > > result. I also suspected that there are some failure/bugs in the actual > > algorithm that R uses for fitting the model. I remember that at some > points I > > checked the theoretical estimation of the parameters, solve(t(x) %*% w > %*% > > x) %*% t(x) %*% w %*% y, (besides the point that I had to set tol > parameter in > > solve() to a super small value) and realised that lm() and the > theoretical > > results match together. That is the parameter estimation is right in R. > > Moreover, I checked the predictions, predict(lm.fit), and it was right. > Then the > > only source of error remained was resid() function. I further checked > this > > function and it is nothing more than calling a sub-element from and lm() > fit. > > Putting all together, I think that there is something wrong/bug/miss- > > configuration in the lm() algorithm and I highly recommend the R core > team to > > fix that. > > > > Please feel free to contact me for more details if required. > > > > Warm regards, > > Hamed. > > > > > > > > > > > > > > > > > > > > On Fri, 14 Sep 2018 at 13:35, Fox, John <jfox at mcmaster.ca > > <mailto:jfox at mcmaster.ca> > wrote: > > > > > > Dear Hamed, > > > > I don't think that anyone has picked up on this problem. > > > > What's peculiar about your weights is that several are 0 within > > rounding error but not exactly 0: > > > > > head(df) > > y x weight > > 1 1.5115614 0.5520924 2.117337e-34 > > 2 -0.6365313 -0.1259932 2.117337e-34 > > 3 0.3778278 0.4209538 4.934135e-31 > > 4 3.0379232 1.4031545 2.679495e-24 > > 5 1.5364652 0.4607686 2.679495e-24 > > 6 -2.3772787 -0.7396358 6.244160e-21 > > > > I can reproduce the results that you report: > > > > > (mod.1 <- lm(y ~ x, data=df)) > > > > Call: > > lm(formula = y ~ x, data = df) > > > > Coefficients: > > (Intercept) x > > -0.04173 2.03790 > > > > > max(resid(mod.1)) > > [1] 1.14046 > > > (mod.2 <- lm(y ~ x, data=df, weights=weight)) > > > > Call: > > lm(formula = y ~ x, data = df, weights = weight) > > > > Coefficients: > > (Intercept) x > > -0.05786 1.96087 > > > > > max(resid(mod.2)) > > [1] 36.84939 > > > > But the problem disappears when the tiny nonzero weight are set to > 0: > > > > > df2 <- df > > > df2$weight <- zapsmall(df2$weight) > > > head(df2) > > y x weight > > 1 1.5115614 0.5520924 0 > > 2 -0.6365313 -0.1259932 0 > > 3 0.3778278 0.4209538 0 > > 4 3.0379232 1.4031545 0 > > 5 1.5364652 0.4607686 0 > > 6 -2.3772787 -0.7396358 0 > > > (mod.3 <- update(mod.2, data=df2)) > > > > Call: > > lm(formula = y ~ x, data = df2, weights = weight) > > > > Coefficients: > > (Intercept) x > > -0.05786 1.96087 > > > > > max(resid(mod.3)) > > [1] 1.146663 > > > > I don't know exactly why this happens, but suspect numerical > > instability produced by the near-zero weights, which are smaller than the > > machine double-epsilon > > > > > .Machine$double.neg.eps > > [1] 1.110223e-16 > > > > The problem also disappears, e.g., if the tiny weight are set to > 1e-15 > > rather than 0. > > > > I hope this helps, > > John > > > > ----------------------------------------------------------------- > > John Fox > > Professor Emeritus > > McMaster University > > Hamilton, Ontario, Canada > > Web: https://socialsciences.mcmaster.ca/jfox/ > > > > > > > > > -----Original Message----- > > > From: R-help [mailto:r-help-bounces at r-project.org <mailto: > r-help- > > bounces at r-project.org> ] On Behalf Of Hamed Ha > > > Sent: Tuesday, September 11, 2018 8:39 AM > > > To: r-help at r-project.org <mailto:r-help at r-project.org> > > > Subject: [R] Problem with lm.resid() when weights are provided > > > > > > Dear R Help Team. > > > > > > I get some weird results when I use the lm function with weight. > The > > issue can > > > be reproduced by the example below: > > > > > > > > > The input data is (weights are intentionally designed to reflect > some > > > structures in the data) > > > > > > > > > > df > > > y x weight > > > 1.51156139 0.55209240 2.117337e-34 > > > -0.63653132 -0.12599316 2.117337e-34 > > > 0.37782776 0.42095384 4.934135e-31 > > > 3.03792318 1.40315446 2.679495e-24 > > > 1.53646523 0.46076858 2.679495e-24 > > > -2.37727874 -0.73963576 6.244160e-21 > > > 0.37183065 0.20407468 1.455107e-17 > > > -1.53917553 -0.95519361 1.455107e-17 > > > 1.10926675 0.03897129 3.390908e-14 > > > -0.37786333 -0.17523593 3.390908e-14 > > > 2.43973603 0.97970095 7.902000e-11 > > > -0.35432394 -0.03742559 7.902000e-11 > > > 2.19296613 1.00355263 4.289362e-04 > > > 0.49845532 0.34816207 4.289362e-04 > > > 1.25005260 0.76306225 5.000000e-01 > > > 0.84360691 0.45152356 5.000000e-01 > > > 0.29565993 0.53880068 5.000000e-01 > > > -0.54081334 -0.28104525 5.000000e-01 > > > 0.83612836 -0.12885659 9.995711e-01 > > > -1.42526769 -0.87107631 9.999998e-01 > > > 0.10204789 -0.11649899 1.000000e+00 > > > 1.14292898 0.37249631 1.000000e+00 > > > -3.02942081 -1.28966997 1.000000e+00 > > > -1.37549764 -0.74676145 1.000000e+00 > > > -2.00118016 -0.55182759 1.000000e+00 > > > -4.24441674 -1.94603608 1.000000e+00 > > > 1.17168144 1.00868008 1.000000e+00 > > > 2.64007761 1.26333069 1.000000e+00 > > > 1.98550114 1.18509599 1.000000e+00 > > > -0.58941683 -0.61972416 9.999998e-01 > > > -4.57559611 -2.30914920 9.995711e-01 > > > -0.82610544 -0.39347576 9.995711e-01 > > > -0.02768220 0.20076910 9.995711e-01 > > > 0.78186399 0.25690215 9.995711e-01 > > > -0.88314153 -0.20200148 5.000000e-01 > > > -4.17076452 -2.03547588 5.000000e-01 > > > 0.93373070 0.54190626 4.289362e-04 > > > -0.08517734 0.17692491 4.289362e-04 > > > -4.47546619 -2.14876688 4.289362e-04 > > > -1.65509103 -0.76898087 4.289362e-04 > > > -0.39403030 -0.12689705 4.289362e-04 > > > 0.01203300 -0.18689898 1.841442e-07 > > > -4.82762639 -2.31391121 1.841442e-07 > > > -0.72658380 -0.39751171 3.397282e-14 > > > -2.35886866 -1.01082109 0.000000e+00 > > > -2.03762707 -0.96439902 0.000000e+00 > > > 0.90115123 0.60172286 0.000000e+00 > > > 1.55999194 0.83433953 0.000000e+00 > > > 3.07994058 1.30942776 0.000000e+00 > > > 1.78871462 1.10605530 0.000000e+00 > > > > > > > > > > > > Running simple linear model returns: > > > > > > > lm(y~x,data=df) > > > > > > Call: > > > lm(formula = y ~ x, data = df) > > > > > > Coefficients: > > > (Intercept) x > > > -0.04173 2.03790 > > > > > > and > > > > max(resid(lm(y~x,data=df))) > > > [1] 1.14046 > > > > > > > > > *HOWEVER if I use the weighted model then:* > > > > > > lm(formula = y ~ x, data = df, weights = df$weights) > > > > > > Coefficients: > > > (Intercept) x > > > -0.05786 1.96087 > > > > > > and > > > > max(resid(lm(y~x,data=df,weights=df$weights))) > > > [1] 60.91888 > > > > > > > > > as you see, the estimation of the coefficients are nearly the > same > > but the > > > resid() function returns a giant residual (I have some cases > where > > the value is > > > much much higher). Further, if I calculate the residuals by > simply > > > predict(lm(y~x,data=df,weights=df$weights))-df$y then I get the > true > > value for > > > the residuals. > > > > > > > > > Thanks. > > > > > > Please do not hesitate to contact me for more details. > > > Regards, > > > Hamed. > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org <mailto:R-help at r-project.org> mailing > list -- > > To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > http://www.R-project.org/posting- > > > guide.html > > > and provide commented, minimal, self-contained, reproducible > > code. > > > >[[alternative HTML version deleted]]
Dear Hamed,> -----Original Message----- > From: Hamed Ha [mailto:hamedhaseli at gmail.com] > Sent: Monday, September 17, 2018 3:56 AM > To: Fox, John <jfox at mcmaster.ca> > Cc: r-help at r-project.org > Subject: Re: [R] Problem with lm.resid() when weights are provided > > H i John, > > > Thank you for your reply. > > > I see your point, thanks. I checked lm.wfit() and realised that there is a tol > parameter that is already set to 10^-7. This is not even the half decimal to the > machine precision. Furthermore, plying with tol parameter does not solve the > problem, as far as I checked.tol plays a different role in lm.wfit(). It's for the QR decomposition (done in C code), I suppose to determine the rank of the weighted model matrix. Generally in this kind of context, you'd use something like the square root of the machine double epsilon to define a number that's effectively 0, and the tolerance used here isn't too far off that -- about an order of magnitude larger. I'm not an expert in computer arithmetic or numerical linear algebra, so I don't have anything more to say about this.> > > I still see this issue as critical and we should report it to the R core team to be > investigated more. What do you think?I don't think that it's a critical issue because it isn't sensible to specify nonzero weights so close to 0. A simple solution is to change these weights to 0 in your code calling lm(). That said, I suppose that it might be better to make lm.wfit() more robust to near-zero weights. If you feel strongly about this, you can file a bug report, but I'm not interested in pursuing it. Best, John> > > Regards, > Hamed. > > > On Fri, 14 Sep 2018 at 22:46, Fox, John <jfox at mcmaster.ca > <mailto:jfox at mcmaster.ca> > wrote: > > > Dear Hamed, > > When you post a question to r-help, generally you should cc > subsequent messages there as well, as I've done to this response. > > The algorithm that lm() uses is much more numerically stable than > inverting the weighted sum-of-squares-and-product matrix. If you want to see > how the computations are done, look at lm.wfit(), in which the residuals and > fits are computed as > > z$residuals <- z$residuals/wts > z$fitted.values <- y - z$residuals > > Zero weights are handled specially, and your tiny weights are thus the > source of the problem. When you divide by a number less than the machine > double-epsilon, you can't expect numerically stable results. I suppose that > lm.wfit() could check for 0 weights to a tolerance rather than exactly. > > John > > > -----Original Message----- > > From: Hamed Ha [mailto:hamedhaseli at gmail.com > <mailto:hamedhaseli at gmail.com> ] > > Sent: Friday, September 14, 2018 5:34 PM > > To: Fox, John <jfox at mcmaster.ca <mailto:jfox at mcmaster.ca> > > > Subject: Re: [R] Problem with lm.resid() when weights are provided > > > > Hi John, > > > > Thank you for your reply. > > > > I agree that the small weights are the potential source of the > instability in the > > result. I also suspected that there are some failure/bugs in the actual > > algorithm that R uses for fitting the model. I remember that at some > points I > > checked the theoretical estimation of the parameters, solve(t(x) > %*% w %*% > > x) %*% t(x) %*% w %*% y, (besides the point that I had to set tol > parameter in > > solve() to a super small value) and realised that lm() and the > theoretical > > results match together. That is the parameter estimation is right in > R. > > Moreover, I checked the predictions, predict(lm.fit), and it was right. > Then the > > only source of error remained was resid() function. I further checked > this > > function and it is nothing more than calling a sub-element from and > lm() fit. > > Putting all together, I think that there is something wrong/bug/miss- > > configuration in the lm() algorithm and I highly recommend the R > core team to > > fix that. > > > > Please feel free to contact me for more details if required. > > > > Warm regards, > > Hamed. > > > > > > > > > > > > > > > > > > > > On Fri, 14 Sep 2018 at 13:35, Fox, John <jfox at mcmaster.ca > <mailto:jfox at mcmaster.ca> > > <mailto:jfox at mcmaster.ca <mailto:jfox at mcmaster.ca> > > wrote: > > > > > > Dear Hamed, > > > > I don't think that anyone has picked up on this problem. > > > > What's peculiar about your weights is that several are 0 within > > rounding error but not exactly 0: > > > > > head(df) > > y x weight > > 1 1.5115614 0.5520924 2.117337e-34 > > 2 -0.6365313 -0.1259932 2.117337e-34 > > 3 0.3778278 0.4209538 4.934135e-31 > > 4 3.0379232 1.4031545 2.679495e-24 > > 5 1.5364652 0.4607686 2.679495e-24 > > 6 -2.3772787 -0.7396358 6.244160e-21 > > > > I can reproduce the results that you report: > > > > > (mod.1 <- lm(y ~ x, data=df)) > > > > Call: > > lm(formula = y ~ x, data = df) > > > > Coefficients: > > (Intercept) x > > -0.04173 2.03790 > > > > > max(resid(mod.1)) > > [1] 1.14046 > > > (mod.2 <- lm(y ~ x, data=df, weights=weight)) > > > > Call: > > lm(formula = y ~ x, data = df, weights = weight) > > > > Coefficients: > > (Intercept) x > > -0.05786 1.96087 > > > > > max(resid(mod.2)) > > [1] 36.84939 > > > > But the problem disappears when the tiny nonzero weight are set > to 0: > > > > > df2 <- df > > > df2$weight <- zapsmall(df2$weight) > > > head(df2) > > y x weight > > 1 1.5115614 0.5520924 0 > > 2 -0.6365313 -0.1259932 0 > > 3 0.3778278 0.4209538 0 > > 4 3.0379232 1.4031545 0 > > 5 1.5364652 0.4607686 0 > > 6 -2.3772787 -0.7396358 0 > > > (mod.3 <- update(mod.2, data=df2)) > > > > Call: > > lm(formula = y ~ x, data = df2, weights = weight) > > > > Coefficients: > > (Intercept) x > > -0.05786 1.96087 > > > > > max(resid(mod.3)) > > [1] 1.146663 > > > > I don't know exactly why this happens, but suspect numerical > > instability produced by the near-zero weights, which are smaller > than the > > machine double-epsilon > > > > > .Machine$double.neg.eps > > [1] 1.110223e-16 > > > > The problem also disappears, e.g., if the tiny weight are set to 1e- > 15 > > rather than 0. > > > > I hope this helps, > > John > > > > ----------------------------------------------------------------- > > John Fox > > Professor Emeritus > > McMaster University > > Hamilton, Ontario, Canada > > Web: https://socialsciences.mcmaster.ca/jfox/ > > > > > > > > > -----Original Message----- > > > From: R-help [mailto:r-help-bounces at r-project.org <mailto:r- > help-bounces at r-project.org> <mailto:r-help- <mailto:r-help-> > > bounces at r-project.org <mailto:bounces at r-project.org> > ] On > Behalf Of Hamed Ha > > > Sent: Tuesday, September 11, 2018 8:39 AM > > > To: r-help at r-project.org <mailto:r-help at r-project.org> > <mailto:r-help at r-project.org <mailto:r-help at r-project.org> > > > > Subject: [R] Problem with lm.resid() when weights are provided > > > > > > Dear R Help Team. > > > > > > I get some weird results when I use the lm function with weight. > The > > issue can > > > be reproduced by the example below: > > > > > > > > > The input data is (weights are intentionally designed to reflect > some > > > structures in the data) > > > > > > > > > > df > > > y x weight > > > 1.51156139 0.55209240 2.117337e-34 > > > -0.63653132 -0.12599316 2.117337e-34 > > > 0.37782776 0.42095384 4.934135e-31 > > > 3.03792318 1.40315446 2.679495e-24 > > > 1.53646523 0.46076858 2.679495e-24 > > > -2.37727874 -0.73963576 6.244160e-21 > > > 0.37183065 0.20407468 1.455107e-17 > > > -1.53917553 -0.95519361 1.455107e-17 > > > 1.10926675 0.03897129 3.390908e-14 > > > -0.37786333 -0.17523593 3.390908e-14 > > > 2.43973603 0.97970095 7.902000e-11 > > > -0.35432394 -0.03742559 7.902000e-11 > > > 2.19296613 1.00355263 4.289362e-04 > > > 0.49845532 0.34816207 4.289362e-04 > > > 1.25005260 0.76306225 5.000000e-01 > > > 0.84360691 0.45152356 5.000000e-01 > > > 0.29565993 0.53880068 5.000000e-01 > > > -0.54081334 -0.28104525 5.000000e-01 > > > 0.83612836 -0.12885659 9.995711e-01 > > > -1.42526769 -0.87107631 9.999998e-01 > > > 0.10204789 -0.11649899 1.000000e+00 > > > 1.14292898 0.37249631 1.000000e+00 > > > -3.02942081 -1.28966997 1.000000e+00 > > > -1.37549764 -0.74676145 1.000000e+00 > > > -2.00118016 -0.55182759 1.000000e+00 > > > -4.24441674 -1.94603608 1.000000e+00 > > > 1.17168144 1.00868008 1.000000e+00 > > > 2.64007761 1.26333069 1.000000e+00 > > > 1.98550114 1.18509599 1.000000e+00 > > > -0.58941683 -0.61972416 9.999998e-01 > > > -4.57559611 -2.30914920 9.995711e-01 > > > -0.82610544 -0.39347576 9.995711e-01 > > > -0.02768220 0.20076910 9.995711e-01 > > > 0.78186399 0.25690215 9.995711e-01 > > > -0.88314153 -0.20200148 5.000000e-01 > > > -4.17076452 -2.03547588 5.000000e-01 > > > 0.93373070 0.54190626 4.289362e-04 > > > -0.08517734 0.17692491 4.289362e-04 > > > -4.47546619 -2.14876688 4.289362e-04 > > > -1.65509103 -0.76898087 4.289362e-04 > > > -0.39403030 -0.12689705 4.289362e-04 > > > 0.01203300 -0.18689898 1.841442e-07 > > > -4.82762639 -2.31391121 1.841442e-07 > > > -0.72658380 -0.39751171 3.397282e-14 > > > -2.35886866 -1.01082109 0.000000e+00 > > > -2.03762707 -0.96439902 0.000000e+00 > > > 0.90115123 0.60172286 0.000000e+00 > > > 1.55999194 0.83433953 0.000000e+00 > > > 3.07994058 1.30942776 0.000000e+00 > > > 1.78871462 1.10605530 0.000000e+00 > > > > > > > > > > > > Running simple linear model returns: > > > > > > > lm(y~x,data=df) > > > > > > Call: > > > lm(formula = y ~ x, data = df) > > > > > > Coefficients: > > > (Intercept) x > > > -0.04173 2.03790 > > > > > > and > > > > max(resid(lm(y~x,data=df))) > > > [1] 1.14046 > > > > > > > > > *HOWEVER if I use the weighted model then:* > > > > > > lm(formula = y ~ x, data = df, weights = df$weights) > > > > > > Coefficients: > > > (Intercept) x > > > -0.05786 1.96087 > > > > > > and > > > > max(resid(lm(y~x,data=df,weights=df$weights))) > > > [1] 60.91888 > > > > > > > > > as you see, the estimation of the coefficients are nearly the > same > > but the > > > resid() function returns a giant residual (I have some cases > where > > the value is > > > much much higher). Further, if I calculate the residuals by > simply > > > predict(lm(y~x,data=df,weights=df$weights))-df$y then I get the > true > > value for > > > the residuals. > > > > > > > > > Thanks. > > > > > > Please do not hesitate to contact me for more details. > > > Regards, > > > Hamed. > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org <mailto:R-help at r-project.org> > <mailto:R-help at r-project.org <mailto:R-help at r-project.org> > mailing list -- > > To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R- > project.org/posting- > > > guide.html > > > and provide commented, minimal, self-contained, reproducible > > code. > > > >