David J. Birke
2019-Sep-27 17:05 UTC
[R] stats::lm has inconsistent output when adding constant to dependent variable
Dear R community, I just stumbled upon the following behavior in R version 3.6.0: set.seed(42) y <- rep(0, 30) x <- rbinom(30, 1, prob = 0.91) # The following will not show any t-statistic or p-value summary(lm(y~x)) # The following will show t-statistic and p-value summary(lm(1+y~x)) My expected output is that the first case should report t-statistic and p-value. My intuition might be tricking me, but I think that a constant shift of the data should be fully absorbed by the constant and not affect inference about the slope. Is this a bug or is there a reason why there should be a discrepancy between the two outputs? Best, David
Mark Leeds
2019-Sep-27 18:35 UTC
[R] stats::lm has inconsistent output when adding constant to dependent variable
Hi: In your example, you made the response zero in every case which is going to cause problems. In glm's, I think they call it the donsker effect. I'm not sure what it's called in OLS. probably a lack of identifiability. Note that you probably shouldn't be using zeros and 1's as the response in a regression anyway. If you change the response to below, you get what you'd expect. y <- c(rep(0, 15), rep(1,15)) On Fri, Sep 27, 2019 at 1:48 PM David J. Birke <djbirke at berkeley.edu> wrote:> Dear R community, > > I just stumbled upon the following behavior in R version 3.6.0: > > set.seed(42) > y <- rep(0, 30) > x <- rbinom(30, 1, prob = 0.91) > # The following will not show any t-statistic or p-value > summary(lm(y~x)) > # The following will show t-statistic and p-value > summary(lm(1+y~x)) > > My expected output is that the first case should report t-statistic and > p-value. My intuition might be tricking me, but I think that a constant > shift of the data should be fully absorbed by the constant and not > affect inference about the slope. > > Is this a bug or is there a reason why there should be a discrepancy > between the two outputs? > > Best, > David > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Mark Leeds
2019-Sep-27 18:43 UTC
[R] stats::lm has inconsistent output when adding constant to dependent variable
correction to my previous answer. I looked around and I don't think it's called the donsker effect. It seems to jbe referred to as just a case of "perfect separability.". if you google for" perfect separation in glms", you'll get a lot of information. On Fri, Sep 27, 2019 at 2:35 PM Mark Leeds <markleeds2 at gmail.com> wrote:> Hi: In your example, you made the response zero in every case which > is going to cause problems. In glm's, I think they call it the donsker > effect. I'm not sure what it's called > in OLS. probably a lack of identifiability. Note that you probably > shouldn't be using zeros > and 1's as the response in a regression anyway. > > If you change the response to below, you get what you'd expect. > > y <- c(rep(0, 15), rep(1,15)) > > On Fri, Sep 27, 2019 at 1:48 PM David J. Birke <djbirke at berkeley.edu> > wrote: > >> Dear R community, >> >> I just stumbled upon the following behavior in R version 3.6.0: >> >> set.seed(42) >> y <- rep(0, 30) >> x <- rbinom(30, 1, prob = 0.91) >> # The following will not show any t-statistic or p-value >> summary(lm(y~x)) >> # The following will show t-statistic and p-value >> summary(lm(1+y~x)) >> >> My expected output is that the first case should report t-statistic and >> p-value. My intuition might be tricking me, but I think that a constant >> shift of the data should be fully absorbed by the constant and not >> affect inference about the slope. >> >> Is this a bug or is there a reason why there should be a discrepancy >> between the two outputs? >> >> Best, >> David >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]
Rui Barradas
2019-Sep-27 19:01 UTC
[R] stats::lm has inconsistent output when adding constant to dependent variable
Hello, Maybe FAQ 7.31? Check the residuals, they are all "zero" in both cases: fit0 <- lm(y~x) fit1 <- lm(1+y~x) # residuals table(resid(fit0)) # # 0 #30 table(resid(fit1)) # #-5.21223595241838e-16 -4.93038065763132e-31 3.12734157145103e-15 # 6 23 1 Hope this helps, Rui Barradas ?s 18:05 de 27/09/19, David J. Birke escreveu:> Dear R community, > > I just stumbled upon the following behavior in R version 3.6.0: > > set.seed(42) > y <- rep(0, 30) > x <- rbinom(30, 1, prob = 0.91) > # The following will not show any t-statistic or p-value > summary(lm(y~x)) > #? The following will show t-statistic and p-value > summary(lm(1+y~x)) > > My expected output is that the first case should report t-statistic and > p-value. My intuition might be tricking me, but I think that a constant > shift of the data should be fully absorbed by the constant and not > affect inference about the slope. > > Is this a bug or is there a reason why there should be a discrepancy > between the two outputs? > > Best, > David > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Mark Leeds
2019-Sep-28 17:32 UTC
[R] stats::lm has inconsistent output when adding constant to dependent variable
Hi Berwin: Yes, that's it. Donsker is famous for a functional CLT so I was mixing up statistics and stochastic processes I better stick to statistics. It's safer. !!!!! Thanks for correction. I'm ccing R-help since it may be useful to someone there. See below for Berwin's comment. Mark On Sat, Sep 28, 2019 at 3:36 AM Berwin A Turlach <berwin.turlach at gmail.com> wrote:> G'day Mark, > > On Fri, 27 Sep 2019 14:43:28 -0400 > Mark Leeds <markleeds2 at gmail.com> wrote: > > > correction to my previous answer. I looked around and I don't think > > it's called the donsker effect. > > I think you meant the Hauck-Donner effect [1], which refers to the > problem of separation for binomial GLMs (not all GLMs). > > Cheers, > > Berwin > > [1] Hauck, Jr., W.W. and Donner, A. (1977) Wald's test as applied to > hypotheses in logit analysis. Journal of the American Statistical > Association 72, 851-853. >[[alternative HTML version deleted]]
Aleš Žiberna
2019-Oct-03 07:35 UTC
[R] stats::lm has inconsistent output when adding constant to dependent variable
In one case they are exactly 0 and in the other they are almost zero. This is the reason for different results. Of course, they should be exactly the same, but this is due to some integer values not being exactly represented as real values on binary computers. Best, Ale? ?iberna On Fri, Sep 27, 2019 at 9:01 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Maybe FAQ 7.31? > > Check the residuals, they are all "zero" in both cases: > > fit0 <- lm(y~x) > fit1 <- lm(1+y~x) > > # residuals > table(resid(fit0)) > # > # 0 > #30 > > table(resid(fit1)) > # > #-5.21223595241838e-16 -4.93038065763132e-31 3.12734157145103e-15 > # 6 23 1 > > > Hope this helps, > > Rui Barradas > > ?s 18:05 de 27/09/19, David J. Birke escreveu: > > Dear R community, > > > > I just stumbled upon the following behavior in R version 3.6.0: > > > > set.seed(42) > > y <- rep(0, 30) > > x <- rbinom(30, 1, prob = 0.91) > > # The following will not show any t-statistic or p-value > > summary(lm(y~x)) > > # The following will show t-statistic and p-value > > summary(lm(1+y~x)) > > > > My expected output is that the first case should report t-statistic and > > p-value. My intuition might be tricking me, but I think that a constant > > shift of the data should be fully absorbed by the constant and not > > affect inference about the slope. > > > > Is this a bug or is there a reason why there should be a discrepancy > > between the two outputs? > > > > Best, > > David > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]