Dear Michael and Dingyuan Wang,> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael > Dewey > Sent: Thursday, April 18, 2019 11:25 AM > To: Dingyuan Wang <gumblex at aosc.io>; r-help at r-project.org > Subject: Re: [R] lm fails on some large input > > Perhaps subtract 1506705766 from y? > > Saying some other software does it well implies you know what the _correct_ > answer is here but I would question what that means with this sort of data- > set.It's rather an interesting problem, though, because the na?ve computation of the LS solution works: plot(x, y) X <- cbind(1, x) b <- solve(t(X) %*% X) %*% t(X) %*% y b abline(b) That surprised me, because I expected that lm() computation, using the QR decomposition, would be more numerically stable. Best, John ----------------------------------------------------------------- John Fox Professor Emeritus McMaster University Hamilton, Ontario, Canada Web: https://socialsciences.mcmaster.ca/jfox/> > On 17/04/2019 07:26, Dingyuan Wang wrote: > > Hi, > > > > This input doesn't have any interesting properties except y is unix > > time. Spreadsheets can do this well. > > Is this a bug that lm can't do x ~ y? > > > > R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" > > Copyright (C) 2018 The R Foundation for Statistical Computing > > Platform: x86_64-pc-linux-gnu (64-bit) > > > > > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, > > 101.632, 108.928, 94.08) > y = c(1506705739.385, 1506705766.895, > > 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351, > > 1506705756.26, 1506705761.307, > > 1506705747.372) > > > m = lm(x ~ y) > > > summary(m) > > > > Call: > > lm(formula = x ~ y) > > > > Residuals: > > ???? Min?????? 1Q?? Median?????? 3Q????? Max > > -27.0222 -14.9902? -0.6542? 14.1938? 29.1698 > > > > Coefficients: (1 not defined because of singularities) > > ??????????? Estimate Std. Error t value Pr(>|t|) > > (Intercept)?? 94.734????? 6.511?? 14.55 4.88e-07 *** y > > NA???????? NA????? NA?????? NA > > --- > > Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > > > Residual standard error: 19.53 on 8 degrees of freedom > > > > > summary(lm(y ~ x)) > > > > Call: > > lm(formula = y ~ x) > > > > Residuals: > > ??? Min????? 1Q? Median????? 3Q???? Max > > -2.1687 -1.3345 -0.9466? 1.3826? 2.6551 > > > > Coefficients: > > ???????????? Estimate Std. Error?? t value Pr(>|t|) > > (Intercept) 1.507e+09? 3.294e+00 4.574e+08? < 2e-16 *** x > > 6.136e-01? 3.413e-02 1.798e+01 4.07e-07 *** > > --- > > Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > > > Residual standard error: 1.885 on 7 degrees of freedom Multiple > > R-squared:? 0.9788,??? Adjusted R-squared:? 0.9758 > > F-statistic: 323.3 on 1 and 7 DF,? p-value: 4.068e-07 > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > --- > > This email has been checked for viruses by AVG. > > https://www.avg.com > > > > > > -- > Michael > http://www.dewey.myzen.co.uk/home.html > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Um, you need to reverse y and x there. The question was about lm(y ~ x)....> X <- cbind(1, y) > solve(crossprod(X))Error in solve.default(crossprod(X)) : system is computationally singular: reciprocal condition number = 6.19587e-35 Actually, lm can QR perfectly OK, but it gets caught by its singularity detection:> qr <- qr(X, tol=1e-10) > qr # without the tol bit, you get same thing but $rank == 1$qr y [1,] -3.0000000 -4.520117e+09 [2,] 0.3333333 -3.426530e+01 [3,] 0.3333333 -2.947103e-02 [4,] 0.3333333 4.252164e-01 [5,] 0.3333333 -3.665468e-01 [6,] 0.3333333 -3.488029e-01 [7,] 0.3333333 2.614064e-01 [8,] 0.3333333 4.086982e-01 [9,] 0.3333333 2.018556e-03 $rank [1] 2 $qraux [1] 1.333333 1.571779 $pivot [1] 1 2 attr(,"class") [1] "qr"> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632, 108.928, 94.08) > qr.coef(qr,x)y -2.403345e+09 1.595099e+00> lm(x~y)Call: lm(formula = x ~ y) Coefficients: (Intercept) y 94.73 NA> lm(x~y, tol=1e-10)Call: lm(formula = x ~ y, tol = 1e-10) Coefficients: (Intercept) y -2.403e+09 1.595e+00> lm(x~I(y-mean(y)))Call: lm(formula = x ~ I(y - mean(y))) Coefficients: (Intercept) I(y - mean(y)) 94.734 1.595> On 18 Apr 2019, at 17:56 , Fox, John <jfox at mcmaster.ca> wrote: > > Dear Michael and Dingyuan Wang, > >> -----Original Message----- >> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael >> Dewey >> Sent: Thursday, April 18, 2019 11:25 AM >> To: Dingyuan Wang <gumblex at aosc.io>; r-help at r-project.org >> Subject: Re: [R] lm fails on some large input >> >> Perhaps subtract 1506705766 from y? >> >> Saying some other software does it well implies you know what the _correct_ >> answer is here but I would question what that means with this sort of data- >> set. > > It's rather an interesting problem, though, because the na?ve computation of the LS solution works: > > plot(x, y) > X <- cbind(1, x) > b <- solve(t(X) %*% X) %*% t(X) %*% y > b > abline(b) > > That surprised me, because I expected that lm() computation, using the QR decomposition, would be more numerically stable. > > Best, > John > > ----------------------------------------------------------------- > John Fox > Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > Web: https://socialsciences.mcmaster.ca/jfox/ > > > >> >> On 17/04/2019 07:26, Dingyuan Wang wrote: >>> Hi, >>> >>> This input doesn't have any interesting properties except y is unix >>> time. Spreadsheets can do this well. >>> Is this a bug that lm can't do x ~ y? >>> >>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" >>> Copyright (C) 2018 The R Foundation for Statistical Computing >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>>> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, >>> 101.632, 108.928, 94.08) > y = c(1506705739.385, 1506705766.895, >>> 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351, >>> 1506705756.26, 1506705761.307, >>> 1506705747.372) >>>> m = lm(x ~ y) >>>> summary(m) >>> >>> Call: >>> lm(formula = x ~ y) >>> >>> Residuals: >>> Min 1Q Median 3Q Max >>> -27.0222 -14.9902 -0.6542 14.1938 29.1698 >>> >>> Coefficients: (1 not defined because of singularities) >>> Estimate Std. Error t value Pr(>|t|) >>> (Intercept) 94.734 6.511 14.55 4.88e-07 *** y >>> NA NA NA NA >>> --- >>> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >>> >>> Residual standard error: 19.53 on 8 degrees of freedom >>> >>>> summary(lm(y ~ x)) >>> >>> Call: >>> lm(formula = y ~ x) >>> >>> Residuals: >>> Min 1Q Median 3Q Max >>> -2.1687 -1.3345 -0.9466 1.3826 2.6551 >>> >>> Coefficients: >>> Estimate Std. Error t value Pr(>|t|) >>> (Intercept) 1.507e+09 3.294e+00 4.574e+08 < 2e-16 *** x >>> 6.136e-01 3.413e-02 1.798e+01 4.07e-07 *** >>> --- >>> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >>> >>> Residual standard error: 1.885 on 7 degrees of freedom Multiple >>> R-squared: 0.9788, Adjusted R-squared: 0.9758 >>> F-statistic: 323.3 on 1 and 7 DF, p-value: 4.068e-07 >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> --- >>> This email has been checked for viruses by AVG. >>> https://www.avg.com >>> >>> >> >> -- >> Michael >> http://www.dewey.myzen.co.uk/home.html >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Dear Peter,> -----Original Message----- > From: peter dalgaard [mailto:pdalgd at gmail.com] > Sent: Thursday, April 18, 2019 12:23 PM > To: Fox, John <jfox at mcmaster.ca> > Cc: Michael Dewey <lists at dewey.myzen.co.uk>; Dingyuan Wang > <gumblex at aosc.io>; r-help at r-project.org > Subject: Re: [R] lm fails on some large input > > Um, you need to reverse y and x there. The question was about lm(y ~ x).... >Good catch! I missed that in the original posting, and lm() does indeed produce the LS solution for the regression of y on x. And, as I'd have expected, the na?ve approach also fails for the regression of x on y:> Y <- cbind(1, y) > b <- solve(t(Y) %*% Y) %*% t(Y) %*% xError in solve.default(t(Y) %*% Y) : system is computationally singular: reciprocal condition number = 6.19587e-35 resolving the mystery. Thanks, John> > X <- cbind(1, y) > > solve(crossprod(X)) > Error in solve.default(crossprod(X)) : > system is computationally singular: reciprocal condition number = 6.19587e- > 35 > > Actually, lm can QR perfectly OK, but it gets caught by its singularity detection: > > > qr <- qr(X, tol=1e-10) > > qr # without the tol bit, you get same thing but $rank == 1 > $qr > y > [1,] -3.0000000 -4.520117e+09 > [2,] 0.3333333 -3.426530e+01 > [3,] 0.3333333 -2.947103e-02 > [4,] 0.3333333 4.252164e-01 > [5,] 0.3333333 -3.665468e-01 > [6,] 0.3333333 -3.488029e-01 > [7,] 0.3333333 2.614064e-01 > [8,] 0.3333333 4.086982e-01 > [9,] 0.3333333 2.018556e-03 > > $rank > [1] 2 > > $qraux > [1] 1.333333 1.571779 > > $pivot > [1] 1 2 > > attr(,"class") > [1] "qr" > > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632, > > 108.928, 94.08) > > qr.coef(qr,x) > y > -2.403345e+09 1.595099e+00 > > > lm(x~y) > > Call: > lm(formula = x ~ y) > > Coefficients: > (Intercept) y > 94.73 NA > > > lm(x~y, tol=1e-10) > > Call: > lm(formula = x ~ y, tol = 1e-10) > > Coefficients: > (Intercept) y > -2.403e+09 1.595e+00 > > > lm(x~I(y-mean(y))) > > Call: > lm(formula = x ~ I(y - mean(y))) > > Coefficients: > (Intercept) I(y - mean(y)) > 94.734 1.595 > > > > On 18 Apr 2019, at 17:56 , Fox, John <jfox at mcmaster.ca> wrote: > > > > Dear Michael and Dingyuan Wang, > > > >> -----Original Message----- > >> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of > >> Michael Dewey > >> Sent: Thursday, April 18, 2019 11:25 AM > >> To: Dingyuan Wang <gumblex at aosc.io>; r-help at r-project.org > >> Subject: Re: [R] lm fails on some large input > >> > >> Perhaps subtract 1506705766 from y? > >> > >> Saying some other software does it well implies you know what the > >> _correct_ answer is here but I would question what that means with > >> this sort of data- set. > > > > It's rather an interesting problem, though, because the na?ve computation of > the LS solution works: > > > > plot(x, y) > > X <- cbind(1, x) > > b <- solve(t(X) %*% X) %*% t(X) %*% y > > b > > abline(b) > > > > That surprised me, because I expected that lm() computation, using the QR > decomposition, would be more numerically stable. > > > > Best, > > John > > > > ----------------------------------------------------------------- > > John Fox > > Professor Emeritus > > McMaster University > > Hamilton, Ontario, Canada > > Web: https://socialsciences.mcmaster.ca/jfox/ > > > > > > > >> > >> On 17/04/2019 07:26, Dingyuan Wang wrote: > >>> Hi, > >>> > >>> This input doesn't have any interesting properties except y is unix > >>> time. Spreadsheets can do this well. > >>> Is this a bug that lm can't do x ~ y? > >>> > >>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" > >>> Copyright (C) 2018 The R Foundation for Statistical Computing > >>> Platform: x86_64-pc-linux-gnu (64-bit) > >>> > >>>> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, > >>> 101.632, 108.928, 94.08) > y = c(1506705739.385, 1506705766.895, > >>> 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351, > >>> 1506705756.26, 1506705761.307, > >>> 1506705747.372) > >>>> m = lm(x ~ y) > >>>> summary(m) > >>> > >>> Call: > >>> lm(formula = x ~ y) > >>> > >>> Residuals: > >>> Min 1Q Median 3Q Max > >>> -27.0222 -14.9902 -0.6542 14.1938 29.1698 > >>> > >>> Coefficients: (1 not defined because of singularities) > >>> Estimate Std. Error t value Pr(>|t|) > >>> (Intercept) 94.734 6.511 14.55 4.88e-07 *** y > >>> NA NA NA NA > >>> --- > >>> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > >>> > >>> Residual standard error: 19.53 on 8 degrees of freedom > >>> > >>>> summary(lm(y ~ x)) > >>> > >>> Call: > >>> lm(formula = y ~ x) > >>> > >>> Residuals: > >>> Min 1Q Median 3Q Max > >>> -2.1687 -1.3345 -0.9466 1.3826 2.6551 > >>> > >>> Coefficients: > >>> Estimate Std. Error t value Pr(>|t|) > >>> (Intercept) 1.507e+09 3.294e+00 4.574e+08 < 2e-16 *** x > >>> 6.136e-01 3.413e-02 1.798e+01 4.07e-07 *** > >>> --- > >>> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > >>> > >>> Residual standard error: 1.885 on 7 degrees of freedom Multiple > >>> R-squared: 0.9788, Adjusted R-squared: 0.9758 > >>> F-statistic: 323.3 on 1 and 7 DF, p-value: 4.068e-07 > >>> > >>> ______________________________________________ > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> --- > >>> This email has been checked for viruses by AVG. > >>> https://www.avg.com > >>> > >>> > >> > >> -- > >> Michael > >> http://www.dewey.myzen.co.uk/home.html > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting- > >> guide.html and provide commented, minimal, self-contained, > >> reproducible code. > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 > Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > >