Hi, This input doesn't have any interesting properties except y is unix time. Spreadsheets can do this well. Is this a bug that lm can't do x ~ y? R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632, 108.928, 94.08) > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307, 1506705747.372) > m = lm(x ~ y) > summary(m) Call: lm(formula = x ~ y) Residuals: Min 1Q Median 3Q Max -27.0222 -14.9902 -0.6542 14.1938 29.1698 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 94.734 6.511 14.55 4.88e-07 *** y NA NA NA NA --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 19.53 on 8 degrees of freedom > summary(lm(y ~ x)) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -2.1687 -1.3345 -0.9466 1.3826 2.6551 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.507e+09 3.294e+00 4.574e+08 < 2e-16 *** x 6.136e-01 3.413e-02 1.798e+01 4.07e-07 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 1.885 on 7 degrees of freedom Multiple R-squared: 0.9788, Adjusted R-squared: 0.9758 F-statistic: 323.3 on 1 and 7 DF, p-value: 4.068e-07
Perhaps subtract 1506705766 from y? Saying some other software does it well implies you know what the _correct_ answer is here but I would question what that means with this sort of data-set. On 17/04/2019 07:26, Dingyuan Wang wrote:> Hi, > > This input doesn't have any interesting properties except y is unix > time. Spreadsheets can do this well. > Is this a bug that lm can't do x ~ y? > > R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" > Copyright (C) 2018 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > > > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, > 101.632, 108.928, 94.08) > > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873, > 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307, > 1506705747.372) > > m = lm(x ~ y) > > summary(m) > > Call: > lm(formula = x ~ y) > > Residuals: > ???? Min?????? 1Q?? Median?????? 3Q????? Max > -27.0222 -14.9902? -0.6542? 14.1938? 29.1698 > > Coefficients: (1 not defined because of singularities) > ??????????? Estimate Std. Error t value Pr(>|t|) > (Intercept)?? 94.734????? 6.511?? 14.55 4.88e-07 *** > y???????????????? NA???????? NA????? NA?????? NA > --- > Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 19.53 on 8 degrees of freedom > > > summary(lm(y ~ x)) > > Call: > lm(formula = y ~ x) > > Residuals: > ??? Min????? 1Q? Median????? 3Q???? Max > -2.1687 -1.3345 -0.9466? 1.3826? 2.6551 > > Coefficients: > ???????????? Estimate Std. Error?? t value Pr(>|t|) > (Intercept) 1.507e+09? 3.294e+00 4.574e+08? < 2e-16 *** > x?????????? 6.136e-01? 3.413e-02 1.798e+01 4.07e-07 *** > --- > Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 1.885 on 7 degrees of freedom > Multiple R-squared:? 0.9788,??? Adjusted R-squared:? 0.9758 > F-statistic: 323.3 on 1 and 7 DF,? p-value: 4.068e-07 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > --- > This email has been checked for viruses by AVG. > avg.com > >-- Michael dewey.myzen.co.uk/home.html
This sort of data arises quite easily if you deal with time/dates around now. E.g.,> d <- data.frame(+ when = seq(as.POSIXct("2017-09-29 18:22:01"), by="secs", len=10), + measurement = log2(1:10))> coef(lm(data=d, measurement ~ when))(Intercept) when 2.1791061114716954 NA> as.numeric(d$when)[1:2][1] 1506734521 1506734522 There are problems with the time units (seconds vs. hours) if you subtract off a time because the units of -.POSIXt depend on the data:> coef(lm(data=d, measurement ~ I(when - min(when))))(Intercept) I(when - min(when)) 0.68327571513124297 0.33240675474232279> coef(lm(data=d, measurement ~ I(when - as.POSIXct("2017-09-2900:00:00")))) (Intercept) I(when - as.POSIXct("2017-09-29 00:00:00")) -21978.3837546251634 1196.6643170736229 Hence you have to use difftime and specify the units> coef(lm(data=d, measurement ~ difftime(when, as.POSIXct("2017-09-2900:00:00"), units="secs"))) (Intercept) -2.1978383754612696e+04 difftime(when, as.POSIXct("2017-09-29 00:00:00"), units = "secs") 3.3240675474248449e-01> coef(lm(data=d, measurement ~ difftime(when, min(when), units="secs")))(Intercept) difftime(when, min(when), units "secs") 0.68327571513124297 0.33240675474232279 Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Apr 18, 2019 at 8:24 AM Michael Dewey <lists at dewey.myzen.co.uk> wrote:> Perhaps subtract 1506705766 from y? > > Saying some other software does it well implies you know what the > _correct_ answer is here but I would question what that means with this > sort of data-set. > > On 17/04/2019 07:26, Dingyuan Wang wrote: > > Hi, > > > > This input doesn't have any interesting properties except y is unix > > time. Spreadsheets can do this well. > > Is this a bug that lm can't do x ~ y? > > > > R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" > > Copyright (C) 2018 The R Foundation for Statistical Computing > > Platform: x86_64-pc-linux-gnu (64-bit) > > > > > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, > > 101.632, 108.928, 94.08) > > > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873, > > 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307, > > 1506705747.372) > > > m = lm(x ~ y) > > > summary(m) > > > > Call: > > lm(formula = x ~ y) > > > > Residuals: > > Min 1Q Median 3Q Max > > -27.0222 -14.9902 -0.6542 14.1938 29.1698 > > > > Coefficients: (1 not defined because of singularities) > > Estimate Std. Error t value Pr(>|t|) > > (Intercept) 94.734 6.511 14.55 4.88e-07 *** > > y NA NA NA NA > > --- > > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > > > Residual standard error: 19.53 on 8 degrees of freedom > > > > > summary(lm(y ~ x)) > > > > Call: > > lm(formula = y ~ x) > > > > Residuals: > > Min 1Q Median 3Q Max > > -2.1687 -1.3345 -0.9466 1.3826 2.6551 > > > > Coefficients: > > Estimate Std. Error t value Pr(>|t|) > > (Intercept) 1.507e+09 3.294e+00 4.574e+08 < 2e-16 *** > > x 6.136e-01 3.413e-02 1.798e+01 4.07e-07 *** > > --- > > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > > > Residual standard error: 1.885 on 7 degrees of freedom > > Multiple R-squared: 0.9788, Adjusted R-squared: 0.9758 > > F-statistic: 323.3 on 1 and 7 DF, p-value: 4.068e-07 > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > --- > > This email has been checked for viruses by AVG. > > avg.com > > > > > > -- > Michael > dewey.myzen.co.uk/home.html > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
I just want to make a line out of timestamps vs some coordinates, so y~x or x~y doesn't matter. Yes, I know the answer. When trying R, I'm surprised that R can't solve that either. I first noticed that PostgreSQL can't solve it, and found that they fixed that in pg 12. postgresql.org/message-id/153313051300.1397.9594490737341194671@wrigleys.postgresql.org Therefore I come to ask whether someone know how to fix this in R, or I must submit it as a bug? 2019/4/18 23:24, Michael Dewey:> Perhaps subtract 1506705766 from y? > > Saying some other software does it well implies you know what the > _correct_ answer is here but I would question what that means with this > sort of data-set. > > On 17/04/2019 07:26, Dingyuan Wang wrote: >> Hi, >> >> This input doesn't have any interesting properties except y is unix >> time. Spreadsheets can do this well. >> Is this a bug that lm can't do x ~ y? >> >> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" >> Copyright (C) 2018 The R Foundation for Statistical Computing >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> ?> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, >> 101.632, 108.928, 94.08) >> ?> y = c(1506705739.385, 1506705766.895, 1506705746.293, >> 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26, >> 1506705761.307, 1506705747.372) >> ?> m = lm(x ~ y) >> ?> summary(m) >> >> Call: >> lm(formula = x ~ y) >> >> Residuals: >> ????? Min?????? 1Q?? Median?????? 3Q????? Max >> -27.0222 -14.9902? -0.6542? 14.1938? 29.1698 >> >> Coefficients: (1 not defined because of singularities) >> ???????????? Estimate Std. Error t value Pr(>|t|) >> (Intercept)?? 94.734????? 6.511?? 14.55 4.88e-07 *** >> y???????????????? NA???????? NA????? NA?????? NA >> --- >> Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> >> Residual standard error: 19.53 on 8 degrees of freedom >> >> ?> summary(lm(y ~ x)) >> >> Call: >> lm(formula = y ~ x) >> >> Residuals: >> ???? Min????? 1Q? Median????? 3Q???? Max >> -2.1687 -1.3345 -0.9466? 1.3826? 2.6551 >> >> Coefficients: >> ????????????? Estimate Std. Error?? t value Pr(>|t|) >> (Intercept) 1.507e+09? 3.294e+00 4.574e+08? < 2e-16 *** >> x?????????? 6.136e-01? 3.413e-02 1.798e+01 4.07e-07 *** >> --- >> Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> >> Residual standard error: 1.885 on 7 degrees of freedom >> Multiple R-squared:? 0.9788,??? Adjusted R-squared:? 0.9758 >> F-statistic: 323.3 on 1 and 7 DF,? p-value: 4.068e-07 >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> --- >> This email has been checked for viruses by AVG. >> avg.com >> >> >
> On Apr 18, 2019, at 8:24 AM, Michael Dewey <lists at dewey.myzen.co.uk> wrote: > > Perhaps subtract 1506705766 from y?Good advice. Some further notes follow. One can specify `tol` to have a smaller than default value e.g. m2 <- lm(x ~ y, tol=1e-12) which is accurate: plot(y,x) abline(coef=coef(m2)) Users of numerical procedures need to be mindful of the default settings of the algorithms they use. As is well known, the use of a too large default for convergence of an optimization algorithm can lead to seriously wrong results. There is an example described here: science.sciencemag.org/content/296/5575/1945/tab-pdf One might quibble with the choice of tol=1e-7 (the default in lm.fit), and 64 bit floating point will support much smaller values. However, there are usually statistical issues surrounding fitting highly collinear variables. So, `tol = 1e-07` seems more like a feature than a bug. HTH, Chuck> > Saying some other software does it well implies you know what the _correct_ answer is here but I would question what that means with this sort of data-set. > > On 17/04/2019 07:26, Dingyuan Wang wrote: >> Hi, >> This input doesn't have any interesting properties except y is unix time. Spreadsheets can do this well. >> Is this a bug that lm can't do x ~ y? >> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" >> Copyright (C) 2018 The R Foundation for Statistical Computing >> Platform: x86_64-pc-linux-gnu (64-bit) >> > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632, 108.928, 94.08) >> > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307, 1506705747.372) >> > m = lm(x ~ y) >> > summary(m) >> Call: >> lm(formula = x ~ y) >> Residuals: >> Min 1Q Median 3Q Max >> -27.0222 -14.9902 -0.6542 14.1938 29.1698 >> Coefficients: (1 not defined because of singularities) >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 94.734 6.511 14.55 4.88e-07 *** >> y NA NA NA NA >> --- >> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> Residual standard error: 19.53 on 8 degrees of freedom >> > summary(lm(y ~ x)) >> Call: >> lm(formula = y ~ x) >> Residuals: >> Min 1Q Median 3Q Max >> -2.1687 -1.3345 -0.9466 1.3826 2.6551 >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 1.507e+09 3.294e+00 4.574e+08 < 2e-16 *** >> x 6.136e-01 3.413e-02 1.798e+01 4.07e-07 *** >> --- >> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> Residual standard error: 1.885 on 7 degrees of freedom >> Multiple R-squared: 0.9788, Adjusted R-squared: 0.9758 >> F-statistic: 323.3 on 1 and 7 DF, p-value: 4.068e-07 >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> --- >> This email has been checked for viruses by AVG. >> avg.com > > -- > Michael > dewey.myzen.co.uk/home.html >
Dear Dingyuan Wang, But your question was answered clearly earlier in this thread (I forget by whom), showing that lm() provides the solution to the regression of x on y if the criterion for singularity is tightened:> lm(x ~ y)Call: lm(formula = x ~ y) Coefficients: (Intercept) y 94.73 NA> lm(x ~ y, tol=1e-10)Call: lm(formula = x ~ y, tol = 1e-10) Coefficients: (Intercept) y -2.403e+09 1.595e+00 Best, John> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Dingyuan > Wang > Sent: Thursday, April 18, 2019 12:36 PM > To: Michael Dewey <lists at dewey.myzen.co.uk>; r-help at r-project.org > Subject: Re: [R] lm fails on some large input > > I just want to make a line out of timestamps vs some coordinates, so y~x or > x~y doesn't matter. > > Yes, I know the answer. When trying R, I'm surprised that R can't solve that > either. I first noticed that PostgreSQL can't solve it, and found that they fixed > that in pg 12. > > postgresql.org/message- > id/153313051300.1397.9594490737341194671%40wrigleys.postgresql.org > > Therefore I come to ask whether someone know how to fix this in R, or I must > submit it as a bug? > > 2019/4/18 23:24, Michael Dewey: > > Perhaps subtract 1506705766 from y? > > > > Saying some other software does it well implies you know what the > > _correct_ answer is here but I would question what that means with > > this sort of data-set. > > > > On 17/04/2019 07:26, Dingyuan Wang wrote: > >> Hi, > >> > >> This input doesn't have any interesting properties except y is unix > >> time. Spreadsheets can do this well. > >> Is this a bug that lm can't do x ~ y? > >> > >> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" > >> Copyright (C) 2018 The R Foundation for Statistical Computing > >> Platform: x86_64-pc-linux-gnu (64-bit) > >> > >> ?> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, > >> 101.632, 108.928, 94.08) > >> ?> y = c(1506705739.385, 1506705766.895, 1506705746.293, > >> 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26, > >> 1506705761.307, 1506705747.372) > >> ?> m = lm(x ~ y) > >> ?> summary(m) > >> > >> Call: > >> lm(formula = x ~ y) > >> > >> Residuals: > >> ????? Min?????? 1Q?? Median?????? 3Q????? Max > >> -27.0222 -14.9902? -0.6542? 14.1938? 29.1698 > >> > >> Coefficients: (1 not defined because of singularities) > >> ???????????? Estimate Std. Error t value Pr(>|t|) > >> (Intercept)?? 94.734????? 6.511?? 14.55 4.88e-07 *** y > >> NA???????? NA????? NA?????? NA > >> --- > >> Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > >> > >> Residual standard error: 19.53 on 8 degrees of freedom > >> > >> ?> summary(lm(y ~ x)) > >> > >> Call: > >> lm(formula = y ~ x) > >> > >> Residuals: > >> ???? Min????? 1Q? Median????? 3Q???? Max > >> -2.1687 -1.3345 -0.9466? 1.3826? 2.6551 > >> > >> Coefficients: > >> ????????????? Estimate Std. Error?? t value Pr(>|t|) > >> (Intercept) 1.507e+09? 3.294e+00 4.574e+08? < 2e-16 *** x > >> 6.136e-01? 3.413e-02 1.798e+01 4.07e-07 *** > >> --- > >> Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > >> > >> Residual standard error: 1.885 on 7 degrees of freedom Multiple > >> R-squared:? 0.9788,??? Adjusted R-squared:? 0.9758 > >> F-statistic: 323.3 on 1 and 7 DF,? p-value: 4.068e-07 > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > >> --- > >> This email has been checked for viruses by AVG. > >> avg.com > >> > >> > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.