GlenB
2017-May-28 01:28 UTC
[Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
Bug: stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3 Example: line(1:9,1:9) should have intercept 0 and slope 1 but it gives intercept -1 and slope 1.2 Trying line(1:i,1:i) across a range of i makes it clear there's a cycle of length 6, with four of every six correct. Bug has been present across many versions. The machine I just tried it on just now has R3.2.3: _ platform x86_64-w64-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status major 3 minor 2.3 year 2015 month 12 day 10 svn rev 69752 language R version.string R version 3.2.3 (2015-12-10) nickname Wooden Christmas-Tree [[alternative HTML version deleted]]
Joris Meys
2017-May-28 13:27 UTC
[Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
Can confirm this in R 3.4.0 : end <- 6:100 res <- lapply(end, function(i) line(1:i,1:i)) absresid <- sapply(res, function(i) mean(abs(resid(i)))) plot(absresid, type = "h") coefs <- sapply(res, coef) plot(coefs[1,], coefs[2,])> sessionInfo()R version 3.4.0 (2017-04-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.0 tools_3.4.0 On Sun, May 28, 2017 at 3:28 AM, GlenB <glnbrntt at gmail.com> wrote:> Bug: stats::line() does not produce correct Tukey line when n mod 6 is 2 or > 3 > > Example: line(1:9,1:9) should have intercept 0 and slope 1 but it gives > intercept -1 and slope 1.2 > > Trying line(1:i,1:i) across a range of i makes it clear there's a cycle of > length 6, with four of every six correct. > > Bug has been present across many versions. > > The machine I just tried it on just now has R3.2.3: > > _ > platform x86_64-w64-mingw32 > arch x86_64 > os mingw32 > system x86_64, mingw32 > status > major 3 > minor 2.3 > year 2015 > month 12 > day 10 > svn rev 69752 > language R > version.string R version 3.2.3 (2015-12-10) > nickname Wooden Christmas-Tree > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 (0)9 264 61 79 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Duncan Murdoch
2017-May-28 22:40 UTC
[Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
On 27/05/2017 9:28 PM, GlenB wrote:> Bug: stats::line() does not produce correct Tukey line when n mod 6 is 2 or > 3 > > Example: line(1:9,1:9) should have intercept 0 and slope 1 but it gives > intercept -1 and slope 1.2 > > Trying line(1:i,1:i) across a range of i makes it clear there's a cycle of > length 6, with four of every six correct. > > Bug has been present across many versions. > > The machine I just tried it on just now has R3.2.3:If you look at the source (in src/library/stats/src/line.c), the explanation is clear: the x value is chosen as the 1/6 quantile (according to a particular definition of quantile), and the y value is chosen as the median of the y values where x is less than or equal to the 1/3 quantile. Those are different definitions (though I think they would be asymptotically equivalent under pretty weak assumptions), so it's not surprising the x value doesn't correspond perfectly to the y value, and the line ends up "wrong". So is it a bug? Well, that depends on Tukey's definition. I don't have a copy of his book handy so I can't really say. Maybe the R function is doing exactly what Tukey said it should, and that's not a bug. Or maybe R is wrong. Duncan Murdoch
GlenB
2017-May-29 04:19 UTC
[Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
Tukey divides the points into three groups, not the x and y values separately. I'll try to get hold of the book for a direct quote, might take a couple of days. On Mon, May 29, 2017 at 8:40 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 27/05/2017 9:28 PM, GlenB wrote: > >> Bug: stats::line() does not produce correct Tukey line when n mod 6 is 2 >> or >> 3 >> >> Example: line(1:9,1:9) should have intercept 0 and slope 1 but it gives >> intercept -1 and slope 1.2 >> >> Trying line(1:i,1:i) across a range of i makes it clear there's a cycle of >> length 6, with four of every six correct. >> >> Bug has been present across many versions. >> >> The machine I just tried it on just now has R3.2.3: >> > > If you look at the source (in src/library/stats/src/line.c), the > explanation is clear: the x value is chosen as the 1/6 quantile (according > to a particular definition of quantile), and the y value is chosen as the > median of the y values where x is less than or equal to the 1/3 quantile. > Those are different definitions (though I think they would be > asymptotically equivalent under pretty weak assumptions), so it's not > surprising the x value doesn't correspond perfectly to the y value, and the > line ends up "wrong". > > So is it a bug? Well, that depends on Tukey's definition. I don't have a > copy of his book handy so I can't really say. Maybe the R function is > doing exactly what Tukey said it should, and that's not a bug. Or maybe R > is wrong. > > Duncan Murdoch > >[[alternative HTML version deleted]]
Reasonably Related Threads
- stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
- stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
- stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
- stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
- stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3