Evgeni Chasnovski
2018-Dec-30 11:49 UTC
[Rd] Bug report with patch: `stats:::regularize.values()` always creates full copies of `x` and `y`
This is intended to be a bug report with proposed patch. I am posting to this mailing list as described in NOTE in "Bug Reporting in R". Function `stats:::regularize.values()` is meant to preprocess `x` and `y` arguments to have "proper" values for later use during interpolation. If input is already "proper", I would expect it to reuse the same objects without creating new ones. However, this isn't the case and is the source of unneccessary extra memory usage in `approx()` and others. The root cause of this seems to be a forceful reordering in lines 37-39 of 'approx.R' file. If reordering is done only if `x` is unsorted then no copies are created. Also this doesn't seem like breaking any existing code. There is a patch attached. Reproducable code: x <- seq(1, 100, 1) y <- seq(1, 100, 1) reg_xy <- stats:::regularize.values(x, y, mean) # Regularized versions of `x` and `y` are identical to input but are stored at # different places identical(x, reg_xy[["x"]]) #> [1] TRUE .Internal(inspect(x)) #> @15719b0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,... .Internal(inspect(reg_xy[["x"]])) #> @2b84130 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,... identical(y, reg_xy[["y"]]) #> [1] TRUE .Internal(inspect(y)) #> @2c91be0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,... .Internal(inspect(reg_xy[["y"]])) #> @2bb4880 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,... # Differs from original only by using `if (is.unsorted(x))` new_regularize.values <- function (x, y, ties) { x <- xy.coords(x, y, setLab = FALSE) y <- x$y x <- x$x if (any(na <- is.na(x) | is.na(y))) { ok <- !na x <- x[ok] y <- y[ok] } nx <- length(x) if (!identical(ties, "ordered")) { if (is.unsorted(x)) { o <- order(x) x <- x[o] y <- y[o] } if (length(ux <- unique(x)) < nx) { if (missing(ties)) warning("collapsing to unique 'x' values") y <- as.vector(tapply(y, match(x, x), ties)) x <- ux stopifnot(length(y) == length(x)) } } list(x = x, y = y) } new_reg_xy <- new_regularize.values(x, y, mean) # Output is still identical to input and also references to the same objects identical(x, new_reg_xy[["x"]]) #> [1] TRUE .Internal(inspect(x)) #> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,... .Internal(inspect(new_reg_xy[["x"]])) #> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,... identical(y, new_reg_xy[["y"]]) #> [1] TRUE .Internal(inspect(y)) #> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,... .Internal(inspect(new_reg_xy[["y"]])) #> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,... # Current R version R.version #> _ #> platform x86_64-pc-linux-gnu #> arch x86_64 #> os linux-gnu #> system x86_64, linux-gnu #> status #> major 3 #> minor 5.2 #> year 2018 #> month 12 #> day 20 #> svn rev 75870 #> language R #> version.string R version 3.5.2 (2018-12-20) #> nickname Eggshell Igloo -- Best regards, Evgeni Chasnovski -------------- next part -------------- A non-text attachment was scrubbed... Name: patch.diff Type: text/x-patch Size: 522 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20181230/63e44ead/attachment.bin>
Reasonably Related Threads
- Bug report with patch: `stats:::regularize.values()` always creates full copies of `x` and `y`
- Any plans for ALTREP lists (VECSXP)?
- [External] Re: Any plans for ALTREP lists (VECSXP)?
- Any plans for ALTREP lists (VECSXP)?
- [External] Re: Any plans for ALTREP lists (VECSXP)?