Davis Vaughan
2020-Feb-03 17:40 UTC
[Rd] ALTREP "wrapper" classes needs an Extract_subset method
Hi all, I believe I have found a bug (or perhaps just an oversight) with the ALTREP wrapper classes. The short form of this is that I believe that the wrapper classes need to override the default ALTREP `Extract_subset_method()` with a method that calls `Extract_subset()` on the "wrapped" object. I have a patch prepared here: https://github.com/DavisVaughan/r-source/pull/1 There is currently no call to `R_set_altvec_Extract_subset_method()` in the wrapper class init functions. This means that the default ALTREP method of `altvec_Extract_subset_default()` is called, which simply returns `NULL`. Consider what that means for an ALTREP compact integer seq that has been "wrapped". The default subsetting code will eventually call `ExtractSubset()`. That checks if the object is ALTREP, and calls the ALTREP Extract_subset() method if so. If the return value from that is NULL, then it will fallback to repeatedly calling `INTEGER_ELT()` to do the subsetting. See below for the relevant section: https://github.com/wch/r-source/blob/d1c0c6b921fc6a0cbe82c4354c6ec6ceb7f320aa/src/main/subset.c#L103 This wrapped compact integer seq is an ALTREP object, so `ALTREP(x)` returns true. But then it just calls the default method of returning NULL rather than calling the compact integer seq `Extract_subset()` method! This still "works" because it falls back to `INTEGER_ELT()` for which there is a `wrapper_integer_Elt()` method defined that will use the underlying compact seq's `integer_Elt()` method, but it is less efficient than it could be. My rough benchmarks show that in R 3.6.0 the subset benchmarks at the bottom of this message take 4ms on the compact seq, and 5ms on the wrapped compact seq. With a patch that I have prepared, both take 4ms. The other reason I bring this up is that it can result in bugs with some ALTREP objects. I was working on one where it makes sense to have an `Extract_subset()` method but not an `Elt()` method. When it gets "wrapped", my `Extract_subset()` method is bypassed as shown above, and the `Elt()` method is incorrectly called (which throws an error because it is not meaningful for me). If you all agree these changes should be made, I can submit the patch. Thanks, Davis # Ensure we have enough elements for "wrapping" to kick in x <- 1:65 # select the 1st element a large number of times index <- rep(1L, 1e6) + 0L # ALTREP - but not wrapped # .Internal(inspect(x)) bench::mark(x[index], iterations = 1000) # Wrap it by adding a dummy attribute attributes(x) <- list(foo = "bar") # ALTREP - wrapped + compact seq # .Internal(inspect(x)) bench::mark(x[index], iterations = 1000) [[alternative HTML version deleted]]
iuke-tier@ey m@iii@g oii uiow@@edu
2020-Feb-04 03:43 UTC
[Rd] [External] ALTREP "wrapper" classes needs an Extract_subset method
It's not a bug as there will always be cases where ALTREP will need to fall back. But it does look like something that would be good to address. So please file it as a wishlist item and I'll look at a patch if you have one. As to your issue at the end, it seems to me that you should probably have another look at your design. Supporting getting a subset, including one of size one, but not getting an element seems odd, and I suspect will get you in trouble somewhere else before long.. Best, luke On Mon, 3 Feb 2020, Davis Vaughan wrote:> Hi all, > > I believe I have found a bug (or perhaps just an oversight) with the ALTREP > wrapper classes. The short form of this is that I believe that the wrapper > classes need to override the default ALTREP `Extract_subset_method()` with > a method that calls `Extract_subset()` on the "wrapped" object. I have a > patch prepared here: > > https://github.com/DavisVaughan/r-source/pull/1 > > There is currently no call to `R_set_altvec_Extract_subset_method()` in the > wrapper class init functions. This means that the default ALTREP method of > `altvec_Extract_subset_default()` is called, which simply returns `NULL`. > > Consider what that means for an ALTREP compact integer seq that has been > "wrapped". The default subsetting code will eventually call > `ExtractSubset()`. That checks if the object is ALTREP, and calls the > ALTREP Extract_subset() method if so. If the return value from that is > NULL, then it will fallback to repeatedly calling `INTEGER_ELT()` to do the > subsetting. See below for the relevant section: > > https://github.com/wch/r-source/blob/d1c0c6b921fc6a0cbe82c4354c6ec6ceb7f320aa/src/main/subset.c#L103 > > This wrapped compact integer seq is an ALTREP object, so `ALTREP(x)` > returns true. But then it just calls the default method of returning NULL > rather than calling the compact integer seq `Extract_subset()` method! This > still "works" because it falls back to `INTEGER_ELT()` for which there is > a `wrapper_integer_Elt()` method defined that will use the underlying > compact seq's `integer_Elt()` method, but it is less efficient than it > could be. > > My rough benchmarks show that in R 3.6.0 the subset benchmarks at the > bottom of this message take 4ms on the compact seq, and 5ms on the wrapped > compact seq. With a patch that I have prepared, both take 4ms. > > The other reason I bring this up is that it can result in bugs with some > ALTREP objects. I was working on one where it makes sense to have an > `Extract_subset()` method but not an `Elt()` method. When it gets > "wrapped", my `Extract_subset()` method is bypassed as shown above, and the > `Elt()` method is incorrectly called (which throws an error because it is > not meaningful for me). > > If you all agree these changes should be made, I can submit the patch. > > Thanks, > Davis > > # Ensure we have enough elements for "wrapping" to kick in > x <- 1:65 > > # select the 1st element a large number of times > index <- rep(1L, 1e6) + 0L > > # ALTREP - but not wrapped > # .Internal(inspect(x)) > > bench::mark(x[index], iterations = 1000) > > # Wrap it by adding a dummy attribute > attributes(x) <- list(foo = "bar") > > # ALTREP - wrapped + compact seq > # .Internal(inspect(x)) > > bench::mark(x[index], iterations = 1000) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu