Travers Ching
2019-Jan-31 08:10 UTC
[Rd] Object.size() should not visit every element for alt-rep strings, or there should be an altstring_objectsize_method
Below is a toy alt-rep string example, that generates N random strings: https://gist.github.com/traversc/a48a504eb062554f2d6ff8043ca16f9c example: `x <- altrandomStrings(1e8)` `head(x)` [1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... `object.size(1e8)` Object.size will call the `set_altstring_Elt_method` for every single element, materializing (slowly) every element of the vector. This is a problem mostly in R-studio since object.size is called automatically, defeating the purpose of alt-rep entirely.
Tierney, Luke
2019-Jan-31 13:35 UTC
[Rd] Object.size() should not visit every element for alt-rep strings, or there should be an altstring_objectsize_method
You should really take this up with RStudio. Calling object.size on every top level assignment as they appear to do is a bad idea, even without ALTREP. object.size is only a cheap operation for simple atomic vectors. For anything with recursive sturcture it needs to walk the object, so the effort is proprtional to object size:> x <- rep("A", 1e8) > system.time(object.size(x))user system elapsed 1.222 0.624 1.850> x <- rep(list(1), 1e8) > system.time(object.size(x))user system elapsed 1.247 0.022 1.273 The current help for object.size says Provides an estimate of the memory that is being used to store an R object. If this is interpreted as the current memory use, which could change in the ALTREP context (or for environments, though there the changes are ignored), then we could define object.size for ALTREP objects to avoid any ALTREP-specific computation. I'm not convinced yet that this is a good idea, but it even if we do change this at the R level, RStudio would still be well-advised to have another look at what they are doing. Best, luke On Tue, 15 Jan 2019, Travers Ching wrote:> > Below is a toy alt-rep string example, that generates N random strings: > > https://gist.github.com/traversc/a48a504eb062554f2d6ff8043ca16f9c > > example: > `x <- altrandomStrings(1e8)` > `head(x)` > [1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... > `object.size(1e8)` > > Object.size will call the `set_altstring_Elt_method` for every single > element, materializing (slowly) every element of the vector. This is > a problem mostly in R-studio since object.size is called > automatically, defeating the purpose of alt-rep entirely. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Travers Ching
2019-Jan-31 16:02 UTC
[Rd] Object.size() should not visit every element for alt-rep strings, or there should be an altstring_objectsize_method
Hi Lujke, Thanks for the response. But for some reason, this is a duplicate post I had sent WEEKS ago, but for some reason is only showing up now? I initially thought it was filtered out and detected as spam because of the github link, so I re-wrote the email (several times in fact), and you can see the other thread. Very weird. Also, the good people at rstudio seem to have fixed the issue! Thanks Travers On Thu, Jan 31, 2019 at 5:35 AM Tierney, Luke <luke-tierney at uiowa.edu> wrote:> > You should really take this up with RStudio. Calling object.size on > every top level assignment as they appear to do is a bad idea, even > without ALTREP. object.size is only a cheap operation for simple > atomic vectors. For anything with recursive sturcture it needs to walk > the object, so the effort is proprtional to object size: > > > x <- rep("A", 1e8) > > system.time(object.size(x)) > user system elapsed > 1.222 0.624 1.850 > > x <- rep(list(1), 1e8) > > system.time(object.size(x)) > user system elapsed > 1.247 0.022 1.273 > > The current help for object.size says > > Provides an estimate of the memory that is being used to store an > R object. > > If this is interpreted as the current memory use, which could change > in the ALTREP context (or for environments, though there the changes > are ignored), then we could define object.size for ALTREP objects to > avoid any ALTREP-specific computation. I'm not convinced yet that this > is a good idea, but it even if we do change this at the R level, > RStudio would still be well-advised to have another look at what they > are doing. > > Best, > > luke > > On Tue, 15 Jan 2019, Travers Ching wrote: > > > > > Below is a toy alt-rep string example, that generates N random strings: > > > > https://gist.github.com/traversc/a48a504eb062554f2d6ff8043ca16f9c > > > > example: > > `x <- altrandomStrings(1e8)` > > `head(x)` > > [1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... > > `object.size(1e8)` > > > > Object.size will call the `set_altstring_Elt_method` for every single > > element, materializing (slowly) every element of the vector. This is > > a problem mostly in R-studio since object.size is called > > automatically, defeating the purpose of alt-rep entirely. > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tierney at uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Possibly Parallel Threads
- Object.size() should not visit every element for alt-rep strings, or there should be an altstring_objectsize_method
- Objectsize function visiting every element for alt-rep strings
- Objectsize function visiting every element for alt-rep strings
- Objectsize function visiting every element for alt-rep strings
- Objectsize function visiting every element for alt-rep strings