Hi Tomas, On 3/27/20 07:01, Tomas Kalibera wrote:> they provide an over-approximationThey can also provide an "under-approximation" (to say the least) e.g. on reference objects where the entire substance of the object is ignored which makes object.size() completely meaningless in that case: setRefClass("A", fields=c(stuff="ANY")) object.size(new("A", stuff=raw(0))) # 680 bytes object.size(new("A", stuff=runif(1e8))) # 680 bytes Why wouldn't object.size() look at the content of environments? Thanks, H. -- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
On 3/27/20 4:39 PM, Herv? Pag?s wrote:> Hi Tomas, > > On 3/27/20 07:01, Tomas Kalibera wrote: >> they provide an over-approximation > > They can also provide an "under-approximation" (to say the least) e.g. > on reference objects where the entire substance of the object is > ignored which makes object.size() completely meaningless in that case: > > ? setRefClass("A", fields=c(stuff="ANY")) > ? object.size(new("A", stuff=raw(0)))????? # 680 bytes > ? object.size(new("A", stuff=runif(1e8)))? # 680 bytes > > Why wouldn't object.size() look at the content of environments?Yes, the treatment of environments is not "over-approximative". It has to be bounded somewhere, you can't traverse all captured environments, getting to say package namespaces, global environment, code of all functions, that would be too over-approximating. For environments used as hash maps that contain data, such as in reference classes, it would of course be much better to include them, but you can't differentiate programmatically. In principle the same environment can be used for both things, say a namespace environment can contain data (not clearly related to any user-level R object) as well as code. Not mentioning things like source references and parse data. Tomas> > Thanks, > H. >
On Fri, Mar 27, 2020 at 10:39 AM Herv? Pag?s <hpages at fredhutch.org> wrote:> Hi Tomas, > > On 3/27/20 07:01, Tomas Kalibera wrote: > > they provide an over-approximation > > They can also provide an "under-approximation" (to say the least) e.g. > on reference objects where the entire substance of the object is ignored > which makes object.size() completely meaningless in that case: > > setRefClass("A", fields=c(stuff="ANY")) > object.size(new("A", stuff=raw(0))) # 680 bytes > object.size(new("A", stuff=runif(1e8))) # 680 bytes > > Why wouldn't object.size() look at the content of environments? >As the author, I'm obviously biased, but I do like lobstr::obj_sizes() which allows you to see the additional size occupied by one object given any number of other objects. This is particularly important for reference classes since individual objects appear quite large: A <- setRefClass("A", fields=c(stuff="ANY")) lobstr::obj_size(new("A", stuff=raw(0))) #> 567,056 B But the vast majority is shared across all instances of that class: lobstr::obj_size(A) #> 719,232 B lobstr::obj_sizes(A, new("A", stuff=raw(0))) #> * 719,232 B #> * 720 B lobstr::obj_sizes(A, new("A", stuff=runif(1e8))) #> * 719,232 B #> * 800,000,720 B Hadley -- http://hadley.nz [[alternative HTML version deleted]]
On Fri, Mar 27, 2020 at 11:08 AM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:> On 3/27/20 4:39 PM, Herv? Pag?s wrote: > > Hi Tomas, > > > > On 3/27/20 07:01, Tomas Kalibera wrote: > >> they provide an over-approximation > > > > They can also provide an "under-approximation" (to say the least) e.g. > > on reference objects where the entire substance of the object is > > ignored which makes object.size() completely meaningless in that case: > > > > setRefClass("A", fields=c(stuff="ANY")) > > object.size(new("A", stuff=raw(0))) # 680 bytes > > object.size(new("A", stuff=runif(1e8))) # 680 bytes > > > > Why wouldn't object.size() look at the content of environments? > > Yes, the treatment of environments is not "over-approximative". It has > to be bounded somewhere, you can't traverse all captured environments, > getting to say package namespaces, global environment, code of all > functions, that would be too over-approximating. For environments used > as hash maps that contain data, such as in reference classes, it would > of course be much better to include them, but you can't differentiate > programmatically. In principle the same environment can be used for both > things, say a namespace environment can contain data (not clearly > related to any user-level R object) as well as code. Not mentioning > things like source references and parse data. > >I think the heuristic used in lobstr works well in practice: don't traverse further than the current environment (supplied as an argument so you can override), and don't ever traverse past the global or base environments. Hadley -- http://hadley.nz [[alternative HTML version deleted]]
On 3/27/20 12:00, Hadley Wickham wrote:> > > On Fri, Mar 27, 2020 at 10:39 AM Herv? Pag?s <hpages at fredhutch.org > <mailto:hpages at fredhutch.org>> wrote: > > Hi Tomas, > > On 3/27/20 07:01, Tomas Kalibera wrote: > > they provide an over-approximation > > They can also provide an "under-approximation" (to say the least) e.g. > on reference objects where the entire substance of the object is > ignored > which makes object.size() completely meaningless in that case: > > ? ?setRefClass("A", fields=c(stuff="ANY")) > ? ?object.size(new("A", stuff=raw(0)))? ? ? # 680 bytes > ? ?object.size(new("A", stuff=runif(1e8)))? # 680 bytes > > Why wouldn't object.size() look at the content of environments? > > > As the author, I'm obviously biased, but I do like lobstr::obj_sizes() > which allows you to see the additional size occupied by one object given > any number of other objects. This is particularly important for > reference classes since individual objects appear quite large: > > A <- setRefClass("A", fields=c(stuff="ANY")) > lobstr::obj_size(new("A", stuff=raw(0))) > #> 567,056 B > > But the vast majority is shared across all instances of that class: > > lobstr::obj_size(A) > #> 719,232 B > lobstr::obj_sizes(A, new("A", stuff=raw(0))) > #> * 719,232 B > #> * ? ? 720 B > lobstr::obj_sizes(A, new("A", stuff=runif(1e8))) > #> * ? ? 719,232 B > #> * 800,000,720 BNice. Can you clarify the situation with lobstr::obj_size vs pryr::object_size? I've heard of the latter before and use it sometimes but never heard of the former before seeing Stefan's post. Then I checked the authors of both and thought maybe they should talk to each other ;-) Thanks, H.> > Hadley > -- > http://hadley.nz > <https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MX7Olw-dGRDfJNWEqIDTTTkaagVswOEqcRnxuRBAdjw&s=haVkOV6bEj7VnjT4Gn4iXzRqO7IOqDZUZuEeFPSHQuM&e=>-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319