This example has demoed the similar or same characteristics of my question. If I > save(formula, file = "abc.RData") and then in a new launched R session, I > load("abc.RData") > formula x ~ y <environment: 0x00000000171e4be8> I want to know what are stored in the <environment: 0x00000000171e4be8>, and how to access it, or how to save the object without the environment. Best, Jinsong On 2021/10/21 4:06, Henrik Bengtsson wrote:> Example illustrating what Duncan says: > >> make_formula <- function() { large <- rnorm(1e6); x ~ y } >> formula <- make_formula() > > # "Apparent" size of object >> object.size(formula) > 728 bytes > > # Actual serialization size >> length(serialize(formula, connection = NULL)) > [1] 8000203 > > # A better size estimate >> lobstr::obj_size(formula) > 8,000,888 B > > /Henrik > > On Wed, Oct 20, 2021 at 12:57 PM Duncan Murdoch > <murdoch.duncan at gmail.com> wrote: >> >> On 20/10/2021 9:20 a.m., Jinsong Zhao wrote: >>> On 2021/10/20 21:05, Duncan Murdoch wrote: >>>> On 20/10/2021 8:57 a.m., Jinsong Zhao wrote: >>>>> Hi there, >>>>> >>>>> I have a RData file that is obtained by save.image() with size about >>>>> 74.0 MB (77,608,222 bytes). >>>>> >>>>> When load into R, I measured the size of each object with object.size(): >>>>> >>>>>> object.size(combn.rda.m) >>>>> 105448 bytes >>>>>> object.size(cross) >>>>> 102064 bytes >>>>>> object.size(denitr.1) >>>>> 25032 bytes >>>>>> object.size(rda.denitr.1) >>>>> 600280 bytes >>>>>> object.size(xh) >>>>> 7792 bytes >>>>>> object.size(xh.x) >>>>> 6064 bytes >>>>>> object.size(xh.x.1) >>>>> 24144 bytes >>>>>> object.size(xh.x.2) >>>>> 24144 bytes >>>>>> object.size(xh.x.3) >>>>> 24144 bytes >>>>>> object.size(xh.y) >>>>> 2384 bytes >>>>> >>>>> There are all small objects. >>>>> >>>>> If I delete the largest one "rda.denitr.1", and save.image("xx.RData"). >>>>> It has the size of 22.6 KB (23,244 bytes). All seem OK. >>>>> >>>>> However, when I save(rda.denitr.1, file = "yy.RData"), then it has the >>>>> size of 73.9 MB (77,574,869 bytes). >>>>> >>>>> I don't know why... >>>>> >>>>> Any hint? >>>> >>>> As the docs for object.size() say, "Exactly which parts of the memory >>>> allocation should be attributed to which object is not clear-cut." In >>>> particular, if a function or formula has an associated environment, it >>>> isn't included, but it is sometimes saved in the image. >>>> >>>> So I'd suspect rda.denitr.1 contains something that references an >>>> environment, and it's an environment that would be saved. (I forget the >>>> exact rules, but I think that means it's not the global environment and >>>> it's not a package environment.) >>>> >>>> Duncan Murdoch >>> >>> >>> The rda.denitr.1 is only a list with length 2: >>> rda.denitr.1[[1]] is a vector with length 10; >>> rda.denitr.2[[2]] is a list with the length 10. rda.denitr.1[[2]][[1]] >>> to rda.denitr.1[[2]][[10]] are small RDA objects generated by rda() from >>> vegan package. >>> >>> If I >>> > a <- rda.denitr.1[[2]][[1]] >>> > object.size(a) >>> 59896 bytes >>> > save(a, file = "abc.RData") >>> It also has a large size of 73.9 MB (77,536,611 bytes) >>> >>> Jinsong >>> >> >> The rda() function uses formulas. If it saves the formula in the >> result, then it references the environment of that formula, typically >> the environment where the formula was created. >> >> Duncan Murdoch >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
That depends what was in the active environment when you created that formula. You would probably benefit from reading https://adv-r.hadley.nz/environments.html about now, though you are about to enter a complex interaction between functions, formulas and environments. A rational option is consider not saving this object to a file at all, but instead to extract what value you need from it now and save that. On October 20, 2021 11:09:10 PM PDT, Jinsong Zhao <jszhao at yeah.net> wrote:>This example has demoed the similar or same characteristics of my question. > >If I > > save(formula, file = "abc.RData") >and then in a new launched R session, I > > load("abc.RData") > > formula >x ~ y ><environment: 0x00000000171e4be8> > >I want to know what are stored in the <environment: 0x00000000171e4be8>, >and how to access it, or how to save the object without the environment. > >Best, >Jinsong > >On 2021/10/21 4:06, Henrik Bengtsson wrote: >> Example illustrating what Duncan says: >> >>> make_formula <- function() { large <- rnorm(1e6); x ~ y } >>> formula <- make_formula() >> >> # "Apparent" size of object >>> object.size(formula) >> 728 bytes >> >> # Actual serialization size >>> length(serialize(formula, connection = NULL)) >> [1] 8000203 >> >> # A better size estimate >>> lobstr::obj_size(formula) >> 8,000,888 B >> >> /Henrik >> >> On Wed, Oct 20, 2021 at 12:57 PM Duncan Murdoch >> <murdoch.duncan at gmail.com> wrote: >>> >>> On 20/10/2021 9:20 a.m., Jinsong Zhao wrote: >>>> On 2021/10/20 21:05, Duncan Murdoch wrote: >>>>> On 20/10/2021 8:57 a.m., Jinsong Zhao wrote: >>>>>> Hi there, >>>>>> >>>>>> I have a RData file that is obtained by save.image() with size about >>>>>> 74.0 MB (77,608,222 bytes). >>>>>> >>>>>> When load into R, I measured the size of each object with object.size(): >>>>>> >>>>>>> object.size(combn.rda.m) >>>>>> 105448 bytes >>>>>>> object.size(cross) >>>>>> 102064 bytes >>>>>>> object.size(denitr.1) >>>>>> 25032 bytes >>>>>>> object.size(rda.denitr.1) >>>>>> 600280 bytes >>>>>>> object.size(xh) >>>>>> 7792 bytes >>>>>>> object.size(xh.x) >>>>>> 6064 bytes >>>>>>> object.size(xh.x.1) >>>>>> 24144 bytes >>>>>>> object.size(xh.x.2) >>>>>> 24144 bytes >>>>>>> object.size(xh.x.3) >>>>>> 24144 bytes >>>>>>> object.size(xh.y) >>>>>> 2384 bytes >>>>>> >>>>>> There are all small objects. >>>>>> >>>>>> If I delete the largest one "rda.denitr.1", and save.image("xx.RData"). >>>>>> It has the size of 22.6 KB (23,244 bytes). All seem OK. >>>>>> >>>>>> However, when I save(rda.denitr.1, file = "yy.RData"), then it has the >>>>>> size of 73.9 MB (77,574,869 bytes). >>>>>> >>>>>> I don't know why... >>>>>> >>>>>> Any hint? >>>>> >>>>> As the docs for object.size() say, "Exactly which parts of the memory >>>>> allocation should be attributed to which object is not clear-cut." In >>>>> particular, if a function or formula has an associated environment, it >>>>> isn't included, but it is sometimes saved in the image. >>>>> >>>>> So I'd suspect rda.denitr.1 contains something that references an >>>>> environment, and it's an environment that would be saved. (I forget the >>>>> exact rules, but I think that means it's not the global environment and >>>>> it's not a package environment.) >>>>> >>>>> Duncan Murdoch >>>> >>>> >>>> The rda.denitr.1 is only a list with length 2: >>>> rda.denitr.1[[1]] is a vector with length 10; >>>> rda.denitr.2[[2]] is a list with the length 10. rda.denitr.1[[2]][[1]] >>>> to rda.denitr.1[[2]][[10]] are small RDA objects generated by rda() from >>>> vegan package. >>>> >>>> If I >>>> > a <- rda.denitr.1[[2]][[1]] >>>> > object.size(a) >>>> 59896 bytes >>>> > save(a, file = "abc.RData") >>>> It also has a large size of 73.9 MB (77,536,611 bytes) >>>> >>>> Jinsong >>>> >>> >>> The rda() function uses formulas. If it saves the formula in the >>> result, then it references the environment of that formula, typically >>> the environment where the formula was created. >>> >>> Duncan Murdoch >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
On 21/10/2021 2:09 a.m., Jinsong Zhao wrote:> This example has demoed the similar or same characteristics of my question. > > If I > > save(formula, file = "abc.RData") > and then in a new launched R session, I > > load("abc.RData") > > formula > x ~ y > <environment: 0x00000000171e4be8> > > I want to know what are stored in the <environment: 0x00000000171e4be8>, > and how to access it, or how to save the object without the environment.Using Henrik's example, the environment would contain all the local variables of the make_formula call. In his case, that's just the "large" variable, but in real examples, it can be quite a few things. To access it, you can do e <- environment(formula) ls(e) # shows just "large" e$large # extracts that value It is possible to save the formula without the environment, but you should *never* do that. That changes the meaning of the formula and is almost certain to lead to bugs in the future. For example, consider this slightly more complicated example like Henrik's: make_formula <- function() { x <- rnorm(100) y <- rnorm(100) x ~ y } formula <- make_formula() lm(formula) #> #> Call: #> lm(formula = formula) #> #> Coefficients: #> (Intercept) y #> -0.1584 -0.0805 Here the lm() function finds the variables used in the formula in the formula's attached environment. You'd get a completely different answer (probably wrong) if you removed the environment. In your real example where the save files are too big, the solution is to find where those RDA objects were created, and make sure there are no unused local variables at the time you return the result. Any local variable that's mentioned in the formula should be kept, but other variables that may have been used to construct them can be removed, e.g. make_formula <- function() { # Create a local variable large <- rnorm(100000) # Use it to create variables in the formula x <- large + 1 y <- large + rnorm(100000) # Remove the temporary one rm(large) # Return the formula x ~ y } Duncan Murdoch> > Best, > Jinsong > > On 2021/10/21 4:06, Henrik Bengtsson wrote: >> Example illustrating what Duncan says: >> >>> make_formula <- function() { large <- rnorm(1e6); x ~ y } >>> formula <- make_formula() >> >> # "Apparent" size of object >>> object.size(formula) >> 728 bytes >> >> # Actual serialization size >>> length(serialize(formula, connection = NULL)) >> [1] 8000203 >> >> # A better size estimate >>> lobstr::obj_size(formula) >> 8,000,888 B >> >> /Henrik >> >> On Wed, Oct 20, 2021 at 12:57 PM Duncan Murdoch >> <murdoch.duncan at gmail.com> wrote: >>> >>> On 20/10/2021 9:20 a.m., Jinsong Zhao wrote: >>>> On 2021/10/20 21:05, Duncan Murdoch wrote: >>>>> On 20/10/2021 8:57 a.m., Jinsong Zhao wrote: >>>>>> Hi there, >>>>>> >>>>>> I have a RData file that is obtained by save.image() with size about >>>>>> 74.0 MB (77,608,222 bytes). >>>>>> >>>>>> When load into R, I measured the size of each object with object.size(): >>>>>> >>>>>>> object.size(combn.rda.m) >>>>>> 105448 bytes >>>>>>> object.size(cross) >>>>>> 102064 bytes >>>>>>> object.size(denitr.1) >>>>>> 25032 bytes >>>>>>> object.size(rda.denitr.1) >>>>>> 600280 bytes >>>>>>> object.size(xh) >>>>>> 7792 bytes >>>>>>> object.size(xh.x) >>>>>> 6064 bytes >>>>>>> object.size(xh.x.1) >>>>>> 24144 bytes >>>>>>> object.size(xh.x.2) >>>>>> 24144 bytes >>>>>>> object.size(xh.x.3) >>>>>> 24144 bytes >>>>>>> object.size(xh.y) >>>>>> 2384 bytes >>>>>> >>>>>> There are all small objects. >>>>>> >>>>>> If I delete the largest one "rda.denitr.1", and save.image("xx.RData"). >>>>>> It has the size of 22.6 KB (23,244 bytes). All seem OK. >>>>>> >>>>>> However, when I save(rda.denitr.1, file = "yy.RData"), then it has the >>>>>> size of 73.9 MB (77,574,869 bytes). >>>>>> >>>>>> I don't know why... >>>>>> >>>>>> Any hint? >>>>> >>>>> As the docs for object.size() say, "Exactly which parts of the memory >>>>> allocation should be attributed to which object is not clear-cut." In >>>>> particular, if a function or formula has an associated environment, it >>>>> isn't included, but it is sometimes saved in the image. >>>>> >>>>> So I'd suspect rda.denitr.1 contains something that references an >>>>> environment, and it's an environment that would be saved. (I forget the >>>>> exact rules, but I think that means it's not the global environment and >>>>> it's not a package environment.) >>>>> >>>>> Duncan Murdoch >>>> >>>> >>>> The rda.denitr.1 is only a list with length 2: >>>> rda.denitr.1[[1]] is a vector with length 10; >>>> rda.denitr.2[[2]] is a list with the length 10. rda.denitr.1[[2]][[1]] >>>> to rda.denitr.1[[2]][[10]] are small RDA objects generated by rda() from >>>> vegan package. >>>> >>>> If I >>>> > a <- rda.denitr.1[[2]][[1]] >>>> > object.size(a) >>>> 59896 bytes >>>> > save(a, file = "abc.RData") >>>> It also has a large size of 73.9 MB (77,536,611 bytes) >>>> >>>> Jinsong >>>> >>> >>> The rda() function uses formulas. If it saves the formula in the >>> result, then it references the environment of that formula, typically >>> the environment where the formula was created. >>> >>> Duncan Murdoch >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >