Vaidotas Zemlys
2007-Jan-31 17:03 UTC
[R] features of save and save.image (unexpected file sizes)
Hi, Today I came upon unexpected R behaviour. I did some modelling and the result was R object, about 28MB size (nested list, with matrixes as list elements). When I was saving the session with save.image, the resulting .RData file was 300MB. There were no other large objects:> sum(sapply(ls(),function(x)eval(parse(text=paste("object.size(",x,")",sep=""))))/1024^2)[1] 30.10540 The interesting thing, then I removed the large object with rm, save.image again produced .RData file with 300MB size. Only after rm(list=ls()) I got normal sized .RData file. I used dump for dumping my object, the resulting dump file was 72 MB in size. So I assume that R was saving some large object which was not visible to me directly, using ls(). Is there a way to find such objects, and discard them before saving? I use R 2.4.1 on Ubuntu 6.06, through Emacs 23.0 and ESS 5.3.1. Vaidotas Zemlys -- Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php Vilnius University
Professor Brian Ripley
2007-Jan-31 18:06 UTC
[R] features of save and save.image (unexpected file sizes)
Two comments: 1) ls() does not list all the objects: it has all.names argument. 2) save.image() does not just save the objects in the workspace, it also saves any environments they may have. Having a function with a large environment is the usual cause of a large saved image. (And finally, a compressed binary representation from save.image is nor comparable sizewise with an ASCII version from dump.) Vaidotas Zemlys wrote:> Hi, > > Today I came upon unexpected R behaviour. I did some modelling and the > result was R object, about 28MB size (nested list, with matrixes as > list elements). When I was saving the session with save.image, the > resulting .RData file was 300MB. There were no other large objects: > >> sum(sapply(ls(),function(x)eval(parse(text=paste("object.size(",x,")",sep=""))))/1024^2) > [1] 30.10540 > > The interesting thing, then I removed the large object with rm, > save.image again produced .RData file with 300MB size. Only after > rm(list=ls()) I got normal sized .RData file. I used dump for dumping > my object, the resulting dump file was 72 MB in size. So I assume that > R was saving some large object which was not visible to me directly, > using ls(). Is there a way to find such objects, and discard them > before saving? I use R 2.4.1 on Ubuntu 6.06, through Emacs 23.0 and > ESS 5.3.1. > > Vaidotas Zemlys > -- > Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php > Vilnius University-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Vaidotas Zemlys
2007-Feb-01 07:43 UTC
[R] features of save and save.image (unexpected file sizes)
Hi, On 1/31/07, Professor Brian Ripley <ripley at stats.ox.ac.uk> wrote:> Two comments: > > 1) ls() does not list all the objects: it has all.names argument. >Yes, I tried it with all.names, but the effect was the same, I forgot to mention it in a letter.> 2) save.image() does not just save the objects in the workspace, it also > saves any environments they may have. Having a function with a > large environment is the usual cause of a large saved image.I have little experience dealing with environments, so is there a quick way to discard the environments of the functions? When saving the session I really do not need them.> > (And finally, a compressed binary representation from save.image is > nor comparable sizewise with an ASCII version from dump.) >I know, but thus I found out that I am saving something besides my large object. Thanks very much for your answer! Vaidotas Zemlys -- Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php Vilnius University
Vaidotas Zemlys
2007-Feb-01 13:30 UTC
[R] features of save and save.image (unexpected file sizes)
Hi, On 2/1/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:> On Thu, 1 Feb 2007, Vaidotas Zemlys wrote: > > > Hi, > > > > On 1/31/07, Professor Brian Ripley <ripley at stats.ox.ac.uk> wrote: > >> Two comments: > >> > >> 1) ls() does not list all the objects: it has all.names argument. > >> > > Yes, I tried it with all.names, but the effect was the same, I forgot > > to mention it in a letter. > > > >> 2) save.image() does not just save the objects in the workspace, it also > >> saves any environments they may have. Having a function with a > >> large environment is the usual cause of a large saved image. > > > > I have little experience dealing with enivronments, so is there a > > quick way to discard the environments of the functions? When saving > > the session I really do not need them. > > Change, not discard. E.g. environment(f) <- .GlobalEnv. If environments > are not mentioned by anything saved, they will not be saved. >I found the culprit. I was parsing formulas in my code, and I saved them in that large object. So the environment came with saved formulas. Is there a nice way to say R: "please do not save the environments with the formulas, I do not need them?" This is what I was doing (I am discarding irrelevant code) testf<- function(formula) { mainform <- formula if(deparse(mainform[[3]][[1]])!="|") pandterm("invalid conditioning for main regression") mmodel <- substitute(y~x,list(y=mainform[[2]],x=mainform[[3]][[2]])) mmodel <- as.formula(mmodel) list(formula=list(main=mmodel)) } when called bu <- testf(lnp~I(CE/12000)+hhs|Country) I get ls(env=environment(bu$formula$main)) [1] "formula" "mainform" "mmodel" or in actual case, a lot of more objects, which I do not need, but which take a lot of place. For the moment I solved the problem with environment(mmodel) <- NULL but is this correct R way? Vaidotas Zemlys -- Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php Vilnius University
Prof Brian Ripley
2007-Feb-02 09:12 UTC
[R] features of save and save.image (unexpected file sizes)
On Thu, 1 Feb 2007, Vaidotas Zemlys wrote:> Hi, > > > On 2/1/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote: >> On Thu, 1 Feb 2007, Vaidotas Zemlys wrote: >> >>> Hi, >>> >>> On 1/31/07, Professor Brian Ripley <ripley at stats.ox.ac.uk> wrote: >>>> Two comments: >>>> >>>> 1) ls() does not list all the objects: it has all.names argument. >>>> >>> Yes, I tried it with all.names, but the effect was the same, I forgot >>> to mention it in a letter. >>> >>>> 2) save.image() does not just save the objects in the workspace, it also >>>> saves any environments they may have. Having a function with a >>>> large environment is the usual cause of a large saved image. >>> >>> I have little experience dealing with enivronments, so is there a >>> quick way to discard the environments of the functions? When saving >>> the session I really do not need them. >> >> Change, not discard. E.g. environment(f) <- .GlobalEnv. If environments >> are not mentioned by anything saved, they will not be saved. >> > > I found the culprit. I was parsing formulas in my code, and I saved > them in that large object. So the environment came with saved > formulas. Is there a nice way to say R: "please do not save the > environments with the formulas, I do not need them?"No, but why create them that way? You could do mmodel <- as.formula(mmodel, env=.GlobalEnv) The R way is to create what you want, not fix up afterwards. (I find your code unreadable--spaces help a great deal, so am not sure if I have understood it correctly.)> > This is what I was doing (I am discarding irrelevant code) > > testf<- function(formula) { > mainform <- formula > if(deparse(mainform[[3]][[1]])!="|") pandterm("invalid conditioning > for main regression") > mmodel <- substitute(y~x,list(y=mainform[[2]],x=mainform[[3]][[2]])) > mmodel <- as.formula(mmodel) > list(formula=list(main=mmodel)) > } > > when called > bu <- testf(lnp~I(CE/12000)+hhs|Country) > > I get > > ls(env=environment(bu$formula$main)) > [1] "formula" "mainform" "mmodel" > > or in actual case, a lot of more objects, which I do not need, but > which take a lot of place. For the moment I solved the problem with > > environment(mmodel) <- NULL > > but is this correct R way? > > Vaidotas Zemlys > -- > Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php > Vilnius University > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Vaidotas Zemlys
2007-Feb-02 12:33 UTC
[R] features of save and save.image (unexpected file sizes)
Hi, On 2/2/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:> > I found the culprit. I was parsing formulas in my code, and I saved > > them in that large object. So the environment came with saved > > formulas. Is there a nice way to say R: "please do not save the > > environments with the formulas, I do not need them?" > > No, but why create them that way? You could do > > mmodel <- as.formula(mmodel, env=.GlobalEnv) >Hm, but say I have some large object in .GlobalEnv, and I generate mmodel 10 different times and save the result as a list with length 10. Now if I try to save this list, R will save 10 different copies of .GlobalEnv together with aforementioned large object?> The R way is to create what you want, not fix up afterwards. > > (I find your code unreadable--spaces help a great deal, so am not sure if > I have understood it correctly.) >Hm, I copied this code directly from Emacs+ESS, maybe the mailer mangled something. What I want to do with this piece of code (I will repaste it here) testf<- function(formula) { mainform <- formula if(deparse(mainform[[3]][[1]])!="|") stop("invalid conditioning") mmodel <- substitute(y~x,list(y=mainform[[2]],x=mainform[[3]][[2]])) mmodel <- as.formula(mmodel) list(formula=list(main=mmodel)) } is to read formula with condition: formula(y~x|z) and construct formula formula(y~x) I looked for examples in code of coplot in library graphics and latticeParseFormula in library lattice. Vaidotas Zemlys -- Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php Vilnius University
Prof Brian Ripley
2007-Feb-02 13:14 UTC
[R] features of save and save.image (unexpected file sizes)
On Fri, 2 Feb 2007, Vaidotas Zemlys wrote:> Hi, > > On 2/2/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote: > >> > I found the culprit. I was parsing formulas in my code, and I saved >> > them in that large object. So the environment came with saved >> > formulas. Is there a nice way to say R: "please do not save the >> > environments with the formulas, I do not need them?" >> >> No, but why create them that way? You could do >> >> mmodel <- as.formula(mmodel, env=.GlobalEnv) >> > Hm, but say I have some large object in .GlobalEnv, and I generate > mmodel 10 different times and save the result as a list with length > 10. Now if I try to save this list, R will save 10 different copies of > .GlobalEnv together with aforementioned large object?No, it saves the environment (here .GlobalEnv), not objects, and there can be many shared references.>> The R way is to create what you want, not fix up afterwards. >> >> (I find your code unreadable--spaces help a great deal, so am not sure if >> I have understood it correctly.) >> > Hm, I copied this code directly from Emacs+ESS, maybe the mailer > mangled something. What I want to do with this piece of code (I will > repaste it here) > > testf<- function(formula) { > mainform <- formula > if(deparse(mainform[[3]][[1]])!="|") stop("invalid conditioning") > mmodel <- substitute(y~x,list(y=mainform[[2]],x=mainform[[3]][[2]])) > mmodel <- as.formula(mmodel) > list(formula=list(main=mmodel)) > }You use no spaces around your operators or after commas. R does when deparsing:> testffunction (formula) { mainform <- formula if (deparse(mainform[[3]][[1]]) != "|") stop("invalid conditioning") mmodel <- substitute(y ~ x, list(y = mainform[[2]], x = mainform[[3]][[2]])) mmodel <- as.formula(mmodel) list(formula = list(main = mmodel)) } because it is (at least to old hands) much easier to read. IcanreadEnglishtextwithoutanyspacesbutIchoosenotto.Similarly,Rcode.Occasional spacesare evenharderto parse.> is to read formula with condition: > > formula(y~x|z) > > and construct formula > > formula(y~x) > > I looked for examples in code of coplot in library graphics and > latticeParseFormula in library lattice. > > Vaidotas Zemlys > -- > Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php > Vilnius University >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Luke Tierney
2007-Feb-02 18:48 UTC
[R] features of save and save.image (unexpected file sizes)
On Fri, 2 Feb 2007, Prof Brian Ripley wrote:> On Fri, 2 Feb 2007, Vaidotas Zemlys wrote: > >> Hi, >> >> On 2/2/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote: >> >>>> I found the culprit. I was parsing formulas in my code, and I saved >>>> them in that large object. So the environment came with saved >>>> formulas. Is there a nice way to say R: "please do not save the >>>> environments with the formulas, I do not need them?" >>> >>> No, but why create them that way? You could do >>> >>> mmodel <- as.formula(mmodel, env=.GlobalEnv) >>> >> Hm, but say I have some large object in .GlobalEnv, and I generate >> mmodel 10 different times and save the result as a list with length >> 10. Now if I try to save this list, R will save 10 different copies of >> .GlobalEnv together with aforementioned large object? > > No, it saves the environment (here .GlobalEnv), not objects, and there can > be many shared references.Just to amplify this point: Only a marker representing .GlobalEnv is saved; on load into a new session that marker becomes the .GlobalEnv of the new session. Best, luke> >>> The R way is to create what you want, not fix up afterwards. >>> >>> (I find your code unreadable--spaces help a great deal, so am not sure if >>> I have understood it correctly.) >>> >> Hm, I copied this code directly from Emacs+ESS, maybe the mailer >> mangled something. What I want to do with this piece of code (I will >> repaste it here) >> >> testf<- function(formula) { >> mainform <- formula >> if(deparse(mainform[[3]][[1]])!="|") stop("invalid conditioning") >> mmodel <- substitute(y~x,list(y=mainform[[2]],x=mainform[[3]][[2]])) >> mmodel <- as.formula(mmodel) >> list(formula=list(main=mmodel)) >> } > > You use no spaces around your operators or after commas. R does when > deparsing: > >> testf > function (formula) > { > mainform <- formula > if (deparse(mainform[[3]][[1]]) != "|") > stop("invalid conditioning") > mmodel <- substitute(y ~ x, list(y = mainform[[2]], x = mainform[[3]][[2]])) > mmodel <- as.formula(mmodel) > list(formula = list(main = mmodel)) > } > > because it is (at least to old hands) much easier to read. > > IcanreadEnglishtextwithoutanyspacesbutIchoosenotto.Similarly,Rcode.Occasional > spacesare evenharderto parse. > >> is to read formula with condition: >> >> formula(y~x|z) >> >> and construct formula >> >> formula(y~x) >> >> I looked for examples in code of coplot in library graphics and >> latticeParseFormula in library lattice. >> >> Vaidotas Zemlys >> -- >> Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php >> Vilnius University >> > >-- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke at stat.uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu