I understand R is a "Pass-By-Value" language. I have a few practical questions, however. I'm dealing with a "large" dataset (~1GB) and so my understanding of the nuances of memory usage in R is becoming important. In an example such as:> d <- read.csv("file.csv"); > n <- apply(d, 1, sum);must "d" be copied to another location in memory in order to be used by apply? In general, is copying only done when a variable is updated within a function? Would the following example be any different in terms of memory usage?> d <- read.csv("file.csv"); > n <- apply(d[,2:10], 1, sum);or can R reference the original "d" object since no changes to the object are being made? I'm familiar with FF and BigMemory, but are there any packages/tricks which allow for passing such objects by reference without having to code in C? Regards, Jeff Allen
On 19/08/2010 12:57 PM, lists at jdadesign.net wrote:> I understand R is a "Pass-By-Value" language. I have a few practical > questions, however. > > I'm dealing with a "large" dataset (~1GB) and so my understanding of the > nuances of memory usage in R is becoming important. > > In an example such as: > > d <- read.csv("file.csv"); > > n <- apply(d, 1, sum); > must "d" be copied to another location in memory in order to be used by > apply? In general, is copying only done when a variable is updated within > a function? >Generally R only copies when the variable is modified, but its rules for detecting this are sometimes overly conservative, so you may get some unnecessary copying. For example, d[1,1] <- 3 will probably not make a full copy of d when the internal version of "[<-" is used, but if you have an R-level version, it probably will. I forget whether the dataframe method is internal or R level. In the apply(d, 1, sum) example, it would probably make a copy of each row to pass to sum, but never a copy of the whole dataframe/array.> Would the following example be any different in terms of memory usage? > > d <- read.csv("file.csv"); > > n <- apply(d[,2:10], 1, sum); > or can R reference the original "d" object since no changes to the object > are being made? >This would make a new object containing d[,2:10], and would pass that to apply.> I'm familiar with FF and BigMemory, but are there any packages/tricks > which allow for passing such objects by reference without having to code > in C? >Duncan Murdoch
Jeff, R has 'environments' as a general mechanism to pass around objects by reference. However, that does not help with most functions like 'apply' which take arguments other than environments.> I'm familiar with FF and BigMemory, but are there any packages/tricks> which allow for passing such objects by reference without having to code> in C?With ff (and I assume with bigmemory as well) you can pass around objects by reference without C-coding.To be more precise with regard to ff: atomic ff objects have 'hybrid copying semantics', which means that two references to an ff object will share the data and SOME features (like the 'length') while OTHER features (like 'dim') are copied on modify (see 'vt' for an powerful application of this concept). You might want to have a look at 'ffapply' and friends and at 'chunk'. HTH Jens Oehlschl?gel