David Khabie-Zeitoune
2003-Jul-23 16:17 UTC
[R] Passing references to data objects into R functions
Hi. I have the following question about reading from large data objects from within R functions; I have tried to simplify my problem as much as possible in what follows. Imagine I have various large data objects sitting in my global environment (call them "data1", "data2", ...). I want to write a function "extract" that extracts some of the rows of a particular data object, does some further manipulations on the extract and then returns the result. The function takes the data object's name and an index vector -- for example the following call would return the first 3 rows of object data1. ans = extract("data1", 1:3) I could write a simple function like this: extract1 = function(object.name, index) { temp = get(object.name, envir = .GlobalEnv) temp = temp[index, , drop=FALSE] # do some further manipulations here .... return(temp) } The problem is that the function makes a copy "temp" of the object in the function frame, which (in my application) is very memory inefficient as the data objects are very large. It is especially inefficient when the length of the "index" vector is much smaller than the number of rows in the data object. What I really would like to do is to be able to read from the underlying data object directly (in other programming languages this would be achieved by passing a pointer to the object instead), without making a copy. Given the rules of variable name scoping in R, I could avoid making a copy with the following call: extract2 = function(object.name, index) { eval(parse(text = "temp = ", object.name, "[index, , drop=FALSE]", sep="")) # do some further manipulations here .... return(temp) } But this seems very messy. Is there a better way? Thanks for your help David Khabie-Zeitoune
Henrik Bengtsson
2003-Jul-24 10:21 UTC
[R] Passing references to data objects into R functions
One way is to use an object-oriented design and wrap up the reference functionality in a common superclass. At http://www.maths.lth.se/help/R/ImplementingReferences/ I have got some discussions which are in line what you are trying to achieve and that you might be able to adopt. Also, note that passing huge objects as arguments to functions is NOT expensive (considering memory or time) in R if they are used for read-only purposes. It only becomes expensive if you assign a new value to the argument. In such cases R *has to* copy the whole object to make sure you only modify a local instance of the object. Thus, objects can be though of being passed by reference to functions as long as they are not modified, if modified they are passed by value. This is intentional as R is a (one-threaded) functional language. Best wishes Henrik Bengtsson Lund University> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David > Khabie-Zeitoune > Sent: den 23 juli 2003 18:18 > To: r-help at r-project.org > Subject: [R] Passing references to data objects into R functions > > > Hi. > > I have the following question about reading from large data > objects from within R functions; I have tried to simplify my > problem as much as possible in what follows. > > Imagine I have various large data objects sitting in my > global environment (call them "data1", "data2", ...). I want > to write a function "extract" that extracts some of the rows > of a particular data object, does some further manipulations > on the extract and then returns the result. The function > takes the data object's name and an index vector -- for > example the following call would return the first 3 rows of > object data1. > > ans = extract("data1", 1:3) > > I could write a simple function like this: > > extract1 = function(object.name, index) { > > temp = get(object.name, envir = .GlobalEnv) > temp = temp[index, , drop=FALSE] > > # do some further manipulations here .... > > return(temp) > > } > > The problem is that the function makes a copy "temp" of the > object in the function frame, which (in my application) is > very memory inefficient as the data objects are very large. > It is especially inefficient when the length of the "index" > vector is much smaller than the number of rows in the data > object. What I really would like to do is to be able to read > from the underlying data object directly (in other > programming languages this would be achieved by passing a > pointer to the object instead), without making a copy. > > Given the rules of variable name scoping in R, I could avoid > making a copy with the following call: > > extract2 = function(object.name, index) { > > eval(parse(text = "temp = ", object.name, "[index, , drop=FALSE]", > sep="")) > # do some further manipulations here .... > > return(temp) > } > > But this seems very messy. Is there a better way? > > Thanks for your help > > David Khabie-Zeitoune > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help