thr3ads.net - R help - [R] Passing references to data objects into R functions [Jul 2003]

If this information is useful, please help other people find it:
Share via:

David Khabie-Zeitoune

2003-Jul-23 16:17 UTC

[R] Passing references to data objects into R functions

Hi. 

I have the following question about reading from large data objects from
within R functions; I have tried to simplify my problem as much as
possible in what follows.

Imagine I have various large data objects sitting in my global
environment (call them "data1", "data2", ...).  I want to
write a
function "extract" that extracts some of the rows of a particular data
object, does some further manipulations on the extract and then returns
the result. The function takes the data object's name and an index
vector -- for example the following call would return the first 3 rows
of object data1. 

ans = extract("data1", 1:3)

I could write a simple function like this:

extract1 = function(object.name, index) {

    temp = get(object.name, envir = .GlobalEnv)
    temp = temp[index, , drop=FALSE]

    # do some further manipulations here ....

    return(temp)

}

The problem is that the function makes a copy "temp" of the object in
the function frame, which (in my application) is very memory inefficient
as the data objects are very large. It is especially inefficient when
the length of the "index" vector is much smaller than the number of
rows
in the data object. What I really would like to do is to be able to read
from the underlying data object directly (in other programming languages
this would be achieved by passing a pointer to the object instead),
without making a copy.

Given the rules of variable name scoping in R, I could avoid making a
copy with the following call:

extract2 = function(object.name, index) {

    eval(parse(text = "temp = ", object.name, "[index, ,
drop=FALSE]",
sep=""))
    # do some further manipulations here ....

    return(temp)
}

But this seems very messy. Is there a better way?

Thanks for your help

David Khabie-Zeitoune

Henrik Bengtsson

2003-Jul-24 10:21 UTC

head link

[R] Passing references to data objects into R functions

One way is to use an object-oriented design and wrap up the reference
functionality in a common superclass. At
http://www.maths.lth.se/help/R/ImplementingReferences/ I have got some
discussions which are in line what you are trying to achieve and that
you might be able to adopt.

Also, note that passing huge objects as arguments to functions is NOT
expensive (considering memory or time) in R if they are used for
read-only purposes. It only becomes expensive if you assign a new value
to the argument. In such cases R *has to* copy the whole object to make
sure you only modify a local instance of the object. Thus, objects can
be though of being passed by reference to functions as long as they are
not modified, if modified they are passed by value. This is intentional
as R is a (one-threaded) functional language. 

Best wishes

Henrik Bengtsson
Lund University
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David 
> Khabie-Zeitoune
> Sent: den 23 juli 2003 18:18
> To: r-help at r-project.org
> Subject: [R] Passing references to data objects into R functions
> 
> 
> Hi. 
> 
> I have the following question about reading from large data 
> objects from within R functions; I have tried to simplify my 
> problem as much as possible in what follows.
> 
> Imagine I have various large data objects sitting in my 
> global environment (call them "data1", "data2", ...). 
I want
> to write a function "extract" that extracts some of the rows 
> of a particular data object, does some further manipulations 
> on the extract and then returns the result. The function 
> takes the data object's name and an index vector -- for 
> example the following call would return the first 3 rows of 
> object data1. 
> 
> ans = extract("data1", 1:3)
> 
> I could write a simple function like this:
> 
> extract1 = function(object.name, index) {
> 
>     temp = get(object.name, envir = .GlobalEnv)
>     temp = temp[index, , drop=FALSE]
> 
>     # do some further manipulations here ....
> 
>     return(temp)
> 
> }
> 
> The problem is that the function makes a copy "temp" of the 
> object in the function frame, which (in my application) is 
> very memory inefficient as the data objects are very large. 
> It is especially inefficient when the length of the "index" 
> vector is much smaller than the number of rows in the data 
> object. What I really would like to do is to be able to read 
> from the underlying data object directly (in other 
> programming languages this would be achieved by passing a 
> pointer to the object instead), without making a copy.
> 
> Given the rules of variable name scoping in R, I could avoid 
> making a copy with the following call:
> 
> extract2 = function(object.name, index) {
> 
>     eval(parse(text = "temp = ", object.name, "[index, ,
drop=FALSE]",
> sep=""))
>     # do some further manipulations here ....
> 
>     return(temp)
> }
> 
> But this seems very messy. Is there a better way?
> 
> Thanks for your help
> 
> David Khabie-Zeitoune
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help

Reasonably Related Threads

Search for more reasonably related threads

R help - Jul 2003 - Passing references to data objects into R functions

[R] Passing references to data objects into R functions

[R] Passing references to data objects into R functions

Reasonably Related Threads