thr3ads.net - R devel - [Rd] number of copies [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Terry Therneau

2011-Oct-03 14:32 UTC

[Rd] number of copies

I'm looking at memory efficiency for some of the survival code.  The
following fragment appears in coxph.fit
    coxfit <- .C("coxfit2", iter=as.integer(maxiter),
                   as.integer(n),
                   as.integer(nvar), stime,
                   sstat,
                   x= x[sorted,] ,
	      ...

Does this make a second copy of x to pass to the routine (my
expectation) or will I end up with 3: x and x[sorted,] in the local
frame of reference, and another due to dup=TRUE?

Terry T.

Simon Urbanek

2011-Oct-03 15:38 UTC

head link

[Rd] number of copies

Terry,

On Oct 3, 2011, at 10:32 AM, Terry Therneau wrote:
> I'm looking at memory efficiency for some of the survival code.  The
> following fragment appears in coxph.fit
>    coxfit <- .C("coxfit2", iter=as.integer(maxiter),
>                   as.integer(n),
>                   as.integer(nvar), stime,
>                   sstat,
>                   x= x[sorted,] ,
> 	      ...
> 
> Does this make a second copy of x to pass to the routine (my
> expectation) or will I end up with 3: x and x[sorted,] in the local
> frame of reference, and another due to dup=TRUE?
> 
I'm not sure I'm counting your copies right, but I'd say the latter
(although the sorting cannot be technically called a copy ;)).
There are 4 distinct, separate objects:
x -> x[sorted,] -> double-array to pass to C -> result vector
If you care about speed, you should definitely use .Call().

Note for debugging: tracemem is actually smart and flags the intermediate memory
object created inside .C for passing as a proper duplication even though it is
not a real one (no duplicate() involved) since the object is not an R object at
all. It then also flags the allocation of the result object as a duplication
from the intermediate object, so in summary tracemem gives you the true number
of copies.

As far as I remember .C is a legacy left-over from the ancient Fortran interface
in original S (it's not really a C interface at all - it is a Fortran
interface that happens to not care about source language and C can be used to
create Fortran-looking object code) so unless one needs Fortran, one should not
be using .C ;). It can be used, but should not be used for anything but maybe
didactic purposes IMHO.

Cheers,
Simon

Terry Therneau

2011-Oct-03 18:43 UTC

head link

[Rd] number of copies

On Mon, 2011-10-03 at 12:31 -0400, Simon Urbanek wrote:> > Thanks.  I was hoping that x[,sorted] would act like
"double(n)"
> does in a .C call, and have no extra copies made since it has no local
> assignment.
> 
> Yes it does act the same way, you get an extra copy with double(n) as
> well - there is no difference.
> 
That is surprising.  This is not true of Splus.  Since Chambers mentions
it as a specific case as well (Programming with Data, p421) I assumed
that R would be at least as intellegent.  He also used the unset()
function to declare that something could be treated like double(n),
i.e., need no further copies. But that always looked like a dangerous
assertion to me and unset has disappeared, perhaps deservedly, from R.

Terry T.

Seemingly Similar Threads

Search for more possibly parallel threads

R devel - Oct 2011 - number of copies

[Rd] number of copies

[Rd] number of copies

[Rd] number of copies

Seemingly Similar Threads