Terry,
On Oct 3, 2011, at 10:32 AM, Terry Therneau wrote:
> I'm looking at memory efficiency for some of the survival code. The
> following fragment appears in coxph.fit
> coxfit <- .C("coxfit2", iter=as.integer(maxiter),
> as.integer(n),
> as.integer(nvar), stime,
> sstat,
> x= x[sorted,] ,
> ...
>
> Does this make a second copy of x to pass to the routine (my
> expectation) or will I end up with 3: x and x[sorted,] in the local
> frame of reference, and another due to dup=TRUE?
>
I'm not sure I'm counting your copies right, but I'd say the latter
(although the sorting cannot be technically called a copy ;)).
There are 4 distinct, separate objects:
x -> x[sorted,] -> double-array to pass to C -> result vector
If you care about speed, you should definitely use .Call().
Note for debugging: tracemem is actually smart and flags the intermediate memory
object created inside .C for passing as a proper duplication even though it is
not a real one (no duplicate() involved) since the object is not an R object at
all. It then also flags the allocation of the result object as a duplication
from the intermediate object, so in summary tracemem gives you the true number
of copies.
As far as I remember .C is a legacy left-over from the ancient Fortran interface
in original S (it's not really a C interface at all - it is a Fortran
interface that happens to not care about source language and C can be used to
create Fortran-looking object code) so unless one needs Fortran, one should not
be using .C ;). It can be used, but should not be used for anything but maybe
didactic purposes IMHO.
Cheers,
Simon