Hi Radford,
On 08/25/2010 07:50 PM, Radford Neal wrote:> I've looked at how to speed up the transpose function in R
> (ie, t(X)).
>
> The existing code does the work with loops like the following:
>
> for (i = 0; i< len; i++)
> REAL(r)[i] = REAL(a)[(i / ncol) + (i % ncol) * nrow];
>
> It seems a bit optimistic to expect a compiler to produce good code
> from this. I've re-written these loops as follows:
>
> for (i = 0, j = 0; i<len; i += 1, j += nrow) {
> if (j>=len) j -= (len-1);
> REAL(r)[i] = REAL(a)[j];
> }
You can even avoid the j>=len test within the loop by using 2 nested
loops:
dest = REAL(r);
src = REAL(a);
// walk the dest col by col (i.e. linearly)
// walk the src row by row (i.e. not linearly)
for (i = 0; i < nrow; i++, src++)
for (j = 0; j < len; j += nrow, dest++)
*dest = src[j];
1 test, 3 additions (src[j] being equiv to *(src + j)), 1 assignment
in the inner loop. I wonder if it's possible to get rid of one of the
additions.
This gives me another 10% gain in speed (on my 64-bit Ubuntu laptop),
not that much, but still...
BTW isn't that surprising that using div() is actually slower than using
of / and %, at least on Linux? Since div() is kind of the atomic version
of it, I would have expected it to be slightly faster :-/
Cheers,
H.
>
> The resulting improvement is sometimes dramatic. Here's a test
> program:
>
> M<- matrix(seq(0,1,12000),200,60)
>
> print(system.time({for (i in 1:10000) S<- t(M)}))
> print(system.time({for (i in 1:10000) R<- t(S)}))
>
> v<- seq(0,2,12000)
>
> print(system.time({for (i in 1:100000) u<- t(v)}))
> print(system.time({for (i in 1:100000) w<- t(u)}))
>
> Here are the times on an Intel Linux system:
>
> R version 2.11.1: Modified version:
>
> user system elapsed user system elapsed
> 1.190 0.040 1.232 0.610 0.010 0.619
> user system elapsed user system elapsed
> 1.200 0.020 1.226 0.610 0.000 0.616
> user system elapsed user system elapsed
> 0.800 0.010 0.813 0.750 0.000 0.752
> user system elapsed user system elapsed
> 0.910 0.010 0.921 0.860 0.000 0.864
>
> Here are the times on a SPARC Solaris system:
>
> R version 2.11.1: Modified version:
>
> user system elapsed user system elapsed
> 18.643 0.041 18.685 2.994 0.039 3.033
> user system elapsed user system elapsed
> 18.574 0.041 18.616 3.123 0.039 3.163
> user system elapsed user system elapsed
> 3.803 0.271 4.075 3.868 0.296 4.163
> user system elapsed user system elapsed
> 4.184 0.273 4.457 4.238 0.302 4.540
>
> So with the modification, transpose for a 60x200 or 200x60 matrix is
> about a factor of two faster on the Intel system, and a factor of six
> faster on the SPARC system. There is little or no gain from the
> modification when transposing a row or column vector, however. (I
> think it must be that on these machines multiplies and divides do not
> take constant time, but are faster when, for instance, dividing by 1.)
>
> I've appended below the new version of the modified part of the
> do_transpose function in src/main/array.c.
>
> Radford Neal
>
> ----------------------------------------------------------------------
>
> PROTECT(r = allocVector(TYPEOF(a), len));
> switch (TYPEOF(a)) {
> case LGLSXP:
> case INTSXP:
> for (i = 0, j = 0; i<len; i += 1, j += nrow) {
> if (j>=len) j -= (len-1);
> INTEGER(r)[i] = INTEGER(a)[j];
> }
> case REALSXP:
> for (i = 0, j = 0; i<len; i += 1, j += nrow) {
> if (j>=len) j -= (len-1);
> REAL(r)[i] = REAL(a)[j];
> }
> break;
> case CPLXSXP:
> for (i = 0, j = 0; i<len; i += 1, j += nrow) {
> if (j>=len) j -= (len-1);
> COMPLEX(r)[i] = COMPLEX(a)[j];
> }
> break;
> case STRSXP:
> for (i = 0, j = 0; i<len; i += 1, j += nrow) {
> if (j>=len) j -= (len-1);
> SET_STRING_ELT(r, i, STRING_ELT(a,j));
> }
> break;
> case VECSXP:
> for (i = 0, j = 0; i<len; i += 1, j += nrow) {
> if (j>=len) j -= (len-1);
> SET_VECTOR_ELT(r, i, VECTOR_ELT(a,j));
> }
> break;
> case RAWSXP:
> for (i = 0, j = 0; i<len; i += 1, j += nrow) {
> if (j>=len) j -= (len-1);
> RAW(r)[i] = RAW(a)[j];
> }
> break;
> default:
> UNPROTECT(1);
> goto not_matrix;
> }
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319