thr3ads.net - R devel - [Rd] Speeding up transpose [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Radford Neal

2010-Aug-26 02:50 UTC

[Rd] Speeding up transpose

I've looked at how to speed up the transpose function in R 
(ie, t(X)).

The existing code does the work with loops like the following:

	for (i = 0; i < len; i++)
	    REAL(r)[i] = REAL(a)[(i / ncol) + (i % ncol) * nrow];

It seems a bit optimistic to expect a compiler to produce good code 
from this.  I've re-written these loops as follows:

        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            REAL(r)[i] = REAL(a)[j];
        }

The resulting improvement is sometimes dramatic.  Here's a test 
program:

        M <- matrix(seq(0,1,12000),200,60)
        
        print(system.time({for (i in 1:10000) S <- t(M)}))
        print(system.time({for (i in 1:10000) R <- t(S)}))
        
        v <- seq(0,2,12000)
        
        print(system.time({for (i in 1:100000) u <- t(v)}))
        print(system.time({for (i in 1:100000) w <- t(u)}))

Here are the times on an Intel Linux system:

    R version 2.11.1:            Modified version:

         user  system elapsed          user  system elapsed 
        1.190   0.040   1.232         0.610   0.010   0.619 
         user  system elapsed          user  system elapsed 
        1.200   0.020   1.226         0.610   0.000   0.616 
         user  system elapsed          user  system elapsed 
        0.800   0.010   0.813         0.750   0.000   0.752 
         user  system elapsed          user  system elapsed 
        0.910   0.010   0.921         0.860   0.000   0.864 

Here are the times on a SPARC Solaris system:

    R version 2.11.1:            Modified version:

         user  system elapsed          user  system elapsed 
       18.643   0.041  18.685         2.994   0.039   3.033
         user  system elapsed          user  system elapsed 
       18.574   0.041  18.616         3.123   0.039   3.163
         user  system elapsed          user  system elapsed 
        3.803   0.271   4.075         3.868   0.296   4.163
         user  system elapsed          user  system elapsed 
        4.184   0.273   4.457         4.238   0.302   4.540

So with the modification, transpose for a 60x200 or 200x60 matrix is
about a factor of two faster on the Intel system, and a factor of six
faster on the SPARC system.  There is little or no gain from the
modification when transposing a row or column vector, however.  (I
think it must be that on these machines multiplies and divides do not
take constant time, but are faster when, for instance, dividing by 1.)

I've appended below the new version of the modified part of the
do_transpose function in src/main/array.c.

    Radford Neal

----------------------------------------------------------------------

    PROTECT(r = allocVector(TYPEOF(a), len));
    switch (TYPEOF(a)) {
    case LGLSXP:
    case INTSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            INTEGER(r)[i] = INTEGER(a)[j];
        }
    case REALSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            REAL(r)[i] = REAL(a)[j];
        }
        break;
    case CPLXSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            COMPLEX(r)[i] = COMPLEX(a)[j];
        }
        break;
    case STRSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            SET_STRING_ELT(r, i, STRING_ELT(a,j));
        }
        break;
    case VECSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            SET_VECTOR_ELT(r, i, VECTOR_ELT(a,j));
        }
        break;
    case RAWSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            RAW(r)[i] = RAW(a)[j];
        }
        break;
    default:
        UNPROTECT(1);
        goto not_matrix;
    }

Radford Neal

2010-Aug-26 17:41 UTC

head link

[Rd] Speeding up transpose

> I've appended below the new version of the modified part of the
> do_transpose function in src/main/array.c.
A quick correction...  A "break" went missing from the case INTSXP:
section of the code I posted.  Corrected version below.
 
     Radford Neal

----------------------------------------------------------------------

    PROTECT(r = allocVector(TYPEOF(a), len));
    switch (TYPEOF(a)) {
    case LGLSXP:
    case INTSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            INTEGER(r)[i] = INTEGER(a)[j];
        }
        break;
    case REALSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            REAL(r)[i] = REAL(a)[j];
        }
        break;
    case CPLXSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            COMPLEX(r)[i] = COMPLEX(a)[j];
        }
        break;
    case STRSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            SET_STRING_ELT(r, i, STRING_ELT(a,j));
        }
        break;
    case VECSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            SET_VECTOR_ELT(r, i, VECTOR_ELT(a,j));
        }
        break;
    case RAWSXP:
        for (i = 0, j = 0; i<len; i += 1, j += nrow) {
            if (j>=len) j -= (len-1);
            RAW(r)[i] = RAW(a)[j];
        }
        break;
    default:
        UNPROTECT(1);
        goto not_matrix;
    }

Hervé Pagès

2010-Aug-28 00:01 UTC

head link

[Rd] Speeding up transpose

Hi Radford,

On 08/25/2010 07:50 PM, Radford Neal wrote:> I've looked at how to speed up the transpose function in R
> (ie, t(X)).
>
> The existing code does the work with loops like the following:
>
> 	for (i = 0; i<  len; i++)
> 	    REAL(r)[i] = REAL(a)[(i / ncol) + (i % ncol) * nrow];
>
> It seems a bit optimistic to expect a compiler to produce good code
> from this.  I've re-written these loops as follows:
>
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              REAL(r)[i] = REAL(a)[j];
>          }
You can even avoid the j>=len test within the loop by using 2 nested
loops:

            dest = REAL(r);
            src = REAL(a);
            // walk the dest col by col (i.e. linearly)
            // walk the src row by row (i.e. not linearly)
            for (i = 0; i < nrow; i++, src++)
                for (j = 0; j < len; j += nrow, dest++)
                    *dest = src[j];

1 test, 3 additions (src[j] being equiv to *(src + j)), 1 assignment
in the inner loop. I wonder if it's possible to get rid of one of the
additions.

This gives me another 10% gain in speed (on my 64-bit Ubuntu laptop),
not that much, but still...

BTW isn't that surprising that using div() is actually slower than using
of / and %, at least on Linux? Since div() is kind of the atomic version
of it, I would have expected it to be slightly faster :-/

Cheers,
H.
>
> The resulting improvement is sometimes dramatic.  Here's a test
> program:
>
>          M<- matrix(seq(0,1,12000),200,60)
>
>          print(system.time({for (i in 1:10000) S<- t(M)}))
>          print(system.time({for (i in 1:10000) R<- t(S)}))
>
>          v<- seq(0,2,12000)
>
>          print(system.time({for (i in 1:100000) u<- t(v)}))
>          print(system.time({for (i in 1:100000) w<- t(u)}))
>
> Here are the times on an Intel Linux system:
>
>      R version 2.11.1:            Modified version:
>
>           user  system elapsed          user  system elapsed
>          1.190   0.040   1.232         0.610   0.010   0.619
>           user  system elapsed          user  system elapsed
>          1.200   0.020   1.226         0.610   0.000   0.616
>           user  system elapsed          user  system elapsed
>          0.800   0.010   0.813         0.750   0.000   0.752
>           user  system elapsed          user  system elapsed
>          0.910   0.010   0.921         0.860   0.000   0.864
>
> Here are the times on a SPARC Solaris system:
>
>      R version 2.11.1:            Modified version:
>
>           user  system elapsed          user  system elapsed
>         18.643   0.041  18.685         2.994   0.039   3.033
>           user  system elapsed          user  system elapsed
>         18.574   0.041  18.616         3.123   0.039   3.163
>           user  system elapsed          user  system elapsed
>          3.803   0.271   4.075         3.868   0.296   4.163
>           user  system elapsed          user  system elapsed
>          4.184   0.273   4.457         4.238   0.302   4.540
>
> So with the modification, transpose for a 60x200 or 200x60 matrix is
> about a factor of two faster on the Intel system, and a factor of six
> faster on the SPARC system.  There is little or no gain from the
> modification when transposing a row or column vector, however.  (I
> think it must be that on these machines multiplies and divides do not
> take constant time, but are faster when, for instance, dividing by 1.)
>
> I've appended below the new version of the modified part of the
> do_transpose function in src/main/array.c.
>
>      Radford Neal
>
> ----------------------------------------------------------------------
>
>      PROTECT(r = allocVector(TYPEOF(a), len));
>      switch (TYPEOF(a)) {
>      case LGLSXP:
>      case INTSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              INTEGER(r)[i] = INTEGER(a)[j];
>          }
>      case REALSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              REAL(r)[i] = REAL(a)[j];
>          }
>          break;
>      case CPLXSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              COMPLEX(r)[i] = COMPLEX(a)[j];
>          }
>          break;
>      case STRSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              SET_STRING_ELT(r, i, STRING_ELT(a,j));
>          }
>          break;
>      case VECSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              SET_VECTOR_ELT(r, i, VECTOR_ELT(a,j));
>          }
>          break;
>      case RAWSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              RAW(r)[i] = RAW(a)[j];
>          }
>          break;
>      default:
>          UNPROTECT(1);
>          goto not_matrix;
>      }
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

Possibly Parallel Threads

Search for more seemingly similar threads

R devel - Aug 2010 - Speeding up transpose

[Rd] Speeding up transpose

[Rd] Speeding up transpose

[Rd] Speeding up transpose

Possibly Parallel Threads