thr3ads.net - R devel - [Rd] storage.mode, C data types and speed [Oct 2005]

If this information is useful, please help other people find it:
Share via:

Heather Turner

2005-Oct-03 15:00 UTC

[Rd] storage.mode, C data types and speed

Hi,

I am trying to speed up part of an algorithm in which certain columns of a large
matrix (X) are replaced by the element-wise product of a matrix (M) and a vector
(v). In R, the code might be

X[, ind] <- M * v

I have written a small C routine to do this for me, but the timing depends on
how I define the storage.mode of the objects in R and the data types in C, in a
way which I don't understand.

To illustrate, suppose I have the following for X, M and v:
nr <- 10000
X <- matrix(1, nr, 500)
M <- matrix(1:(nr*450), nr, 450)
v <- 1:nr
storage.mode(X) <- storage.mode(M) <- storage.mode(v) <-
"double"

Then I have the following integers required by my C code - these are the objects
which I'm not sure how to handle
a <- 50*nr            #index of X indicating where to start replacing values
nm <- length(M)
nv <- length(v)

I would have thought I wanted

storage.mode(a) <- storage.mode(nm) <- storage.mode(nv) <-
"integer"

to go with the following C code

# include <Rinternals.h>

SEXP prod_integer(SEXP X, SEXP M, SEXP v, SEXP a, SEXP nm, SEXP nv) {
  int i = INTEGER(a)[0], i1 = 0, i2 = 0;

  for ( ; i1 < INTEGER(nm)[0]; i2 = (++i2 == INTEGER(nv)[0]) ? 0 : i2) { 
    REAL(X)[i++] = REAL(M)[i1++] * REAL(v)[i2];
  }
  return(X);
}

Running this is R gives the following timings on my PC
> dyn.load("D:/C_routines/prod_integer")
> for(i in 1:3) {print(system.time(.Call("prod_integer", X, M, v,
a, nm, nv)))}[1] 0.17 0.00 0.18   NA   NA
[1] 0.18 0.00 0.17   NA   NA
[1] 0.15 0.00 0.17   NA   NA

But strangely (to me) if I change the storage mode of my integers to
"double", I get
> storage.mode(a) <- storage.mode(nm) <- storage.mode(nv) <-
"double"
> for(i in 1:3) {print(system.time(.Call("prod_integer", X, M, v,
a, nm, nv)))}[1]  0  0  0 NA NA
[1]  0  0  0 NA NA
[1]  0  0  0 NA NA

If on the other hand I use REAL instead of INTEGER in my C code:

# include <Rinternals.h>

SEXP prod_real(SEXP X, SEXP M, SEXP v, SEXP a, SEXP nm, SEXP nv) {
  int i = REAL(a)[0], i1 = 0, i2 = 0;

  for ( ; i1 < REAL(nm)[0]; i2 = (++i2 == REAL(nv)[0]) ? 0 : i2) { 
    REAL(X)[i++] = REAL(M)[i1++] * REAL(v)[i2];
  }
  return(X);
}

The reverse is true:> storage.mode(a) <- storage.mode(nm) <- storage.mode(nv) <-
"integer"
> for(i in 1:3) {print(system.time(.Call("prod_real", X, M, v, a,
nm, nv)))}[1]  0  0  0 NA NA
[1]  0  0  0 NA NA
[1]  0  0  0 NA NA> storage.mode(a) <- storage.mode(nm) <- storage.mode(nv) <-
"double"
> for(i in 1:3) {print(system.time(.Call("prod_real", X, M, v, a,
nm, nv)))}[1] 0.22 0.00 0.22   NA   NA
[1] 0.21 0.00 0.20   NA   NA
[1] 0.21 0.00 0.22   NA   NA> identical(.Call("prod_integer", X, M, v, a, nm, nv),
.Call("prod_real", X, M, v, a, nm, nv))[1] TRUE

So I seem to get the fastest results if I store a, nm and nv as doubles in R and
treat them as integers in C or if I store them as integers in R and treat them
as doubles in C, rather than matching the storage in R to the data type in C.

I must be misunderstanding something here. Can someone explain what's going
on - please note I have only just begun to learn C, apologies if this is a basic
C issue.

Thanks,

Heather

Dr H Turner
Research Assistant
Dept. of Statistics
The University of Warwick
Coventry
CV4 7AL

Tel: 024 76575870
Url: www.warwick.ac.uk/go/heatherturner

Prof Brian Ripley

2005-Oct-03 15:13 UTC

head link

[Rd] storage.mode, C data types and speed

If nm is double, very likely INTEGER(nm)[0] is negative and your C code 
does nothing at all.  Hence it could be fast but useless.

On Mon, 3 Oct 2005, Heather Turner wrote:
> Hi,
>
> I am trying to speed up part of an algorithm in which certain columns of 
> a large matrix (X) are replaced by the element-wise product of a matrix 
> (M) and a vector (v). In R, the code might be
>
> X[, ind] <- M * v
>
> I have written a small C routine to do this for me, but the timing 
> depends on how I define the storage.mode of the objects in R and the 
> data types in C, in a way which I don't understand.
>
> To illustrate, suppose I have the following for X, M and v:
> nr <- 10000
> X <- matrix(1, nr, 500)
> M <- matrix(1:(nr*450), nr, 450)
> v <- 1:nr
> storage.mode(X) <- storage.mode(M) <- storage.mode(v) <-
"double"
>
> Then I have the following integers required by my C code - these are the 
> objects which I'm not sure how to handle
> a <- 50*nr            #index of X indicating where to start replacing
values
> nm <- length(M)
> nv <- length(v)
>
> I would have thought I wanted
>
> storage.mode(a) <- storage.mode(nm) <- storage.mode(nv) <-
"integer"
>
> to go with the following C code
>
> # include <Rinternals.h>
>
> SEXP prod_integer(SEXP X, SEXP M, SEXP v, SEXP a, SEXP nm, SEXP nv) {
>  int i = INTEGER(a)[0], i1 = 0, i2 = 0;
>
>  for ( ; i1 < INTEGER(nm)[0]; i2 = (++i2 == INTEGER(nv)[0]) ? 0 : i2) {
>    REAL(X)[i++] = REAL(M)[i1++] * REAL(v)[i2];
>  }
>  return(X);
> }
>
> Running this is R gives the following timings on my PC
>
>> dyn.load("D:/C_routines/prod_integer")
>> for(i in 1:3) {print(system.time(.Call("prod_integer", X, M,
v, a, nm, nv)))}
> [1] 0.17 0.00 0.18   NA   NA
> [1] 0.18 0.00 0.17   NA   NA
> [1] 0.15 0.00 0.17   NA   NA
>
> But strangely (to me) if I change the storage mode of my integers to
"double", I get
>
>> storage.mode(a) <- storage.mode(nm) <- storage.mode(nv) <-
"double"
>> for(i in 1:3) {print(system.time(.Call("prod_integer", X, M,
v, a, nm, nv)))}
> [1]  0  0  0 NA NA
> [1]  0  0  0 NA NA
> [1]  0  0  0 NA NA
>
> If on the other hand I use REAL instead of INTEGER in my C code:
>
> # include <Rinternals.h>
>
> SEXP prod_real(SEXP X, SEXP M, SEXP v, SEXP a, SEXP nm, SEXP nv) {
>  int i = REAL(a)[0], i1 = 0, i2 = 0;
>
>  for ( ; i1 < REAL(nm)[0]; i2 = (++i2 == REAL(nv)[0]) ? 0 : i2) {
>    REAL(X)[i++] = REAL(M)[i1++] * REAL(v)[i2];
>  }
>  return(X);
> }
>
> The reverse is true:
>> storage.mode(a) <- storage.mode(nm) <- storage.mode(nv) <-
"integer"
>> for(i in 1:3) {print(system.time(.Call("prod_real", X, M, v,
a, nm, nv)))}
> [1]  0  0  0 NA NA
> [1]  0  0  0 NA NA
> [1]  0  0  0 NA NA
>> storage.mode(a) <- storage.mode(nm) <- storage.mode(nv) <-
"double"
>> for(i in 1:3) {print(system.time(.Call("prod_real", X, M, v,
a, nm, nv)))}
> [1] 0.22 0.00 0.22   NA   NA
> [1] 0.21 0.00 0.20   NA   NA
> [1] 0.21 0.00 0.22   NA   NA
>> identical(.Call("prod_integer", X, M, v, a, nm, nv),
.Call("prod_real", X, M, v, a, nm, nv))
> [1] TRUE
>
> So I seem to get the fastest results if I store a, nm and nv as doubles in
R and treat them as integers in C or if I store them as integers in R and treat
them as doubles in C, rather than matching the storage in R to the data type in
C.
>
> I must be misunderstanding something here. Can someone explain what's
going on - please note I have only just begun to learn C, apologies if this is a
basic C issue.
>
> Thanks,
>
> Heather
>
> Dr H Turner
> Research Assistant
> Dept. of Statistics
> The University of Warwick
> Coventry
> CV4 7AL
>
> Tel: 024 76575870
> Url: www.warwick.ac.uk/go/heatherturner
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Apparently Analagous Threads

Search for more maybe matching threads

R devel - Oct 2005 - storage.mode, C data types and speed

[Rd] storage.mode, C data types and speed

[Rd] storage.mode, C data types and speed

Apparently Analagous Threads