thr3ads.net - R help - [R] On the speed of apply and alternatives? [May 2006]

If this information is useful, please help other people find it:
Share via:

Monty B.

2006-May-08 21:45 UTC

[R] On the speed of apply and alternatives?

Dear all,

I have to handle a large matrix (1000 x 10001) where in the
last column i have a value that all the preceding values in the same row
has to be compared to.

I have made the following code :

# generate a (1000 x 10001) matrix, testm
# generate statistics matrix 1000 x 4:

qnt <- c(0.01, 0.05)
cmp_fun  <- function(x)
{
  LAST <- length(x)
  smpls <- x[1:(LAST-1)]
  real  <- x[LAST]

  ret <- vector(length=length(qnt)*2)
  for (i in 1:length(qnt))
  {
    q_i  <- quantile(smpls, qnt[i])            # the quantile i
    m_i <- mean(smpls[smpls<q_i ] )     # mean of obs less than q_i
    ret[i] <- ifelse(real < q_i, 1, 0)
    ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0)
  }
  ret
}
hcvx  <- apply(testm, 1, cmp_fun)

The code is functioning well, but seems to take forever to calculate
the statistics matrix. As I have to repeat this snippet 2000 times, I
have a problem. Can anyone advise as to how I can optimize the runtime
of this problem? Should i drop the apply function altogether and just
loop through the rows with a for loop? Does anyone know of matrix
functions I can use to do the same operations I use within the cmp_fun
function to avoid this looping?

All suggestions are welcome! I have little experience optimizing code
in R, so I am quite stumped at the moment.

Cheers,

Monty

Gabor Grothendieck

2006-May-09 01:33 UTC

head link

[R] On the speed of apply and alternatives?

Is testm really of class "matrix"?  If its a "data.frame"
then manipulation
of matrices is often faster.

On 5/8/06, Monty B. <montezumasrevenge at gmail.com>
wrote:> Dear all,
>
> I have to handle a large matrix (1000 x 10001) where in the
> last column i have a value that all the preceding values in the same row
> has to be compared to.
>
> I have made the following code :
>
> # generate a (1000 x 10001) matrix, testm
> # generate statistics matrix 1000 x 4:
>
> qnt <- c(0.01, 0.05)
> cmp_fun  <- function(x)
> {
>  LAST <- length(x)
>  smpls <- x[1:(LAST-1)]
>  real  <- x[LAST]
>
>  ret <- vector(length=length(qnt)*2)
>  for (i in 1:length(qnt))
>  {
>    q_i  <- quantile(smpls, qnt[i])            # the quantile i
>    m_i <- mean(smpls[smpls<q_i ] )     # mean of obs less than q_i
>    ret[i] <- ifelse(real < q_i, 1, 0)
>    ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0)
>  }
>  ret
> }
> hcvx  <- apply(testm, 1, cmp_fun)
>
> The code is functioning well, but seems to take forever to calculate
> the statistics matrix. As I have to repeat this snippet 2000 times, I
> have a problem. Can anyone advise as to how I can optimize the runtime
> of this problem? Should i drop the apply function altogether and just
> loop through the rows with a for loop? Does anyone know of matrix
> functions I can use to do the same operations I use within the cmp_fun
> function to avoid this looping?
>
> All suggestions are welcome! I have little experience optimizing code
> in R, so I am quite stumped at the moment.
>
> Cheers,
>
> Monty

François Pinard

2006-May-09 11:51 UTC

head link

[R] On the speed of apply and alternatives?

[Monty B. ]
>I have to handle a large matrix (1000 x 10001) where in the
>last column i have a value that all the preceding values in the same row
>has to be compared to.  I have made the following code :
># generate a (1000 x 10001) matrix, testm
># generate statistics matrix 1000 x 4:
>qnt <- c(0.01, 0.05)
>cmp_fun  <- function(x)
>{
>  LAST <- length(x)
>  smpls <- x[1:(LAST-1)]
>  real  <- x[LAST]
>  ret <- vector(length=length(qnt)*2)
>  for (i in 1:length(qnt))
>  {
>    q_i  <- quantile(smpls, qnt[i])            # the quantile i
>    m_i <- mean(smpls[smpls<q_i ] )     # mean of obs less than q_i
>    ret[i] <- ifelse(real < q_i, 1, 0)
>    ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0)
>  }
>  ret
>}
>hcvx  <- apply(testm, 1, cmp_fun)
>Can anyone advise as to how I can optimize the runtime of this problem?  
>All suggestions are welcome!
You may speed it up a bit, not so much, with the following:

stats.testm <- function (testm, qnt=c(0.01, 0.05)) {
    quants <- apply(testm[, 1:(ncol(testm)-1)], 1, quantile, qnt)
    smpls <- testm[rep(1:nrow(testm), each=length(qnt)), 1:(ncol(testm)-1)]
    reals <- testm[rep(1:nrow(testm), each=length(qnt)), ncol(testm)]
    keeps <- smpls < rep(quants, ncol(smpls))
    means <- rowSums(smpls * keeps) / rowSums(keeps)
    matrix(rbind((reals < quants) + 0,
                 (reals < quants) * (reals - means)),
           length(qnt) * 2)
}

Try it with something like:

gen.testm <- function (n, m) {
    matrix(sample(0:99, n * (m + 1), TRUE), n)
}

testm <- gen.testm(100, 100)
stats.testm(testm)

Without checking, I would suspect that quantile is the big consumer.
If you could make it without quantile interpolation, maybe some more
vectorisation could be possible, but in any case, I do not think you can 
avoid sorting each row separately, in one way or another (currently done 
within quantile).

-- 
Fran?ois Pinard   http://pinard.progiciels-bpi.ca

Reasonably Related Threads

Search for more possibly parallel threads

R help - May 2006 - On the speed of apply and alternatives?

[R] On the speed of apply and alternatives?

[R] On the speed of apply and alternatives?

[R] On the speed of apply and alternatives?

Reasonably Related Threads