Dear all,
I have to handle a large matrix (1000 x 10001) where in the
last column i have a value that all the preceding values in the same row
has to be compared to.
I have made the following code :
# generate a (1000 x 10001) matrix, testm
# generate statistics matrix 1000 x 4:
qnt <- c(0.01, 0.05)
cmp_fun <- function(x)
{
LAST <- length(x)
smpls <- x[1:(LAST-1)]
real <- x[LAST]
ret <- vector(length=length(qnt)*2)
for (i in 1:length(qnt))
{
q_i <- quantile(smpls, qnt[i]) # the quantile i
m_i <- mean(smpls[smpls<q_i ] ) # mean of obs less than q_i
ret[i] <- ifelse(real < q_i, 1, 0)
ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0)
}
ret
}
hcvx <- apply(testm, 1, cmp_fun)
The code is functioning well, but seems to take forever to calculate
the statistics matrix. As I have to repeat this snippet 2000 times, I
have a problem. Can anyone advise as to how I can optimize the runtime
of this problem? Should i drop the apply function altogether and just
loop through the rows with a for loop? Does anyone know of matrix
functions I can use to do the same operations I use within the cmp_fun
function to avoid this looping?
All suggestions are welcome! I have little experience optimizing code
in R, so I am quite stumped at the moment.
Cheers,
Monty
Is testm really of class "matrix"? If its a "data.frame" then manipulation of matrices is often faster. On 5/8/06, Monty B. <montezumasrevenge at gmail.com> wrote:> Dear all, > > I have to handle a large matrix (1000 x 10001) where in the > last column i have a value that all the preceding values in the same row > has to be compared to. > > I have made the following code : > > # generate a (1000 x 10001) matrix, testm > # generate statistics matrix 1000 x 4: > > qnt <- c(0.01, 0.05) > cmp_fun <- function(x) > { > LAST <- length(x) > smpls <- x[1:(LAST-1)] > real <- x[LAST] > > ret <- vector(length=length(qnt)*2) > for (i in 1:length(qnt)) > { > q_i <- quantile(smpls, qnt[i]) # the quantile i > m_i <- mean(smpls[smpls<q_i ] ) # mean of obs less than q_i > ret[i] <- ifelse(real < q_i, 1, 0) > ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0) > } > ret > } > hcvx <- apply(testm, 1, cmp_fun) > > The code is functioning well, but seems to take forever to calculate > the statistics matrix. As I have to repeat this snippet 2000 times, I > have a problem. Can anyone advise as to how I can optimize the runtime > of this problem? Should i drop the apply function altogether and just > loop through the rows with a for loop? Does anyone know of matrix > functions I can use to do the same operations I use within the cmp_fun > function to avoid this looping? > > All suggestions are welcome! I have little experience optimizing code > in R, so I am quite stumped at the moment. > > Cheers, > > Monty
[Monty B. ]>I have to handle a large matrix (1000 x 10001) where in the >last column i have a value that all the preceding values in the same row >has to be compared to. I have made the following code :># generate a (1000 x 10001) matrix, testm ># generate statistics matrix 1000 x 4:>qnt <- c(0.01, 0.05) >cmp_fun <- function(x) >{ > LAST <- length(x) > smpls <- x[1:(LAST-1)] > real <- x[LAST]> ret <- vector(length=length(qnt)*2) > for (i in 1:length(qnt)) > { > q_i <- quantile(smpls, qnt[i]) # the quantile i > m_i <- mean(smpls[smpls<q_i ] ) # mean of obs less than q_i > ret[i] <- ifelse(real < q_i, 1, 0) > ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0) > } > ret >} >hcvx <- apply(testm, 1, cmp_fun)>Can anyone advise as to how I can optimize the runtime of this problem? >All suggestions are welcome!You may speed it up a bit, not so much, with the following: stats.testm <- function (testm, qnt=c(0.01, 0.05)) { quants <- apply(testm[, 1:(ncol(testm)-1)], 1, quantile, qnt) smpls <- testm[rep(1:nrow(testm), each=length(qnt)), 1:(ncol(testm)-1)] reals <- testm[rep(1:nrow(testm), each=length(qnt)), ncol(testm)] keeps <- smpls < rep(quants, ncol(smpls)) means <- rowSums(smpls * keeps) / rowSums(keeps) matrix(rbind((reals < quants) + 0, (reals < quants) * (reals - means)), length(qnt) * 2) } Try it with something like: gen.testm <- function (n, m) { matrix(sample(0:99, n * (m + 1), TRUE), n) } testm <- gen.testm(100, 100) stats.testm(testm) Without checking, I would suspect that quantile is the big consumer. If you could make it without quantile interpolation, maybe some more vectorisation could be possible, but in any case, I do not think you can avoid sorting each row separately, in one way or another (currently done within quantile). -- Fran?ois Pinard http://pinard.progiciels-bpi.ca