Dear all, I have to handle a large matrix (1000 x 10001) where in the last column i have a value that all the preceding values in the same row has to be compared to. I have made the following code : # generate a (1000 x 10001) matrix, testm # generate statistics matrix 1000 x 4: qnt <- c(0.01, 0.05) cmp_fun <- function(x) { LAST <- length(x) smpls <- x[1:(LAST-1)] real <- x[LAST] ret <- vector(length=length(qnt)*2) for (i in 1:length(qnt)) { q_i <- quantile(smpls, qnt[i]) # the quantile i m_i <- mean(smpls[smpls<q_i ] ) # mean of obs less than q_i ret[i] <- ifelse(real < q_i, 1, 0) ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0) } ret } hcvx <- apply(testm, 1, cmp_fun) The code is functioning well, but seems to take forever to calculate the statistics matrix. As I have to repeat this snippet 2000 times, I have a problem. Can anyone advise as to how I can optimize the runtime of this problem? Should i drop the apply function altogether and just loop through the rows with a for loop? Does anyone know of matrix functions I can use to do the same operations I use within the cmp_fun function to avoid this looping? All suggestions are welcome! I have little experience optimizing code in R, so I am quite stumped at the moment. Cheers, Monty
Is testm really of class "matrix"? If its a "data.frame" then manipulation of matrices is often faster. On 5/8/06, Monty B. <montezumasrevenge at gmail.com> wrote:> Dear all, > > I have to handle a large matrix (1000 x 10001) where in the > last column i have a value that all the preceding values in the same row > has to be compared to. > > I have made the following code : > > # generate a (1000 x 10001) matrix, testm > # generate statistics matrix 1000 x 4: > > qnt <- c(0.01, 0.05) > cmp_fun <- function(x) > { > LAST <- length(x) > smpls <- x[1:(LAST-1)] > real <- x[LAST] > > ret <- vector(length=length(qnt)*2) > for (i in 1:length(qnt)) > { > q_i <- quantile(smpls, qnt[i]) # the quantile i > m_i <- mean(smpls[smpls<q_i ] ) # mean of obs less than q_i > ret[i] <- ifelse(real < q_i, 1, 0) > ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0) > } > ret > } > hcvx <- apply(testm, 1, cmp_fun) > > The code is functioning well, but seems to take forever to calculate > the statistics matrix. As I have to repeat this snippet 2000 times, I > have a problem. Can anyone advise as to how I can optimize the runtime > of this problem? Should i drop the apply function altogether and just > loop through the rows with a for loop? Does anyone know of matrix > functions I can use to do the same operations I use within the cmp_fun > function to avoid this looping? > > All suggestions are welcome! I have little experience optimizing code > in R, so I am quite stumped at the moment. > > Cheers, > > Monty
[Monty B. ]>I have to handle a large matrix (1000 x 10001) where in the >last column i have a value that all the preceding values in the same row >has to be compared to. I have made the following code :># generate a (1000 x 10001) matrix, testm ># generate statistics matrix 1000 x 4:>qnt <- c(0.01, 0.05) >cmp_fun <- function(x) >{ > LAST <- length(x) > smpls <- x[1:(LAST-1)] > real <- x[LAST]> ret <- vector(length=length(qnt)*2) > for (i in 1:length(qnt)) > { > q_i <- quantile(smpls, qnt[i]) # the quantile i > m_i <- mean(smpls[smpls<q_i ] ) # mean of obs less than q_i > ret[i] <- ifelse(real < q_i, 1, 0) > ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0) > } > ret >} >hcvx <- apply(testm, 1, cmp_fun)>Can anyone advise as to how I can optimize the runtime of this problem? >All suggestions are welcome!You may speed it up a bit, not so much, with the following: stats.testm <- function (testm, qnt=c(0.01, 0.05)) { quants <- apply(testm[, 1:(ncol(testm)-1)], 1, quantile, qnt) smpls <- testm[rep(1:nrow(testm), each=length(qnt)), 1:(ncol(testm)-1)] reals <- testm[rep(1:nrow(testm), each=length(qnt)), ncol(testm)] keeps <- smpls < rep(quants, ncol(smpls)) means <- rowSums(smpls * keeps) / rowSums(keeps) matrix(rbind((reals < quants) + 0, (reals < quants) * (reals - means)), length(qnt) * 2) } Try it with something like: gen.testm <- function (n, m) { matrix(sample(0:99, n * (m + 1), TRUE), n) } testm <- gen.testm(100, 100) stats.testm(testm) Without checking, I would suspect that quantile is the big consumer. If you could make it without quantile interpolation, maybe some more vectorisation could be possible, but in any case, I do not think you can avoid sorting each row separately, in one way or another (currently done within quantile). -- Fran?ois Pinard http://pinard.progiciels-bpi.ca