Henning Redestig
2004-Oct-18 13:51 UTC
[R] Increasing computiation time per column using lapply
Hi, Would be very glad for help on this problem. Using this code: temp<-function(x, bins, tot) { return(as.numeric(lapply(split(x, bins), wtest, tot))); } wtest <- function(x, y) { return(wilcox.test(x,y)$p.value); } rs <- function(x, bins) { binCount <- length(split(x[,1], bins)); tot <- as.numeric(x); result<-matrix(apply(x, 2, temp, bins, tot), nrow=binCount, byrow=F); rownames(result)<-names(split(x[,1], bins)); colnames(result)<-colnames(x); return(result); } where x is a matrix and bins is the grouping vector which can be used to split every column in x I get >rs(x, bins) to take ~100 s to execute if x has 22000 rows, 2 columns and bins split these in to 226 arrays of similar length. Thats all right but, if I instead increase to 3 columns it takes ~300 s and with 50 columns it takes > 13 h to execute. I can not understand why execution time doesnt increase linearly with the amount of columns. Memory status is all fine and I never need to start swapping. I tried to remove the temp function and use a for-loop to iterate over the columns instead of using apply but it does not solve my problem. Thanx! /Henning, redestig at mpimp-golm.mpg.de