Chris Carleton
2010-Nov-17 19:11 UTC
[R] efficient conversion of matrix column rows to list elements
Hi List, I'm hoping to get opinions for enhancing the efficiency of the following code designed to take a vector of probabilities (outcomes) and calculate a union of the probability space. As part of the union calculation, combn() must be used, which returns a matrix, and the parallelized version of lapply() provided in the multicore package requires a list. I've found that parallelization is very necessary for vectors of outcomes greater in length than about 10 or 15 elements, which is why I need to make use of multicore (and, therefore, convert the combn() matrix to a list). It would speed the process up if there was a more direct way to convert the columns of combn() to elements of a single list. Any constructive suggestions will be greatly appreciated. Thanks for your consideration, C code: ------------ unionIndependant <- function(outcomes) { intsctn <- c() column2list <- function(x){list(x)} pb <- ProgressBar(max=length(outcomes),stepLength=1,newlineWhenDone=TRUE) for (i in 2:length(outcomes)){ increase(pb) outcomes_ <- apply(combn(outcomes,i),2,column2list) for (j in 1:length(outcomes_)){outcomes_[[j]] <- outcomes_[[j]][[1]]} outcomes_container <- mclapply(outcomes_,prod,mc.cores=3) intsctn[i] <- sum(unlist(outcomes_container)) } intsctn <- intsctn[-1] return(sum(outcomes) - sum(intsctn[which(which((intsctn %in% intsctn)) %% 2 == 1)]) + sum(intsctn[which(which((intsctn %in% intsctn)) %% 2 == 0)]) + ((-1)^length(intsctn) * prod(outcomes))) } ------------ PS This code has been tested on vectors of up to length(outcomes) == 25 and it should be noted that ProgressBar() requires the R.utils package. [[alternative HTML version deleted]]
Charles C. Berry
2010-Nov-17 20:10 UTC
[R] efficient conversion of matrix column rows to list elements
On Wed, 17 Nov 2010, Chris Carleton wrote:> Hi List, > > I'm hoping to get opinions for enhancing the efficiency of the following > code designed to take a vector of probabilities (outcomes) and calculate a > union of the probability space. As part of the union calculation, combn() > must be used, which returns a matrix, and the parallelized version of > lapply() provided in the multicore package requires a list. I've found that > parallelization is very necessary for vectors of outcomes greater in length > than about 10 or 15 elements, which is why I need to make use of multicore > (and, therefore, convert the combn() matrix to a list). It would speed the > process up if there was a more direct way to convert the columns of combn() > to elements of a single list.I think you are mistaken. Is this what Rprof() tells you? On my system, combn() is the culprit> Rprof() > outcomes <- 1:25 > nada <- replicate(200, {apply(combn(outcomes,2),2,column2list);NULL}) > Rprof(NULL) > summaryRprof()$by.self self.time self.pct total.time total.pct "combn" 0.64 61.54 0.70 67.31 "apply" 0.20 19.23 1.04 100.00 "FUN" 0.10 9.62 1.04 100.00 "!=" 0.04 3.85 0.04 3.85 "<" 0.02 1.92 0.02 1.92 "-" 0.02 1.92 0.02 1.92 "is.null" 0.02 1.92 0.02 1.92 And it hardly takes any time at that! HTH, Chuck p.s. Isn't as.data.frame( combn( outcomes, 2 ) ) or combn(outcomes, 2, list ) good enough? Any constructive suggestions will be greatly> appreciated. Thanks for your consideration, > > C > > code: > ------------ > unionIndependant <- function(outcomes) { > intsctn <- c() > column2list <- function(x){list(x)} > pb <- > ProgressBar(max=length(outcomes),stepLength=1,newlineWhenDone=TRUE) > for (i in 2:length(outcomes)){ > increase(pb) > outcomes_ <- apply(combn(outcomes,i),2,column2list) > for (j in 1:length(outcomes_)){outcomes_[[j]] <- > outcomes_[[j]][[1]]} > outcomes_container <- mclapply(outcomes_,prod,mc.cores=3) > intsctn[i] <- sum(unlist(outcomes_container)) > } > intsctn <- intsctn[-1] > return(sum(outcomes) - sum(intsctn[which(which((intsctn %in% intsctn)) > %% 2 == 1)]) + sum(intsctn[which(which((intsctn %in% intsctn)) %% 2 == 0)]) > + ((-1)^length(intsctn) * prod(outcomes))) > } > ------------ > PS This code has been tested on vectors of up to length(outcomes) == 25 and > it should be noted that ProgressBar() requires the R.utils package. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry Dept of Family/Preventive Medicine cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901