ivo welch
2013-Feb-27 02:51 UTC
[R] Parallelizing Other Apply Functions, e.g. by, the Easy (Wrong?) Way
Dear R Users---this is more curiosity than a real problem. I am wondering how to add mc* functions for all of R's *apply functions. stackoverflow 3505701 has a nice overview of these functions. roughly, apply ( function to rows and columns of matrix ) lapply ( function to each element of list, get back list ) sapply ( function to each element of list, get back vector ) vapply ( like sapply, but tells R about return values of function for speed ) mapply ( function to first element of multiple lists, then second element of multiple lists, etc ) rapply ( uncommon recursive use ) tapply ( function to one vector based on groups of another vector ) by ( function to apply to data frame or list, based on one vector in the list; relies on tapply ) aggregate ( seems to be sort of like "by" ) ave ( merges by results back into a vector of same length ) many of the functions use lapply internally. "by" uses tapply internally. which uses lapply. these functions should ideally have had a parameter that is the lapply function that they use, so that mc.by <- function(...) by(..., lapply=mclapply ). unfortunately, they do not. so, I am now wondering what the preferred way is to "patch" them. one way to do this would be: mc.by <- function( ... ) { oc.lapply <- lapply lapply <<- mclapply result <- by( ... ) lapply <<- oc.lapply return(result) } the disadvantage is that if the function to mc.by itself relies on, say, sapply somewhere, both would use the multicore mclapply function. I don't think it is possible to spoof the GlobalEnv for only one function, but not its own lower-tier functions. another big problem here is that I probably have to trap this function appropriately, so that this restores on abort the original lapply function. not pretty. the advantage of doing it is that future changes by the R core team to by() will not create any changes. the alternative is to copy the definitions from by.data.frame and by.default, and replace them with my own, which has only one change---the optional argument. this is not hard to do, but I now run the risk that the R team could change by(). I wish I could at least test whether the by() function changes from release to release to alert me, but functions are not atomic and therefore cannot be compared. what is the recommended way to do this? /iaw ---- Ivo Welch (ivo.welch@gmail.com) http://www.ivo-welch.info/ [[alternative HTML version deleted]]
Maybe Matching Threads
- parallel error message extraction (in mclapply)?
- Deep Replicable Bug With AMD Threadripper MultiCore
- R 3.0.1 : parallel collection triggers "long memory not supported yet"
- mc.cores and computer settings on osx and linux
- Deep Replicable Bug With AMD Threadripper MultiCore