Dear all,
I am trying to parallelize the function npnewpar given below. When I am
comparing an application of "apply" with "parApply" the
parallelized version seems to be much slower (cf output below). Therefore I
would like to ask how the function could be parallelized more efficient. (With
increasing sample size the difference becomes smaller, but I was wondering about
this big differences and how it could be improved.)
Thank you very much for help in advance!
Best,
Martin
library(microbenchmark)
library(doParallel)
n <- 500
y <- rnorm(n)
Xc <- rnorm(n)
Xd <- sample(c(0,1), replace=TRUE)
Weights <- diag(n)
n1 <- 50
Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE))
detectCores()
cl <- makeCluster(4)
registerDoParallel(cl)
microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd, Weights=Weights,
h=0.5), parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd, Weights=Weights,
h=0.5), times=100)
stopCluster(cl)
Unit: milliseconds
expr min lq mean median
apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,
h = 0.5) 4.674914 4.726463 5.455323 4.771016
parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,
h = 0.5) 34.168250 35.434829 56.553296 39.438899
uq max neval
4.843324 57.01519 100
49.777265 347.77887 100
npnewpar <- function(y, Xc, Xd, Weights, h, xeval) {
xc <- xeval[1]
xd <- xeval[2]
l <- function(x,X) {
w <- Weights[x,X]
return(w)
}
u <- (Xc-xc)/h
#K <- kernel(u)
K <- dnorm(u)
L <- l(xd,Xd)
nom <- sum(y*K*L)
denom <- sum(K*L)
ghat <- nom/denom
return(ghat)
}
Parallelizing comes at a price... and there is no guarantee that you can afford
it. Vectorizing your algorithms is often a better approach. Microbenchmarking
is usually overkill for evaluating parallelizing.
You assume 4 cores... but many CPUs have 2 cores and use hyperthreading to make
each core look like two.
The operating system can make a difference also... Windows processes are more
expensive to start and communicate between than *nix processes are. In
particular, Windows seems to require duplicated RAM pages while *nix can share
process RAM (at least until they are written to) so you end up needing more
memory and disk paging of virtual memory becomes more likely.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On July 30, 2015 8:26:34 AM EDT, Martin Spindler <Martin.Spindler at
gmx.de> wrote:>Dear all,
>
>I am trying to parallelize the function npnewpar given below. When I am
>comparing an application of "apply" with "parApply" the
parallelized
>version seems to be much slower (cf output below). Therefore I would
>like to ask how the function could be parallelized more efficient.
>(With increasing sample size the difference becomes smaller, but I was
>wondering about this big differences and how it could be improved.)
>
>Thank you very much for help in advance!
>
>Best,
>
>Martin
>
>
>library(microbenchmark)
>library(doParallel)
>
>n <- 500
>y <- rnorm(n)
>Xc <- rnorm(n)
>Xd <- sample(c(0,1), replace=TRUE)
>Weights <- diag(n)
>n1 <- 50
>Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE))
>
>
>detectCores()
>cl <- makeCluster(4)
>registerDoParallel(cl)
>microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd,
>Weights=Weights, h=0.5), parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc,
>Xd = Xd, Weights=Weights, h=0.5), times=100)
>stopCluster(cl)
>
>
>Unit: milliseconds
> expr min lq mean median
>apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,
> h = 0.5) 4.674914 4.726463 5.455323 4.771016
>parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights
>Weights, h = 0.5) 34.168250 35.434829 56.553296 39.438899
> uq max neval
> 4.843324 57.01519 100
> 49.777265 347.77887 100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>npnewpar <- function(y, Xc, Xd, Weights, h, xeval) {
> xc <- xeval[1]
> xd <- xeval[2]
> l <- function(x,X) {
> w <- Weights[x,X]
> return(w)
> }
> u <- (Xc-xc)/h
> #K <- kernel(u)
> K <- dnorm(u)
> L <- l(xd,Xd)
> nom <- sum(y*K*L)
> denom <- sum(K*L)
> ghat <- nom/denom
> return(ghat)
>}
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
I ran a test on my Windows box with 4 CPUs. THere were 4 RScript processes started in response to the request for a cluster of 4. Each of these ran for an elapsed time of around 23 seconds, making the median time around 0.2 seconds for 100 iterations as reported by microbenchmark. The 'apply' only takes about 0.003 seconds for a single iteration - again what microbenchmark is reporting. The 4 RScript processes each use about 3 CPU seconds in the 23 seconds of elapsed time, most of that is probably the communication and startup time for the processes and reporting results. So as was pointed out previous there is overhead is running in parallel. You probably have to have at least several seconds of heavy computation for a iteration to make trying to parallelize something. You should also investigate exactly what is happening on your system so that you can account for the time being spent. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Jul 30, 2015 at 8:56 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> Parallelizing comes at a price... and there is no guarantee that you can > afford it. Vectorizing your algorithms is often a better approach. > Microbenchmarking is usually overkill for evaluating parallelizing. > > You assume 4 cores... but many CPUs have 2 cores and use hyperthreading to > make each core look like two. > > The operating system can make a difference also... Windows processes are > more expensive to start and communicate between than *nix processes are. In > particular, Windows seems to require duplicated RAM pages while *nix can > share process RAM (at least until they are written to) so you end up > needing more memory and disk paging of virtual memory becomes more likely. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On July 30, 2015 8:26:34 AM EDT, Martin Spindler <Martin.Spindler at gmx.de> > wrote: > >Dear all, > > > >I am trying to parallelize the function npnewpar given below. When I am > >comparing an application of "apply" with "parApply" the parallelized > >version seems to be much slower (cf output below). Therefore I would > >like to ask how the function could be parallelized more efficient. > >(With increasing sample size the difference becomes smaller, but I was > >wondering about this big differences and how it could be improved.) > > > >Thank you very much for help in advance! > > > >Best, > > > >Martin > > > > > >library(microbenchmark) > >library(doParallel) > > > >n <- 500 > >y <- rnorm(n) > >Xc <- rnorm(n) > >Xd <- sample(c(0,1), replace=TRUE) > >Weights <- diag(n) > >n1 <- 50 > >Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE)) > > > > > >detectCores() > >cl <- makeCluster(4) > >registerDoParallel(cl) > >microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd, > >Weights=Weights, h=0.5), parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc, > >Xd = Xd, Weights=Weights, h=0.5), times=100) > >stopCluster(cl) > > > > > >Unit: milliseconds > > expr min lq mean median > >apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights, > > h = 0.5) 4.674914 4.726463 5.455323 4.771016 > >parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights > >Weights, h = 0.5) 34.168250 35.434829 56.553296 39.438899 > > uq max neval > > 4.843324 57.01519 100 > > 49.777265 347.77887 100 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >npnewpar <- function(y, Xc, Xd, Weights, h, xeval) { > > xc <- xeval[1] > > xd <- xeval[2] > > l <- function(x,X) { > > w <- Weights[x,X] > > return(w) > > } > > u <- (Xc-xc)/h > > #K <- kernel(u) > > K <- dnorm(u) > > L <- l(xd,Xd) > > nom <- sum(y*K*L) > > denom <- sum(K*L) > > ghat <- nom/denom > > return(ghat) > >} > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]