thr3ads.net - R help - [R] error serialize (foreach) [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Doran, Harold

2016-Dec-03 21:25 UTC

[R] error serialize (foreach)

I have a portion of a foreach loop that I cannot run as parallel but works fine
when serialized. Below is a representation of the problem as in this instance I
cannot provide reproducible data to generate the same error, the actual data I
am working with are confidential.

Within each foreach loop are a series of custom functions acting on my data.
When using %do% I get expected result but replacing it with %dopar% generates
the error.

I have searched archives and also stackexchange and see this is an issue that
arises and I have tried a couple of the recommendations, like trying to use an
outfile in makeCluster. But I am not having success.

Oddly, (or perhaps not oddly), others portions of my program run in parallel and
do not generate this same error

library(foreach)
library(doParallel)
registerDoParallel(cores=3)

# This portion runs and produces expected result
result <- foreach(i = 1:N) %do% {
tmp1 <- function1(...)
tmp2 <- function2(...)
tmp2
}

# This portion generates error in serialize
result <- foreach(i = 1:N) %dopar% {
tmp1 <- function1(...)
tmp2 <- function2(...)
tmp2
}

error in serialize(data, node$con) : error writing to connection


	[[alternative HTML version deleted]]

Doran, Harold

2016-Dec-04 02:11 UTC

head link

[R] error serialize (foreach)

As a follow up to this, I have been able to generate a toy example of
reproducible code that generates the same problem. Below is just a sample to
represent the issue, but my data and subsequent functions acting on the data are
much more involved.

I no longer have the error, but, the loop running in parallel is extremely slow
relative to its serialized counterpart.

I have narrowed down the problem to the fact that I am searching through a very
large list, grabbing the data from that list by indexing to subset and then
doing stuff to it. Both "work", but the parallel version is very, very
slow. I believe I am sending data files to each core and the number of searches
happening is prohibitive.

I am very much stuck in the design-based way of how I would do this particular
problem on a single core and am not sure if there is a better designed based
approach for solving this problem in the parallel version.

Any advice on better ways to work with the %dopar% version here?

N <- 200000
myList <- vector('list', N)
names(myList) <- 1:N
for(i in 1:N){
	myList[[i]] <- rnorm(100)
}
nms <- 1:N
library(foreach)
library(doParallel)
registerDoParallel(cores=7)

result <- foreach(i = 1:3) %do% {
	dat <- myList[[which(names(myList) == nms[i])]]
	mean(dat)
}

result <- foreach(i = 1:3) %dopar% {
	dat <- myList[[which(names(myList) == nms[i])]]
	mean(dat)
}
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Doran, Harold
Sent: Saturday, December 03, 2016 4:26 PM
To: r-help at r-project.org
Subject: [R] error serialize (foreach)

I have a portion of a foreach loop that I cannot run as parallel but works fine
when serialized. Below is a representation of the problem as in this instance I
cannot provide reproducible data to generate the same error, the actual data I
am working with are confidential.

Within each foreach loop are a series of custom functions acting on my data.
When using %do% I get expected result but replacing it with %dopar% generates
the error.

I have searched archives and also stackexchange and see this is an issue that
arises and I have tried a couple of the recommendations, like trying to use an
outfile in makeCluster. But I am not having success.

Oddly, (or perhaps not oddly), others portions of my program run in parallel and
do not generate this same error

library(foreach)
library(doParallel)
registerDoParallel(cores=3)

# This portion runs and produces expected result result <- foreach(i = 1:N)
%do% {
tmp1 <- function1(...)
tmp2 <- function2(...)
tmp2
}

# This portion generates error in serialize result <- foreach(i = 1:N)
%dopar% {
tmp1 <- function1(...)
tmp2 <- function2(...)
tmp2
}

error in serialize(data, node$con) : error writing to connection


	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jon Skoien

2016-Dec-05 14:29 UTC

head link

[R] error serialize (foreach)

Parallel processing usually includes quite a lot of overhead, which is 
expensive if the computation itself is quick. This is definitely an 
example where the function is too simple to take advantage of 
parallelization. Another thing is that your example has some errors, 
which makes the effect even stronger, as you are only averaging over the 
first three elements of the list.

I have modified the example below to call a more complicated function 
than the mean function. Then the parallelized example is faster 
(although not by much). To see the difference, replace the lapply lines 
with "lapply(dat, mean)". Under the foreach example, you can also see 
the same computation with clusterApply, which seems to be much more 
efficient for this problem.


N <- 200000
myList <- vector('list', N)
for(i in 1:N){
	myList[[i]] <- rnorm(100)
}
library(foreach)
library(doParallel)

ncores = 7
registerDoParallel(cores=ncores)

names(myList) = make.names(rep(1:ncores, length.out = N))
nms = 1:ncores

system.time(result <- foreach(i = 1:ncores) %do% {
	dat <- myList[which(names(myList) == make.names(nms[i]))]
	lapply(dat, FUN = function(x) log(sd(x)) + sd(x) + var(x))
}            )

system.time(
result2 <- foreach(i = 1:ncores) %dopar% {
	dat <- myList[which(names(myList) == make.names(nms[i]))]
	lapply(dat, FUN = function(x) log(sd(x)) + sd(x) + var(x))
}           )



foreach is not always the best choice for parallel processing. You could 
also have a look at clusterApply:

f1 = function(x) mean(x)
f2 = function(x) log(sd(x)) + sd(x) + var(x)

cl = makeCluster(ncores)
clusterExport(cl, list("f1", "f2"))
dats = split(myList, names(myList))
system.time(res <- clusterApply(cl, dats, fun = function(x) lapply(x, f1)))
system.time(res <- lapply(dats, FUN = function(x) lapply(x, f1)))

system.time(res <- clusterApply(cl, dats, fun = function(x) lapply(x, f2)))
system.time(res <- lapply(dats, FUN = function(x) lapply(x, f2)))

lapply is still faster for the example with mean, but much slower for 
the more complicated function.


Best,
Jon




On 12/4/2016 3:11 AM, Doran, Harold wrote:> As a follow up to this, I have been able to generate a toy example of
reproducible code that generates the same problem. Below is just a sample to
represent the issue, but my data and subsequent functions acting on the data are
much more involved.
>
> I no longer have the error, but, the loop running in parallel is extremely
slow relative to its serialized counterpart.
>
> I have narrowed down the problem to the fact that I am searching through a
very large list, grabbing the data from that list by indexing to subset and then
doing stuff to it. Both "work", but the parallel version is very, very
slow. I believe I am sending data files to each core and the number of searches
happening is prohibitive.
>
> I am very much stuck in the design-based way of how I would do this
particular problem on a single core and am not sure if there is a better
designed based approach for solving this problem in the parallel version.
>
> Any advice on better ways to work with the %dopar% version here?
>
> N <- 200000
> myList <- vector('list', N)
> names(myList) <- 1:N
> for(i in 1:N){
> 	myList[[i]] <- rnorm(100)
> }
> nms <- 1:N
> library(foreach)
> library(doParallel)
> registerDoParallel(cores=7)
>
> result <- foreach(i = 1:3) %do% {
> 	dat <- myList[[which(names(myList) == nms[i])]]
> 	mean(dat)
> }
>
> result <- foreach(i = 1:3) %dopar% {
> 	dat <- myList[[which(names(myList) == nms[i])]]
> 	mean(dat)
> }
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Doran,
Harold
> Sent: Saturday, December 03, 2016 4:26 PM
> To: r-help at r-project.org
> Subject: [R] error serialize (foreach)
>
> I have a portion of a foreach loop that I cannot run as parallel but works
fine when serialized. Below is a representation of the problem as in this
instance I cannot provide reproducible data to generate the same error, the
actual data I am working with are confidential.
>
> Within each foreach loop are a series of custom functions acting on my
data. When using %do% I get expected result but replacing it with %dopar%
generates the error.
>
> I have searched archives and also stackexchange and see this is an issue
that arises and I have tried a couple of the recommendations, like trying to use
an outfile in makeCluster. But I am not having success.
>
> Oddly, (or perhaps not oddly), others portions of my program run in
parallel and do not generate this same error
>
> library(foreach)
> library(doParallel)
> registerDoParallel(cores=3)
>
> # This portion runs and produces expected result result <- foreach(i =
1:N) %do% {
> tmp1 <- function1(...)
> tmp2 <- function2(...)
> tmp2
> }
>
> # This portion generates error in serialize result <- foreach(i = 1:N)
%dopar% {
> tmp1 <- function1(...)
> tmp2 <- function2(...)
> tmp2
> }
>
> error in serialize(data, node$con) : error writing to connection
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Jon Olav Sk?ien
Joint Research Centre - European Commission
Institute for Space, Security & Migration
Disaster Risk Management Unit

Via E. Fermi 2749, TP 122,  I-21027 Ispra (VA), ITALY

jon.skoien at jrc.ec.europa.eu
Tel:  +39 0332 789205

Disclaimer: Views expressed in this email are those of the individual 
and do not necessarily represent official views of the European Commission.

R help - Dec 2016 - error serialize (foreach)

[R] error serialize (foreach)

[R] error serialize (foreach)

[R] error serialize (foreach)