Hello,
I'm running some code that requires untaring many files in the first step.
This takes a lot of time and I'd like to do this in parallel, if possible.
If it's the disk reading speed that is the bottleneck I guess I should not
expect an improvement, but perhaps it's the processor. So I want to try this
out.
I'm working on windows 7 with R 2.15.1 and the latest foreach and doSNOW
packages. See sessionInfo() below. Thanks in advance for any inputs!
# With lapply it works (i.e. each .tar.gz file is decompressed into several
directories with the files of interest inside)
lapply(tar.files.vector, FUN=untar)
# It also works with foreach in serial mode:
foreach(i=1:length(tar.files.vector)) %do% untar(tar.files.vector[i])
# However, foreach in parallel model gives an error....
foreach(i=1:length(tar.files.vector)) %dopar% untar(tar.files.vector[i])
Error in untar(tar.files.vector[i]) :
task 1 failed - "cannot open the connection"
Any ideas on how to address this problem (with these packages or other
ones)?
Thanks in advance.
Ariel
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] doSNOW_1.0.6 snow_0.3-10 iterators_1.0.6 foreach_1.4.0
[5] raster_2.0-08 rgdal_0.7-12 sp_0.9-99
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.1 grid_2.15.1 lattice_0.20-6
--
View this message in context:
http://r.789695.n4.nabble.com/untaring-files-in-parallel-with-foreach-and-doSNOW-tp4637614.html
Sent from the R help mailing list archive at Nabble.com.