Hello,
I'm running some code that requires untaring many files in the first step.
This takes a lot of time and I'd like to do this in parallel, if possible.
If it's the disk reading speed that is the bottleneck I guess I should not
expect an improvement, but perhaps it's the processor. So I want to try this
out.
I'm working on windows 7 with R 2.15.1 and the latest foreach and doSNOW
packages. See sessionInfo() below. Thanks in advance for any inputs!
# With lapply it works (i.e. each .tar.gz file is decompressed into several
directories with the files of interest inside)
lapply(tar.files.vector, FUN=untar) 
# It also works with foreach in serial mode:
foreach(i=1:length(tar.files.vector)) %do% untar(tar.files.vector[i]) 
# However, foreach in parallel model gives an error....
foreach(i=1:length(tar.files.vector)) %dopar% untar(tar.files.vector[i]) 
Error in untar(tar.files.vector[i]) : 
  task 1 failed - "cannot open the connection"
Any ideas on how to address this problem (with these packages or other
ones)?
Thanks in advance.
Ariel
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] doSNOW_1.0.6    snow_0.3-10     iterators_1.0.6 foreach_1.4.0  
[5] raster_2.0-08   rgdal_0.7-12    sp_0.9-99      
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.1 grid_2.15.1     lattice_0.20-6 
--
View this message in context:
http://r.789695.n4.nabble.com/untaring-files-in-parallel-with-foreach-and-doSNOW-tp4637614.html
Sent from the R help mailing list archive at Nabble.com.