Strunk, Jacob (DNR)
2016-Aug-08 16:07 UTC
[R] interaction between clusterMap(), read.csv() and try() - try does not catch error
Hello I am attempting to process a list of csv files in parallel, some of which may be empty and fail with read.csv. I tend to use clusterMap as my go-to parallel function but have run into an interesting behavior. The behavior is that try(read.csv(x)) does not catch read errors resulting from having an empty csv file inside of clusterMap. I have not tested this with other functions (e.g. read.table, mean, etc.). The parLapply function does, it appears, correctly catch the errors. Any suggestions on how I should code with clusterMap such that try is guaranteed to catch the error? I am working on windows server 2012 I have the latest version of R and parallel I am executing the code from within the rstudio ide Version 0.99.896 Here is a demonstration of the failure R code used in demonstration: #prepare csv files - an empty file and a file with data close(file("c:/temp/badcsv.csv",open="w")) write.table(data.frame(x=2),"c:/temp/goodcsv.csv") #prepare a parallel cluster clus0=makeCluster(1, rscript_args = "--no-site-file") #read good / bad files in parallel with parLapply - which succeeds: try Does catch err x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...))) print(x1) #read good / bad files in parallel with clusterMap - which fails: try does Not catch error x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F) print(x0) R output:> #prepare csv files - an empty file and a file with data > close(file("c:/temp/badcsv.csv",open="w")) > write.table(data.frame(x=2),"c:/temp/goodcsv.csv") > > #prepare a parallel cluster > clus0=makeCluster(1, rscript_args = "--no-site-file") > > #read good / bad files in parallel with parLapply - which succeeds: try Does catch err > x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...))) > print(x1)[[1]] [1] "Error in read.table(file = file, header = header, sep = sep, quote = quote, : \n no lines available in input\n" attr(,"class") [1] "try-error" attr(,"condition") <simpleError in read.table(file = file, header = header, sep = sep, quote = quote, dec = dec, fill = fill, comment.char = comment.char, ...): no lines available in input> [[2]] x 1 1 2> > #read good / bad files in parallel with clusterMap - which fails: try does Not catch error > x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F)Error in checkForRemoteErrors(val) : one node produced an error: Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input> print(x0)Error in print(x0) : object 'x0' not found>Thanks for any help, Jacob [[alternative HTML version deleted]]
luke-tierney at uiowa.edu
2016-Aug-08 19:17 UTC
[R] interaction between clusterMap(), read.csv() and try() - try does not catch error
try is working fine. The problem is that your remote function is returning the try-error result, which the parallel infrastructure is interpreting as an error on the remote node, since the remote calling infrastructure is using try as well. This could be implemented more robustly, but it would probably be better in any case your code to use can use tryCatch and have the error. function return something easier to work with, like NULL. Best, luke On Mon, 8 Aug 2016, Strunk, Jacob (DNR) wrote:> Hello I am attempting to process a list of csv files in parallel, some of which may be empty and fail with read.csv. I tend to use clusterMap as my go-to parallel function but have run into an interesting behavior. The behavior is that try(read.csv(x)) does not catch read errors resulting from having an empty csv file inside of clusterMap. I have not tested this with other functions (e.g. read.table, mean, etc.). The parLapply function does, it appears, correctly catch the errors. Any suggestions on how I should code with clusterMap such that try is guaranteed to catch the error? > > > I am working on windows server 2012 > I have the latest version of R and parallel > I am executing the code from within the rstudio ide Version 0.99.896 > > Here is a demonstration of the failure > > R code used in demonstration: > #prepare csv files - an empty file and a file with data > close(file("c:/temp/badcsv.csv",open="w")) > write.table(data.frame(x=2),"c:/temp/goodcsv.csv") > > #prepare a parallel cluster > clus0=makeCluster(1, rscript_args = "--no-site-file") > > #read good / bad files in parallel with parLapply - which succeeds: try Does catch err > x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...))) > print(x1) > > #read good / bad files in parallel with clusterMap - which fails: try does Not catch error > x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F) > print(x0) > > R output: > >> #prepare csv files - an empty file and a file with data >> close(file("c:/temp/badcsv.csv",open="w")) >> write.table(data.frame(x=2),"c:/temp/goodcsv.csv") >> >> #prepare a parallel cluster >> clus0=makeCluster(1, rscript_args = "--no-site-file") >> >> #read good / bad files in parallel with parLapply - which succeeds: try Does catch err >> x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...))) >> print(x1) > [[1]] > [1] "Error in read.table(file = file, header = header, sep = sep, quote = quote, : \n no lines available in input\n" > attr(,"class") > [1] "try-error" > attr(,"condition") > <simpleError in read.table(file = file, header = header, sep = sep, quote = quote, dec = dec, fill = fill, comment.char = comment.char, ...): no lines available in input> > > [[2]] > x > 1 1 2 > >> >> #read good / bad files in parallel with clusterMap - which fails: try does Not catch error >> x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F) > Error in checkForRemoteErrors(val) : > one node produced an error: Error in read.table(file = file, header = header, sep = sep, quote = quote, : > no lines available in input >> print(x0) > Error in print(x0) : object 'x0' not found >> > > > Thanks for any help, > Jacob > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Strunk, Jacob (DNR)
2016-Aug-08 20:06 UTC
[R] interaction between clusterMap(), read.csv() and try() - try does not catch error
Ok - got it, I can handle that. Thank you Luke! Jacob L Strunk _______________________________________ From: luke-tierney at uiowa.edu [luke-tierney at uiowa.edu] Sent: Monday, August 08, 2016 12:17 PM To: Strunk, Jacob (DNR) Cc: r-help at r-project.org Subject: Re: [R] interaction between clusterMap(), read.csv() and try() - try does not catch error try is working fine. The problem is that your remote function is returning the try-error result, which the parallel infrastructure is interpreting as an error on the remote node, since the remote calling infrastructure is using try as well. This could be implemented more robustly, but it would probably be better in any case your code to use can use tryCatch and have the error. function return something easier to work with, like NULL. Best, luke On Mon, 8 Aug 2016, Strunk, Jacob (DNR) wrote:> Hello I am attempting to process a list of csv files in parallel, some of which may be empty and fail with read.csv. I tend to use clusterMap as my go-to parallel function but have run into an interesting behavior. The behavior is that try(read.csv(x)) does not catch read errors resulting from having an empty csv file inside of clusterMap. I have not tested this with other functions (e.g. read.table, mean, etc.). The parLapply function does, it appears, correctly catch the errors. Any suggestions on how I should code with clusterMap such that try is guaranteed to catch the error? > > > I am working on windows server 2012 > I have the latest version of R and parallel > I am executing the code from within the rstudio ide Version 0.99.896 > > Here is a demonstration of the failure > > R code used in demonstration: > #prepare csv files - an empty file and a file with data > close(file("c:/temp/badcsv.csv",open="w")) > write.table(data.frame(x=2),"c:/temp/goodcsv.csv") > > #prepare a parallel cluster > clus0=makeCluster(1, rscript_args = "--no-site-file") > > #read good / bad files in parallel with parLapply - which succeeds: try Does catch err > x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...))) > print(x1) > > #read good / bad files in parallel with clusterMap - which fails: try does Not catch error > x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F) > print(x0) > > R output: > >> #prepare csv files - an empty file and a file with data >> close(file("c:/temp/badcsv.csv",open="w")) >> write.table(data.frame(x=2),"c:/temp/goodcsv.csv") >> >> #prepare a parallel cluster >> clus0=makeCluster(1, rscript_args = "--no-site-file") >> >> #read good / bad files in parallel with parLapply - which succeeds: try Does catch err >> x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...))) >> print(x1) > [[1]] > [1] "Error in read.table(file = file, header = header, sep = sep, quote = quote, : \n no lines available in input\n" > attr(,"class") > [1] "try-error" > attr(,"condition") > <simpleError in read.table(file = file, header = header, sep = sep, quote = quote, dec = dec, fill = fill, comment.char = comment.char, ...): no lines available in input> > > [[2]] > x > 1 1 2 > >> >> #read good / bad files in parallel with clusterMap - which fails: try does Not catch error >> x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F) > Error in checkForRemoteErrors(val) : > one node produced an error: Error in read.table(file = file, header = header, sep = sep, quote = quote, : > no lines available in input >> print(x0) > Error in print(x0) : object 'x0' not found >> > > > Thanks for any help, > Jacob > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu