Seth Russell
2018-Sep-19 21:19 UTC
[Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X
I have an lapply function call that I want to parallelize. Below is a very simplified version of the code: url_base <- "https://cloud.r-project.org/src/contrib/" files <- c("A3_1.0.0.tar.gz", "ABC.RAP_0.9.0.tar.gz") res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, s), s)) Instead of download a couple of files in parallel, I get a segfault per process with a 'memory not mapped' message. I've been working with Henrik Bengtsson on resolving this issue and he recommended I send a message to the R-Devel mailing list. Here's the output: trying URL 'https://cloud.r-project.org/src/contrib/A3_1.0.0.tar.gz' trying URL 'https://cloud.r-project.org/src/contrib/ABC.RAP_0.9.0.tar.gz' *** caught segfault *** address 0x11575ba3a, cause 'memory not mapped' *** caught segfault *** address 0x11575ba3a, cause 'memory not mapped' Traceback: 1: download.file(paste0(url_base, s), s) 2: FUN(X[[i]], ...) 3: lapply(X = S, FUN = FUN, ...) 4: doTryCatch(return(expr), name, parentenv, handler) 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 6: tryCatchList(expr, classes, parentenv, handlers) 7: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys.call(-4L) dcall <- deparse(call)[1L] prefix <- paste("Error in", dcall, ": ") LONG <- 75LTraceback: sm <- strsplit(conditionMessage(e), "\n")[[1L]] 1: w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) download.file(paste0(url_base, s), s) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) 2: FUN(X[[i]], ...) 3: lapply(X = S, FUN = FUN, ...) 4: doTryCatch(return(expr), name, parentenv, handler) 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 6: prefix <- paste0(prefix, "\n ")tryCatchList(expr, classes, parentenv, handlers) } else prefix <- "Error : " 7: msg <- paste0(prefix, conditionMessage(e), "\n")tryCatch(expr, error = function(e) { .Internal(seterrmessage(msg[1L])) call <- conditionCall(e) if (!silent && isTRUE(getOption("show.error.messages"))) { if (!is.null(call)) { cat(msg, file = outFile) if (identical(call[[1L]], quote(doTryCatch))) .Internal(printDeferredWarnings()) call <- sys.call(-4L) } dcall <- deparse(call)[1L] invisible(structure(msg, class "try-error", condition = e)) prefix <- paste("Error in", dcall, ": ")}) LONG <- 75L sm <- strsplit(conditionMessage(e), "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) 8: w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], try(lapply(X = S, FUN = FUN, ...), silent = TRUE) type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") 9: }sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L]))10: if (!silent && isTRUE(getOption("show.error.messages"))) {FUN(X[[i]], ...) cat(msg, file = outFile) .Internal(printDeferredWarnings()) }11: invisible(structure(msg, class = "try-error", condition e))lapply(seq_len(cores), inner.do)}) 12: 8: parallel::mclapply(files, function(s) download.file(paste0(url_base, try(lapply(X = S, FUN = FUN, ...), silent TRUE) s), s)) 9: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 10: 3: exit R without saving workspace FUN(X[[i]], ...)4: exit R saving workspace 11: lapply(seq_len(cores), inner.do) 12: parallel::mclapply(files, function(s) download.file(paste0(url_base, s), s)) Here's my sessionInfo()> sessionInfo()R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin16.7.0 (64-bit) Running under: macOS Sierra 10.12.6 Matrix products: default BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.3/lib/libopenblasp-r0.3.3.dylib locale: [1] en_US/en_US/en_US/C/en_US/en_US attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base loaded via a namespace (and not attached): [1] compiler_3.5.1 My version of R I'm running was installed via homebrew with "brew install r --with-java --with-openblas" Also, the provided example code works as expected on Linux. Also, if I provide a non-default download method to the download.file() call such as: res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, s), s, method="wget")) res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, s), s, method="curl")) It works correctly - no segfault. If I use method="libcurl" it does segfault. I'm not sure what steps to take to further narrow down the source of the error. Is this a known bug? if not, is this a new bug or an unexpected feature? Thanks, Seth [[alternative HTML version deleted]]
Martin Maechler
2018-Sep-20 08:33 UTC
[Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X
>>>>> Seth Russell >>>>> on Wed, 19 Sep 2018 15:19:48 -0600 writes:> I have an lapply function call that I want to parallelize. Below is a very > simplified version of the code: > url_base <- "https://cloud.r-project.org/src/contrib/" > files <- c("A3_1.0.0.tar.gz", "ABC.RAP_0.9.0.tar.gz") > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s)) > Instead of download a couple of files in parallel, I get a segfault per > process with a 'memory not mapped' message. I've been working with Henrik > Bengtsson on resolving this issue and he recommended I send a message to > the R-Devel mailing list. Thank you for the simple reproducible (*) example. If I run the above in either R-devel or R 3.5.1, it works flawlessly [on Linux Fedora 28]. .... ah, now I see you say so much later... also that other methods than "libcurl" work. To note here is that "libcurl" is also the default method on Linux where things work. I've also tried it on the Windows server I've easily access and the following code -- also explicitly using "libcurl" -- ##-------------------------------------------------------------- url_base <- "https://cloud.r-project.org/src/contrib/" files <- c("A3_1.0.0.tar.gz", "ABC.RAP_0.9.0.tar.gz") res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, s), s, method="libcurl")) ##-------------------------------------------------------------- works fine there too. - So maybe this should have gone to the R-SIG-Mac mailing list instead of this one ?? - Can other MacOS R users try and see? -- *) at least till one of the 2 packages gets updated ! ;-) > Here's the output: > trying URL 'https://cloud.r-project.org/src/contrib/A3_1.0.0.tar.gz' > trying URL 'https://cloud.r-project.org/src/contrib/ABC.RAP_0.9.0.tar.gz' > *** caught segfault *** > address 0x11575ba3a, cause 'memory not mapped' > *** caught segfault *** > address 0x11575ba3a, cause 'memory not mapped' > Traceback: > 1: download.file(paste0(url_base, s), s) > 2: FUN(X[[i]], ...) > 3: lapply(X = S, FUN = FUN, ...) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: tryCatchList(expr, classes, parentenv, handlers) > 7: tryCatch(expr, error = function(e) { call <- conditionCall(e) if > (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) > call <- sys.call(-4L) dcall <- deparse(call)[1L] > prefix <- paste("Error in", dcall, ": ") > LONG <- 75LTraceback: > sm <- strsplit(conditionMessage(e), "\n")[[1L]] 1: w <- 14L > + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) > download.file(paste0(url_base, s), s) w <- 14L + nchar(dcall, > type = "b") + nchar(sm[1L], > type = "b") if (w > LONG) 2: FUN(X[[i]], ...) > 3: lapply(X = S, FUN = FUN, ...) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: prefix <- paste0(prefix, "\n ")tryCatchList(expr, classes, > parentenv, handlers) > } else prefix <- "Error : " 7: msg <- paste0(prefix, > conditionMessage(e), "\n")tryCatch(expr, error = function(e) { > .Internal(seterrmessage(msg[1L])) call <- conditionCall(e) if > (!silent && isTRUE(getOption("show.error.messages"))) { if > (!is.null(call)) { cat(msg, file = outFile) if > (identical(call[[1L]], quote(doTryCatch))) > .Internal(printDeferredWarnings()) call <- sys.call(-4L) } > dcall <- deparse(call)[1L] invisible(structure(msg, class > "try-error", condition = e)) prefix <- paste("Error in", dcall, ": > ")}) LONG <- 75L sm <- strsplit(conditionMessage(e), > "\n")[[1L]] > w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") > if (is.na(w)) 8: w <- 14L + nchar(dcall, type = "b") + > nchar(sm[1L], try(lapply(X = S, FUN = FUN, ...), silent = TRUE) > type = "b") > if (w > LONG) prefix <- paste0(prefix, "\n ") 9: > }sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) else > prefix <- "Error : " > msg <- paste0(prefix, conditionMessage(e), "\n") > .Internal(seterrmessage(msg[1L]))10: if (!silent && > isTRUE(getOption("show.error.messages"))) {FUN(X[[i]], ...) cat(msg, > file = outFile) > .Internal(printDeferredWarnings()) }11: > invisible(structure(msg, class = "try-error", condition > e))lapply(seq_len(cores), inner.do)}) > 12: 8: parallel::mclapply(files, function(s) > download.file(paste0(url_base, try(lapply(X = S, FUN = FUN, ...), silent > TRUE) s), s)) > 9: > sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))Possible > actions: > 1: abort (with core dump, if enabled) > 2: normal R exit > 10: 3: exit R without saving workspace > FUN(X[[i]], ...)4: exit R saving workspace > 11: lapply(seq_len(cores), inner.do) > 12: parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s)) > Here's my sessionInfo() >> sessionInfo() > R version 3.5.1 (2018-07-02) > Platform: x86_64-apple-darwin16.7.0 (64-bit) > Running under: macOS Sierra 10.12.6 > Matrix products: default > BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.3/lib/libopenblasp-r0.3.3.dylib > locale: > [1] en_US/en_US/en_US/C/en_US/en_US > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > loaded via a namespace (and not attached): > [1] compiler_3.5.1 > My version of R I'm running was installed via homebrew with "brew install r > --with-java --with-openblas" > Also, the provided example code works as expected on Linux. Also, if I > provide a non-default download method to the download.file() call such as: > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s, method="wget")) > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s, method="curl")) > It works correctly - no segfault. If I use method="libcurl" it does > segfault. > I'm not sure what steps to take to further narrow down the source of the > error. > Is this a known bug? if not, is this a new bug or an unexpected feature? > Thanks, > Seth > [[alternative HTML version deleted]] > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Gábor Csárdi
2018-Sep-20 08:53 UTC
[Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X
This code actually happens to work for me on macOS, but I think in general you cannot rely on performing HTTP requests in fork clusters, i.e. with mclapply(). Fork clusters create worker processes by forking the R process and then _not_ executing another R binary. (Which is often convenient, because the new processes will inherit the memory image of the parent process.) Fork without exec is not supported by macOS, basically any calls to system libraries might crash. (Ie. not just HTTP-related calls.) For HTTP calls I have seen errors, crashes, and sometimes it works. Depends on the combination of libcurl version, macOS version and probably luck. It usually (always?) works on Linux, but I would not rely on that, either. So, yes, this is a known issue. Creating new processes to perform HTTP in parallel is very often bad practice, actually. Whenever you can, use I/O multiplexing instead, since the main R process is not doing anything, anyway, just waiting for the data to come in. So you don't need more processes, you need parallel I/O. Take a look at the curl::multi_add() etc. functions. Btw. download.file() can actually download files in parallel if the liburl method is used, just give it a list of URLs in a character vector. This API is very restricted, though, so I suggest to look at the curl package. GaborOn Thu, Sep 20, 2018 at 8:44 AM Seth Russell <seth.russell at gmail.com> wrote:> > I have an lapply function call that I want to parallelize. Below is a very > simplified version of the code: > > url_base <- "https://cloud.r-project.org/src/contrib/" > files <- c("A3_1.0.0.tar.gz", "ABC.RAP_0.9.0.tar.gz") > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s)) > > Instead of download a couple of files in parallel, I get a segfault per > process with a 'memory not mapped' message. I've been working with Henrik > Bengtsson on resolving this issue and he recommended I send a message to > the R-Devel mailing list. > > Here's the output: > > trying URL 'https://cloud.r-project.org/src/contrib/A3_1.0.0.tar.gz' > trying URL 'https://cloud.r-project.org/src/contrib/ABC.RAP_0.9.0.tar.gz' > > *** caught segfault *** > address 0x11575ba3a, cause 'memory not mapped' > > *** caught segfault *** > address 0x11575ba3a, cause 'memory not mapped' > > Traceback: > 1: download.file(paste0(url_base, s), s) > 2: FUN(X[[i]], ...) > 3: lapply(X = S, FUN = FUN, ...) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: tryCatchList(expr, classes, parentenv, handlers) > 7: tryCatch(expr, error = function(e) { call <- conditionCall(e) if > (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) > call <- sys.call(-4L) dcall <- deparse(call)[1L] > prefix <- paste("Error in", dcall, ": ") > LONG <- 75LTraceback: > sm <- strsplit(conditionMessage(e), "\n")[[1L]] 1: w <- 14L > + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) > download.file(paste0(url_base, s), s) w <- 14L + nchar(dcall, > type = "b") + nchar(sm[1L], > type = "b") if (w > LONG) 2: FUN(X[[i]], ...) > 3: lapply(X = S, FUN = FUN, ...) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: prefix <- paste0(prefix, "\n ")tryCatchList(expr, classes, > parentenv, handlers) > } else prefix <- "Error : " 7: msg <- paste0(prefix, > conditionMessage(e), "\n")tryCatch(expr, error = function(e) { > .Internal(seterrmessage(msg[1L])) call <- conditionCall(e) if > (!silent && isTRUE(getOption("show.error.messages"))) { if > (!is.null(call)) { cat(msg, file = outFile) if > (identical(call[[1L]], quote(doTryCatch))) > .Internal(printDeferredWarnings()) call <- sys.call(-4L) } > dcall <- deparse(call)[1L] invisible(structure(msg, class > "try-error", condition = e)) prefix <- paste("Error in", dcall, ": > ")}) LONG <- 75L sm <- strsplit(conditionMessage(e), > "\n")[[1L]] > w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") > if (is.na(w)) 8: w <- 14L + nchar(dcall, type = "b") + > nchar(sm[1L], try(lapply(X = S, FUN = FUN, ...), silent = TRUE) > type = "b") > if (w > LONG) prefix <- paste0(prefix, "\n ") 9: > }sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) else > prefix <- "Error : " > msg <- paste0(prefix, conditionMessage(e), "\n") > .Internal(seterrmessage(msg[1L]))10: if (!silent && > isTRUE(getOption("show.error.messages"))) {FUN(X[[i]], ...) cat(msg, > file = outFile) > .Internal(printDeferredWarnings()) }11: > invisible(structure(msg, class = "try-error", condition > e))lapply(seq_len(cores), inner.do)}) > > 12: 8: parallel::mclapply(files, function(s) > download.file(paste0(url_base, try(lapply(X = S, FUN = FUN, ...), silent > TRUE) s), s)) > > 9: > sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))Possible > actions: > > 1: abort (with core dump, if enabled) > 2: normal R exit > 10: 3: exit R without saving workspace > FUN(X[[i]], ...)4: exit R saving workspace > > 11: lapply(seq_len(cores), inner.do) > 12: parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s)) > > Here's my sessionInfo() > > > sessionInfo() > R version 3.5.1 (2018-07-02) > Platform: x86_64-apple-darwin16.7.0 (64-bit) > Running under: macOS Sierra 10.12.6 > > Matrix products: default > BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.3/lib/libopenblasp-r0.3.3.dylib > > locale: > [1] en_US/en_US/en_US/C/en_US/en_US > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > loaded via a namespace (and not attached): > [1] compiler_3.5.1 > > My version of R I'm running was installed via homebrew with "brew install r > --with-java --with-openblas" > > Also, the provided example code works as expected on Linux. Also, if I > provide a non-default download method to the download.file() call such as: > > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s, method="wget")) > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s, method="curl")) > > It works correctly - no segfault. If I use method="libcurl" it does > segfault. > > I'm not sure what steps to take to further narrow down the source of the > error. > > Is this a known bug? if not, is this a new bug or an unexpected feature? > > Thanks, > Seth > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Seth Russell
2018-Sep-20 15:09 UTC
[Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X
Thanks for the warning about fork without exec(). A co-worker of mine, also on Mac, ran the sample code and got an error about that exact problem. Thanks also for the pointer to try curl::multi_add() or download.file() with a vector of files. My actual use case includes downloading the files and then untar() for analysis of files contained in the tar.gz file. I'm currently parallelizing both the download and untar operation and found that using a parallel form of lapply resulted in 4x - 8x improvement depending on hardware, network latency, etc. I'll see how much of that improvement can be attributed to I/O multiplexing for the downloading portion using your recommendations. Seth Trimmed reply from G?bor Cs?rdi <csardi.gabor at gmail.com>:> > Fork without exec is not supported by macOS, basically any calls to > system libraries might crash. (Ie. not just HTTP-related calls.) For > HTTP calls I have seen errors, crashes, and sometimes it works. > Depends on the combination of libcurl version, macOS version and > probably luck. > > It usually (always?) works on Linux, but I would not rely on that, either. > > So, yes, this is a known issue. > > Creating new processes to perform HTTP in parallel is very often bad > practice, actually. Whenever you can, use I/O multiplexing instead, > since the main R process is not doing anything, anyway, just waiting > for the data to come in. So you don't need more processes, you need > parallel I/O. Take a look at the curl::multi_add() etc. functions. > > Btw. download.file() can actually download files in parallel if the > liburl method is used, just give it a list of URLs in a character > vector. This API is very restricted, though, so I suggest to look at > the curl package. > > >[[alternative HTML version deleted]]
Tomas Kalibera
2018-Oct-04 16:12 UTC
[Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X
Thanks for the report, but unfortunately I cannot reproduce on my system (either macOS nor Linux, from the command line) to debug. Did you run this in the command line version of R? I would not be surprised to see such a crash if executed from a multi-threaded application, say from some GUI or frontend that runs multiple threads, or from some other R session where a third party library (curl?) already started some threads. In such situations mcfork/mclapply is unsafe (?mcfork warns against GUI and frontends and I've now expanded slightly) and and it could not be fixed without being turned into something like parLapply(). parLapply() on a non-FORK cluster should work fine even with such applications. Best Tomas On 09/19/2018 11:19 PM, Seth Russell wrote:> I have an lapply function call that I want to parallelize. Below is a very > simplified version of the code: > > url_base <- "https://cloud.r-project.org/src/contrib/" > files <- c("A3_1.0.0.tar.gz", "ABC.RAP_0.9.0.tar.gz") > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s)) > > Instead of download a couple of files in parallel, I get a segfault per > process with a 'memory not mapped' message. I've been working with Henrik > Bengtsson on resolving this issue and he recommended I send a message to > the R-Devel mailing list. > > Here's the output: > > trying URL 'https://cloud.r-project.org/src/contrib/A3_1.0.0.tar.gz' > trying URL 'https://cloud.r-project.org/src/contrib/ABC.RAP_0.9.0.tar.gz' > > *** caught segfault *** > address 0x11575ba3a, cause 'memory not mapped' > > *** caught segfault *** > address 0x11575ba3a, cause 'memory not mapped' > > Traceback: > 1: download.file(paste0(url_base, s), s) > 2: FUN(X[[i]], ...) > 3: lapply(X = S, FUN = FUN, ...) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: tryCatchList(expr, classes, parentenv, handlers) > 7: tryCatch(expr, error = function(e) { call <- conditionCall(e) if > (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) > call <- sys.call(-4L) dcall <- deparse(call)[1L] > prefix <- paste("Error in", dcall, ": ") > LONG <- 75LTraceback: > sm <- strsplit(conditionMessage(e), "\n")[[1L]] 1: w <- 14L > + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) > download.file(paste0(url_base, s), s) w <- 14L + nchar(dcall, > type = "b") + nchar(sm[1L], > type = "b") if (w > LONG) 2: FUN(X[[i]], ...) > 3: lapply(X = S, FUN = FUN, ...) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: prefix <- paste0(prefix, "\n ")tryCatchList(expr, classes, > parentenv, handlers) > } else prefix <- "Error : " 7: msg <- paste0(prefix, > conditionMessage(e), "\n")tryCatch(expr, error = function(e) { > .Internal(seterrmessage(msg[1L])) call <- conditionCall(e) if > (!silent && isTRUE(getOption("show.error.messages"))) { if > (!is.null(call)) { cat(msg, file = outFile) if > (identical(call[[1L]], quote(doTryCatch))) > .Internal(printDeferredWarnings()) call <- sys.call(-4L) } > dcall <- deparse(call)[1L] invisible(structure(msg, class > "try-error", condition = e)) prefix <- paste("Error in", dcall, ": > ")}) LONG <- 75L sm <- strsplit(conditionMessage(e), > "\n")[[1L]] > w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") > if (is.na(w)) 8: w <- 14L + nchar(dcall, type = "b") + > nchar(sm[1L], try(lapply(X = S, FUN = FUN, ...), silent = TRUE) > type = "b") > if (w > LONG) prefix <- paste0(prefix, "\n ") 9: > }sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) else > prefix <- "Error : " > msg <- paste0(prefix, conditionMessage(e), "\n") > .Internal(seterrmessage(msg[1L]))10: if (!silent && > isTRUE(getOption("show.error.messages"))) {FUN(X[[i]], ...) cat(msg, > file = outFile) > .Internal(printDeferredWarnings()) }11: > invisible(structure(msg, class = "try-error", condition > e))lapply(seq_len(cores), inner.do)}) > > 12: 8: parallel::mclapply(files, function(s) > download.file(paste0(url_base, try(lapply(X = S, FUN = FUN, ...), silent > TRUE) s), s)) > > 9: > sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))Possible > actions: > > 1: abort (with core dump, if enabled) > 2: normal R exit > 10: 3: exit R without saving workspace > FUN(X[[i]], ...)4: exit R saving workspace > > 11: lapply(seq_len(cores), inner.do) > 12: parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s)) > > Here's my sessionInfo() > >> sessionInfo() > R version 3.5.1 (2018-07-02) > Platform: x86_64-apple-darwin16.7.0 (64-bit) > Running under: macOS Sierra 10.12.6 > > Matrix products: default > BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.3/lib/libopenblasp-r0.3.3.dylib > > locale: > [1] en_US/en_US/en_US/C/en_US/en_US > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > loaded via a namespace (and not attached): > [1] compiler_3.5.1 > > My version of R I'm running was installed via homebrew with "brew install r > --with-java --with-openblas" > > Also, the provided example code works as expected on Linux. Also, if I > provide a non-default download method to the download.file() call such as: > > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s, method="wget")) > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s, method="curl")) > > It works correctly - no segfault. If I use method="libcurl" it does > segfault. > > I'm not sure what steps to take to further narrow down the source of the > error. > > Is this a known bug? if not, is this a new bug or an unexpected feature? > > Thanks, > Seth > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Jeroen Ooms
2018-Oct-04 17:29 UTC
[Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X
On Thu, Oct 4, 2018 at 6:12 PM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:> > > Thanks for the report, but unfortunately I cannot reproduce on my system > (either macOS nor Linux, from the command line) to debug. Did you run > this in the command line version of R?It depends on which version of MacOS that you are using and specifically which TLS back-end curl has been configured with. When libcurl uses DarwinSSL, it may crash when opening HTTPS connections in a fork because CoreFoundation is not fork-safe. OTOH when using OpenSSL or LibreSSL for TLS, you usually get away with forking (though it's still bad practice). The standard version of libcurl that ships with MacOS was using CoreFoundation until 10.12 but starting 10.13 they switched to LibreSSL in order to support HTTP/2. See curl --version or curl::curl_version() for your local config. Don't count in this though, Apple might switch back to the fork-unsafe DarwinSSL once they support ALPN, which is needed for HTTP/2. As Gabor already suggested, libcurl has built-in systems for concurrent connections. The curl package exposes this via multi_add function. Not only is this safer than forking, it will be much faster because it takes advantage of HTTP keep-alive and when supported it uses HTTP2 multiplexing which allows efficiently performing thousands of concurrent HTTPS requests over a single TCP connection.
Seemingly Similar Threads
- segfault issue with parallel::mclapply and download.file() on Mac OS X
- segfault issue with parallel::mclapply and download.file() on Mac OS X
- SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
- SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
- Memory error in the libcurl connection code