Yes I am running on Rstudio 1.2.5033. I was also running this code without error on Ubuntu in Rstudio. Checking again on the terminal and it does indeed work fine even with large data.frames. Any idea as to what interaction between Rstudio and mclapply causes this? Thanks, Shian On 28 Apr 2020, at 7:29 pm, Simon Urbanek <simon.urbanek at R-project.org<mailto:simon.urbanek at R-project.org>> wrote: Sorry, the code works perfectly fine for me in R even for 1e6 observations (but I was testing with R 4.0.0). Are you using some kind of GUI? Cheers, Simon On 28/04/2020, at 8:11 PM, Shian Su <su.s at wehi.edu.au<mailto:su.s at wehi.edu.au>> wrote: Dear R-devel, I am experiencing issues with running GAM models using mclapply, it fails to return any values if the data input becomes large. For example here the code runs fine with a df of 100 rows, but fails at 1000. library(mgcv) library(parallel) df <- data.frame( + x = 1:100, + y = 1:100 + ) mclapply(1:2, function(i, df) { + fit <- gam(y ~ s(x, bs = "cs"), data = df) + }, + df = df, + mc.cores = 2L + ) [[1]] Family: gaussian Link function: identity Formula: y ~ s(x, bs = "cs") Estimated degrees of freedom: 9 total = 10 GCV score: 0 [[2]] Family: gaussian Link function: identity Formula: y ~ s(x, bs = "cs") Estimated degrees of freedom: 9 total = 10 GCV score: 0 df <- data.frame( + x = 1:1000, + y = 1:1000 + ) mclapply(1:2, function(i, df) { + fit <- gam(y ~ s(x, bs = "cs"), data = df) + }, + df = df, + mc.cores = 2L + ) [[1]] NULL [[2]] NULL There is no error message returned, and the code runs perfectly fine in lapply. I am on a MacBook 15 (2016) running MacOS 10.14.6 (Mojave) and R version 3.6.2. This bug could not be reproduced on my Ubuntu 19.10 running R 3.6.1. Kind regards, Shian Su ---- Shian Su PhD Student, Ritchie Lab 6W, Epigenetics and Development Walter & Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville VIC 3052, Australia _______________________________________________ The information in this email is confidential and intend...{{dropped:26}}
Henrik Bengtsson
2020-Apr-28 16:08 UTC
[Rd] mclapply returns NULLs on MacOS when running GAM
Hi, a few comments below. First, from my experience and troubleshooting similar reports from others, a returned NULL from parallel::mclapply() is often because the corresponding child process crashed/died. However, when this happens you should see a warning, e.g.> y <- parallel::mclapply(1:2, FUN = function(x) if (x == 2) quit("no") else x)Warning message: In parallel::mclapply(1:2, FUN = function(x) if (x == 2) quit("no") else x) : scheduled core 2 did not deliver a result, all values of the job will be affected> str(y)List of 2 $ : int 1 $ : NULL This warning is produces on R 4.0.0 and R 3.6.2 in Linux, but I would assume that warning is also produced on macOS. It's not clear from you message whether you also got that warning or not. Second, forked processing, as used by parallel::mclapply(), is advised against when using the RStudio Console [0]. Unfortunately, there's no way to disable forked processing in R [1]. You could add the following to your ~/.Rprofile startup file: ## Warn when forked processing is used in the RStudio Console if (Sys.getenv("RSTUDIO") == "1" && !nzchar(Sys.getenv("RSTUDIO_TERM"))) { invisible(trace(parallel:::mcfork, tracer quote(warning("parallel::mcfork() was used. Note that forked processes, e.g. parallel::mclapply(), may be unstable when used from the RStudio Console [https://github.com/rstudio/rstudio/issues/2597#issuecomment-482187011]", call.=FALSE)))) } to detect when forked processed is used in the RStudio Console - either by you or by some package code that you use directly or indirectly. You could even use stop() here if you wanna be conservative. [0] https://github.com/rstudio/rstudio/issues/2597#issuecomment-482187011 [1] https://stat.ethz.ch/pipermail/r-devel/2020-January/078896.html /Henrik On Tue, Apr 28, 2020 at 2:39 AM Shian Su <su.s at wehi.edu.au> wrote:> > Yes I am running on Rstudio 1.2.5033. I was also running this code without error on Ubuntu in Rstudio. Checking again on the terminal and it does indeed work fine even with large data.frames. > > Any idea as to what interaction between Rstudio and mclapply causes this? > > Thanks, > Shian > > On 28 Apr 2020, at 7:29 pm, Simon Urbanek <simon.urbanek at R-project.org<mailto:simon.urbanek at R-project.org>> wrote: > > Sorry, the code works perfectly fine for me in R even for 1e6 observations (but I was testing with R 4.0.0). Are you using some kind of GUI? > > Cheers, > Simon > > > On 28/04/2020, at 8:11 PM, Shian Su <su.s at wehi.edu.au<mailto:su.s at wehi.edu.au>> wrote: > > Dear R-devel, > > I am experiencing issues with running GAM models using mclapply, it fails to return any values if the data input becomes large. For example here the code runs fine with a df of 100 rows, but fails at 1000. > > library(mgcv) > library(parallel) > > df <- data.frame( > + x = 1:100, > + y = 1:100 > + ) > > mclapply(1:2, function(i, df) { > + fit <- gam(y ~ s(x, bs = "cs"), data = df) > + }, > + df = df, > + mc.cores = 2L > + ) > [[1]] > > Family: gaussian > Link function: identity > > Formula: > y ~ s(x, bs = "cs") > > Estimated degrees of freedom: > 9 total = 10 > > GCV score: 0 > > [[2]] > > Family: gaussian > Link function: identity > > Formula: > y ~ s(x, bs = "cs") > > Estimated degrees of freedom: > 9 total = 10 > > GCV score: 0 > > > > df <- data.frame( > + x = 1:1000, > + y = 1:1000 > + ) > > mclapply(1:2, function(i, df) { > + fit <- gam(y ~ s(x, bs = "cs"), data = df) > + }, > + df = df, > + mc.cores = 2L > + ) > [[1]] > NULL > > [[2]] > NULL > > There is no error message returned, and the code runs perfectly fine in lapply. > > I am on a MacBook 15 (2016) running MacOS 10.14.6 (Mojave) and R version 3.6.2. This bug could not be reproduced on my Ubuntu 19.10 running R 3.6.1. > > Kind regards, > Shian Su > ---- > Shian Su > PhD Student, Ritchie Lab 6W, Epigenetics and Development > Walter & Eliza Hall Institute of Medical Research > 1G Royal Parade, Parkville VIC 3052, Australia > > > _______________________________________________ > > The information in this email is confidential and inte...{{dropped:6}}
Thanks Henrik, That clears things up significantly. I did see the warning but failed to include it my initial email. It sounds like an RStudio issue, and it seems like that it?s quite intrinsic to how forks interact with RStudio. Given this code is eventually going to be a part of a package, should I expect it to fail mysteriously in RStudio for my users? Is the best solution here to migrate all my parallelism to PSOCK for the foreseeable future? Thanks, Shian> On 29 Apr 2020, at 2:08 am, Henrik Bengtsson <henrik.bengtsson at gmail.com> wrote: > > Hi, a few comments below. > > First, from my experience and troubleshooting similar reports from > others, a returned NULL from parallel::mclapply() is often because the > corresponding child process crashed/died. However, when this happens > you should see a warning, e.g. > >> y <- parallel::mclapply(1:2, FUN = function(x) if (x == 2) quit("no") else x) > Warning message: > In parallel::mclapply(1:2, FUN = function(x) if (x == 2) quit("no") else x) : > scheduled core 2 did not deliver a result, all values of the job > will be affected >> str(y) > List of 2 > $ : int 1 > $ : NULL > > This warning is produces on R 4.0.0 and R 3.6.2 in Linux, but I would > assume that warning is also produced on macOS. It's not clear from > you message whether you also got that warning or not. > > Second, forked processing, as used by parallel::mclapply(), is advised > against when using the RStudio Console [0]. Unfortunately, there's no > way to disable forked processing in R [1]. You could add the > following to your ~/.Rprofile startup file: > > ## Warn when forked processing is used in the RStudio Console > if (Sys.getenv("RSTUDIO") == "1" && !nzchar(Sys.getenv("RSTUDIO_TERM"))) { > invisible(trace(parallel:::mcfork, tracer > quote(warning("parallel::mcfork() was used. Note that forked > processes, e.g. parallel::mclapply(), may be unstable when used from > the RStudio Console > [https://github.com/rstudio/rstudio/issues/2597#issuecomment-482187011]", > call.=FALSE)))) > } > > to detect when forked processed is used in the RStudio Console - > either by you or by some package code that you use directly or > indirectly. You could even use stop() here if you wanna be > conservative. > > [0] https://github.com/rstudio/rstudio/issues/2597#issuecomment-482187011 > [1] https://stat.ethz.ch/pipermail/r-devel/2020-January/078896.html > > /Henrik > > On Tue, Apr 28, 2020 at 2:39 AM Shian Su <su.s at wehi.edu.au> wrote: >> >> Yes I am running on Rstudio 1.2.5033. I was also running this code without error on Ubuntu in Rstudio. Checking again on the terminal and it does indeed work fine even with large data.frames. >> >> Any idea as to what interaction between Rstudio and mclapply causes this? >> >> Thanks, >> Shian >> >> On 28 Apr 2020, at 7:29 pm, Simon Urbanek <simon.urbanek at R-project.org<mailto:simon.urbanek at R-project.org>> wrote: >> >> Sorry, the code works perfectly fine for me in R even for 1e6 observations (but I was testing with R 4.0.0). Are you using some kind of GUI? >> >> Cheers, >> Simon >> >> >> On 28/04/2020, at 8:11 PM, Shian Su <su.s at wehi.edu.au<mailto:su.s at wehi.edu.au>> wrote: >> >> Dear R-devel, >> >> I am experiencing issues with running GAM models using mclapply, it fails to return any values if the data input becomes large. For example here the code runs fine with a df of 100 rows, but fails at 1000. >> >> library(mgcv) >> library(parallel) >> >> df <- data.frame( >> + x = 1:100, >> + y = 1:100 >> + ) >> >> mclapply(1:2, function(i, df) { >> + fit <- gam(y ~ s(x, bs = "cs"), data = df) >> + }, >> + df = df, >> + mc.cores = 2L >> + ) >> [[1]] >> >> Family: gaussian >> Link function: identity >> >> Formula: >> y ~ s(x, bs = "cs") >> >> Estimated degrees of freedom: >> 9 total = 10 >> >> GCV score: 0 >> >> [[2]] >> >> Family: gaussian >> Link function: identity >> >> Formula: >> y ~ s(x, bs = "cs") >> >> Estimated degrees of freedom: >> 9 total = 10 >> >> GCV score: 0 >> >> >> >> df <- data.frame( >> + x = 1:1000, >> + y = 1:1000 >> + ) >> >> mclapply(1:2, function(i, df) { >> + fit <- gam(y ~ s(x, bs = "cs"), data = df) >> + }, >> + df = df, >> + mc.cores = 2L >> + ) >> [[1]] >> NULL >> >> [[2]] >> NULL >> >> There is no error message returned, and the code runs perfectly fine in lapply. >> >> I am on a MacBook 15 (2016) running MacOS 10.14.6 (Mojave) and R version 3.6.2. This bug could not be reproduced on my Ubuntu 19.10 running R 3.6.1. >> >> Kind regards, >> Shian Su >> ---- >> Shian Su >> PhD Student, Ritchie Lab 6W, Epigenetics and Development >> Walter & Eliza Hall Institute of Medical Research >> 1G Royal Parade, Parkville VIC 3052, Australia >> >> >> _______________________________________________ >> >> The information in this email is confidential and intend...{{dropped:26}} >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel_______________________________________________ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. The Walter and Eliza Hall Institute acknowledges the Wurundjeri people of the Kulin Nation as the traditional owners of the land where our campuses are located and the continuing connection to country and community. _______________________________________________