Hi all, The call parallel::makeCluster(1L) hangs infinitely on my MacOS machine which seems to be already reported by some people (e.g., https://stat.ethz.ch/pipermail/r-devel/2018-February/075565.html). However, the solutions posted on SO, GH or R-devel do not work in my case. So far, I unsuccessfully tested ? 1. Couple of reboots 2. Adding 192.0.0.1 to /etc/hosts 3. Using R.app instead of RStudio.app 4. Turn off the firewall Following Hendriks advice, ?cl <- future::makeClusterPSOCK(1L, verbose = TRUE, timeout = 60)? gives (note: without adding the timeout parameter, R just hangs):> Sys.setenv(LANGUAGE='en') > cl <- future::makeClusterPSOCK(1L, verbose = TRUE, timeout = 60)[local output] Workers: [n = 1] ?localhost? [local output] Base port: 11867 [local output] Creating node 1 of 1 ... [local output] - setting up node Testing if worker's PID can be inferred: ?'/Library/Frameworks/R.framework/Resources/bin/Rscript' -e 'try(cat(Sys.getpid(),file="/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid"), silent = TRUE)' -e "file.exists('/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid')"? - Possible to infer worker's PID: TRUE [local output] Starting worker #1 on ?localhost?: '/Library/Frameworks/R.framework/Resources/bin/Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'try(cat(Sys.getpid(),file="/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid"), silent = TRUE)' -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11867 OUT=/dev/null TIMEOUT=60 XDR=TRUE [local output] - Exit code of system() call: 0 [local output] Waiting for worker #1 on ?localhost? to connect back [local output] Detected a warning from socketConnection(): ?problem in listening on this socket? Killing worker process (PID 903) if still alive Worker (PID 903) was successfully killed: TRUE Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : Failed to launch and connect to R worker on local machine ?localhost? from local machine ?Dominiks-MBP.local?. * The error produced by socketConnection() was: ?cannot open the connection? * In addition, socketConnection() produced 1 warning(s): - Warning #1: ?problem in listening on this socket? * The localhost socket connection that failed to connect to the R worker used port 11867 using a communication timeout of 60 seconds and a connection timeout of 120 seconds. * Worker launch call: '/Library/Frameworks/R.framework/Resources/bin/Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'try(cat(Sys.getpid(),file="/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid"), silent = TRUE)' -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11867 OUT=/dev/null TIMEOUT=60 XDR=TRUE. * Worker (PID 903) was successfully killed: TRUE * Troubleshooting suggestions: - Suggestion #1: Set 'outfile=NULL' to see output from worker. In addition: Warning message: In socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : problem in listening on this socket My session looks like:> sessionInfo()R version 3.6.0 (2019-04-26) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.5 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding locale: [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.6.0>Any help is greatly appreciated. Best regards Dominik Dr. Dominik Leutnant Muenster University of Applied Sciences Department of Civil Engineering Institute for Infrastucture?Water?Resources?Environment (IWARU) WG Urban Hydrology and Water Management Corrensstr. 25 FRG-48149 M?nster, Germany Tel.: +49 (0) 251/83-65274 Fax: +49 (0) 251/83-65915 Mail: leutnant at fh-muenster.de<mailto:leutnant at fh-muenster.de> Web: https://www.fh-muenster.de/ [[alternative HTML version deleted]]
Hi Dominik, from the output, the master process could not "listen" on the port where it expects a connection from the worker. We need to find out why. I'd recommend first to create a minimal reproducible example (and one that does not use future, only parallel, and a minimal number of threads, ideally just 2). Then I'd recommend to check if the problem still exists with R-devel. Then I'd check if the problem happens in all invocations, even after reboots, on a clean system, without many running applications - if it does, this is good news. Then you could post such example and we could help more - if we can reproduce on our system indeed we could debug, if not there could at least be more directed advice on how to debug on your side. What I'd do myself if I could reproduce on my system would be instrument R around Sock_listen in internet module to see exactly what has failed with which error. Maybe dtruss would help too, but instrumenting may be easier. The earlier problem you mention has never been diagnosed (it was only intermittent on the reporter's machine, we could not reproduce on our systems, and despite a lot of effort on our side and on the reporter's, we could not reliably diagnose). In principle, it could be some race condition in R (one has been fixed since the previous report), but especially if it is deterministic it would more likely be some OS limit on your system. You could of course try playing with OS limits, on the number of open files, etc, with changing the port number (port= option), etc, but I would recommend the systematic approach of debugging the cause. Best Tomas On 6/4/19 10:45 AM, Dominik Leutnant wrote:> Hi all, > > The call parallel::makeCluster(1L) hangs infinitely on my MacOS machine which seems to be already reported by some people (e.g., https://stat.ethz.ch/pipermail/r-devel/2018-February/075565.html). > However, the solutions posted on SO, GH or R-devel do not work in my case. > > So far, I unsuccessfully tested ? > > 1. Couple of reboots > 2. Adding 192.0.0.1 to /etc/hosts > 3. Using R.app instead of RStudio.app > 4. Turn off the firewall > > Following Hendriks advice, ?cl <- future::makeClusterPSOCK(1L, verbose = TRUE, timeout = 60)? gives (note: without adding the timeout parameter, R just hangs): >> Sys.setenv(LANGUAGE='en') >> cl <- future::makeClusterPSOCK(1L, verbose = TRUE, timeout = 60) > [local output] Workers: [n = 1] ?localhost? > [local output] Base port: 11867 > [local output] Creating node 1 of 1 ... > [local output] - setting up node > Testing if worker's PID can be inferred: ?'/Library/Frameworks/R.framework/Resources/bin/Rscript' -e 'try(cat(Sys.getpid(),file="/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid"), silent = TRUE)' -e "file.exists('/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid')"? > - Possible to infer worker's PID: TRUE > [local output] Starting worker #1 on ?localhost?: '/Library/Frameworks/R.framework/Resources/bin/Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'try(cat(Sys.getpid(),file="/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid"), silent = TRUE)' -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11867 OUT=/dev/null TIMEOUT=60 XDR=TRUE > [local output] - Exit code of system() call: 0 > [local output] Waiting for worker #1 on ?localhost? to connect back > [local output] Detected a warning from socketConnection(): ?problem in listening on this socket? > Killing worker process (PID 903) if still alive > Worker (PID 903) was successfully killed: TRUE > Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : > Failed to launch and connect to R worker on local machine ?localhost? from local machine ?Dominiks-MBP.local?. > * The error produced by socketConnection() was: ?cannot open the connection? > * In addition, socketConnection() produced 1 warning(s): > - Warning #1: ?problem in listening on this socket? > * The localhost socket connection that failed to connect to the R worker used port 11867 using a communication timeout of 60 seconds and a connection timeout of 120 seconds. > * Worker launch call: '/Library/Frameworks/R.framework/Resources/bin/Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'try(cat(Sys.getpid(),file="/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid"), silent = TRUE)' -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11867 OUT=/dev/null TIMEOUT=60 XDR=TRUE. > * Worker (PID 903) was successfully killed: TRUE > * Troubleshooting suggestions: > - Suggestion #1: Set 'outfile=NULL' to see output from worker. > In addition: Warning message: > In socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : > problem in listening on this socket > > My session looks like: >> sessionInfo() > R version 3.6.0 (2019-04-26) > Platform: x86_64-apple-darwin15.6.0 (64-bit) > Running under: macOS Mojave 10.14.5 > > Matrix products: default > BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib > LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib > > Random number generation: > RNG: Mersenne-Twister > Normal: Inversion > Sample: Rounding > > locale: > [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.6.0 > Any help is greatly appreciated. > Best regards > Dominik > > Dr. Dominik Leutnant > > Muenster University of Applied Sciences > Department of Civil Engineering > Institute for Infrastucture?Water?Resources?Environment (IWARU) > WG Urban Hydrology and Water Management > Corrensstr. 25 > FRG-48149 M?nster, Germany > > Tel.: +49 (0) 251/83-65274 > Fax: +49 (0) 251/83-65915 > Mail: leutnant at fh-muenster.de<mailto:leutnant at fh-muenster.de> > Web: https://www.fh-muenster.de/ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Hi Thomas, thanks for your reply (and thanks for your patience...). I am now using the following minimal reprex:> library(parallel) > cl <- makeCluster(2L)I freshly started the machine and did not open any other app. Just R.app (3.6.1). After executing the second line of code, R seems to hang infinitely and does not respond. The R process itself uses almost no CPU. Unfortunately, I do not have any experience with neither "Sock_listen" nor "dtruss". Is there an example somewhere available? Best Dominik ?Am 05.06.19, 10:18 schrieb "Tomas Kalibera" <tomas.kalibera at gmail.com>: Hi Dominik, from the output, the master process could not "listen" on the port where it expects a connection from the worker. We need to find out why. I'd recommend first to create a minimal reproducible example (and one that does not use future, only parallel, and a minimal number of threads, ideally just 2). Then I'd recommend to check if the problem still exists with R-devel. Then I'd check if the problem happens in all invocations, even after reboots, on a clean system, without many running applications - if it does, this is good news. Then you could post such example and we could help more - if we can reproduce on our system indeed we could debug, if not there could at least be more directed advice on how to debug on your side. What I'd do myself if I could reproduce on my system would be instrument R around Sock_listen in internet module to see exactly what has failed with which error. Maybe dtruss would help too, but instrumenting may be easier. The earlier problem you mention has never been diagnosed (it was only intermittent on the reporter's machine, we could not reproduce on our systems, and despite a lot of effort on our side and on the reporter's, we could not reliably diagnose). In principle, it could be some race condition in R (one has been fixed since the previous report), but especially if it is deterministic it would more likely be some OS limit on your system. You could of course try playing with OS limits, on the number of open files, etc, with changing the port number (port= option), etc, but I would recommend the systematic approach of debugging the cause. Best Tomas On 6/4/19 10:45 AM, Dominik Leutnant wrote: > Hi all, > > The call parallel::makeCluster(1L) hangs infinitely on my MacOS machine which seems to be already reported by some people (e.g., https://stat.ethz.ch/pipermail/r-devel/2018-February/075565.html). > However, the solutions posted on SO, GH or R-devel do not work in my case. > > So far, I unsuccessfully tested ? > > 1. Couple of reboots > 2. Adding 192.0.0.1 to /etc/hosts > 3. Using R.app instead of RStudio.app > 4. Turn off the firewall > > Following Hendriks advice, ?cl <- future::makeClusterPSOCK(1L, verbose = TRUE, timeout = 60)? gives (note: without adding the timeout parameter, R just hangs): >> Sys.setenv(LANGUAGE='en') >> cl <- future::makeClusterPSOCK(1L, verbose = TRUE, timeout = 60) > [local output] Workers: [n = 1] ?localhost? > [local output] Base port: 11867 > [local output] Creating node 1 of 1 ... > [local output] - setting up node > Testing if worker's PID can be inferred: ?'/Library/Frameworks/R.framework/Resources/bin/Rscript' -e 'try(cat(Sys.getpid(),file="/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid"), silent = TRUE)' -e "file.exists('/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid')"? > - Possible to infer worker's PID: TRUE > [local output] Starting worker #1 on ?localhost?: '/Library/Frameworks/R.framework/Resources/bin/Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'try(cat(Sys.getpid(),file="/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid"), silent = TRUE)' -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11867 OUT=/dev/null TIMEOUT=60 XDR=TRUE > [local output] - Exit code of system() call: 0 > [local output] Waiting for worker #1 on ?localhost? to connect back > [local output] Detected a warning from socketConnection(): ?problem in listening on this socket? > Killing worker process (PID 903) if still alive > Worker (PID 903) was successfully killed: TRUE > Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : > Failed to launch and connect to R worker on local machine ?localhost? from local machine ?Dominiks-MBP.local?. > * The error produced by socketConnection() was: ?cannot open the connection? > * In addition, socketConnection() produced 1 warning(s): > - Warning #1: ?problem in listening on this socket? > * The localhost socket connection that failed to connect to the R worker used port 11867 using a communication timeout of 60 seconds and a connection timeout of 120 seconds. > * Worker launch call: '/Library/Frameworks/R.framework/Resources/bin/Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'try(cat(Sys.getpid(),file="/var/folders/5s/kgm05t2s0_52gz1s445mnlgw0000gn/T//RtmpZp1RX6/future.parent=835.3434fe0c5c6.pid"), silent = TRUE)' -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11867 OUT=/dev/null TIMEOUT=60 XDR=TRUE. > * Worker (PID 903) was successfully killed: TRUE > * Troubleshooting suggestions: > - Suggestion #1: Set 'outfile=NULL' to see output from worker. > In addition: Warning message: > In socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : > problem in listening on this socket > > My session looks like: >> sessionInfo() > R version 3.6.0 (2019-04-26) > Platform: x86_64-apple-darwin15.6.0 (64-bit) > Running under: macOS Mojave 10.14.5 > > Matrix products: default > BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib > LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib > > Random number generation: > RNG: Mersenne-Twister > Normal: Inversion > Sample: Rounding > > locale: > [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.6.0 > Any help is greatly appreciated. > Best regards > Dominik > > Dr. Dominik Leutnant > > Muenster University of Applied Sciences > Department of Civil Engineering > Institute for Infrastucture?Water?Resources?Environment (IWARU) > WG Urban Hydrology and Water Management > Corrensstr. 25 > FRG-48149 M?nster, Germany > > Tel.: +49 (0) 251/83-65274 > Fax: +49 (0) 251/83-65915 > Mail: leutnant at fh-muenster.de<mailto:leutnant at fh-muenster.de> > Web: https://www.fh-muenster.de/ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Possibly Parallel Threads
- MacOS parallel::makeCluster fails
- MacOS parallel::makeCluster fails
- SUGGESTION: Proposal to mitigate problem with stray processes left behind by parallel::makeCluster()
- parallel:::newPSOCKnode(): background worker fails immediately if socket on master is not set up in time (BUG?)
- parallel:::newPSOCKnode(): background worker fails immediately if socket on master is not set up in time (BUG?)