Hi, I wondered about the behavior described in the following stackoverflow question: https://stackoverflow.com/questions/20674538/mclapply-returns-null-randomly More specifically, I would like to know if you ever considered the suggestion made in the comments of the first answer, namely to somehow warn the user if one of the processes has been killed by the out-of-memory killer ? I am always surprised to see the random NULLs without message/warning/error of any kind, and I think that it could be a useful feature to know whether the function executed by mclapply returned a NULL or if the process was killed for some reason. In the following gist, I have an example of this (in this case non-random) behavior: https://gist.github.com/tvatter/2fcf3a9a99c256f9b9360f596b300715 For the record, I generate the list of NULLs in the 4th mclapply in the girst above with a late 2013 macbook pro with macOS High Sierra, 16GB of memory, and my sessionInfo() is: R version 3.5.0 (2018-04-23) Platform: x86_64-apple-darwin16.7.0 (64-bit) Running under: macOS High Sierra 10.13.6 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19 ------------------------------------------------------------ Thibault Vatter Department of Statistics Columbia University [[alternative HTML version deleted]]
Hi Thibault, mclapply has been designed to signal an error in two ways. User code errors are returned as special objects (of class "try-error") in the respective element of the result list. All other errors (including a process killed) are returned as NULL in the respective elements of the result list. To detect these errors reliably, one needs to implement FUN so that it never returns NULL normally (also it cannot return a raw vector). This is how mclapply was designed and implemented (and also mccollect, etc). It may be surprising to see multiple NULL elements when a single process is killed, but this is expected with pre-scheduling when that process has been tasked to compute multiple elements. To make this API more user friendly, I've added a warning that is now emitted when a job does not deliver a result (that is, when a vector element is NULL because of such error). I've also made it more explicit in the documentation that NULL signals an error. Best, Tomas On 07/26/2018 08:37 PM, Thibault Vatter wrote:> Hi, > > I wondered about the behavior described in the following stackoverflow > question: > > https://stackoverflow.com/questions/20674538/mclapply-returns-null-randomly > > More specifically, I would like to know if you ever considered the > suggestion made in the comments of the first answer, namely to somehow warn > the user if one of the processes has been killed by the out-of-memory > killer ? > > I am always surprised to see the random NULLs without message/warning/error > of any kind, and I think that it could be a useful feature to know whether > the function executed by mclapply returned a NULL or if the process was > killed for some reason. > > In the following gist, I have an example of this (in this case non-random) > behavior: > > https://gist.github.com/tvatter/2fcf3a9a99c256f9b9360f596b300715 > > For the record, I generate the list of NULLs in the 4th mclapply in the > girst above with a late 2013 macbook pro with macOS High Sierra, 16GB of > memory, and my sessionInfo() is: > > R version 3.5.0 (2018-04-23) > Platform: x86_64-apple-darwin16.7.0 (64-bit) > Running under: macOS High Sierra 10.13.6 > > Matrix products: default > BLAS: > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib > LAPACK: > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > loaded via a namespace (and not attached): > [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19 > > ------------------------------------------------------------ > Thibault Vatter > Department of Statistics > Columbia University > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Hi Tomas, Thanks a lot for the explanation and the changes. The update in the documentation is especially helpful. Best, Thibault On Thu, Oct 18, 2018 at 10:48 AM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:> > Hi Thibault, > > mclapply has been designed to signal an error in two ways. User code > errors are returned as special objects (of class "try-error") in the > respective element of the result list. All other errors (including a > process killed) are returned as NULL in the respective elements of the > result list. To detect these errors reliably, one needs to implement FUN > so that it never returns NULL normally (also it cannot return a raw > vector). This is how mclapply was designed and implemented (and also > mccollect, etc). It may be surprising to see multiple NULL elements when > a single process is killed, but this is expected with pre-scheduling > when that process has been tasked to compute multiple elements. > > To make this API more user friendly, I've added a warning that is now > emitted when a job does not deliver a result (that is, when a vector > element is NULL because of such error). I've also made it more explicit > in the documentation that NULL signals an error. > > Best, > Tomas > > > On 07/26/2018 08:37 PM, Thibault Vatter wrote: > > Hi, > > > > I wondered about the behavior described in the following stackoverflow > > question: > > > > > https://stackoverflow.com/questions/20674538/mclapply-returns-null-randomly > > > > More specifically, I would like to know if you ever considered the > > suggestion made in the comments of the first answer, namely to somehow > warn > > the user if one of the processes has been killed by the out-of-memory > > killer ? > > > > I am always surprised to see the random NULLs without > message/warning/error > > of any kind, and I think that it could be a useful feature to know > whether > > the function executed by mclapply returned a NULL or if the process was > > killed for some reason. > > > > In the following gist, I have an example of this (in this case > non-random) > > behavior: > > > > https://gist.github.com/tvatter/2fcf3a9a99c256f9b9360f596b300715 > > > > For the record, I generate the list of NULLs in the 4th mclapply in the > > girst above with a late 2013 macbook pro with macOS High Sierra, 16GB of > > memory, and my sessionInfo() is: > > > > R version 3.5.0 (2018-04-23) > > Platform: x86_64-apple-darwin16.7.0 (64-bit) > > Running under: macOS High Sierra 10.13.6 > > > > Matrix products: default > > BLAS: > > > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib > > LAPACK: > > > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib > > > > locale: > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > base > > > > loaded via a namespace (and not attached): > > [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19 > > > > ------------------------------------------------------------ > > Thibault Vatter > > Department of Statistics > > Columbia University > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > >[[alternative HTML version deleted]]