On Thu, May 2, 2019 at 7:24 PM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:> > On 5/1/19 12:25 AM, Gergely Dar?czi wrote: > > Dear All, > > > > I'm running into issues with calling mccollect on a list containing NULL > > using R 3.6 (this used to work in 3.5.3): > > > > jobs <- lapply( > > list(NULL, 'foobar'), > > function(x) mcparallel(identity(x))) > > mccollect(jobs, wait = FALSE, timeout = 0) > > #> Error in names(res) <- pnames[match(s, pids)] : > > #> 'names' attribute [2] must be the same length as the vector [1] > > > > Note, setting a "name" for the jobs does not help, but the above works with > > "wait=TRUE", and also if I change the order of NULL and "foobar", although > > in that case, the second value (NULL) is ommitted. It also works with > > mclapply fine. > > > > Any ideas/suggestion on how to get mccollect work with the above example?. > > NULL is not a valid job identification. Perhaps mccollect() could give a > clearer error message, but I don't see, given its documentation, what > else than throwing an error it should do. What is the problem you were > trying to solve?Thank you very much for looking into this! What was interesting to me is that it used to work before 3.6 -- I have a script iterating over a list of data frames to train models, but it started to fail with the 3.6 release. The "NULL is not a valid job identification" problem doesn't seem to stand for my production job, as each list element has a proper name, but I think I can reproduce this with this minimal example as well: library(parallel) jobs <- lapply(1:2, function(x) { mcparallel(if (x == 1) NULL else x, name = as.character(x)) }) mccollect(jobs, wait = FALSE, timeout = 2) #> Error in names(res) <- pnames[match(s, pids)] : #> 'names' attribute [1] must be the same length as the vector [0] So the jobs have proper name, but the NULL return value is causing problems. Note, that it only causes problems when the NULL value is the first, eg switching 1 and 2 works, also running this on 1:3 and returning NULL on 2 etc. Now, I'm aware that 7 months ago this was added to the docs at https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4 that NULL should not be returned, so it seems to be a user error on my end, but it seems to fail only when NULL is the first returned element in mccollect, and working OK eg when NULL is the 2nd or other element (although with side effects, eg missed elements). So maybe failing with an explicit error message whenever mccollect hits a NULL for consistency might help here instead of skipping "delivered.result <- delivered.result + 1L" when the returned value is not raw at https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4#diff-e634fbaed323aac88667e7826865b160R72 ? Or even better (at least for my use case), maybe allowing to return NULL and throwing the warning on line 108 in that case. Thanks for considering this, Gergely> > Best > Tomas > > > > > Thanks, > > Gergely > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > >
On 5/3/19 3:04 PM, Gergely Dar?czi wrote:> On Thu, May 2, 2019 at 7:24 PM Tomas Kalibera <tomas.kalibera at gmail.com> wrote: >> On 5/1/19 12:25 AM, Gergely Dar?czi wrote: >>> Dear All, >>> >>> I'm running into issues with calling mccollect on a list containing NULL >>> using R 3.6 (this used to work in 3.5.3): >>> >>> jobs <- lapply( >>> list(NULL, 'foobar'), >>> function(x) mcparallel(identity(x))) >>> mccollect(jobs, wait = FALSE, timeout = 0) >>> #> Error in names(res) <- pnames[match(s, pids)] : >>> #> 'names' attribute [2] must be the same length as the vector [1] >>> >>> Note, setting a "name" for the jobs does not help, but the above works with >>> "wait=TRUE", and also if I change the order of NULL and "foobar", although >>> in that case, the second value (NULL) is ommitted. It also works with >>> mclapply fine. >>> >>> Any ideas/suggestion on how to get mccollect work with the above example?. >> NULL is not a valid job identification. Perhaps mccollect() could give a >> clearer error message, but I don't see, given its documentation, what >> else than throwing an error it should do. What is the problem you were >> trying to solve? > Thank you very much for looking into this! > > What was interesting to me is that it used to work before 3.6 -- I > have a script iterating over a list of data frames to train models, > but it started to fail with the 3.6 release. > > The "NULL is not a valid job identification" problem doesn't seem to > stand for my production job, as each list element has a proper name, > but I think I can reproduce this with this minimal example as well: > > library(parallel) > jobs <- lapply(1:2, function(x) { > mcparallel(if (x == 1) NULL else x, name = as.character(x)) > }) > mccollect(jobs, wait = FALSE, timeout = 2) > #> Error in names(res) <- pnames[match(s, pids)] : > #> 'names' attribute [1] must be the same length as the vector [0] > > So the jobs have proper name, but the NULL return value is causing > problems. Note, that it only causes problems when the NULL value is > the first, eg switching 1 and 2 works, also running this on 1:3 and > returning NULL on 2 etc.Thanks for the clarification, I got confused by that you wrote about "calling mccollect on a list containing NULL", so I was answering a different question. Now I see you actually meant collecting results from jobs that return NULL, and even though that writing jobs that return NULL is incorrect, I see that there is really a bug that causes the R error.> Now, I'm aware that 7 months ago this was added to the docs at > https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4 > that NULL should not be returned, so it seems to be a user error on my > end, but it seems to fail only when NULL is the first returned element > in mccollect, and working OK eg when NULL is the 2nd or other element > (although with side effects, eg missed elements).NULL has been reserved even well before that documentation update, and if it somewhat worked as a result, it had been a coincidence. The documentation change was only to make it more obvious, but it already said "?mccollect? will return ?NULL? for a terminating job that has sent its results already after which the job is no longer available."> So maybe failing with an explicit error message whenever mccollect > hits a NULL for consistency might help here instead of skipping > "delivered.result <- delivered.result + 1L" when the returned value is > not raw at https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4#diff-e634fbaed323aac88667e7826865b160R72 > ? Or even better (at least for my use case), maybe allowing to return > NULL and throwing the warning on line 108 in that case.Thanks for the report, I've fixed mccollect() in R-devel to return NULL from the job. To be ported to R-pathed unless problems appear. Best Tomas> > Thanks for considering this, > Gergely> >> Best >> Tomas >> >>> Thanks, >>> Gergely >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>
On Fri, May 3, 2019 at 4:34 PM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:> > On 5/3/19 3:04 PM, Gergely Dar?czi wrote: > > On Thu, May 2, 2019 at 7:24 PM Tomas Kalibera <tomas.kalibera at gmail.com> wrote: > >> On 5/1/19 12:25 AM, Gergely Dar?czi wrote: > >>> Dear All, > >>> > >>> I'm running into issues with calling mccollect on a list containing NULL > >>> using R 3.6 (this used to work in 3.5.3): > >>> > >>> jobs <- lapply( > >>> list(NULL, 'foobar'), > >>> function(x) mcparallel(identity(x))) > >>> mccollect(jobs, wait = FALSE, timeout = 0) > >>> #> Error in names(res) <- pnames[match(s, pids)] : > >>> #> 'names' attribute [2] must be the same length as the vector [1] > >>> > >>> Note, setting a "name" for the jobs does not help, but the above works with > >>> "wait=TRUE", and also if I change the order of NULL and "foobar", although > >>> in that case, the second value (NULL) is ommitted. It also works with > >>> mclapply fine. > >>> > >>> Any ideas/suggestion on how to get mccollect work with the above example?. > >> NULL is not a valid job identification. Perhaps mccollect() could give a > >> clearer error message, but I don't see, given its documentation, what > >> else than throwing an error it should do. What is the problem you were > >> trying to solve? > > Thank you very much for looking into this! > > > > What was interesting to me is that it used to work before 3.6 -- I > > have a script iterating over a list of data frames to train models, > > but it started to fail with the 3.6 release. > > > > The "NULL is not a valid job identification" problem doesn't seem to > > stand for my production job, as each list element has a proper name, > > but I think I can reproduce this with this minimal example as well: > > > > library(parallel) > > jobs <- lapply(1:2, function(x) { > > mcparallel(if (x == 1) NULL else x, name = as.character(x)) > > }) > > mccollect(jobs, wait = FALSE, timeout = 2) > > #> Error in names(res) <- pnames[match(s, pids)] : > > #> 'names' attribute [1] must be the same length as the vector [0] > > > > So the jobs have proper name, but the NULL return value is causing > > problems. Note, that it only causes problems when the NULL value is > > the first, eg switching 1 and 2 works, also running this on 1:3 and > > returning NULL on 2 etc. > Thanks for the clarification, I got confused by that you wrote about > "calling mccollect on a list containing NULL", so I was answering a > different question. Now I see you actually meant collecting results from > jobs that return NULL, and even though that writing jobs that return > NULL is incorrect, I see that there is really a bug that causes the R error. > > Now, I'm aware that 7 months ago this was added to the docs at > > https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4 > > that NULL should not be returned, so it seems to be a user error on my > > end, but it seems to fail only when NULL is the first returned element > > in mccollect, and working OK eg when NULL is the 2nd or other element > > (although with side effects, eg missed elements). > NULL has been reserved even well before that documentation update, and > if it somewhat worked as a result, it had been a coincidence. The > documentation change was only to make it more obvious, but it already > said "?mccollect? will return ?NULL? for a terminating job that has sent > its results already after which the job is no longer available." > > So maybe failing with an explicit error message whenever mccollect > > hits a NULL for consistency might help here instead of skipping > > "delivered.result <- delivered.result + 1L" when the returned value is > > not raw at https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4#diff-e634fbaed323aac88667e7826865b160R72 > > ? Or even better (at least for my use case), maybe allowing to return > > NULL and throwing the warning on line 108 in that case. > > Thanks for the report, I've fixed mccollect() in R-devel to return NULL > from the job. To be ported to R-pathed unless problems appear.Thank you very much for the great news and your work!> > Best > Tomas > > > > > Thanks for considering this, > > Gergely > > > > >> Best > >> Tomas > >> > >>> Thanks, > >>> Gergely > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-devel at r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >> >