Michael Chirico
2022-Aug-25 02:51 UTC
[Rd] simplify2array assumes non-NA dim() -- base or method bug?
If any component of the dim() returned as c.dim here[1] are missing, simplify2array() errors inscrutably (specifically, because the last && condition is missing): Error in if (higher && length(c.dim <- unique(lapply(x, dim))) == 1 && : missing value where TRUE/FALSE needed At root here is that dim.tbl_lazy ({dbplyr} package method [2]) very intentionally neglects to count the # of rows in the result -- the whole point of a lazy table is to avoid calculating full queries unless specifically requested, so the # of rows is left as missing, i.e., there _is_ a quantity of rows, but the exact number is not known. That seems to me like a proper usage of NA, and hence this is a simplify2array() bug, but I am curious other thoughts here before attempting a patch. [1] https://github.com/r-devel/r-svn/blob/2d4f8c283d53ff2c98d92c7b77b11e652297742c/src/library/base/R/sapply.R#L46-L48 [2] https://github.com/tidyverse/dbplyr/blob/36b146e36d6d9af215dc48e60862d4b807b9e606/R/tbl-lazy.R#L45-L47
Sebastian Meyer
2022-Aug-25 09:55 UTC
[Rd] simplify2array assumes non-NA dim() -- base or method bug?
Hmm. I think I haven't seen NA dims in base R before. R-lang says> The 'dim' attribute is used to implement arrays. The content of > the array is stored in a vector in column-major order and the 'dim' > attribute is a vector of integers specifying the respective extents of > the array. R ensures that the length of the vector is the product of > the lengths of the dimensions.And R-intro says:> A dimension vector is a vector of non-negative integers.I understand "vector of [non-negative] integers" as excluding NA_integer_. Correspondingly:> a <- 1 > dim(a) <- NA_integer_Error in dim(a) <- NA_integer_ : the dims contain missing values> matrix(1, NA_integer_)Error in matrix(1, NA_integer_) : invalid 'nrow' value (too large or NA)> array(1, NA_integer_)Error in array(1, NA_integer_) : negative length vectors are not allowed There must be many more examples where non-NULL dim() is assumed to not contain missing values. simplify2array() aligns with that specification and only needs to check for numeric (non-NULL) dims at this point and not also !anyNA(c.dim). So my take is that there is no bug. Best regards, Sebastian Meyer Am 25.08.22 um 04:51 schrieb Michael Chirico via R-devel:> If any component of the dim() returned as c.dim here[1] are missing, > simplify2array() errors inscrutably (specifically, because the last && > condition is missing): > > Error in if (higher && length(c.dim <- unique(lapply(x, dim))) == 1 && : > missing value where TRUE/FALSE needed > > At root here is that dim.tbl_lazy ({dbplyr} package method [2]) very > intentionally neglects to count the # of rows in the result -- the > whole point of a lazy table is to avoid calculating full queries > unless specifically requested, so the # of rows is left as missing, > i.e., there _is_ a quantity of rows, but the exact number is not > known. > > That seems to me like a proper usage of NA, and hence this is a > simplify2array() bug, but I am curious other thoughts here before > attempting a patch. > > [1] https://github.com/r-devel/r-svn/blob/2d4f8c283d53ff2c98d92c7b77b11e652297742c/src/library/base/R/sapply.R#L46-L48 > [2] https://github.com/tidyverse/dbplyr/blob/36b146e36d6d9af215dc48e60862d4b807b9e606/R/tbl-lazy.R#L45-L47 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel