Martin
2020-Oct-17 19:18 UTC
[Rd] sum() (and similar methods) should work for zero row data.frames
The "Summary" group generics always throw errors for a data.frame with zero rows, for example:> sum(data.frame(x = numeric(0)))#> Error in FUN(X[[i]], ...) : #> only defined on a data frame with all numeric variables Same behaviour for min, max, any, all, ... . I believe this is inconsistent with what these methods do for other empty objects (vectors, matrices), where the return value is chosen to ensure transitivity: sum(numeric(0)) == 0. The reason for this is that the return type of as.matrix() for empty (no rows or no columns) data.frame objects is always a matrix of type "logical". The Summary method for data.frame, in turn, throws an error when the data.frame, converted to a matrix, is not of numeric type. I suggest two ways that make sum, min, max, ... more consistent. IMHO it would be fitting to implement both of these fixes, because they also make other things more consistent. 1. Make the return type of as.matrix() for zero-row data.frames consistent with the type that would have been returned, had the data.frame had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should then be numeric, if there is an empty "character" column the return matrix should be a character etc. This would make subsetting by row and conversion to matrix commute (except for row names sometimes):> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , drop = FALSE])Furthermore, this change would make as.matrix.data.frame obey the documentation, which indicates that the coercion hierarchy is used for the return type. 2. Make the Summary.data.frame method accept data.frames that produce non-numeric matrices. Next to the main focus of this message, I believe it would e.g. be fitting to have any() and all() work on logical data.frame objects. The current behaviour is such that> any(data.frame(x = 1))#> [1] TRUE #> Warning message: #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to logical and> any(data.frame(x = TRUE))#> Error in FUN(X[[i]], ...) : #> only defined on a data frame with all numeric variables So a numeric data.frame warns about implicit coercion, while a logical data.frame (which would not need coercion) does not work at all. (I feel more strongly about fixing 1. than 2., because I don't know the discussion that lead to the behaviour described in 2.) Best, Martin
peter dalgaard
2020-Oct-18 07:19 UTC
[Rd] sum() (and similar methods) should work for zero row data.frames
Hmm, yes, this is probably wrong. E.g., we are likely to get inconsistencies out of boundary cases like this> a <- na.omit(airquality) > sum(a)[1] 37495.3> sum(a[FALSE,])Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables Or, closer to an actual use case:> sum(subset(a, Ozone>100))[1] 3330.5> sum(subset(a, Ozone>200))Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables However, given that numeric summaries generally treat logicals as 0/1, wouldn't it be easiest just to extend the check inside Summary.data.frame with "&& !is.logical(x)"?> sum(as.matrix(a[FALSE,]))[1] 0 -pd> On 17 Oct 2020, at 21:18 , Martin <rdev at mb706.com> wrote: > > The "Summary" group generics always throw errors for a data.frame with zero rows, for example: >> sum(data.frame(x = numeric(0))) > #> Error in FUN(X[[i]], ...) : > #> only defined on a data frame with all numeric variables > Same behaviour for min, max, any, all, ... . I believe this is inconsistent with what these methods do for other empty objects (vectors, matrices), where the return value is chosen to ensure transitivity: sum(numeric(0)) == 0. > > The reason for this is that the return type of as.matrix() for empty (no rows or no columns) data.frame objects is always a matrix of type "logical". The Summary method for data.frame, in turn, throws an error when the data.frame, converted to a matrix, is not of numeric type. > > I suggest two ways that make sum, min, max, ... more consistent. IMHO it would be fitting to implement both of these fixes, because they also make other things more consistent. > > 1. Make the return type of as.matrix() for zero-row data.frames consistent with the type that would have been returned, had the data.frame had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should then be numeric, if there is an empty "character" column the return matrix should be a character etc. This would make subsetting by row and conversion to matrix commute (except for row names sometimes): >> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , drop = FALSE]) > Furthermore, this change would make as.matrix.data.frame obey the documentation, which indicates that the coercion hierarchy is used for the return type. > > 2. Make the Summary.data.frame method accept data.frames that produce non-numeric matrices. Next to the main focus of this message, I believe it would e.g. be fitting to have any() and all() work on logical data.frame objects. The current behaviour is such that >> any(data.frame(x = 1)) > #> [1] TRUE > #> Warning message: > #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to logical > and >> any(data.frame(x = TRUE)) > #> Error in FUN(X[[i]], ...) : > #> only defined on a data frame with all numeric variables > So a numeric data.frame warns about implicit coercion, while a logical data.frame (which would not need coercion) does not work at all. > > (I feel more strongly about fixing 1. than 2., because I don't know the discussion that lead to the behaviour described in 2.) > > Best, > Martin > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Gabriel Becker
2020-Oct-18 19:49 UTC
[Rd] sum() (and similar methods) should work for zero row data.frames
Peter et al, I had the same thought, in particular for any() and all(), which in as much as they should work on data.frames in the first place (which to be perfectly honest i do find quite debatable myself), should certainly work on "logical" data.frames if they are going to work on "numeric" ones. I can volunteer to prepare a patch if Martin (the reporter) did not want to take a crack at it, and further if it is not already being done within R-core. Best, ~G On Sun, Oct 18, 2020 at 12:19 AM peter dalgaard <pdalgd at gmail.com> wrote:> Hmm, yes, this is probably wrong. E.g., we are likely to get > inconsistencies out of boundary cases like this > > > a <- na.omit(airquality) > > sum(a) > [1] 37495.3 > > sum(a[FALSE,]) > Error in FUN(X[[i]], ...) : > only defined on a data frame with all numeric variables > > Or, closer to an actual use case: > > > sum(subset(a, Ozone>100)) > [1] 3330.5 > > sum(subset(a, Ozone>200)) > Error in FUN(X[[i]], ...) : > only defined on a data frame with all numeric variables > > > However, given that numeric summaries generally treat logicals as 0/1, > wouldn't it be easiest just to extend the check inside Summary.data.frame > with "&& !is.logical(x)"? > > > sum(as.matrix(a[FALSE,])) > [1] 0 > > -pd > > > On 17 Oct 2020, at 21:18 , Martin <rdev at mb706.com> wrote: > > > > The "Summary" group generics always throw errors for a data.frame with > zero rows, for example: > >> sum(data.frame(x = numeric(0))) > > #> Error in FUN(X[[i]], ...) : > > #> only defined on a data frame with all numeric variables > > Same behaviour for min, max, any, all, ... . I believe this is > inconsistent with what these methods do for other empty objects (vectors, > matrices), where the return value is chosen to ensure transitivity: > sum(numeric(0)) == 0. > > > > The reason for this is that the return type of as.matrix() for empty (no > rows or no columns) data.frame objects is always a matrix of type > "logical". The Summary method for data.frame, in turn, throws an error when > the data.frame, converted to a matrix, is not of numeric type. > > > > I suggest two ways that make sum, min, max, ... more consistent. IMHO it > would be fitting to implement both of these fixes, because they also make > other things more consistent. > > > > 1. Make the return type of as.matrix() for zero-row data.frames > consistent with the type that would have been returned, had the data.frame > had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should > then be numeric, if there is an empty "character" column the return matrix > should be a character etc. This would make subsetting by row and conversion > to matrix commute (except for row names sometimes): > >> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , > drop = FALSE]) > > Furthermore, this change would make as.matrix.data.frame obey the > documentation, which indicates that the coercion hierarchy is used for the > return type. > > > > 2. Make the Summary.data.frame method accept data.frames that produce > non-numeric matrices. Next to the main focus of this message, I believe it > would e.g. be fitting to have any() and all() work on logical data.frame > objects. The current behaviour is such that > >> any(data.frame(x = 1)) > > #> [1] TRUE > > #> Warning message: > > #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to > logical > > and > >> any(data.frame(x = TRUE)) > > #> Error in FUN(X[[i]], ...) : > > #> only defined on a data frame with all numeric variables > > So a numeric data.frame warns about implicit coercion, while a logical > data.frame (which would not need coercion) does not work at all. > > > > (I feel more strongly about fixing 1. than 2., because I don't know the > discussion that lead to the behaviour described in 2.) > > > > Best, > > Martin > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Maybe Matching Threads
- sum() (and similar methods) should work for zero row data.frames
- sum() (and similar methods) should work for zero row data.frames
- sum() (and similar methods) should work for zero row data.frames
- sum() (and similar methods) should work for zero row data.frames
- sum() vs cumsum() implicit type coercion