Hilmar Berger
2019-Sep-04 13:25 UTC
[Rd] '==' operator: inconsistency in data.frame(...) == NULL
Dear all, I just stumbled upon some behavior of the == operator which is at least somewhat inconsistent. R version 3.6.1 (2019-07-05) -- "Action of the Toes" Copyright (C) 2019 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) > list(a=1:3, b=LETTERS[1:3]) == NULL logical(0) > matrix(1:6, 2,3) == NULL logical(0) > data.frame(a=1:3, b=LETTERS[1:3]) == NULL # same for == logical(0) Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn,? : ? length of 'dimnames' [2] not equal to array extent > data.frame(NULL) == 1 <0 x 0 matrix> > data.frame(NULL) == NULL <0 x 0 matrix> > data.frame(NULL) == logical(0) <0 x 0 matrix> I wonder if data.frame(<some non-empty data>) == NULL should also return a value instead of an error. R help reads: "At least one of |x| and |y| must be an atomic vector, but if the other is a list *R* attempts to coerce it to the type of the atomic vector: this will succeed if the list is made up of elements of length one that can be coerced to the correct type. If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw." It is not clear from the help what to expect for NULL or empty atomic vectors. It is also strange that for list() there is no error but for data.frame() with the same data an error is thrown. I can see that there might be reasons to return logical(0) instead of FALSE, but I do not fully understand why there should be differences between e.g. matrix() and data.frame(). Also, It is at least somewhat strange that data.frame(NULL) == NULL and similar expressions return an empty matrix, while comparing a normal filled matrix to NULL returns logical(0). Even if this behavior is expected, the error message shown by data.frame(...) == NULL is not very informative. Thanks and best regards, Hilmar [[alternative HTML version deleted]]
Martin Maechler
2019-Sep-11 07:56 UTC
[Rd] '==' operator: inconsistency in data.frame(...) == NULL
>>>>> Hilmar Berger >>>>> on Wed, 4 Sep 2019 15:25:46 +0200 writes:> Dear all, > I just stumbled upon some behavior of the == operator which is at least > somewhat inconsistent. > R version 3.6.1 (2019-07-05) -- "Action of the Toes" > Copyright (C) 2019 The R Foundation for Statistical Computing > Platform: x86_64-w64-mingw32/x64 (64-bit) >> list(a=1:3, b=LETTERS[1:3]) == NULL > logical(0) >> matrix(1:6, 2,3) == NULL > logical(0) >> data.frame(a=1:3, b=LETTERS[1:3]) == NULL # same for == logical(0) > Error in matrix(if (is.null(value)) logical() else value, nrow = nr, > dimnames = list(rn,? : > ? length of 'dimnames' [2] not equal to array extent >> data.frame(NULL) == 1 > <0 x 0 matrix> >> data.frame(NULL) == NULL > <0 x 0 matrix> >> data.frame(NULL) == logical(0) > <0 x 0 matrix> > I wonder if data.frame(<some non-empty data>) == NULL should also return > a value instead of an error. R help reads: > "At least one of |x| and |y| must be an atomic vector, but > if the other is a list R attempts to coerce it to the > type of the atomic vector: this will succeed if the list > is made up of elements of length one that can be coerced > to the correct type. > If the two arguments are atomic vectors of different > types, one is coerced to the type of the other, the > (decreasing) order of precedence being character, complex, > numeric, integer, logical and raw." > It is not clear from the help what to expect for NULL or > empty atomic vectors. Well, strictly speaking an error would be expected for NULL, as it is *not* an atomic vector, and your main issue " data.frame(..) == NULL " would already be settled by the first half sentence from the doc, and strictly speaking, even data.frame(NULL) == NULL "should" return an error ((Note: I'm not saying it really should, but at least the reference does not say it should work at all)) Now, logical(0) on the other hand *is* an atomic vector ... > It is also strange that for list() > there is no error but for data.frame() with the same data > an error is thrown. I can see that there might be reasons > to return logical(0) instead of FALSE, but I do not fully > understand why there should be differences between > e.g. matrix() and data.frame(). Well, a [regular base R] matrix() is atomic and a data frame is not. > Also, It is at least somewhat strange that > data.frame(NULL) == NULL and similar expressions return an > empty matrix, while comparing a normal filled matrix to > NULL returns logical(0). > Even if this behavior is expected, the error message shown > by data.frame(...) == NULL is not very informative. I'm not at all sure there's any need for a change here. I would say the following general thinking should be applied 1. The general rule that '==' should be used only for comparing atomic objects (as it returns an atomic object, a 'logical' with corresponding attributes), is really principal and using '==' for anything else has never been "the idea". 2. There are (two) "semi-exceptions" to the above: 2a) Sometimes it has been convenient to treat NULL as if it was a zero-length atomic object (of "arbitrary" type/mode). 2b) data.frame()s "should typically" behave like matrices in many situations, notably when indexed {and that rule is violated (on purpose) by tibbles .. ("drop=FALSE" etc, but that's another story)} So because of these exceptions, you and possibly others may think '==' should "work" with data.frame()s and/or NULL, but I would not tend to agree. > Thanks and best regards, > Hilmar You are welcome! Martin
Hilmar Berger
2019-Sep-11 09:55 UTC
[Rd] '==' operator: inconsistency in data.frame(...) == NULL
Dear Martin, On 11/09/2019 09:56, Martin Maechler wrote:> > > I wonder if data.frame(<some non-empty data>) == NULL should also return > > a value instead of an error. R help reads: > > > "At least one of |x| and |y| must be an atomic vector, but > > if the other is a list R attempts to coerce it to the > > type of the atomic vector: this will succeed if the list > > is made up of elements of length one that can be coerced > > to the correct type. > > > If the two arguments are atomic vectors of different > > types, one is coerced to the type of the other, the > > (decreasing) order of precedence being character, complex, > > numeric, integer, logical and raw." > > > It is not clear from the help what to expect for NULL or > > empty atomic vectors. > > Well, strictly speaking an error would be expected for NULL, > as it is *not* an atomic vector, and your main issue > > " data.frame(..) == NULL " > > would already be settled by the first half sentence from the > doc, and strictly speaking, even data.frame(NULL) == NULL > "should" return an error ((Note: I'm not saying it really > should, but at least the reference does not say it should work at all))Thanks, this explanation makes total sense to me. I did not consider that NULL might be non-atomic. Strangely, is.atomic(NULL) returns TRUE. On the other hand, I understand that one would not like to treat it like atomic in ==. However, in this case one might expect that the error message would be more like that for S4 objects (which always seem to report an informative error message for ==): > Pos <- setClass("Pos", slots = c(latitude = "numeric", longitude = "numeric", altitude = "numeric")) > p = Pos() > p == NULL Error in p == NULL : ? comparison (1) is possible only for atomic and list types > p == "FOO" Error in p == "FOO" : ? comparison (1) is possible only for atomic and list types In the data.frame()==NULL cases I have the impression that the fact that both sides are non-atomic is not properly detected and therefore R tries to go on with the == method for data.frames. From a cursory check in Ops.data.frame() and some debugging I have the impression that the case of the second argument being non-atomic or empty is not handled at all and the function progresses until the end, where it fails in the last step on an empty value: matrix(unlist(value, recursive = FALSE, use.names = FALSE), ??? nrow = nr, dimnames = list(rn, cn)) Best regards, Hilmar -- Dr. Hilmar Berger, MD Max Planck Institute for Infection Biology Charit?platz 1 D-10117 Berlin GERMANY Phone: + 49 30 28460 430 Fax: + 49 30 28460 401 E-Mail: berger at mpiib-berlin.mpg.de Web : www.mpiib-berlin.mpg.de [[alternative HTML version deleted]]
Possibly Parallel Threads
- '==' operator: inconsistency in data.frame(...) == NULL
- '==' operator: inconsistency in data.frame(...) == NULL
- Crash after (wrongly) applying product operator on object from LIMMA package
- '==' operator: inconsistency in data.frame(...) == NULL
- '==' operator: inconsistency in data.frame(...) == NULL