Jon Clayden
2015-Jul-28 10:58 UTC
[Rd] all.equal: possible mismatch between behaviour and documentation
Dear all, The documentation for `all.equal.numeric` says Numerical comparisons for ?scale = NULL? (the default) are done by first computing the mean absolute difference of the two numerical vectors. If this is smaller than ?tolerance? or not finite, absolute differences are used, otherwise relative differences scaled by the mean absolute difference. But the actual behaviour of the function is to use relative differences if the mean value of the first argument is greater than `tolerance`: all.equal(0.1, 0.102, tolerance=0.01) # [1] "Mean relative difference: 0.02" It seems to me that this example should produce `TRUE`, because abs(0.1-0.102) < 0.01, but it does not, because abs(0.1) > 0.01. The relevant section in the source seems to be what <- if (is.null(scale)) { xn <- mean(abs(target)) if (is.finite(xn) && xn > tolerance) { xy <- xy/xn "relative" } else "absolute" } I think `xy`, not `xn`, should be tested here. The last line of the documentation, indicating that relative differences are "scaled by the mean absolute difference" also seems not to match the code, but in this aspect the code is surely right, i.e., the relative difference is relative to the mean value, not the mean difference. All the best, Jon
Jon Clayden
2015-Jul-28 11:14 UTC
[Rd] all.equal: possible mismatch between behaviour and documentation
Sorry; minor clarification. The actual test criterion in the example I gave is of course abs((0.1-0.102)/0.1) < 0.01, not abs(0.1) < 0.01. In any case, this does not match (my reading of) the docs, and the result is not `TRUE`. Regards, Jon On 28 July 2015 at 11:58, Jon Clayden <jon.clayden at gmail.com> wrote:> Dear all, > > The documentation for `all.equal.numeric` says > > Numerical comparisons for ?scale = NULL? (the default) are done by > first computing the mean absolute difference of the two numerical > vectors. If this is smaller than ?tolerance? or not finite, > absolute differences are used, otherwise relative differences > scaled by the mean absolute difference. > > But the actual behaviour of the function is to use relative > differences if the mean value of the first argument is greater than > `tolerance`: > > all.equal(0.1, 0.102, tolerance=0.01) > # [1] "Mean relative difference: 0.02" > > It seems to me that this example should produce `TRUE`, because > abs(0.1-0.102) < 0.01, but it does not, because abs(0.1) > 0.01. The > relevant section in the source seems to be > > what <- if (is.null(scale)) { > xn <- mean(abs(target)) > if (is.finite(xn) && xn > tolerance) { > xy <- xy/xn > "relative" > } > else "absolute" > } > > I think `xy`, not `xn`, should be tested here. > > The last line of the documentation, indicating that relative > differences are "scaled by the mean absolute difference" also seems > not to match the code, but in this aspect the code is surely right, > i.e., the relative difference is relative to the mean value, not the > mean difference. > > All the best, > Jon
Martin Maechler
2015-Jul-30 09:39 UTC
[Rd] all.equal: possible mismatch between behaviour and documentation
Dear Jon, thank you for raising the issue,>>>>> Jon Clayden <jon.clayden at gmail.com> >>>>> on Tue, 28 Jul 2015 12:14:48 +0100 writes:> Sorry; minor clarification. The actual test criterion in the example I > gave is of course abs((0.1-0.102)/0.1) < 0.01, not abs(0.1) < 0.01. In > any case, this does not match (my reading of) the docs, and the result > is not `TRUE`.> Regards, > Jon> On 28 July 2015 at 11:58, Jon Clayden <jon.clayden at gmail.com> wrote: > > Dear all, > > > > The documentation for `all.equal.numeric` says > > > > Numerical comparisons for ?scale = NULL? (the default) are done by > > first computing the mean absolute difference of the two numerical > > vectors. If this is smaller than ?tolerance? or not finite, > > absolute differences are used, otherwise relative differences > > scaled by the mean absolute difference. > > > > But the actual behaviour of the function is to use relative > > differences if the mean value of the first argument is greater than > > `tolerance`: > > > > all.equal(0.1, 0.102, tolerance=0.01) > > # [1] "Mean relative difference: 0.02" > > > > It seems to me that this example should produce `TRUE`, because > > abs(0.1-0.102) < 0.01, but it does not, because abs(0.1) > 0.01.Irrespective of the documentation, the above example should continue to produce what it does now. These numbers are not close to zero (compared to tol), and so relative error should be used. The whole idea of all.equal.numeric() is to use *relative* error/difference __unless__ that is not sensible anymore, namely when the denominator of the ratio which defines the relative error becomes too close to zero (and hence has to be seen as "unstable" / "unreliable"). The exact behavior of all.equal.numeric() has __ I'm pretty sure, but can no longer easily prove __ been inherited from the original S implementation in most parts, and (if that's correct) has been in place for about 30 years [ If not, it has "only" been in place about 17 years... ] notably the code below has been unchanged for a long time, and been in use in too many places to be changed now. So it is about the *documentation* only we should discuss changing.> > The relevant section in the source seems to be > > > > what <- if (is.null(scale)) { > > xn <- mean(abs(target)) > > if (is.finite(xn) && xn > tolerance) { > > xy <- xy/xn > > "relative" > > } > > else "absolute" > > } > > > > I think `xy`, not `xn`, should be tested here.as I said above, no such change is acceptable {but I don't see *why* either }> > The last line of the documentation, indicating that relative > > differences are "scaled by the mean absolute difference" also seems > > not to match the code, but in this aspect the code is surely right, > > i.e., the relative difference is relative to the mean value, not the > > mean difference.Indeed... interestingly, 'target' at the point in the function is containing only those values of the original target which are not NA (fine, of course) but also which are not equal to 'current'. This is a bit more surprising, also does make sense, let's say if we have too long numeric vectors which are mostly zero (and hence often both equal to zero), but in other cases, this may be more problematic (think of a case when also both vectors are mostly (exactly) equal, but then there are a few outliers...). But in spite of such cases, as said above, I'm almost sure this won't be changed in the code. My current conclusion would be to change only scaled by the mean absolute difference to scaled by the mean absolute target value or (less easy, but more precise) scaled by the mean absolute value of those finite entries of \code{target} where it differs from \code{current} or yet another version which is both precise and understandable ? Martin Maechler, ETH Zurich> > > > All the best, > > Jon