Meyners, Michael
2015-Jun-08 10:01 UTC
[R] mismatch between match and unique causing ecdf (well, approxfun) to fail
All,
I encountered the following issue with ecdf which was originally on a vector of
length 10,000, but I have been able to reduce it to a minimal reproducible
example (just to avoid questions why I'd want to do this for a vector of
length 2...):
test2 = structure(list(X817 = 3.39824670255344, X4789 = 3.39824670255344),
.Names = c("X817", "X4789"), row.names = 74L, class =
"data.frame")
ecdf(test2)
# Error in xy.coords(x, y) : 'x' and 'y' lengths differ
In an attempt to track this down, it occurs that
unique(test2)
# X817 X4789
#74 3.398247 3.398247
while
match(test2, unique(test2))
#[1] 1 1
matches both values to the first one. This causes a hiccup in the call to ecdf,
as this uses (an equivalent to) a call to approxfun with x = test2 and y =
cumsum(tabulate(match(test2, unique(test2)))), the latter now containing one
entry less than the former, so xy.coords fails.
I understand that the issue should be somehow related to FAQ 7.31, but I would
have hoped that unique and match would be using the same precision and hence
both or neither would consider the two values identical, but not one match while
unique doesn't.
Last but not least, it doesn't really cause an issue on my end (other than
breaking my code and hence out of a loop at first place...); rounding will help
w/o noteworthy changes to the outcome, so no need to propose a workaround :-)
I'd rather like to raise the issue and learn whether there is a purpose for
this behavior, and/or whether there is a generic fix to this, or whether I am
completely missing something.
Version info (under Windows 7):
R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Platform: x86_64-w64-mingw32/x64 (64-bit)
Cheers, Michael