Meyners, Michael
2015-Jun-08 10:01 UTC
[R] mismatch between match and unique causing ecdf (well, approxfun) to fail
All, I encountered the following issue with ecdf which was originally on a vector of length 10,000, but I have been able to reduce it to a minimal reproducible example (just to avoid questions why I'd want to do this for a vector of length 2...): test2 = structure(list(X817 = 3.39824670255344, X4789 = 3.39824670255344), .Names = c("X817", "X4789"), row.names = 74L, class = "data.frame") ecdf(test2) # Error in xy.coords(x, y) : 'x' and 'y' lengths differ In an attempt to track this down, it occurs that unique(test2) # X817 X4789 #74 3.398247 3.398247 while match(test2, unique(test2)) #[1] 1 1 matches both values to the first one. This causes a hiccup in the call to ecdf, as this uses (an equivalent to) a call to approxfun with x = test2 and y = cumsum(tabulate(match(test2, unique(test2)))), the latter now containing one entry less than the former, so xy.coords fails. I understand that the issue should be somehow related to FAQ 7.31, but I would have hoped that unique and match would be using the same precision and hence both or neither would consider the two values identical, but not one match while unique doesn't. Last but not least, it doesn't really cause an issue on my end (other than breaking my code and hence out of a loop at first place...); rounding will help w/o noteworthy changes to the outcome, so no need to propose a workaround :-) I'd rather like to raise the issue and learn whether there is a purpose for this behavior, and/or whether there is a generic fix to this, or whether I am completely missing something. Version info (under Windows 7): R version 3.2.0 (2015-04-16) -- "Full of Ingredients" Platform: x86_64-w64-mingw32/x64 (64-bit) Cheers, Michael