Matthieu Stigler
2008-Apr-17 10:44 UTC
[Rd] Suggestion: add a warning in the help-file of unique()
Hello I'm sorry if this suggestion/correction was already made but after a search in devel list I did not find any mention of it. I would just suggest to add a warning or an exemple for the help-file of the function unique() like "Note that unique() compares only identical values. Values which, are printed equally but in facts are not identical will be treated as different." > a<-c(0.2, 0.3, 0.2, 0.4-0.1) > a [1] 0.2 0.3 0.2 0.3 > unique(a) [1] 0.2 0.3 0.3 > Well this is just the idea and the sentence could be made better (my poor english...). Maybe a reference to RFAQ 7.31 could be made. Maybe is this behaviour clear and logical for experienced users, but I don't think it is for beginners. I personnaly spent two hours to see that the problem in my code came from this. I was thinking about modify the function unique() to introduce a "tol" argument which allows to compare with a tolerance level (with default value zero to keep unique consistent) like all.equal(), but it seemed too complicated with my little understanding. Bests regards and many thanks for what you do for R! Matthieu Stigler
(Ted Harding)
2008-Apr-17 13:54 UTC
[Rd] Suggestion: add a warning in the help-file of unique()
On 17-Apr-08 10:44:32, Matthieu Stigler wrote:> Hello > > I'm sorry if this suggestion/correction was already made > but after a search in devel list I did not find any mention > of it. I would just suggest to add a warning or an exemple > for the help-file of the function unique() like > > "Note that unique() compares only identical values. Values > which, are printed equally but in facts are not identical > will be treated as different." > > > > a<-c(0.2, 0.3, 0.2, 0.4-0.1) > > a > [1] 0.2 0.3 0.2 0.3 > > unique(a) > [1] 0.2 0.3 0.3 > > Well this is just the idea and the sentence could be made better > (my poor english...). Maybe a reference to RFAQ 7.31 could be made. > Maybe is this behaviour clear and logical for experienced users, > but I don't think it is for beginners. I personnaly spent two > hours to see that the problem in my code came from this.The above is potentially a useful suggestion, and I would be inclined to support it. However, for your other suggestion:> I was thinking about modify the function unique() to introduce > a "tol" argument which allows to compare with a tolerance level > (with default value zero to keep unique consistent) like all.equal(), > but it seemed too complicated with my little understanding. > > Bests regards and many thanks for what you do for R! > Matthieu StiglerWhat is really complicated about it is that the results may depend on the order of elements. When unique() eliminates only values which are strictly identical to values which have been scanned earlier, there is no problem. But suppose you set "tol=0.11" in unique(c(20.0, 30.0, 30.1, 30.2, 40.0) # 20.0, 30.0, 40 [30.1 rejected because within 0.11 of previous 30.0; 30.2 rejected because within 0.11 of previous 30.1] and compare with unique(c(20.0, 30.0, 30.2, 30.1, 40.0) # 20.0, 30.0, 30.2, 40.0 [30.2 accepted because not within 0.11 of any previous; 30.1 rejected because within 0.11 of previous 30.2 or 30.0] This kind of problem is always present in situations where there are potential "chained tolerances". You cannot see the difference between the position of the hour-hand of a clock now, and one minute later. But you may not chain this logic, for, if you could: If A is indistinguishable from B, and B is indistinguishable from C, then A is indistinguishable from C. 10:00 is indistinguishable from 10:01 (on the hour-hand) 10:[n] is indistinguishable from 10:[n+1] Hence, by induction, 10:00 is indistinguishable from 11:00 Which you do not want! Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 17-Apr-08 Time: 14:54:19 ------------------------------ XFMail ------------------------------