thr3ads.net - R devel - [Rd] What to do with a inconsistency in rank() that's in S+ and R ever since? [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Jens Oehlschlägel

2006-Oct-27 09:14 UTC

[Rd] What to do with a inconsistency in rank() that's in S+ and R ever since?

Dear R-developers,

I just realized that rank() behaves inconsistent if combining one of na.last in
{TRUE|FALSE} with a ties.method in
{"average"|"random"|"max"|"min"}.
The documentation suggests that e.g. with na.last=TRUE NAs are treated like the
last (=highest) value, which obviously is not the case:
> rank(c(1,2,2,NA,NA), na.last = TRUE, ties.method = c("average",
"first", "random", "max", "min")[1])[1] 1.0 2.5 2.5 4.0 5.0

I'd expect 

[1] 1.0 2.5 2.5 4.5 4.5

rather, but in fact NAs seem to be always treated ties.method =
"first". I have no idea in which situation one could desire e.g.
ties.method = "average" except for NAs!?

I am aware that the prototype behaves like this and R ever since behaves like
this, however to me this appears very unfortunate. In order not to
'break' existing code, what about adding ties.methods
{"NAaverage"|"NArandom"|"NAmax"|"NAmin"}
that behave consistently?

Best regards


Jens Oehlschl?gel


P.S. Please cc. me, I am not on the list

> version               _                           
platform       i386-pc-mingw32             
arch           i386                        
os             mingw32                     
system         i386, mingw32               
status                                     
major          2                           
minor          4.0                         
year           2006                        
month          10                          
day            03                          
svn rev        39566                       
language       R                           
version.string R version 2.4.0 (2006-10-03)

Andrew Piskorski

2006-Oct-27 15:38 UTC

head link

[Rd] What to do with a inconsistency in rank() that's in S+ and R ever since?

On Fri, Oct 27, 2006 at 11:14:25AM +0200, Jens Oehlschl?gel wrote:
> rather, but in fact NAs seem to be always treated ties.method >
"first". I have no idea in which situation one could desire
> e.g. ties.method = "average" except for NAs!?
Interesting.  I was aware of the S-Plus vs. R difference, but I didn't
realize that it appears to be because R rank() ignores
ties.method="average" for NA values.
> I am aware that the prototype behaves like this and R ever since
> behaves like this, however to me this appears very unfortunate. In
> order not to 'break' existing code, what about adding ties.methods
If you only care about ranking integers and floating point numbers,
it's pretty straghtforward to take the S-Plus implementation of
rank(), call it to my.rank(), and use it in both R and S-Plus.  (Since
the R rank() makes calls to .Internal(), you can't re-use its
implementation in S-Plus.)

Note though that the S-Plus-style my.rank() will still sort strings
differently in R than in S-Plus.  I never looked into why.

Some old notes I have on this issue:

  R and S-Plus rank() treat NAs differently (which can magnifiy other
  floating point differences):

  # S-Plus 6.2.1:            # R 2.1.0:
  > rank(1:5)                > rank(1:5)
  [1] 1 2 3 4 5              [1] 1 2 3 4 5
  > rank(c(1,2,NA,4,NA))     > rank(c(1,2,NA,4,NA))
  [1] 1.0 2.0 4.5 3.0 4.5    [1] 1 2 4 3 5
  > rank(c(1,NA,3,4,NA))     > rank(c(1,NA,3,4,NA))
  [1] 1.0 4.5 2.0 3.0 4.5    [1] 1 4 2 3 5
  > rank(c(1,NA,3))          > rank(c(1,NA,3))
  [1] 1 3 2                  [1] 1 3 2
  > rank(c(NA,NA,3))         > rank(c(NA,NA,3))
  [1] 2.5 2.5 1.0            [1] 2 3 1

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/

Possibly Parallel Threads

Search for more reasonably related threads

R devel - Oct 2006 - What to do with a inconsistency in rank() that's in S+ and R ever since?

[Rd] What to do with a inconsistency in rank() that's in S+ and R ever since?

[Rd] What to do with a inconsistency in rank() that's in S+ and R ever since?

Possibly Parallel Threads