thr3ads.net - R devel - [Rd] (no subject) [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Suharto Anggono Suharto Anggono

2015-Oct-22 05:44 UTC

[Rd] (no subject)

------------------>>>>> Henric Winell <[hidden email]>
>>>>>     on Wed, 21 Oct 2015 13:43:02 +0200 writes:
    > Den 2015-10-21 kl. 07:24, skrev Suharto Anggono Suharto Anggono via
R-devel:
    >> Marius Hofert-4------------------------------
    >>> Den 2015-10-09 kl. 12:14, skrev Martin Maechler:
    >>> I think so: the code above doesn't seem to do the right
thing.  Consider
    >>> the following example:
    >>>
    >>> > x <- c(1, 1, 2, 3)
    >>> > rank2(x, ties.method = "last")
    >>> [1] 1 2 4 3
    >>>
    >>> That doesn't look right to me -- I had expected
    >>>
    >>> > rev(sort.list(x, decreasing = TRUE))
    >>> [1] 2 1 3 4
    >>>
    >>
    >> Indeed, well spotted, that seems to be correct.
    >>
    >>>
    >>> Henric Winell
    >>>
    >> ------------------------------
    >>
    >> In the particular example (of length 4), what is really wanted is
the following.
    >> ind <- integer(4)
    >> ind[sort.list(x, decreasing=TRUE)] <- 4:1
    >> ind

    > You don't provide the output here, but 'ind' is, of course,

    >> ind
    > [1] 2 1 3 4

    >> The following gives the desired result:
    >> sort.list(rev(sort.list(x, decreasing=TRUE)))

    > And, again, no output, but

    >> sort.list(rev(sort.list(x, decreasing=TRUE)))
    > [1] 2 1 3 4

    > Why is it necessary to use 'sort.list' on the result from
    > 'rev(sort.list(...'?

You can try all kind of code on this *too* simple example and do
experiments.  But let's approach this a bit more scientifically
and hence systematically:

Look at  rank  {the R function definition} to see that
for the case of no NA's,

 rank(x, ties.method = "first')   ===    sort.list(sort.list(x))

If you assume that to be correct and want to define "last" to be
correct as well (in the sense of being  "first"-consistent),
it is clear that

  rank(x, ties.method = "last)   ===  rev(sort.list(sort.list(rev(x))))

must also be correct.  I don't think that *any* of the proposals
so far had a correct version [but the too simplistic examples
did not show the problems].

In  R-devel (the R development) version of today, i.e., svn
revision >= 69549, the implementation of  ties.method = "last'
uses
        ## == rev(sort.list(sort.list(rev(x)))) :
        if(length(x) == 0) integer(0)
        else { i <- length(x):1L
               sort.list(sort.list(x[i]))[i] },

which is equivalent to using rev() but a bit more efficient.

Martin Maechler, ETH Zurich 
------------------

I'll defend that my code is correct in general.

All comes from the fact that, if p is a permutation of 1:n,
{ ind <- integer(n); ind[p] <- 1:n; ind }
gives the same result to
sort.list(p)
You can make sense of it like this. In ind[p] <- 1:n, ind[1] is the position
where p == 1. So, ind[1] is the position of the smallest element of p. So, it is
the first element of sort.list(p). Next elements follow.

That's why 'sort.list' is used for ties.method="first" and
ties.method="random" in function 'rank' in R. When p gives the
desired order,
{ ind <- integer(n); ind[p] <- 1:n; ind }
gives ranks of the original elements based on the order. The original element in
position p[1] has rank 1, the original element in position p[2] has rank 2, and
so on.

Now, I say that rev(sort.list(x, decreasing=TRUE)) gives the desired order for
ties.method="last". With the order, the elements are from smallest to
largest; for equal elements, elements are ordered by their positions backwards.

Martin Maechler

2015-Oct-22 07:06 UTC

head link

[Rd] rank(, ties.method="last")

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at
r-project.org>
>>>>>     on Wed, 21 Oct 2015 22:44:57 -0700 writes:
    > ------------------>>>>> Henric Winell <[hidden email]>
>>>>>     on Wed, 21 Oct 2015 13:43:02 +0200 writes:
    >> Den 2015-10-21 kl. 07:24, skrev Suharto Anggono Suharto Anggono via
R-devel:
    >>> Marius Hofert-4------------------------------
    >>>> Den 2015-10-09 kl. 12:14, skrev Martin Maechler:
    >>>> I think so: the code above doesn't seem to do the right
thing.  Consider
    >>>> the following example:
    >>>> 
    >>>> > x <- c(1, 1, 2, 3)
    >>>> > rank2(x, ties.method = "last")
    >>>> [1] 1 2 4 3
    >>>> 
    >>>> That doesn't look right to me -- I had expected
    >>>> 
    >>>> > rev(sort.list(x, decreasing = TRUE))
    >>>> [1] 2 1 3 4
    >>>> 
    >>> 
    >>> Indeed, well spotted, that seems to be correct.
    >>> 
    >>>> 
    >>>> Henric Winell
    >>>> 
    >>> ------------------------------
    >>> 
    >>> In the particular example (of length 4), what is really wanted
is the following.
    >>> ind <- integer(4)
    >>> ind[sort.list(x, decreasing=TRUE)] <- 4:1
    >>> ind

    >> You don't provide the output here, but 'ind' is, of
course,

    >>> ind
    >> [1] 2 1 3 4

    >>> The following gives the desired result:
    >>> sort.list(rev(sort.list(x, decreasing=TRUE)))

    >> And, again, no output, but

    >>> sort.list(rev(sort.list(x, decreasing=TRUE)))
    >> [1] 2 1 3 4

    >> Why is it necessary to use 'sort.list' on the result from
    >> 'rev(sort.list(...'?

    > You can try all kind of code on this *too* simple example and do
    > experiments.  But let's approach this a bit more scientifically
    > and hence systematically:

    > Look at  rank  {the R function definition} to see that
    > for the case of no NA's,

    > rank(x, ties.method = "first')   ===   
sort.list(sort.list(x))

    > If you assume that to be correct and want to define "last" to
be
    > correct as well (in the sense of being  "first"-consistent),
    > it is clear that

    > rank(x, ties.method = "last)   === 
rev(sort.list(sort.list(rev(x))))

    > must also be correct.  I don't think that *any* of the proposals
    > so far had a correct version [but the too simplistic examples
    > did not show the problems].

    > In  R-devel (the R development) version of today, i.e., svn
    > revision >= 69549, the implementation of  ties.method =
"last'
    > uses
    > ## == rev(sort.list(sort.list(rev(x)))) :
    > if(length(x) == 0) integer(0)
    > else { i <- length(x):1L
    > sort.list(sort.list(x[i]))[i] },

    > which is equivalent to using rev() but a bit more efficient.

    > Martin Maechler, ETH Zurich 
    > ------------------

    > I'll defend that my code is correct in general.

    > All comes from the fact that, if p is a permutation of 1:n,
    > { ind <- integer(n); ind[p] <- 1:n; ind }
    > gives the same result to
    > sort.list(p)

Definitely; a known fact
(and that's how sort.list() -> order() is basically
 implemented in R's C source code.)

    > You can make sense of it like this. In ind[p] <- 1:n, ind[1] is the
position where p == 1. So, ind[1] is the position of the smallest element of p.
So, it is the first element of sort.list(p). Next elements follow.

    > That's why 'sort.list' is used for
ties.method="first" and ties.method="random" in function
'rank' in R. When p gives the desired order,
    > { ind <- integer(n); ind[p] <- 1:n; ind }
    > gives ranks of the original elements based on the order. The original
element in position p[1] has rank 1, the original element in position p[2] has
rank 2, and so on.

    > Now, I say that rev(sort.list(x, decreasing=TRUE)) gives the desired
order for ties.method="last". With the order, the elements are from
smallest to largest; for equal elements, elements are ordered by their positions
backwards.

You are right, Suharto :
Your proposed 

     sort.list(rev(sort.list(x, decreasing=TRUE)))

is also correct and the same as my

     rev(sort.list(sort.list(rev(x))))

from above
nd your variant will even be slightly more efficient if implemented optimally !
     	   
Indeed,  I was thinking wrongly when I wrote
  "I don't think that *any* of the proposals so far had a correct
version"

because your proposal *was* correct.
I apologize for my wrong claim.

Best regards,
Martin

Seemingly Similar Threads

Search for more reasonably related threads

R devel - Oct 2015 - (no subject)

[Rd] (no subject)

[Rd] rank(, ties.method="last")

Seemingly Similar Threads