I don't follow why this happens:> sort(c(LETTERS[1:5], letters[1:5]))[1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E" The help for sort() says: method: character string specifying the algorithm used. Not available for partial sorting. Can be abbreviated. But what are the methods available? The help mentions xtfrm but that doesn't illuminate, I'd have thought that at least by default it would have something to do with ASCII codes. But that's not the case since all the uppercase ones would be before the lowercase ones. I know something different is happening but I don't know what it is (do you, Mr Jones?). Apologies to Bob Dylan. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
On 18 Mar 2016, at 10:02 , Patrick Connolly <p_connolly at slingshot.co.nz> wrote:> I don't follow why this happens: > >> sort(c(LETTERS[1:5], letters[1:5])) > [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E" > > The help for sort() says: > > method: character string specifying the algorithm used. Not > available for partial sorting. Can be abbreviated. > > But what are the methods available? The help mentions xtfrm but that > doesn't illuminate, I'd have thought that at least by default it would > have something to do with ASCII codes. But that's not the case since > all the uppercase ones would be before the lowercase ones. > > I know something different is happening but I don't know what it is > (do you, Mr Jones?). Apologies to Bob Dylan. >Um, read _all_ of the help file? sort.int(x, partial = NULL, na.last = NA, decreasing = FALSE, method = c("shell", "quick"), index.return = FALSE) [snip] Method "shell" uses Shellsort (an O(n^{4/3}) variant from Sedgewick (1986)). If x has names a stable modification is used, so ties are not reordered. (This only matters if names are present.) Method "quick" uses Singleton (1969)'s implementation of Hoare's Quicksort method and is only available when x is numeric (double or integer) and partial is NULL. (For other types of x Shellsort is used, silently.) It is normally somewhat faster than Shellsort (perhaps 50% faster on vectors of length a million and twice as fast at a billion) but has poor performance in the rare worst case. (Peto's modification using a pseudo-random midpoint is used to make the worst case rarer.) This is not a stable sort, and ties may be reordered. Factors with less than 100,000 levels are sorted by radix sorting when method is not supplied: see sort.list. -pd -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Ooops, that was answering the question you actually asked. The one you meant to ask is answered by this part: The sort order for character vectors will depend on the collating sequence of the locale in use: see Comparison. ...and collating sequences is a weird and woolly subject, where you cannot even be sure that locales of the same name on two different platforms sort strings in the same order. -pd On 18 Mar 2016, at 10:13 , peter dalgaard <pdalgd at gmail.com> wrote:> > On 18 Mar 2016, at 10:02 , Patrick Connolly <p_connolly at slingshot.co.nz> wrote: > >> I don't follow why this happens: >> >>> sort(c(LETTERS[1:5], letters[1:5])) >> [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E" >> >> The help for sort() says: >> >> method: character string specifying the algorithm used. Not >> available for partial sorting. Can be abbreviated. >> >> But what are the methods available? The help mentions xtfrm but that >> doesn't illuminate, I'd have thought that at least by default it would >> have something to do with ASCII codes. But that's not the case since >> all the uppercase ones would be before the lowercase ones. >> >> I know something different is happening but I don't know what it is >> (do you, Mr Jones?). Apologies to Bob Dylan. >> > > > Um, read _all_ of the help file? > > sort.int(x, partial = NULL, na.last = NA, decreasing = FALSE, > method = c("shell", "quick"), index.return = FALSE) > > [snip] > > Method "shell" uses Shellsort (an O(n^{4/3}) variant from Sedgewick (1986)). If x has names a stable modification is used, so ties are not reordered. (This only matters if names are present.) > > Method "quick" uses Singleton (1969)'s implementation of Hoare's Quicksort method and is only available when x is numeric (double or integer) and partial is NULL. (For other types of x Shellsort is used, silently.) It is normally somewhat faster than Shellsort (perhaps 50% faster on vectors of length a million and twice as fast at a billion) but has poor performance in the rare worst case. (Peto's modification using a pseudo-random midpoint is used to make the worst case rarer.) This is not a stable sort, and ties may be reordered. > > Factors with less than 100,000 levels are sorted by radix sorting when method is not supplied: see sort.list. > > -pd > > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com