Yes, collation is a strange thing, and? Collation order will depend on locale settings, and there are quite a few cases where the collation order of two items is not defined. To add to the confusion, on OSX Mavericks, I see> x <- "\u0663" > y <- 3 > > x == y[1] FALSE> rank(c(x, y))[1] 2 1> x[1] "?"> x == y[1] FALSE> x > y[1] TRUE> x < y[1] FALSE> Sys.getlocale()[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"> Sys.getlocale("LC_COLLATE")[1] "en_US.UTF-8" Notice the differences from en_US.UTF8 (sans hyphen) on your system.... -pd On 13 Aug 2015, at 16:01 , John McKown <john.archie.mckown at gmail.com> wrote:> 2015-08-13 8:39 GMT-05:00 Hadley Wickham <h.wickham at gmail.com>: > >> x <- "\u0663" >> y <- 3 >> >> x == y >> # FALSE >> rank(c(x, y)) >> # c(1.5, 1.5) >> > > ?also interesting, and confusing to me: > >> x == y > [1] FALSE >> x > y > [1] FALSE >> x < y > [1] FALSE >> > > With some slight changes: > >> x <- "\u0663" >> y <- "3" >> xy <- c(x,y) >> rank(xy); > [1] 1.5 1.5 >> Sys.getlocale(); > [1] > "LC_CTYPE=en_US.UTF8;LC_NUMERIC=C;LC_TIME=en_US.UTF8;LC_COLLATE=en_US.UTF8;LC_MONETARY=en_US.UTF8;LC_MESSAGES=en_US.UTF8;LC_PAPER=en_US.UTF8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF8;LC_IDENTIFICATION=C" >> Sys.setlocale(category="LC_COLLATE", locale="C"); > [1] "C" >> rank(xy); > [1] 2 1 >> > > > >> -- >> http://had.co.nz/ >> >> > -- > > Schrodinger's backup: The condition of any backup is unknown until a > restore is attempted. > > Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be. > > He's about as useful as a wax frying pan. > > 10 to the 12th power microphones = 1 Megaphone > > Maranatha! <>< > John McKown > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On 13/08/2015 15:19, peter dalgaard wrote:> Yes, collation is a strange thing, and?And remember that on some platforms (including yours) ICU is used, so LC_COLLATE is not particularly relevant (unless it is 'C'). See ?Comparisons and ?icuGetCollate. E.g. on my Yosemite system in en_US.UTF-8> rank(c(x, y))[1] 1.5 1.5> icuGetCollate()[1] "root"> icuSetCollate(locale="ASCII") > rank(c(x, y))[1] 2 1 whereas on Fedora 21> rank(c(x, y))[1] 2 1> icuGetCollate()[1] "root"> > Collation order will depend on locale settings, and there are quite a few cases where the collation order of two items is not defined. > > To add to the confusion, on OSX Mavericks, I see > >> x <- "\u0663" >> y <- 3 >> >> x == y > [1] FALSE >> rank(c(x, y)) > [1] 2 1 >> x > [1] "?" >> x == y > [1] FALSE >> x > y > [1] TRUE >> x < y > [1] FALSE > >> Sys.getlocale() > [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8" >> Sys.getlocale("LC_COLLATE") > [1] "en_US.UTF-8" > > Notice the differences from en_US.UTF8 (sans hyphen) on your system.... > > -pd > > On 13 Aug 2015, at 16:01 , John McKown <john.archie.mckown at gmail.com> wrote: > >> 2015-08-13 8:39 GMT-05:00 Hadley Wickham <h.wickham at gmail.com>: >> >>> x <- "\u0663" >>> y <- 3 >>> >>> x == y >>> # FALSE >>> rank(c(x, y)) >>> # c(1.5, 1.5) >>> >> >> ?also interesting, and confusing to me: >> >>> x == y >> [1] FALSE >>> x > y >> [1] FALSE >>> x < y >> [1] FALSE >>> >> >> With some slight changes: >> >>> x <- "\u0663" >>> y <- "3" >>> xy <- c(x,y) >>> rank(xy); >> [1] 1.5 1.5 >>> Sys.getlocale(); >> [1] >> "LC_CTYPE=en_US.UTF8;LC_NUMERIC=C;LC_TIME=en_US.UTF8;LC_COLLATE=en_US.UTF8;LC_MONETARY=en_US.UTF8;LC_MESSAGES=en_US.UTF8;LC_PAPER=en_US.UTF8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF8;LC_IDENTIFICATION=C" >>> Sys.setlocale(category="LC_COLLATE", locale="C"); >> [1] "C" >>> rank(xy); >> [1] 2 1 >>> >-- Brian D. Ripley, ripley at stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK
> On 14 Aug 2015, at 08:10 , Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote: > > E.g. on my Yosemite system in en_US.UTF-8 > >> rank(c(x, y)) > [1] 1.5 1.5 >..which differs from my Mavericks system but not my Yosemite system, both in en_US.UTF-8, both with icuGetCollate returning "root"... Oh, well. -pd -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com