Consider the following: x <- letters[1:5] x < 0 This gives> [1] FALSE FALSE FALSE FALSE FALSEwhich kind of makes sense, I guess, though I would a priori have expected all NAs. But then do: x[3] <- "*" x < 0 This gives> [1] FALSE FALSE TRUE FALSE FALSEwhich puzzles me. Why is "*" considered to be less than 0? At one point I made the conjecture that it had something to do with the ordering of ASCII characters, but it does not seem to. A little more investigation led me to conjecture that all ASCII characters except real-live letters and numerals come out as being less than 0. Can anyone explain the rationale to me? Not that it matters a damn. Just idle curiosity. cheers, Rolf Turner -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
> On Feb 26, 2020, at 8:09 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote: > > > Consider the following: > > x <- letters[1:5] > x < 0 > > This gives > >> [1] FALSE FALSE FALSE FALSE FALSE > > which kind of makes sense, I guess, though I would a priori have expected all NAs. > > But then do: > > x[3] <- "*" > x < 0 > > This gives > >> [1] FALSE FALSE TRUE FALSE FALSE > > which puzzles me. Why is "*" considered to be less than 0? > > At one point I made the conjecture that it had something to do with the ordering of ASCII characters, but it does not seem to. A little more investigation led me to conjecture that all ASCII characters except real-live letters and numerals come out as being less than 0. > > Can anyone explain the rationale to me? Not that it matters a damn. Just idle curiosity. > > cheers, > > Rolf Turner >Rolf, Does this help? From ?"<": "If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw." Thus:> c(0, x)[1] "0" "a" "b" "*" "d" "e"> sort(c(0, x))[1] "*" "0" "a" "b" "d" "e" Thus, "*" is less than "0", at least in my locale, and presumably yours, since lexical sort ordering is locale dependent. Regards, Marc Schwartz
On 26/02/2020 8:09 p.m., Rolf Turner wrote:> > Consider the following: > > x <- letters[1:5] > x < 0 > > This gives > >> [1] FALSE FALSE FALSE FALSE FALSE > > which kind of makes sense, I guess, though I would a priori have > expected all NAs. > > But then do: > > x[3] <- "*" > x < 0 > > This gives > >> [1] FALSE FALSE TRUE FALSE FALSE > > which puzzles me. Why is "*" considered to be less than 0? > > At one point I made the conjecture that it had something to do with the > ordering of ASCII characters, but it does not seem to. A little more > investigation led me to conjecture that all ASCII characters except > real-live letters and numerals come out as being less than 0. > > Can anyone explain the rationale to me? Not that it matters a damn. > Just idle curiosity.It's doing a string comparison, but ordering will depend on your locale. You can read the ?icuGetCollate help page if you want to spend a lot of time reading a help page. Not sure it'll answer your question, though... Duncan Murdoch
Thanks to Marc and Duncan for setting me straight. I guess the piece of the puzzle that I was overlooking is the fact that lexicographic ordering for string comparison depends on locale. It would also have helped me a bit if I'd done the RTFM thing and looked at ?"<" !!! Thanks again. cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276