Hi! Sorry, I am a beginner in R. I was not able to find answers to my questions (tried Google, Stack Overflow, etc). Please correct me if anything is wrong here. When comparing symbols/strings in R - raw numeric values are compared symbol by symbol starting from left? If raw numeric values are not used is there an ASCII / Unicode table where symbols have values/ranking/order and R compares those values? *2) Comparing symbols* Letter "a" raw value is 61, letter "b" raw value is 62? Is this correct? # Raw value for "a" = 61 a_raw <- charToRaw("a") a_raw # Raw value for "b" = 62 b_raw <- charToRaw("b") b_raw # equals TRUE "a" < "b" Ok, so 61 is less than 62 so it's TRUE. Is this correct? *3) Comparing strings #1* "1040" <= "12000" raw_1040 <- charToRaw("1040") raw_1040 #31 *30* (comparison happens with the second symbol) 34 30 raw_12000 <- charToRaw("12000") raw_12000 #31 *32* (comparison happens with the second symbol) 30 30 30 The symbol in the second position is 30 and it's less than 32. Equals to true. Is this correct? *4) Comparing strings #2* "1040" <= "10000" raw_1040 <- charToRaw("1040") raw_1040 #31 30 *34* (comparison happens with third symbol) 30 raw_10000 <- charToRaw("10000") raw_10000 #31 30 *30* (comparison happens with third symbol) 30 30 The symbol in the third position is 34 is greater than 30. Equals to false. Is this correct? *5) Problem - Why does this equal FALSE?* *"A" < "a"* 41 < 61 # FALSE? # Raw value for "A" = 41 A_raw <- charToRaw("A") A_raw # Raw value for "a" = 61 a_raw <- charToRaw("a") a_raw Why is capitalized "A" not less than lowercase "a"? Based on raw values it should be. What am I missing here? Thanks Kristjan [[alternative HTML version deleted]]
https://en.wikipedia.org/wiki/ASCII There is a table towards the end of the document. Some of the other pieces may be of interest and/or relevant. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Kristjan Kure Sent: Wednesday, April 13, 2022 10:06 AM To: r-help at r-project.org Subject: [R] Symbol/String comparison in R [External Email] Hi! Sorry, I am a beginner in R. I was not able to find answers to my questions (tried Google, Stack Overflow, etc). Please correct me if anything is wrong here. When comparing symbols/strings in R - raw numeric values are compared symbol by symbol starting from left? If raw numeric values are not used is there an ASCII / Unicode table where symbols have values/ranking/order and R compares those values? *2) Comparing symbols* Letter "a" raw value is 61, letter "b" raw value is 62? Is this correct? # Raw value for "a" = 61 a_raw <- charToRaw("a") a_raw # Raw value for "b" = 62 b_raw <- charToRaw("b") b_raw # equals TRUE "a" < "b" Ok, so 61 is less than 62 so it's TRUE. Is this correct? *3) Comparing strings #1* "1040" <= "12000" raw_1040 <- charToRaw("1040") raw_1040 #31 *30* (comparison happens with the second symbol) 34 30 raw_12000 <- charToRaw("12000") raw_12000 #31 *32* (comparison happens with the second symbol) 30 30 30 The symbol in the second position is 30 and it's less than 32. Equals to true. Is this correct? *4) Comparing strings #2* "1040" <= "10000" raw_1040 <- charToRaw("1040") raw_1040 #31 30 *34* (comparison happens with third symbol) 30 raw_10000 <- charToRaw("10000") raw_10000 #31 30 *30* (comparison happens with third symbol) 30 30 The symbol in the third position is 34 is greater than 30. Equals to false. Is this correct? *5) Problem - Why does this equal FALSE?* *"A" < "a"* 41 < 61 # FALSE? # Raw value for "A" = 41 A_raw <- charToRaw("A") A_raw # Raw value for "a" = 61 a_raw <- charToRaw("a") a_raw Why is capitalized "A" not less than lowercase "a"? Based on raw values it should be. What am I missing here? Thanks Kristjan [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9E-P8HOWO0s4h1p__tW4o8QGtge3bJ9VUJEDH-e-U_8OKRu2p1zazebKjPltKrWM&s=rhYKCkMRBFMzOVf8rVaRiO1Puh-rTSWAS8P6hoSzdgc&ePLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9E-P8HOWO0s4h1p__tW4o8QGtge3bJ9VUJEDH-e-U_8OKRu2p1zazebKjPltKrWM&s=fI_1ZAYJFp1nrJkOV4i4ueqf4o1MD1gKHzb6AyciJUc&eand provide commented, minimal, self-contained, reproducible code.
"I was not able to find answers to my questions (tried Google, Stack Overflow, etc). Please correct me if anything is wrong here." R has an extensive Help system. That should always be your first place to look. In this case, ?"<" (at the R prompt) brings you to the Help page for comparisons (as would ?Comparison, but only if the 'c" is in upper case, unfortunately). Among lots of other stuff, it says: "Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see locales." ... (+ lots more). Incidentally, rseek.org and rdrr.io are another couple of good places to look for R documentation. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Apr 13, 2022 at 5:10 PM Kristjan Kure <kristjan.kure.1 at gmail.com> wrote:> > Hi! > > Sorry, I am a beginner in R. > > I was not able to find answers to my questions (tried Google, Stack > Overflow, etc). Please correct me if anything is wrong here. > > When comparing symbols/strings in R - raw numeric values are compared > symbol by symbol starting from left? If raw numeric values are not used is > there an ASCII / Unicode table where symbols have values/ranking/order and > R compares those values? > > *2) Comparing symbols* > Letter "a" raw value is 61, letter "b" raw value is 62? Is this correct? > > # Raw value for "a" = 61 > a_raw <- charToRaw("a") > a_raw > > # Raw value for "b" = 62 > b_raw <- charToRaw("b") > b_raw > > # equals TRUE > "a" < "b" > > Ok, so 61 is less than 62 so it's TRUE. Is this correct? > > *3) Comparing strings #1* > "1040" <= "12000" > > raw_1040 <- charToRaw("1040") > raw_1040 > #31 *30* (comparison happens with the second symbol) 34 30 > > raw_12000 <- charToRaw("12000") > raw_12000 > #31 *32* (comparison happens with the second symbol) 30 30 30 > > The symbol in the second position is 30 and it's less than 32. Equals to > true. Is this correct? > > *4) Comparing strings #2* > "1040" <= "10000" > > raw_1040 <- charToRaw("1040") > raw_1040 > #31 30 *34* (comparison happens with third symbol) 30 > > raw_10000 <- charToRaw("10000") > raw_10000 > #31 30 *30* (comparison happens with third symbol) 30 30 > > The symbol in the third position is 34 is greater than 30. Equals to false. > Is this correct? > > *5) Problem - Why does this equal FALSE?* > *"A" < "a"* > > 41 < 61 # FALSE? > > # Raw value for "A" = 41 > A_raw <- charToRaw("A") > A_raw > > # Raw value for "a" = 61 > a_raw <- charToRaw("a") > a_raw > > Why is capitalized "A" not less than lowercase "a"? Based on raw values it > should be. What am I missing here? > > Thanks > Kristjan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, This is a locale issue, you are counting on the ASCII table codes but that's only valid for the "C" locale. old_loc <- Sys.getlocale("LC_COLLATE") "A" < "a" #> [1] FALSE "A" > "a" #> [1] TRUE Sys.setlocale("LC_COLLATE", locale = "C") #> [1] "C" "A" < "a" #> [1] TRUE "A" > "a" #> [1] FALSE Sys.setlocale("LC_COLLATE", old_loc) #> [1] "Portuguese_Portugal.1252" Hope this helps, Rui Barradas ?s 15:06 de 13/04/2022, Kristjan Kure escreveu:> Hi! > > Sorry, I am a beginner in R. > > I was not able to find answers to my questions (tried Google, Stack > Overflow, etc). Please correct me if anything is wrong here. > > When comparing symbols/strings in R - raw numeric values are compared > symbol by symbol starting from left? If raw numeric values are not used is > there an ASCII / Unicode table where symbols have values/ranking/order and > R compares those values? > > *2) Comparing symbols* > Letter "a" raw value is 61, letter "b" raw value is 62? Is this correct? > > # Raw value for "a" = 61 > a_raw <- charToRaw("a") > a_raw > > # Raw value for "b" = 62 > b_raw <- charToRaw("b") > b_raw > > # equals TRUE > "a" < "b" > > Ok, so 61 is less than 62 so it's TRUE. Is this correct? > > *3) Comparing strings #1* > "1040" <= "12000" > > raw_1040 <- charToRaw("1040") > raw_1040 > #31 *30* (comparison happens with the second symbol) 34 30 > > raw_12000 <- charToRaw("12000") > raw_12000 > #31 *32* (comparison happens with the second symbol) 30 30 30 > > The symbol in the second position is 30 and it's less than 32. Equals to > true. Is this correct? > > *4) Comparing strings #2* > "1040" <= "10000" > > raw_1040 <- charToRaw("1040") > raw_1040 > #31 30 *34* (comparison happens with third symbol) 30 > > raw_10000 <- charToRaw("10000") > raw_10000 > #31 30 *30* (comparison happens with third symbol) 30 30 > > The symbol in the third position is 34 is greater than 30. Equals to false. > Is this correct? > > *5) Problem - Why does this equal FALSE?* > *"A" < "a"* > > 41 < 61 # FALSE? > > # Raw value for "A" = 41 > A_raw <- charToRaw("A") > A_raw > > # Raw value for "a" = 61 > a_raw <- charToRaw("a") > a_raw > > Why is capitalized "A" not less than lowercase "a"? Based on raw values it > should be. What am I missing here? > > Thanks > Kristjan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.