Thomas Stewart
2014-May-29 02:25 UTC
[R] Confusing behavior when using gsub to insert unicode character (minimal working example provided)
Can anyone help me understand the following behavior? I want to replace the letter 'X' in the string 'text X' with '≥' (\u226 5 ). The output from gsub is not what I expect. It gives: "text ≥". Now, suppose I want to replace the character '≤' in the string 'text ≤' with '≥'. Then, gsub gives the expected, desired output. What am I missing? Thanks for any insight. -tgs Minimal Working Example: string1 <- "text X"; string1 new_string1 <- gsub("X","\u2265",string1); new_string1 string2 <- "text \u2264"; string2 new_string2 <- gsub("\u2264","\u2265",string2); new_string2 charToRaw(new_string1) charToRaw(new_string2) sessionInfo() ## OUTPUT> string1 <- "text X"; string1[1] "text X"> new_string1 <- gsub("X","\u2265",string1); new_string1[1] "text ≥"> string2 <- "text \u2264"; string2[1] "text ≤"> new_string2 <- gsub("\u2264","\u2265",string2); new_string2[1] "text ≥"> charToRaw(new_string1)[1] 74 65 78 74 20 e2 89 a5> charToRaw(new_string2)[1] 74 65 78 74 20 e2 89 a5> sessionInfo()R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.2 [[alternative HTML version deleted]]
David Winsemius
2014-May-29 03:39 UTC
[R] Confusing behavior when using gsub to insert unicode character (minimal working example provided)
On May 28, 2014, at 7:25 PM, Thomas Stewart wrote:> Can anyone help me understand the following behavior? > > I want to replace the letter 'X' in > ?the string ? > 'text X' with '?' (\u226 > ?5 > ). The output from gsub is not what I expect. It gives: "text ???". > > Now, suppose I want to replace the character '?' in > ? the string? > 'text ?' with '?'. Then, gsub gives the expected, desired output. > > ?What am I missing? > > Thanks for any insight. > -tgs > > Minimal Working Example: > > string1 <- "text X"; string1 > new_string1 <- gsub("X","\u2265",string1); new_string1Try this instead:> new_string1 <- gsub("X","\\\u2265",string1); new_string1[1] "text ?" Each "\" needs to be escaped, both the "\" in \u2265 as well as the "\" that escapes it.> nchar("\\")[1] 1> nchar("\\\u2265")[1] 2 You would be well-served by spending effort at reading: ?Quotes -- David.> > string2 <- "text \u2264"; string2 > new_string2 <- gsub("\u2264","\u2265",string2); new_string2 > > charToRaw(new_string1) > charToRaw(new_string2) > > sessionInfo() > > ## OUTPUT > >> string1 <- "text X"; string1 > [1] "text X" > >> new_string1 <- gsub("X","\u2265",string1); new_string1 > [1] "text ???" > >> string2 <- "text \u2264"; string2 > [1] "text ?" > >> new_string2 <- gsub("\u2264","\u2265",string2); new_string2 > [1] "text ?" > >> charToRaw(new_string1) > [1] 74 65 78 74 20 e2 89 a5> charToRaw("\\\u2265")[1] 5c e2 89 a5> >> charToRaw(new_string2) > [1] 74 65 78 74 20 e2 89 a5 > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) >It was a good idea to post sessionInfo(), but it would have been even better to have posted in plain text.> [[alternative HTML version deleted]] >-- David Winsemius Alameda, CA, USA