Displaying 3 results from an estimated 3 matches for "fgrep_one_bytes".
2015 Mar 02
2
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
On Windows, grep(fixed=TRUE) throws errors with some UTF-8 strings.
Here's an example (must be run on Windows to reproduce the error):
Sys.setlocale("LC_CTYPE", "chinese")
y <- rawToChar(as.raw(c(0xe6, 0xb8, 0x97)))
Encoding(y) <- "UTF-8"
y
# [1] "?"
grep("\n", y, fixed = TRUE)
# Error in grep("\n", y, fixed = TRUE) : invalid
2015 Mar 04
0
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
..., like so:
if (!useBytes && mbcslocale) {
...
} else if (!useBytes && use_UTF8) {
...
} else ...
This can be seen at
https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L668-L692
A similar pattern occurs in the fgrep_one_bytes function, at
https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L718-L736
I believe that the test order should be reversed; it should test first
for `use_UTF8`, and then for `mbcslocale`. This pattern occurs in a few
places in grep.c. It looks like this:...
2008 Mar 17
1
Inconsistency in gsub in R.2.6.2 (PR#10978)
Hi,
May this be an oversight?
R version 2.6.2 Patched (2008-03-13 r44783)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
...
> x <- "ab?"
> Encoding(x)
[1] "latin1"
> Encoding(gsub("?","", x))
[1] "unknown"
> Encoding(gsub("?","", x, perl = TRUE))
[1] "latin1"