Displaying 4 results from an estimated 4 matches for "fgrep_one".
2015 Mar 02
2
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
On Windows, grep(fixed=TRUE) throws errors with some UTF-8 strings.
Here's an example (must be run on Windows to reproduce the error):
Sys.setlocale("LC_CTYPE", "chinese")
y <- rawToChar(as.raw(c(0xe6, 0xb8, 0x97)))
Encoding(y) <- "UTF-8"
y
# [1] "?"
grep("\n", y, fixed = TRUE)
# Error in grep("\n", y, fixed = TRUE) : invalid
2015 Mar 04
0
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
...E results in an error.
Sys.setlocale("LC_CTYPE", "chinese")
grep("a", y, fixed = TRUE)
# Error in grep("a", y, fixed = TRUE) : invalid multibyte string at '<97>'
=======================
I believe the problem is in the main/grep.c file, in the fgrep_one
function. It tests for a multi-byte character string locale
`mbcslocale`, and then for the `use_UTF8`, like so:
if (!useBytes && mbcslocale) {
...
} else if (!useBytes && use_UTF8) {
...
} else ...
This can be seen at
https://github.com/wch/r-source/blo...
2011 Sep 29
3
grep and PCRE fun
...regex_t reg;
- int i, j, n, nmatches = 0, ov, rc;
+ int i, j, n, nmatches = 0, ov[3], rc;
int igcase_opt, value_opt, perl_opt, fixed_opt, useBytes, invert;
const char *spat = NULL;
pcre *re_pcre = NULL /* -Wall */;
@@ -882,7 +882,7 @@
if (fixed_opt)
LOGICAL(ind)[i] = fgrep_one(spat, s, useBytes, use_UTF8, NULL) >= 0;
else if (perl_opt) {
- if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, &ov, 0) >= 0)
+ if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, ov, 3) >= 0)
INTEGER(ind)[i] = 1;
} else {
if (!use_WC)
2013 May 01
1
Windows, format.POSIXct and character encodings
Hi all,
In what encoding does format.POSIXct return its output? It doesn't
seem to be utf-8:
Sys.setlocale("LC_ALL", "Japanese_Japan.932")
times <- c("1970-01-01 01:00:00 UTC", "1970-02-02 22:00:00 UTC")
ampm <- format(as.POSIXct(times), format = "%p")
x <- gsub(">", "*", paste(ampm, collapse =