search for: use_utf8

Displaying 7 results from an estimated 7 matches for "use_utf8".

Did you mean: use_utf16
2015 Mar 02
2
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
On Windows, grep(fixed=TRUE) throws errors with some UTF-8 strings. Here's an example (must be run on Windows to reproduce the error): Sys.setlocale("LC_CTYPE", "chinese") y <- rawToChar(as.raw(c(0xe6, 0xb8, 0x97))) Encoding(y) <- "UTF-8" y # [1] "?" grep("\n", y, fixed = TRUE) # Error in grep("\n", y, fixed = TRUE) : invalid
2015 Mar 04
0
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
...fixed = TRUE) # Error in grep("a", y, fixed = TRUE) : invalid multibyte string at '<97>' ======================= I believe the problem is in the main/grep.c file, in the fgrep_one function. It tests for a multi-byte character string locale `mbcslocale`, and then for the `use_UTF8`, like so: if (!useBytes && mbcslocale) { ... } else if (!useBytes && use_UTF8) { ... } else ... This can be seen at https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L668-L692 A similar pattern occurs in the...
2011 Sep 29
3
grep and PCRE fun
...n, nmatches = 0, ov, rc; + int i, j, n, nmatches = 0, ov[3], rc; int igcase_opt, value_opt, perl_opt, fixed_opt, useBytes, invert; const char *spat = NULL; pcre *re_pcre = NULL /* -Wall */; @@ -882,7 +882,7 @@ if (fixed_opt) LOGICAL(ind)[i] = fgrep_one(spat, s, useBytes, use_UTF8, NULL) >= 0; else if (perl_opt) { - if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, &ov, 0) >= 0) + if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, ov, 3) >= 0) INTEGER(ind)[i] = 1; } else { if (!use_WC)
2009 Mar 18
1
sprintf("%d", integer(0)) aborts
...------------------------------------------------------- Index: sprintf.c =================================================================== --- sprintf.c (revision 48148) +++ sprintf.c (working copy) @@ -79,13 +79,13 @@ static R_StringBuffer outbuff = {NULL, 0, MAXELTSIZE}; Rboolean use_UTF8; - outputString = R_AllocStringBuffer(0, &outbuff); - /* grab the format string */ nargs = length(args); format = CAR(args); - if (!isString(format) || length(format) == 0) + if (!isString(format)) error(_("'fmt' is not a non-empty character vector&...
2013 May 01
1
Windows, format.POSIXct and character encodings
Hi all, In what encoding does format.POSIXct return its output? It doesn't seem to be utf-8: Sys.setlocale("LC_ALL", "Japanese_Japan.932") times <- c("1970-01-01 01:00:00 UTC", "1970-02-02 22:00:00 UTC") ampm <- format(as.POSIXct(times), format = "%p") x <- gsub(">", "*", paste(ampm, collapse =
2011 Feb 25
0
Named capture in regexp
...ibility, my strategy was to just add more attributes to the results of these functions, as shown above. Attached is the patch and some R code for testing the new features. It works fine for me with no memory problems. However, I noticed that there is some UTF8 handling code, which I did not touch (use_UTF8 is false on my machine). I presume we will need to make some small modifications to get it to work with unicode, but I'm not sure how to do them. Would you consider integrating this patch into the R source code for future releases, so the larger R community can take advantage of this feature?...
2015 Nov 28
0
[patch] Use JIT for PCRE pattern matching
...(errorptr) warning(_("PCRE pattern study error\n\t'%s'\n"), errorptr); @@ -482,7 +486,11 @@ } vmaxset(vmax2); } +#ifdef PCRE_CONFIG_JIT + pcre_free_study(re_pe); +#else pcre_free(re_pe); +#endif pcre_free(re_pcre); } else if (!useBytes && use_UTF8) { /* ERE in wchar_t */ regex_t reg; @@ -867,12 +875,12 @@ warning(_("PCRE pattern compilation error\n\t'%s'\n\tat '%s'\n"), errorptr, spat+erroffset); error(_("invalid regular expression '%s'"), spat); - if (n > 10) { - re_pe =...