thr3ads.net - search: "use

Displaying 7 results from an estimated 7 matches for "use_utf8".

Did you mean: use_utf16

Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

2015 Mar 02

Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

On Windows, grep(fixed=TRUE) throws errors with some UTF-8 strings. Here's an example (must be run on Windows to reproduce the error): Sys.setlocale("LC_CTYPE", "chinese") y <- rawToChar(as.raw(c(0xe6, 0xb8, 0x97))) Encoding(y) <- "UTF-8" y # [1] "?" grep("\n", y, fixed = TRUE) # Error in grep("\n", y, fixed = TRUE) : invalid

Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

2015 Mar 04

Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

...fixed = TRUE) # Error in grep("a", y, fixed = TRUE) : invalid multibyte string at '<97>' ======================= I believe the problem is in the main/grep.c file, in the fgrep_one function. It tests for a multi-byte character string locale `mbcslocale`, and then for the `use_UTF8`, like so: if (!useBytes && mbcslocale) { ... } else if (!useBytes && use_UTF8) { ... } else ... This can be seen at https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L668-L692 A similar pattern occurs in the...

grep and PCRE fun

2011 Sep 29

grep and PCRE fun

...n, nmatches = 0, ov, rc; + int i, j, n, nmatches = 0, ov[3], rc; int igcase_opt, value_opt, perl_opt, fixed_opt, useBytes, invert; const char *spat = NULL; pcre *re_pcre = NULL /* -Wall */; @@ -882,7 +882,7 @@ if (fixed_opt) LOGICAL(ind)[i] = fgrep_one(spat, s, useBytes, use_UTF8, NULL) >= 0; else if (perl_opt) { - if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, &ov, 0) >= 0) + if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, ov, 3) >= 0) INTEGER(ind)[i] = 1; } else { if (!use_WC)

sprintf("%d", integer(0)) aborts

2009 Mar 18

sprintf("%d", integer(0)) aborts

...------------------------------------------------------- Index: sprintf.c =================================================================== --- sprintf.c (revision 48148) +++ sprintf.c (working copy) @@ -79,13 +79,13 @@ static R_StringBuffer outbuff = {NULL, 0, MAXELTSIZE}; Rboolean use_UTF8; - outputString = R_AllocStringBuffer(0, &outbuff); - /* grab the format string */ nargs = length(args); format = CAR(args); - if (!isString(format) || length(format) == 0) + if (!isString(format)) error(_("'fmt' is not a non-empty character vector&...

Windows, format.POSIXct and character encodings

2013 May 01

Windows, format.POSIXct and character encodings

Hi all, In what encoding does format.POSIXct return its output? It doesn't seem to be utf-8: Sys.setlocale("LC_ALL", "Japanese_Japan.932") times <- c("1970-01-01 01:00:00 UTC", "1970-02-02 22:00:00 UTC") ampm <- format(as.POSIXct(times), format = "%p") x <- gsub(">", "*", paste(ampm, collapse =

Named capture in regexp

2011 Feb 25

Named capture in regexp

...ibility, my strategy was to just add more attributes to the results of these functions, as shown above. Attached is the patch and some R code for testing the new features. It works fine for me with no memory problems. However, I noticed that there is some UTF8 handling code, which I did not touch (use_UTF8 is false on my machine). I presume we will need to make some small modifications to get it to work with unicode, but I'm not sure how to do them. Would you consider integrating this patch into the R source code for future releases, so the larger R community can take advantage of this feature?...

[patch] Use JIT for PCRE pattern matching

2015 Nov 28

[patch] Use JIT for PCRE pattern matching

...(errorptr) warning(_("PCRE pattern study error\n\t'%s'\n"), errorptr); @@ -482,7 +486,11 @@ } vmaxset(vmax2); } +#ifdef PCRE_CONFIG_JIT + pcre_free_study(re_pe); +#else pcre_free(re_pe); +#endif pcre_free(re_pcre); } else if (!useBytes && use_UTF8) { /* ERE in wchar_t */ regex_t reg; @@ -867,12 +875,12 @@ warning(_("PCRE pattern compilation error\n\t'%s'\n\tat '%s'\n"), errorptr, spat+erroffset); error(_("invalid regular expression '%s'"), spat); - if (n > 10) { - re_pe =...

search for: use_utf8