search for: utf8valid

Displaying 4 results from an estimated 4 matches for "utf8valid".

2013 Sep 09
2
Invalid UTF-8 with gsub(perl=TRUE) and iconv(sub="")
...3e3965" gsub("a", "", "\U3e3965", perl=TRUE) # Error in gsub("a", "", "\U3e3965", perl = TRUE) : # input string 1 is invalid UTF-8 The error message in the second command seems to come from src/main/grep.c:1640 (in do_gsub): if (!utf8Valid(s)) error(("input string %d is invalid UTF-8"), i+1); utf8Valid() relies on valid_utf8() from PCRE, whose behavior is described in src/extra/pcre/pcre_valid_utf8.c. Even more problematic/interesting is the fact that iconv() does not consider the above character as invalid, as it does...
2023 Jan 31
1
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
...for (i = 0, e = environ; *e != NULL; i++, e++) - SET_STRING_ELT(ans, i, mkChar(*e)); + for (i = 0, e = environ; *e != NULL; i++, e++) { + cetype_t enc = known_to_be_latin1 ? CE_LATIN1 : + known_to_be_utf8 ? CE_UTF8 : + CE_NATIVE; + if ( + (utf8locale && !utf8Valid(*e)) + || (mbcslocale && !mbcsValid(*e)) + ) enc = CE_BYTES; + SET_STRING_ELT(ans, i, mkCharCE(*e, enc)); + } #endif } else { PROTECT(ans = allocVector(STRSXP, i)); @@ -416,11 +424,14 @@ if (s == NULL) SET_STRING_ELT(ans, j, STRING_ELT(CADR(args), 0));...
2023 Jan 31
1
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
...L; i++, e++) > - SET_STRING_ELT(ans, i, mkChar(*e)); > + for (i = 0, e = environ; *e != NULL; i++, e++) { > + cetype_t enc = known_to_be_latin1 ? CE_LATIN1 : > + known_to_be_utf8 ? CE_UTF8 : > + CE_NATIVE; > + if ( > + (utf8locale && !utf8Valid(*e)) > + || (mbcslocale && !mbcsValid(*e)) > + ) enc = CE_BYTES; > + SET_STRING_ELT(ans, i, mkCharCE(*e, enc)); > + } > #endif > } else { > PROTECT(ans = allocVector(STRSXP, i)); > @@ -416,11 +424,14 @@ > if (s == NULL) > S...
2023 Jan 30
2
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
/Hello. SUMMARY: $ BOOM=$'\xFF' LC_ALL=en_US.UTF-8 Rscript --vanilla -e "Sys.getenv()" Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' $ BOOM=$'\xFF' LC_ALL=en_US.UTF-8 Rscript --vanilla -e "Sys.getenv('BOOM')" [1] "\xff" BACKGROUND: I launch R through an Son of Grid Engine (SGE) scheduler, where the R