search for: valid_utf8

Displaying 2 results from an estimated 2 matches for "valid_utf8".

2020 Apr 04
0
Possible Bug In Validation of UTF-8 Sequences
As per `?intToUtf8`, and in the comments to `valid_utf8`[1], R intends to prevent illegal UTF-8 such as UTF-8 encoded UTF-16 surrogate pairs.? `R_nchar`, invoked via `base::nchar`, explicitly validates UTF-8 strings[2], but allows the surrogate: ??? > Encoding('\ud800') ??? [1] "UTF-8" ??? > nchar('\ud800')? // should b...
2013 Sep 09
2
Invalid UTF-8 with gsub(perl=TRUE) and iconv(sub="")
...quot;a", "", "\U3e3965", perl = TRUE) : # input string 1 is invalid UTF-8 The error message in the second command seems to come from src/main/grep.c:1640 (in do_gsub): if (!utf8Valid(s)) error(("input string %d is invalid UTF-8"), i+1); utf8Valid() relies on valid_utf8() from PCRE, whose behavior is described in src/extra/pcre/pcre_valid_utf8.c. Even more problematic/interesting is the fact that iconv() does not consider the above character as invalid, as it does not replace it when using the sub argument. > iconv("a\U3e3965", sub="") [...