Displaying 1 result from an estimated 1 matches for "u003e3965".
2013 Sep 09
2
Invalid UTF-8 with gsub(perl=TRUE) and iconv(sub="")
...E, whose behavior is
described in src/extra/pcre/pcre_valid_utf8.c.
Even more problematic/interesting is the fact that iconv() does not
consider the above character as invalid, as it does not replace it when
using the sub argument.
> iconv("a\U3e3965", sub="")
[1] "a\U003e3965"
On the contrary, an invalid sequence such as \xff is substituted:
iconv("a\xff", sub="")
# [1] "a"
This makes it difficult to sanitize the string before passing it to
gsub(perl=TRUE). Thus, I'm wondering whether something could be done,
and where. Should ic...