Displaying 1 result from an estimated 1 matches for "utf8conv".
2014 Jul 28
1
Parsing and deparsing of escaped unicode characters
...8"
nchar(x) #9, seems OK
cat(deparse(x))
"I like <U+5BFF><U+53F8>"
As a result, the code does not parse() back into the proper unicode
characters. I am currently using a regular expression to convert the
output of deparse into something that parse() (and json) supports:
utf8conv <- function(x) {
gsub("<U\\+([0-9A-F]{4})>","\\\\u\\1",x)
}
> src <- utf8conv(src)
> y <- parse(text=src)[[1]]
> identical(x, y)
[1] TRUE
However this is suboptimal because it introduces a big performance
overhead for large text. Several things are un...