Does R support unicode normalization? For my application, I'd quite
like to test for canonical equivalence (e.g. "n\u0303" is equivalent
to
"\u00F1" which is ?) and ideally convert strings to NFD form.
("\u0303"
is the "combining tilde" character.) Is there a package for this?
The Unicode Normalization FAQ [1] states that "Programs should always
compare canonical-equivalent Unicode strings as equal" so is it even a
bug that "n\u0303" != "\u00F1" in my version of R?
Allan
[1] see http://www.unicode.org/unicode/faq/normalization.html