Mark van der Loo
2014-Nov-19 09:58 UTC
[Rd] nchar reporting wrong width when zero-space character is present?
Dear list, If I include the zero-width non-breaking space (\ufeff) in a string, nchar seems to compute the wrong number of columns used by 'cat'.> x <- "f\ufeffoo" > x[1] "f?oo"> nchar(x,type="width")[1] 2 I would expect "3" here. Going through the documentation of 'Encoding' and 'encodeString', I don't think this is expected behavior. Am I missing something? If it is a bug I will file a report. Secondly, the documentation of 'nchars' states that with type='chars' (the default) it returns "the number of human-readable characters". I get:> nchar(x,type='chars')[1] 4 I would hardly call the zero-width space human-readable. Also, since for example> nchar("foo\r")[1] 4 it is probably more accurate to say that the number of symbols (abstract characters) are counted, noting that some of the symbols in an alphabet represented by an encoding may be invisible (or hardly visible). Much thanks in advance, Best, Mark> sessionInfo()R version 3.1.2 (2014-10-31) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=nl_NL.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=nl_NL.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=nl_NL.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.1.2
Apparently Analagous Threads
- Inconsistency in treating NaN-results?
- Apparent bug in behavior of formulas with '-' operator for lm
- Apparent bug in behavior of formulas with '-' operator for lm
- Apparent bug in behavior of formulas with '-' operator for lm
- Inconsistency in treating NaN-results?