thr3ads.net - R devel - [Rd] nchar reporting wrong width when zero-space character is present? [Nov 2014]

If this information is useful, please help other people find it:
Share via:

Mark van der Loo

2014-Nov-19 09:58 UTC

[Rd] nchar reporting wrong width when zero-space character is present?

Dear list,

If I include the zero-width non-breaking space (\ufeff) in a string,
nchar seems to compute the wrong number of columns used by 'cat'.
> x <- "f\ufeffoo"
> x
[1] "f?oo"> nchar(x,type="width")[1] 2

I would expect "3" here. Going through the documentation of
'Encoding'
and 'encodeString', I don't think this is expected behavior. Am I
missing something? If it is a bug I will file a report.

Secondly, the documentation of 'nchars' states that with
type='chars'
(the default) it returns "the number of human-readable characters". I
get:
> nchar(x,type='chars')[1] 4

I would hardly call the zero-width space human-readable. Also, since for example
> nchar("foo\r")[1] 4

it is probably more accurate to say that the number of symbols
(abstract characters) are counted, noting that some of the symbols in
an alphabet represented by an encoding may be invisible (or hardly
visible).


Much thanks in advance,
Best, Mark

> sessionInfo()R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=nl_NL.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.1.2

Possibly Parallel Threads

Search for more possibly parallel threads

R devel - Nov 2014 - nchar reporting wrong width when zero-space character is present?

[Rd] nchar reporting wrong width when zero-space character is present?

Possibly Parallel Threads

Wisdom of the Ancients