thr3ads.net - R help - [R] Problem with Windows clipboard and UTF-8 [Oct 2022]

If this information is useful, please help other people find it:
Share via:

Andrew Hart

2022-Sep-30 13:05 UTC

[R] Problem with Windows clipboard and UTF-8

Hi everyone,

Recently I upgraded to R 4.2.1 which now uses UTF-8 internally as its 
native encoding. Very nice. However, I've discovered that if I use 
writeClipboard to try and move a string containing accented characters 
to the Windows clipboard and then try and paste that into another 
application (e.g. notepad, Eclipse, etc.), the accents turn out all 
garbled. Here's an example:

writeClipboard("categor?a")
Pasting the result into this e-mail message yields
Categor??a

As near as I can tell, the problem seems to have something to do with 
the format parameter of writeClipboard. By default, format has a value 
of 1, which tells the clipboard to receive Text in the machine's locale. 
If I set format=13 in the call, the accents transfer to the clipboard 
correctly:

writeClipboard("categor?a", format=13)
and the result is
Categor?a

It seems that format=13 may be a better default now that R is using 
UTF-8. It would be nice not to have to specify the format every time I 
want to copy text to the clipboard with writeClipboard.

Is writeClipboard supposed to perform any kind of encoding conversion or 
is the format parameter merely informing the clipboard of the kind of 
payload it's being handed?

Btw, with pre-4.2.0 versions of R, this wasn't a problem. I am very much 
in favour of R using some kind of Unicode encoding natively, but this 
wrinkle seems to be something the user shouldn't have to deal with since 
the Windows clipboard is capable of holding Unicode text. Any advice 
would be gratefully received.

Thanks,
	Andrew.

Rui Barradas

2022-Sep-30 14:26 UTC

head link

[R] Problem with Windows clipboard and UTF-8

Hello,

I can reproduce this.


C:\Users\ruipb>R -q -e  "writeClipboard('categor?a');
sessionInfo()"
 > writeClipboard('categor?a'); sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Portugal.utf8  LC_CTYPE=Portuguese_Portugal.utf8
[3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.utf8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.2.1

# quoting Andrew: Pasting the result into this e-mail message yields
categor??a



And with the same sessionInfo() output


R -q -e  "writeClipboard('categor?a', format = 13)"
# <Ctrl+V> paste clipboard here
categor?a


Hope this helps,

Rui Barradas

?s 14:05 de 30/09/2022, Andrew Hart via R-help escreveu:> Hi everyone,
> 
> Recently I upgraded to R 4.2.1 which now uses UTF-8 internally as its 
> native encoding. Very nice. However, I've discovered that if I use 
> writeClipboard to try and move a string containing accented characters 
> to the Windows clipboard and then try and paste that into another 
> application (e.g. notepad, Eclipse, etc.), the accents turn out all 
> garbled. Here's an example:
> 
> writeClipboard("categor?a")
> Pasting the result into this e-mail message yields
> Categor??a
> 
> As near as I can tell, the problem seems to have something to do with 
> the format parameter of writeClipboard. By default, format has a value 
> of 1, which tells the clipboard to receive Text in the machine's
locale.
> If I set format=13 in the call, the accents transfer to the clipboard 
> correctly:
> 
> writeClipboard("categor?a", format=13)
> and the result is
> Categor?a
> 
> It seems that format=13 may be a better default now that R is using 
> UTF-8. It would be nice not to have to specify the format every time I 
> want to copy text to the clipboard with writeClipboard.
> 
> Is writeClipboard supposed to perform any kind of encoding conversion or 
> is the format parameter merely informing the clipboard of the kind of 
> payload it's being handed?
> 
> Btw, with pre-4.2.0 versions of R, this wasn't a problem. I am very
much
> in favour of R using some kind of Unicode encoding natively, but this 
> wrinkle seems to be something the user shouldn't have to deal with
since
> the Windows clipboard is capable of holding Unicode text. Any advice 
> would be gratefully received.
> 
> Thanks,
>  ????Andrew.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Tomas Kalibera

2022-Oct-13 13:24 UTC

head link

[R] Problem with Windows clipboard and UTF-8

Hi Andrew,

On 9/30/22 15:05, Andrew Hart via R-help wrote:> Hi everyone,
>
> Recently I upgraded to R 4.2.1 which now uses UTF-8 internally as its 
> native encoding. Very nice. However, I've discovered that if I use 
> writeClipboard to try and move a string containing accented characters 
> to the Windows clipboard and then try and paste that into another 
> application (e.g. notepad, Eclipse, etc.), the accents turn out all 
> garbled. Here's an example:
>
> writeClipboard("categor?a")
> Pasting the result into this e-mail message yields
> Categor??a
>
> As near as I can tell, the problem seems to have something to do with 
> the format parameter of writeClipboard. By default, format has a value 
> of 1, which tells the clipboard to receive Text in the machine's 
> locale. If I set format=13 in the call, the accents transfer to the 
> clipboard correctly:
>
> writeClipboard("categor?a", format=13)
> and the result is
> Categor?a
Ivan Krylov has kindly turned this into a bug report, please see

https://bugs.r-project.org/show_bug.cgi?id=18412

for more details. In short, yes, using format=13 is recommended, but 
please note it has already been documented in ?writeClipboard.
> It seems that format=13 may be a better default now that R is using 
> UTF-8. It would be nice not to have to specify the format every time I 
> want to copy text to the clipboard with writeClipboard.
Yes, I agree, I've changed the default to format=13.
> Is writeClipboard supposed to perform any kind of encoding conversion 
> or is the format parameter merely informing the clipboard of the kind 
> of payload it's being handed?
>
> Btw, with pre-4.2.0 versions of R, this wasn't a problem. I am very 
> much in favour of R using some kind of Unicode encoding natively, but 
> this wrinkle seems to be something the user shouldn't have to deal 
> with since the Windows clipboard is capable of holding Unicode text. 
> Any advice would be gratefully received.
This is a bit complicated and more can be found in the bug report 
response. In short, the clipboard is capable of holding either "text" 
(then with locale information) or "Unicode text". One can ask Windows 
for either content and Windows will do the conversion, it would convert 
from "text" to "Unicode text" using that locale. If that
locale is not
filled in explicitly, it is the current input language (so the 
"keyboard" the user has selected at the time of the copying to 
clipboard, e.g. of writeClipboard). If that locale encoding doesn't 
match the R current native encoding, and you are using "text", 
characters may be mis-represented. This could have happened even before 
R 4.2.0, but is more likely from R 4.2.0 when it uses UTF-8. Going via 
"Unicode text" resolves the issue as the conversion to/from UTF-16LE
is
done by readClipboard/writeClipboard using the R current native encoding.

Users who don't want to deal with these complexities can use the 
higher-level connections interface (?connections, "clipboard").

Best
Tomas
>
> Thanks,
> ????Andrew.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Oct 2022 - Problem with Windows clipboard and UTF-8

[R] Problem with Windows clipboard and UTF-8

[R] Problem with Windows clipboard and UTF-8

[R] Problem with Windows clipboard and UTF-8