Displaying 20 results from an estimated 4000 matches similar to: "iconv: embedded nulls when converting to UTF-16"
2016 Feb 16
2
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
If I execute the code from the "?write.table" examples section
x <- data.frame(a = I("a \" quote"), b = pi)
# (ommited code)
write.csv(x, file = "foo.csv", fileEncoding = "UTF-16LE")
the resulting CSV file has a size of 6 bytes which is too short
(truncated):
""",3
The problem seems to be the iconv function:
2016 Feb 23
1
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
Excellent analysis, thank you both for the quick reply!
Is there anything I can do to get the bug fixed in the next version of R
(e. g. filing a bug report at https://bugs.r-project.org/bugzilla3/)?
On Tue, 2016-02-23 at 14:06 +0200, Mikko Korpela wrote:
> On 23.02.2016 11:37, Martin Maechler wrote:
> >>>>>> nospam at altfeld-im de <nospam at altfeld-im.de>
>
2016 Feb 23
4
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
>>>>> nospam at altfeld-im de <nospam at altfeld-im.de>
>>>>> on Mon, 22 Feb 2016 18:45:59 +0100 writes:
> Dear R developers
> I think I have found a bug that can be reproduced with two lines of code
> and I am very thankful to get your first assessment or feed-back on my
> report.
> If this is the wrong mailing list or I
2017 Apr 29
2
Any progress on write.csv fileEncoding for UTF-16 and UTF-32 ?
"R version 3.4.0 (2017-04-21)" on "x86_64-w64-mingw32" platform
I am using CSVs and other text tables, and text in general (including
regular expressions), on Windows 10.
For me, that means dealing with Windows-1252 and UTF-8 encoding, with UTF-16
and UTF-32 as helpful curiosities.
Something as simple as iconv ("\n", to = "UTF-16") causes an error, due to
2017 May 01
3
Any progress on write.csv fileEncoding for UTF-16 and UTF-32 ?
On 30/04/2017 12:23 PM, Duncan Murdoch wrote:
> No, I don't think anyone is working on this.
>
> There's a fairly simple workaround for the UTF-16 and UTF-32 iconv
> issues: don't attempt to produce character vectors, produce raw vectors
> instead. (The "toRaw" argument to iconv() asks for this.) Raw vectors
> can contain embedded nulls. Character vectors
2016 Feb 23
0
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
On 23.02.2016 11:37, Martin Maechler wrote:
>>>>>> nospam at altfeld-im de <nospam at altfeld-im.de>
>>>>>> on Mon, 22 Feb 2016 18:45:59 +0100 writes:
>
> > Dear R developers
> > I think I have found a bug that can be reproduced with two lines of code
> > and I am very thankful to get your first assessment or feed-back
2017 May 02
1
Any progress on write.csv fileEncoding for UTF-16 and UTF-32 ?
On 01/05/2017 8:49 PM, Jack Kelley wrote:
> Thanks for looking into this.
>
> A few notes regarding all the UTF encodings on Windows 10 ...
This all stems from the ancient bad decision by Microsoft to translate
LF characters to CR LF when writing text files. R passes 0A or 0A 00 or
0A 00 00 00 to the output routine (part of the C run-time), and it needs
to figure out how many
2016 Feb 24
2
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
On 23/02/2016 7:06 AM, Mikko Korpela wrote:
> On 23.02.2016 11:37, Martin Maechler wrote:
>>>>>>> nospam at altfeld-im de <nospam at altfeld-im.de>
>>>>>>> on Mon, 22 Feb 2016 18:45:59 +0100 writes:
>>
>> > Dear R developers
>> > I think I have found a bug that can be reproduced with two lines of code
2016 Feb 29
1
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
I have just committed your first patch (the strlen() replacement) to
R-devel, and will soon put it in R-patched as well. I wont have time to
look at this again before the 3.2.4 release, so your file.show() patch
isn't going to make it unless someone else gets to it.
There's still a faint chance that I'll do more in R-devel before 3.3.0,
but I think it's best if there were bug
2016 Feb 22
0
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
Dear R developers
I think I have found a bug that can be reproduced with two lines of code
and I am very thankful to get your first assessment or feed-back on my
report.
If this is the wrong mailing list or I did something wrong
(e. g. semi "anonymous" email address to protect my privacy and defend
unwanted spam) please let me know since I am new here.
Thank you very much :-)
J.
2016 Sep 05
2
How to print UTF-8 encoded strings from a C routine to R's output?
Dear R experts,
It seems that Rprintf has to be used to print from a C routine to guarantee
to write to R?s output according to
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Printing.
However if a string is UTF-8 encoded, non-ASCII characters (e.g., the
infinity symbol http://www.fileformat.info/info/unicode/char/221e/index.htm)
are misprinted.
Is this an unsupported feature or is
2016 Feb 25
2
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
On 23.02.2016 14:06, Mikko Korpela wrote:
> On 23.02.2016 11:37, Martin Maechler wrote:
>>>>>>> nospam at altfeld-im de <nospam at altfeld-im.de>
>>>>>>> on Mon, 22 Feb 2016 18:45:59 +0100 writes:
>>
>> > Dear R developers
>> > I think I have found a bug that can be reproduced with two lines of code
>>
2016 Feb 24
2
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
On 24/02/2016 9:55 AM, Mikko Korpela wrote:
> On 24.02.2016 15:47, Duncan Murdoch wrote:
>> On 23/02/2016 7:06 AM, Mikko Korpela wrote:
>>> On 23.02.2016 11:37, Martin Maechler wrote:
>>>>>>>>> nospam at altfeld-im de <nospam at altfeld-im.de>
>>>>>>>>> on Mon, 22 Feb 2016 18:45:59 +0100 writes:
>>>>
2016 Feb 24
0
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
On 24.02.2016 15:47, Duncan Murdoch wrote:
> On 23/02/2016 7:06 AM, Mikko Korpela wrote:
>> On 23.02.2016 11:37, Martin Maechler wrote:
>>>>>>>> nospam at altfeld-im de <nospam at altfeld-im.de>
>>>>>>>> on Mon, 22 Feb 2016 18:45:59 +0100 writes:
>>>
>>> > Dear R developers
>>> > I think
2016 Feb 24
0
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
On 24/02/2016 11:16 AM, Duncan Murdoch wrote:
> On 24/02/2016 9:55 AM, Mikko Korpela wrote:
>> On 24.02.2016 15:47, Duncan Murdoch wrote:
>>> On 23/02/2016 7:06 AM, Mikko Korpela wrote:
>>>> On 23.02.2016 11:37, Martin Maechler wrote:
>>>>>>>>>> nospam at altfeld-im de <nospam at altfeld-im.de>
2016 Feb 25
0
iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
On 25.02.2016 11:31, Mikko Korpela wrote:
> On 23.02.2016 14:06, Mikko Korpela wrote:
>> On 23.02.2016 11:37, Martin Maechler wrote:
>>>>>>>> nospam at altfeld-im de <nospam at altfeld-im.de>
>>>>>>>> on Mon, 22 Feb 2016 18:45:59 +0100 writes:
>>>
>>> > Dear R developers
>>> > I think I have
2013 Sep 09
2
Invalid UTF-8 with gsub(perl=TRUE) and iconv(sub="")
Hi!
I experience an error with an invalid UTF-8 character passed to
gsub(..., perl=TRUE); the interesting point is that with perl=FALSE (the
default) no error happens. (The character itself was read from an
invalid HTML file.) Illustration of the error:
gsub("a", "", "\U3e3965", perl=FALSE)
# [1] "\U3e3965"
gsub("a", "",
2018 Apr 26
1
embeded R application on Windows prints broken character.
The issue was reported to me for?https://github.com/randy3k/rtichoke/issues/50
which is a python program which embeds R and provides a interface to R.
With R 3.5,?for reason which i don't understand, when I typed `"a"` in the console
STDOUT got `"\x02\xff\xfea\x03\xff\xfe"`?with the extra escaped characters.
I notice that `\x02\xff\xfe` and `\x03\xff\xfe` are encoding
2017 Apr 30
0
Any progress on write.csv fileEncoding for UTF-16 and UTF-32 ?
No, I don't think anyone is working on this.
There's a fairly simple workaround for the UTF-16 and UTF-32 iconv
issues: don't attempt to produce character vectors, produce raw vectors
instead. (The "toRaw" argument to iconv() asks for this.) Raw vectors
can contain embedded nulls. Character vectors can't, because
internally, R is using 8 bit C strings, and the
2017 May 02
0
Any progress on write.csv fileEncoding for UTF-16 and UTF-32 ?
Thanks for looking into this.
A few notes regarding all the UTF encodings on Windows 10 ...
The default eol for write.csv (via write.table) is "\n" and always gives
as.raw (c (0x0d, 0x0a)), that is, <Carriage Return> <Line Feed> as adjacent
bytes. This is fine for UTF-8 but wrong for UTF-16 and UTF-32.
EXAMPLE: Using UTF-32 for exaggeration (note also that 3 nul bytes are