Hi!
I'm trying to read individual files from a ZIP archive, using the unz()
function. Some of the files contain non-ASCII characters and I'd like to
avoid unpacking them in a temporary directory.
My problem is that unz() seems to ignore the encoding="latin1" option
I need to read the non-ASCII characters properly. I can't find a clear
indication in the documentation that this is expected behaviour, except for the
remark that "unz reads (only) single files within zip files, in binary
mode" (and a short comment further below that re-encoding only works for
text connections).
Digging a bit in the source code, the ultimate cause seems to be this line in
the unz_open() C-level function, on line 359 of src/main/dounzip.c:
> /* set_iconv(); not yet */
Any ideas why this is commented out? The previous lines set up con->text
appropriately and con->encname was set by do_unz(), so I don't see an
obvious reason why the iconv layer can't be added.
I'm working on 2.11.1
> _
> platform i386-apple-darwin9.8.0
> arch i386
> os darwin9.8.0
> system i386, darwin9.8.0
> status
> major 2
> minor 11.1
> year 2010
> month 05
> day 31
> svn rev 52157
> language R
> version.string R version 2.11.1 (2010-05-31)
but have been looking at the current R-devel source code, so I suspect my
problem won't just go away with the next release.
Best regards,
Stefan Evert
[ stefan.evert at uos.de | http://purl.org/stefan.evert ]