Hi Ben,
I have successfully used rcompression with archives created in Java. I have not
used the Deflater to do it but rather the GZipOutputStream.
(http://java.sun.com/j2se/1.5.0/docs/api/java/util/zip/GZIPOutputStream.html) in
combination with a java based tar archive generator
(http://www.trustice.com/java/tar/) to create gzipped tar files. The trustice
tar api is very similar to the java ZIP api and is fairly simple to pick up.
Regards
Andrew
www.mango-solutions.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Stabler, Ben
Sent: 06 May 2009 00:41
To: r-help at r-project.org
Subject: [R] Rcompression and Java Deflator
(this may be a duplicate post since I attached a file to the previous
try...sorry about that)
Below are the first few lines of a zlib compressed byte array written from Java
with the Deflator class.
> readBin("row_1",raw(),10000000)
[1] 4c 45 50 e2 49 d5 86 bc 48 a1 32 5d 49 9d f5 90 48 e0 14 33 49 8f 54 6a
49 77 c9 48 48 d9 ec 56 47 91 48 f0 47 25 56 ef 47 b8 f5 7b 46 35 25 00 47 73 11
c5 48 6c 8e b9 47 ca 71 92 46 8d dc aa 45 92 0e
I'm trying to read it into R with Rcompression and I can't get it to
work. I think it may be because Java's Deflator class by default (see below
... the nowrap parameter) writes the data without the header and checksum. I
can't change the Java creation code. I think uncompress() reads a zlib
package (with headers) and gunzip() reads a gzip package (with headers). Is
there a way to read the package load without headers? It is my understanding
that the package load (minus the headers) is the same for gzip and zlib. The
Ruby thread at the bottom seems to be related. Thanks for any help!
> compressedData = readBin("row_1",raw(),10000000)
> uncompress(compressedData)
Error in uncompress(compressedData) : corrupted compressed (gzip) source
> gunzip(compressedData)
Error in gunzip(compressedData) :
Failed to uncompress the raw data: (-3) incorrect header check
--------------
Java Deflater
public Deflater(int?level, boolean?nowrap) Creates a new compressor using the
specified compression level. If 'nowrap' is true then the ZLIB header
and checksum fields will not be used in order to support the compression format
used in both GZIP and PKZIP.
Parameters:
level?- the compression level (0-9)
nowrap?- if true then use GZIP compatible compression
http://java.sun.com/j2se/1.5.0/docs/api/java/util/zip/Deflater.html#Deflater(int,
boolean)
--------------
These threads also seem to be dealing with the same issue....
http://www.groupsrv.com/science/about98918.html
http://www.ruby-forum.com/topic/183400
The Ruby thread says "As could be seen in your first post, you are using
-MAX_WBITS, which enables old (headerless? don't know what it's called)
zlib format, that has no gzip header and no checksum. Maybe you should be using
+MAX_WBITS (the default), which adds necessary header and checksum."
Ben Stabler
Systems Analysis Group
Parsons Brinckerhoff
503.478.2859
___________________________
NOTICE: This communication and any attachments ("this message") may
contain confidential information for
the sole use of the intended recipient(s). Any unauthorized use, disclosure,
viewing, copying, alteration,
dissemination or distribution of, or reliance on this message is strictly
prohibited. If you have received this
message in error, or you are not an authorized recipient, please notify the
sender immediately by replying
to this message, delete this message and all copies from your e-mail system and
destroy any printed copies.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.