I observed some strange results playing around with gzfile() [R-1.3.0,
WinNT 4.0]:
At first
  x <- 1:1000
  write(x, file = "c:/temp.txt")
results in a file of about 4 kB. But
  my.con <- gzfile("c:/temp.gz", open = "w")
  write(x, file = my.con)
  close(my.con)
results in a file of about 16 kB.
I expected a reduction of the size. Anyone who can tell me what went
wrong?
Uwe Ligges
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Wed, 27 Jun 2001, Uwe Ligges wrote:> I observed some strange results playing around with gzfile() [R-1.3.0, > WinNT 4.0]: > > At first > > x <- 1:1000 > write(x, file = "c:/temp.txt") > > results in a file of about 4 kB. But > > my.con <- gzfile("c:/temp.gz", open = "w") > write(x, file = my.con) > close(my.con) > > results in a file of about 16 kB. > > I expected a reduction of the size. Anyone who can tell me what went > wrong?My experiments concur: I do get a 15913 byte file and it is a valid gzip file. I''ve used this much more to read compressed files than write them. I will take a closer look at the zlib specs when I have time. Brian -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I would guess that this is an artifact of compressionalogor.: under a unix shell: > cat tmp1.txt 1 2 3 > ls -l tmp1.txt -rw------- 1 y0004379 users 5 Juni 27 17:31 tmp1.txt > gzip tmp1.txt > ls -l tmp1.txt.gz -rw------- 1 y0004379 users 34 Juni 27 17:31 tmp1.txt.gz #####on windows r-1.3.0:> x <- rnorm(10000) > write(x, file = "c:/temp.txt") > system("ls -l c:/temp/temp.txt",T)[1] "a----- 4093 27-Jun-101 16:50 c:/temp/temp.txt"> my.con <- gzfile("c:/temp/temp.gz", open = "w") > write(x, file = my.con)close(my.con)> > system("ls -l c:/temp/temp.gz",T)[1] "a----- 193200 27-Jun-101 17:24 c:/temp/temp.gz" on large file the gzip - alogorithm is effective! Peter>> ls -l tmp1.txt.bz2 -rw------- 1 y0004379 users 43 Juni 27 17:31 tmp1.txt.bz2 > On Wed, Jun 27, 2001 at 03:11:37PM +0100, Prof Brian D Prof Brian D Ripley wrote:> On Wed, 27 Jun 2001, Uwe Ligges wrote: > > > I observed some strange results playing around with gzfile() [R-1.3.0, > > WinNT 4.0]: > > > > At first > > > > x <- 1:1000 > > write(x, file = "c:/temp.txt") > > > > results in a file of about 4 kB. But > > > > my.con <- gzfile("c:/temp.gz", open = "w") > > write(x, file = my.con) > > close(my.con) > > > > results in a file of about 16 kB. > > > > I expected a reduction of the size. Anyone who can tell me what went > > wrong? > > My experiments concur: I do get a 15913 byte file and it is a valid gzip > file. > > I''ve used this much more to read compressed files than write them. > I will take a closer look at the zlib specs when I have time. > > Brian > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272860 (secr) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- P.Malewski, Limmerstr.47, 30451 Hannover, 0511-2135008 At work: http://www.MH-Hannover.de 0511 532 3194 / Fax: 0511 532 3190, P.Malewski at tu-bs.de, peter.malewski at gmx.de, malewski.peter at mh-hannover.de. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Here are some more experiments:
zz <- gzfile("t1.gz", "w")
write(1:1000, zz)
close(zz)
zz <- gzfile("t2.gz", "w")
writeLines(as.character(1:1000), zz)
close(zz)
zz <- gzfile("t3.gz", "w")
writeBin(1:1000, zz)
close(zz)
zz <- textConnection("out", "w")
write(1:1000, zz)
close(zz)
zz <- gzfile("t4.gz", "w")
writeLines(out, zz)
close(zz)
ls -l
-rw-r--r--    1 ripley   Administ    15913 Jun 27 23:20 t1.gz
-rw-r--r--    1 ripley   Administ     1848 Jun 27 23:20 t2.gz
-rw-r--r--    1 ripley   Administ     1434 Jun 27 23:20 t3.gz
-rw-r--r--    1 ripley   Administ     1856 Jun 27 23:20 t4.gz
All are 3893 bytes uncompressed except t3, which is 4000.  The problem with
the first is that it writes in very small pieces,
1 \n 2 \n 3 \n 4 \n  ...
and as the output is trying for no latency, it has too little
opportunity to compress.
The moral seems to be to write to gzfile connections in moderately-sized
pieces.  It''s the one-byte carriage returns that really do the damage
here.
On Wed, 27 Jun 2001, Prof Brian Ripley wrote:> On Wed, 27 Jun 2001, Uwe Ligges wrote:
>
> > I observed some strange results playing around with gzfile() [R-1.3.0,
> > WinNT 4.0]:
> >
> > At first
> >
> >   x <- 1:1000
> >   write(x, file = "c:/temp.txt")
> >
> > results in a file of about 4 kB. But
> >
> >   my.con <- gzfile("c:/temp.gz", open = "w")
> >   write(x, file = my.con)
> >   close(my.con)
> >
> > results in a file of about 16 kB.
> >
> > I expected a reduction of the size. Anyone who can tell me what went
> > wrong?
>
> My experiments concur: I do get a 15913 byte file and it is a valid gzip
> file.
>
> I''ve used this much more to read compressed files than write them.
> I will take a closer look at the zlib specs when I have time.
>
> Brian
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272860 (secr)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._