I observed some strange results playing around with gzfile() [R-1.3.0, WinNT 4.0]: At first x <- 1:1000 write(x, file = "c:/temp.txt") results in a file of about 4 kB. But my.con <- gzfile("c:/temp.gz", open = "w") write(x, file = my.con) close(my.con) results in a file of about 16 kB. I expected a reduction of the size. Anyone who can tell me what went wrong? Uwe Ligges -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Wed, 27 Jun 2001, Uwe Ligges wrote:> I observed some strange results playing around with gzfile() [R-1.3.0, > WinNT 4.0]: > > At first > > x <- 1:1000 > write(x, file = "c:/temp.txt") > > results in a file of about 4 kB. But > > my.con <- gzfile("c:/temp.gz", open = "w") > write(x, file = my.con) > close(my.con) > > results in a file of about 16 kB. > > I expected a reduction of the size. Anyone who can tell me what went > wrong?My experiments concur: I do get a 15913 byte file and it is a valid gzip file. I''ve used this much more to read compressed files than write them. I will take a closer look at the zlib specs when I have time. Brian -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I would guess that this is an artifact of compressionalogor.: under a unix shell: > cat tmp1.txt 1 2 3 > ls -l tmp1.txt -rw------- 1 y0004379 users 5 Juni 27 17:31 tmp1.txt > gzip tmp1.txt > ls -l tmp1.txt.gz -rw------- 1 y0004379 users 34 Juni 27 17:31 tmp1.txt.gz #####on windows r-1.3.0:> x <- rnorm(10000) > write(x, file = "c:/temp.txt") > system("ls -l c:/temp/temp.txt",T)[1] "a----- 4093 27-Jun-101 16:50 c:/temp/temp.txt"> my.con <- gzfile("c:/temp/temp.gz", open = "w") > write(x, file = my.con)close(my.con)> > system("ls -l c:/temp/temp.gz",T)[1] "a----- 193200 27-Jun-101 17:24 c:/temp/temp.gz" on large file the gzip - alogorithm is effective! Peter>> ls -l tmp1.txt.bz2 -rw------- 1 y0004379 users 43 Juni 27 17:31 tmp1.txt.bz2 > On Wed, Jun 27, 2001 at 03:11:37PM +0100, Prof Brian D Prof Brian D Ripley wrote:> On Wed, 27 Jun 2001, Uwe Ligges wrote: > > > I observed some strange results playing around with gzfile() [R-1.3.0, > > WinNT 4.0]: > > > > At first > > > > x <- 1:1000 > > write(x, file = "c:/temp.txt") > > > > results in a file of about 4 kB. But > > > > my.con <- gzfile("c:/temp.gz", open = "w") > > write(x, file = my.con) > > close(my.con) > > > > results in a file of about 16 kB. > > > > I expected a reduction of the size. Anyone who can tell me what went > > wrong? > > My experiments concur: I do get a 15913 byte file and it is a valid gzip > file. > > I''ve used this much more to read compressed files than write them. > I will take a closer look at the zlib specs when I have time. > > Brian > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272860 (secr) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- P.Malewski, Limmerstr.47, 30451 Hannover, 0511-2135008 At work: http://www.MH-Hannover.de 0511 532 3194 / Fax: 0511 532 3190, P.Malewski at tu-bs.de, peter.malewski at gmx.de, malewski.peter at mh-hannover.de. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Here are some more experiments: zz <- gzfile("t1.gz", "w") write(1:1000, zz) close(zz) zz <- gzfile("t2.gz", "w") writeLines(as.character(1:1000), zz) close(zz) zz <- gzfile("t3.gz", "w") writeBin(1:1000, zz) close(zz) zz <- textConnection("out", "w") write(1:1000, zz) close(zz) zz <- gzfile("t4.gz", "w") writeLines(out, zz) close(zz) ls -l -rw-r--r-- 1 ripley Administ 15913 Jun 27 23:20 t1.gz -rw-r--r-- 1 ripley Administ 1848 Jun 27 23:20 t2.gz -rw-r--r-- 1 ripley Administ 1434 Jun 27 23:20 t3.gz -rw-r--r-- 1 ripley Administ 1856 Jun 27 23:20 t4.gz All are 3893 bytes uncompressed except t3, which is 4000. The problem with the first is that it writes in very small pieces, 1 \n 2 \n 3 \n 4 \n ... and as the output is trying for no latency, it has too little opportunity to compress. The moral seems to be to write to gzfile connections in moderately-sized pieces. It''s the one-byte carriage returns that really do the damage here. On Wed, 27 Jun 2001, Prof Brian Ripley wrote:> On Wed, 27 Jun 2001, Uwe Ligges wrote: > > > I observed some strange results playing around with gzfile() [R-1.3.0, > > WinNT 4.0]: > > > > At first > > > > x <- 1:1000 > > write(x, file = "c:/temp.txt") > > > > results in a file of about 4 kB. But > > > > my.con <- gzfile("c:/temp.gz", open = "w") > > write(x, file = my.con) > > close(my.con) > > > > results in a file of about 16 kB. > > > > I expected a reduction of the size. Anyone who can tell me what went > > wrong? > > My experiments concur: I do get a 15913 byte file and it is a valid gzip > file. > > I''ve used this much more to read compressed files than write them. > I will take a closer look at the zlib specs when I have time. > > Brian > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272860 (secr) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._