This might seem like a strange question but is there any way to compress an R object (such as a matrix) and know its resulting size in bytes ? Clearly, I could implement this in the following way (if x is my matrix): zz <- gzfile(fname,"w"); write.table(x,zz); close(zz); file.info(fname)[,"size"]; However, I need to do this for hundreds of thousands of objects and the overhead in terms of disk access due to the actual file creation is prohibitive. I guess, I would like a modified object.size() function that returns the size of the compressed (e.g. gzip) version of the object. Thanks! Markus [[alternative HTML version deleted]]
Prof Brian Ripley
2009-Feb-08 06:43 UTC
[R] compressing data without writing output to file
What do you want the compressed R object to be? (It is not an R object.) Omegahat package Rcompression may help you, but it returns a raw vector (and that has overheads such as the header: you could use its length if appropriate). On Sat, 7 Feb 2009, Markus Loecher wrote:> This might seem like a strange questionIt is ore than a little imprecise ....> but is there any way to compress an > R object (such as a matrix) and know its resulting size in bytes ? > Clearly, I could implement this in the following way (if x is my matrix): > zz <- gzfile(fname,"w"); > write.table(x,zz); > close(zz); > file.info(fname)[,"size"];Hmm, that calcuates the size of a compressed character representation of the object. So do you want the size of an object or of its character representation? object.size() calculated the first.> However, I need to do this for hundreds of thousands of objects and the > overhead in terms of disk access due to the actual file creation is > prohibitive.The overheads of finding a character representation and of allocating an R object for the result would also be large.> I guess, I would like a modified object.size() function that returns the > size of the compressed (e.g. gzip) version of the object.I don't see the pooint of calculating the size of something you will not use. And anything involving 'hundreds of thousands of objects' is better done in C code. So why not just write a C function to do whatever it is you really want (but have not told us). In fact ehe way lazy-loading is implemented is pretty close to what you describe -- that uses an on-disk database and it not slow for 100,000 objects.> Thanks! > > Markus > > [[alternative HTML version deleted]]PLEASE do read the posting guide (belatedly) and do not send HTML as you were asked.> ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595