Dear All, I'd like to be able to have R store (in a list component) a compressed data set, and then write it out uncompressed. gzcon and gzfile work in exactly the opposite direction. What would be a good way to handle this? Details: ---------- We have a package that uses C; part of the C output is a large sparse matrix. This is never manipulated directly by R, but always by the C code. However, we need to store that data somewhere (inside an R object) for further calls to the functions in our package. We'd like to store that matrix as part of the R object (say, as an element of a list). Ideally, it would be stored in as compressed a way as possible. Then, when we need to use that information, it would be decompressed and passed to the C function. I guess one way to do it is to have C deal with the compression and uncompression (e.g., using zlib or the bzip2 libraries) and then use readBin, etc, from R. But, if I can, I'd like to avoid our C code having to call zlib, etc, so as to make our package easily portable. Thanks, R. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz
Ramon, If you are looking for a solution to your specific application (as opposed to a general compression/ decompression mechanism), it might be worth checking out the Matrix package, which has facilities for storing and manipulating sparse matrices. The sparseMatrix class stores matrices in the triplet representation (i.e. only indices and values of the non-zero elements) and this affords great compression ratios, depending on the size and degree of sparseness of the matrix. -Christos> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Ramon Diaz-Uriarte > Sent: Thursday, February 28, 2008 1:18 PM > To: r-help at stat.math.ethz.ch > Subject: [R] compress data on read, decompress on write > > Dear All, > > I'd like to be able to have R store (in a list component) a > compressed data set, and then write it out uncompressed. > gzcon and gzfile work in exactly the opposite direction. What > would be a good way to handle this? > > Details: > ---------- > > We have a package that uses C; part of the C output is a > large sparse matrix. This is never manipulated directly by R, > but always by the C code. However, we need to store that data > somewhere (inside an R > object) for further calls to the functions in our package. > We'd like to store that matrix as part of the R object (say, > as an element of a list). Ideally, it would be stored in as > compressed a way as possible. > Then, when we need to use that information, it would be > decompressed and passed to the C function. > > I guess one way to do it is to have C deal with the > compression and uncompression (e.g., using zlib or the bzip2 > libraries) and then use readBin, etc, from R. But, if I can, > I'd like to avoid our C code having to call zlib, etc, so as > to make our package easily portable. > > > Thanks, > > R. > > -- > Ramon Diaz-Uriarte > Statistical Computing Team > Structural Biology and Biocomputing Programme Spanish > National Cancer Centre (CNIO) http://ligarto.org/rdiaz > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
One solution is likely to be the Omegahat package Rcompression. Otherwise, R does have internal facilities to do internal (gzip) compression and decompression (e.g. see the end of src/main/connections.c), and you could make creative use of serialization to do the compression. On Thu, 28 Feb 2008, Ramon Diaz-Uriarte wrote:> Dear All, > > I'd like to be able to have R store (in a list component) a compressed > data set, and then write it out uncompressed. gzcon and gzfile work in > exactly the opposite direction. What would be a good way to handle > this? > > Details: > ---------- > > We have a package that uses C; part of the C output is a large sparse > matrix. This is never manipulated directly by R, but always by the C > code. However, we need to store that data somewhere (inside an R > object) for further calls to the functions in our package. We'd like > to store that matrix as part of the R object (say, as an element of a > list). Ideally, it would be stored in as compressed a way as possible. > Then, when we need to use that information, it would be decompressed > and passed to the C function. > > I guess one way to do it is to have C deal with the compression and > uncompression (e.g., using zlib or the bzip2 libraries) and then use > readBin, etc, from R. But, if I can, I'd like to avoid our C code > having to call zlib, etc, so as to make our package easily portable.As from R 2.7.0 you will be able to make use of zlib on effectively all platforms, since it has a public interface on Windows.> > Thanks, > > R. > > -- > Ramon Diaz-Uriarte > Statistical Computing Team > Structural Biology and Biocomputing Programme > Spanish National Cancer Centre (CNIO) > http://ligarto.org/rdiaz-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595