Does anyone know a way to do the following: Save a large number of R objects to a file (like load() does) but then read back only a small named subset of them . As far as I can see, load() reads back everything. The context is: I have an application which will generate a large number of large matrices (approx 15000 matrices each of dimension 2000*30). I can generate these matrices using an R-package I wrote, but it requires a large amouint of memory and is slow so I want to do this only once. However, I then want to do some subsequent processing, comprising a very large number of runs in which small (~ 10) random selection of matrices from the previously computed set are used for linear modeling. So I need a way to load back named objects previously saved in a call to save(). I can;t see anyway of doing this. Any ideas? Thanks Richard Mott -- ---------------------------------------------------- Richard Mott | Wellcome Trust Centre tel 01865 287588 | for Human Genetics fax 01865 287697 | Roosevelt Drive, Oxford OX3 7BN
This may not be quite the answer you're looking for, but I sometimes save each such object in its own file (usually <object.name>.RData). Then, if you know which objects you're looking for, you know their names, and can load the individual files. Hope this helps, Matt Wiener -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Richard Mott Sent: Tuesday, June 14, 2005 9:03 AM To: r-help at stat.math.ethz.ch Subject: [R] load ing and saving R objects Does anyone know a way to do the following: Save a large number of R objects to a file (like load() does) but then read back only a small named subset of them . As far as I can see, load() reads back everything. The context is: I have an application which will generate a large number of large matrices (approx 15000 matrices each of dimension 2000*30). I can generate these matrices using an R-package I wrote, but it requires a large amouint of memory and is slow so I want to do this only once. However, I then want to do some subsequent processing, comprising a very large number of runs in which small (~ 10) random selection of matrices from the previously computed set are used for linear modeling. So I need a way to load back named objects previously saved in a call to save(). I can;t see anyway of doing this. Any ideas? Thanks Richard Mott -- ---------------------------------------------------- Richard Mott | Wellcome Trust Centre tel 01865 287588 | for Human Genetics fax 01865 287697 | Roosevelt Drive, Oxford OX3 7BN ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Richard Mott wrote:> Does anyone know a way to do the following: > > Save a large number of R objects to a file (like load() does) but then > read back only a small named subset of them . As far as I can see, > load() reads back everything.Save them to individual files when you generate them? for(i in 1:15000){ m=generateBigMatrix(i) filename=paste("BigMatrix-",i,".Rdata",sep='') save(m,file=filename) } Note that load will always overwrite 'm', so to load a sample of them in you'll need to do something like this: bigSamples=list() for(i in sample(15000,N)){ filename=paste("BigMatrix-",i,".Rdata",sep='') load(filename) bigSamples[[i]]=m } But there may be a more efficient way to string up a big list like that, I can never remember - get it working, then worry about optimisation. I hope your filesystem is happy with 15000 objects in it. I would dedicate a folder or directory for just these objects' files, since it then becomes near impossible to see anything other than the big matrix files... Baz
I would suggest saving each object to an individual file with some sort of systematic file name. That way, you can implement a rudimentary key-value database and load only the objects you want. You might be interested in the 'serialize()' and 'unserialize()' functions for this purpose. If having ~15000 files is not desirable, then you need a database like GDBM. If you can live with something simpler, you might take a look at my 'filehash' package at http://sandybox.typepad.com/software/. It hasn't been tested much but it may suit your needs. -roger Richard Mott wrote:> Does anyone know a way to do the following: > > Save a large number of R objects to a file (like load() does) but then > read back only a small named subset of them . As far as I can see, > load() reads back everything. > > The context is: > > I have an application which will generate a large number of large > matrices (approx 15000 matrices each of dimension 2000*30). I can > generate these matrices using an R-package I wrote, but it requires a > large amouint of memory and is slow so I want to do this only once. > However, I then want to do some subsequent processing, comprising a very > large number of runs in which small (~ 10) random selection of matrices > from the previously computed set are used for linear modeling. So I > need a way to load back named objects previously saved in a call to > save(). I can;t see anyway of doing this. Any ideas? > > Thanks > > Richard Mott > >-- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/
> On Tue, 14 Jun 2005, Prof Brian Ripley wrote: > If your file system does not like 15000 files you can always > save in a DBMS.Or, switch to a better/more appropriate file system: http://en.wikipedia.org/wiki/Comparison_of_file_systems ReiserFS would allow you to store up to about 1.2 million files in a directory.> -----Original Message----- > From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] > Sent: Tuesday, June 14, 2005 10:41 AM > To: Barry Rowlingson > Cc: r-help at stat.math.ethz.ch; Richard Mott > Subject: Re: [R] load ing and saving R objects > > > On Tue, 14 Jun 2005, Barry Rowlingson wrote: > > > Richard Mott wrote: > >> Does anyone know a way to do the following: > >> > >> Save a large number of R objects to a file (like load() > does) but then > >> read back only a small named subset of them . As far as I can see, > >> load() reads back everything. > > > > Save them to individual files when you generate them? > > > > for(i in 1:15000){ > > > > m=generateBigMatrix(i) > > > > filename=paste("BigMatrix-",i,".Rdata",sep='') > > save(m,file=filename) > > } > > > > Note that load will always overwrite 'm', so to load a > sample of them in > > you'll need to do something like this: > > > > bigSamples=list() > > > > for(i in sample(15000,N)){ > > filename=paste("BigMatrix-",i,".Rdata",sep='') > > load(filename) > > bigSamples[[i]]=m > > } > > > > But there may be a more efficient way to string up a big list like > > that, I can never remember - get it working, then worry > about optimisation. > > (Yes, use bigSamples <- vector("list", 15000) first.) > > > I hope your filesystem is happy with 15000 objects in it. I would > > dedicate a folder or directory for just these objects' > files, since it > > then becomes near impossible to see anything other than the > big matrix > > files... > > .readRDS/.saveRDS might be a better way to do this, and avoids always > restoring to "m". > > If your file system does not like 15000 files you can always > save in a > DBMS. > > I did once look into restoring just some of the objects in a save()ed > file, but it is not really possible to do so efficiently due > to sharing > between objects. > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
On 6/14/05, Richard Mott <rmott at well.ox.ac.uk> wrote:> Does anyone know a way to do the following: > > Save a large number of R objects to a file (like load() does) but then > read back only a small named subset of them . As far as I can see, > load() reads back everything. > > The context is: > > I have an application which will generate a large number of large > matrices (approx 15000 matrices each of dimension 2000*30). I can > generate these matrices using an R-package I wrote, but it requires a > large amouint of memory and is slow so I want to do this only once. > However, I then want to do some subsequent processing, comprising a very > large number of runs in which small (~ 10) random selection of matrices > from the previously computed set are used for linear modeling. So I > need a way to load back named objects previously saved in a call to > save(). I can;t see anyway of doing this. Any ideas?Check out the g.data delayed data package on CRAN and the article in R News 2/3.