thr3ads.net - R devel - [Rd] Pre-building lazyload DB (fwd) [Feb 2005]

If this information is useful, please help other people find it:
Share via:

James W. MacDonald

2005-Feb-08 20:26 UTC

[Rd] Pre-building lazyload DB (fwd)

> What is the benefit of lazyload DB in this circumstance?  I don't see
it
> if your .rda files have one data object each and are compressed.
The paradigm we have been following is to have all the environments 
saved in individual .rda files, so after loading the package they can be 
accessed with e.g., ls(), get(), mget() automatically without explicitly 
having to load() each environment into the current workspace. However, 
if we build the win32 packages using R CMD INSTALL --build (which AFAIK 
is the recommended method), the .rda files all get packaged up in an 
Rdata.zip file, so they won't be found in the search path unless they 
are loaded into the workspace using data().

If the design expectation was to have separate compressed dumps for each 
object, does this imply that our metaData packages should be built using 
R CMD build --binary, whereas the rest of the packages should be built 
using R CMD INSTALL --build?
> 
> Do you have a `data/filelist' index in your packages, as suggested by 
> 200update.txt and `Writing R Extensions'?  The slow examples I have
seen
> did not and so were wasting a lot of time preparing indices that could 
> have been supplied.
> 
> The design expectation was that large data packages would supply an index 
> and not use lazyloading for datasets but use separate compressed dumps for 
> each object.  If there is some reason to change that, please send an RFC 
> for the requirements and a design.
> 
> Did this not occur during the alpha/beta period for 2.0.0 several months 
> ago or has something in BioC changed since?  (I did ascertain that if 
> filelist was supplied the then BioC packages installed and loaded quickly 
> and smoothly.)
> 
> On Tue, 8 Feb 2005, James MacDonald wrote:
> 
> 
>>Hi all,
>>
>>Bioconductor has several metaData packages that contain quite large
>>data sets. In the past, these data were simply held in the /data
>>directory of the package as .rda files and load()ed as needed.
>>Converting to using lazy data loading may have memory and performance
>>advantages, but for the larger metaData packages the installation is
>>painfully slow (it has taken > 30 min to install a large metaData
>>package on a PIII, 933 MHz box running Mandrake 9.2). The vast majority
>>of the time is spent moving datasets to lazyload DB.
>>
>>It takes a long time to build the win32 packages as well, but once the
>>package is built, the installation is quick, so there is no real problem
>>for our end users. So my question is this; is there a mechanism that can
>>be used to pre-build the lazyload DB for source packages to decrease the
>>installation time for our end users?
> 
> 

-- 
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109

Possibly Parallel Threads

Search for more possibly parallel threads

R devel - Feb 2005 - Pre-building lazyload DB (fwd)

[Rd] Pre-building lazyload DB (fwd)

Possibly Parallel Threads

Wisdom of the Ancients