At the Vienna meeting we discussed the problems encountered on some operating systems when storing many small files in a directory. In particular the directories $RHOME/library/base/help/, $RHOME/library/base/R-ex/, and $RHOME/library/base/data/ can take up an enormous amount of storage on the Macintosh or on Windows systems because the minimum amount of storage per distinct file is quite large. Fritz Leisch and I suggested storing the contents of each of these directories as a single .zip file. This should result in considerable savings in size with little penalty in access speed. To implement this we would need code for accessing files within a .zip archive and for decompressing the files. I know of the files from Info-Zip including the zlib sources (compression/decompression) and the contrib/minizip directory in the zlib source tree. The minizip directory provides a prototype unzip.c and unzip.h. These would be used to modify the R internal function file.exists and file.show so they could look in the archive as well as in a directory. Does anyone know of a reason why this would not be a good idea? Does anyone know of better|more_portable|whatever implementations of code to access files within a .zip archive. The info-zip sources are available at http://www.cdrom.com/pub/infozip/zlib/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 25 Mar 1999, Douglas Bates wrote:> At the Vienna meeting we discussed the problems encountered on some > operating systems when storing many small files in a directory. In > particular the directories $RHOME/library/base/help/, > $RHOME/library/base/R-ex/, and $RHOME/library/base/data/ can take up > an enormous amount of storage on the Macintosh or on Windows systems > because the minimum amount of storage per distinct file is quite > large.Not just base: lme is at least as bad and MASS and boot are large too.> Fritz Leisch and I suggested storing the contents of each of these > directories as a single .zip file. This should result in considerable > savings in size with little penalty in access speed. > > To implement this we would need code for accessing files within a .zip > archive and for decompressing the files. I know of the files from > Info-Zip including the zlib sources (compression/decompression) and > the contrib/minizip directory in the zlib source tree. The minizip > directory provides a prototype unzip.c and unzip.h. These would be > used to modify the R internal function file.exists and file.show so > they could look in the archive as well as in a directory. > > Does anyone know of a reason why this would not be a good idea? Does > anyone know of better|more_portable|whatever implementations of code > to access files within a .zip archive. The info-zip sources are > available at http://www.cdrom.com/pub/infozip/zlib/Those are the ones I was looking at: they do seem portable, if slow. However, I am not sure that file.exists and file.show should be overloaded, and for data() you need rather more than those. What is needed I think is a function to extract a file from an archive to a temporary file, to be based to file.show or to source or load. There is a snag in ensuring that temporary files get deleted, but file.show has a flag for this, and example and data could have an on.exit call to file.remove. One important point is that AnIndex and 00Titles are in help, and they probably should be kept uncompressed (not least as the PERL scripts read them). It would be easy to mock this up using scripts. Who is doing this? -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
This sounds similar to the Virtual File System (VFS) used by the GNU Midnight Commander (a file manager programme). VFS allows you to browse files that are not on your native file system. Among other things, you can look inside tar and compressed tar files as if they were unpacked. ZIP file support isn't very good - it's just a shell script interface to the zip programme - but I suppose this could be done internally like the "utar" filesystem. The VFS code comes with the Midnight Commander, but it is supposed to compile as a standalone library. If this works, and is portable (!), it might be cleaner to use libvfs than to do this inside R. Or maybe not. Anyway here is the URL: http://www.gnome.org/mc/ Martyn P.S I once trashed my FAT file system by filling it with many small files from gnu-win32 ports, so this may be a safety issue as well as a speed one. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> Date: Fri, 26 Mar 1999 11:44:44 +0000 (GMT) > From: Nicholas Lee <N.J.Lee@statslab.cam.ac.uk> > > Why not just the approach that some linux distributions use, (debian at > least) of gziping the man pages individually. > > Makes it easier and quicker to patch pages and doesn't require temporary > swap space for uncompressing a whole archive.(1) The whole point is to avoid having many tiny files. On Windows the current help files take up ca 37Mb on my VFAT file system, and about 1Mb in a zip archive. As Doug Bates said at the beginning of this thread:> At the Vienna meeting we discussed the problems encountered on some > operating systems when storing many small files in a directory. In > particular the directories $RHOME/library/base/help/, > $RHOME/library/base/R-ex/, and $RHOME/library/base/data/ can take up > an enormous amount of storage on the Macintosh or on Windows systems > because the minimum amount of storage per distinct file is quite > large.(2) You can avoid uncompressing a whole archive with zip (not gzip). (3) You should not be thinking about patching pages in an R distribution: whole R distributions are changed often enough. It is the installed pages (help, R-ex, latex) not the source pages (Rd) that we are considering putting into a zip archive. -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> From: Jonathan Rougier <J.C.Rougier@durham.ac.uk> > > On Fri, 26 Mar 1999, Martyn Plummer wrote: > > > The problem isn't that the help files take up a lot of space, > > but that certain file systems are extremely inefficient in > > the use of that space. > > This is probably an unrealistic suggestion, but if compression is less of > a problem than the number of files, why not take the help files `onboard' > by creating a new class of objects with mode "help", which would be lists > with components much like the current .Rd files, and attach them at the > bottom of the search list. Each library would then require just a single > file of help objects.But then loading a `package' would take longer and use much more memory, and memory in R is precious. We have talked about concatenating the help/R-ex files into a single file per package, and indexing it, but the zip file route avoids some of the work and helps with compression too. -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Tue, 6 Apr 1999, Martin Maechler wrote: [discussion moved make to R-devel]> >>>>> "BDR" == Prof Brian D Ripley <ripley@stats.ox.ac.uk> writes: > > <.....> > > BDR> Doing this proved relatively easy and quite fast enough. It will > BDR> appear in the next Windows version (rw0640, I presume) that is due > BDR> in a few days. We just use unzip to extract a file and display > BDR> it. (There is a DLL version of unzip, but it seems unnecessary.) > BDR> Using zipped help is optional by package, and where selected (when > BDR> installing the package from source) zips up the text, latex and > BDR> example files. The space savings on a VFAT16 file system are > BDR> considerable (about 20Mb on my system). > > Brian, > this will be part of R `proper', not just the windows extensions > right after 0.64, will it?I am not sure about this. The gain is really only on systems with large clusters, and that means Windows VFAT16, where small clusters are 4Kb and clusters can be 32Kb (and are on a > 1Gb disc). The other issue is licensing. I compiled unzip for Win32 and followed the minimal allowed distribution as part of the binary distribution. On source platforms the use of unzip would be optional, but with binary distributions you would need to ensure unzip is available. Is there any interest in having this on Unix/Linux platforms? If so I can very easily change the source code to search zip files if present. Brian -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>>>>> On Tue, 6 Apr 1999 11:22:07 +0100 (BST), >>>>> Prof Brian D Ripley (PBDR) wrote:PBDR> Is there any interest in having this on Unix/Linux platforms? If so I can PBDR> very easily change the source code to search zip files if present. Yes, I think I want the help system as consistent across platforms as possible, i.e., using the same mechanisms everywhere. Should be easier to maintain. .f -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._