On my Windows 10 laptop I see evidence of the operating system caching
information about recently accessed files. This makes it hard to say how
the speed might be improved. Is there a way to clear this cache?
> system.time(L1 <- size.f.pkg(R.home("library")))
user system elapsed
0.48 2.81 30.42> system.time(L2 <- size.f.pkg(R.home("library")))
user system elapsed
0.35 1.10 1.43> identical(L1,L2)
[1] TRUE> length(L1)
[1] 30> length(dir(R.home("library"),recursive=TRUE))
[1] 12949
On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help <
r-help at r-project.org> wrote:
> Dear List Members,
>
>
> I tried to compute the file sizes of each installed package and the
> process is terribly slow.
>
> It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.
>
>
> 1.) Package Sizes
>
>
> system.time({
> x = size.pkg(file=NULL);
> })
> # elapsed time: 509 s !!!
> # 512 Packages; 1.64 GB;
> # R 4.1.1 on MS Windows 10
>
>
> The code for the size.pkg() function is below and the latest version is
> on Github:
>
> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
>
>
> Questions:
> Is there a way to get the file size faster?
> It takes long on Windows as well, but of the order of 10-20 s, not 10
> minutes.
> Do I miss something?
>
>
> 1.b.) Alternative
>
> It came to my mind to read first all file sizes and then use tapply or
> aggregate - but I do not see why it should be faster.
>
> Would it be meaningful to benchmark each individual package?
>
> Although I am not very inclined to wait 10 minutes for each new try out.
>
>
> 2.) Big Packages
>
> Just as a note: there are a few very large packages (in my list of 512
> packages):
>
> 1 123,566,287 BH
> 2 113,578,391 sf
> 3 112,252,652 rgdal
> 4 81,144,868 magick
> 5 77,791,374 openNLPmodels.en
>
> I suspect that sf & rgdal have a lot of duplicated data structures
> and/or duplicate code and/or duplicated libraries - although I am not an
> expert in the field and did not check the sources.
>
>
> Sincerely,
>
>
> Leonard
>
> ======>
>
> # Package Size:
> size.f.pkg = function(path=NULL) {
> if(is.null(path)) path = R.home("library");
> xd = list.dirs(path = path, full.names = FALSE, recursive = FALSE);
> size.f = function(p) {
> p = paste0(path, "/", p);
> sum(file.info(list.files(path=p, pattern=".",
> full.names = TRUE, all.files = TRUE, recursive = TRUE))$size);
> }
> sapply(xd, size.f);
> }
>
> size.pkg = function(path=NULL, sort=TRUE,
file="Packages.Size.csv") {
> x = size.f.pkg(path=path);
> x = as.data.frame(x);
> names(x) = "Size"
> x$Name = rownames(x);
> # Order
> if(sort) {
> id = order(x$Size, decreasing=TRUE)
> x = x[id,];
> }
> if( ! is.null(file)) {
> if( ! is.character(file)) {
> print("Error: Size NOT written to file!");
> } else write.csv(x, file=file, row.names=FALSE);
> }
> return(x);
> }
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]