Hello, Currently if you install a package twice: install.packages("testit") install.packages("testit") R will build the package from source (depending on what OS you're using) twice by default. This becomes especially burdensome when people are using big packages (i.e. lots of depends) and someone has a script with: install.packages("tidyverse") ... ... later on down the script ... install.packages("dplyr") In this case, "dplyr" is part of the tidyverse and will install twice. As the primary "package manager" for R, it should not install a package twice (by default) when it can be so easily checked. Indeed, many people resort to writing a few lines of code to filter out already-installed packages An r-help post from 2010 proposed a solution to improving the default behavior, by adding "force=FALSE" as a api addition to install.packages.( https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html) Would the R-core devs still consider this proposal? Josh Bradley [[alternative HTML version deleted]]
On 08/11/2019 2:06 a.m., Joshua Bradley wrote:> Hello, > > Currently if you install a package twice: > > install.packages("testit") > install.packages("testit") > > R will build the package from source (depending on what OS you're using) > twice by default. This becomes especially burdensome when people are using > big packages (i.e. lots of depends) and someone has a script with: > > install.packages("tidyverse") > ... > ... later on down the script > ... > install.packages("dplyr") > > In this case, "dplyr" is part of the tidyverse and will install twice. As > the primary "package manager" for R, it should not install a package twice > (by default) when it can be so easily checked. Indeed, many people resort > to writing a few lines of code to filter out already-installed packages An > r-help post from 2010 proposed a solution to improving the default > behavior, by adding "force=FALSE" as a api addition to install.packages.( > https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html) > > Would the R-core devs still consider this proposal?Whether or not they'd do it, it's easy for you to do it. install.packages <- function(pkgs, ..., force = FALSE) { if (!force) { pkgs <- Filter(Negate(requireNamespace), pkgs utils::install.packages(pkgs, ...) } You might want to make this more elaborate, e.g. doing update.packages() on the ones that exist. But really, isn't the problem with the script you're using, which could have done a simple test before forcing a slow install? Duncan Murdoch
I could do this...and I have before. This brings up a more fundamental question though. You're asking me to write code that changes the logic of the installation process (i.e. writing my own package installer). Instead of doing that, I would rather integrate that logic into R itself to improve the baseline installation process. This api proposal change would be additive and would not break legacy code. Package managers like pip (python), conda (python), yum (CentOS), apt (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their defaults) when to not download a package again. By proposing this change, I'm essentially asking that R follow some of the same conventions and best practices that other package managers have adopted over the decades. I assumed this list is used to discuss proposals like this to the R codebase. If I'm on the wrong list, please let me know. P.S. if this change happened, it would be interesting to study the effect it has on the bandwidth across all CRAN mirrors. A significant drop would turn into actual $$ saved Josh Bradley On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 08/11/2019 2:06 a.m., Joshua Bradley wrote: > > Hello, > > > > Currently if you install a package twice: > > > > install.packages("testit") > > install.packages("testit") > > > > R will build the package from source (depending on what OS you're using) > > twice by default. This becomes especially burdensome when people are > using > > big packages (i.e. lots of depends) and someone has a script with: > > > > install.packages("tidyverse") > > ... > > ... later on down the script > > ... > > install.packages("dplyr") > > > > In this case, "dplyr" is part of the tidyverse and will install twice. As > > the primary "package manager" for R, it should not install a package > twice > > (by default) when it can be so easily checked. Indeed, many people resort > > to writing a few lines of code to filter out already-installed packages > An > > r-help post from 2010 proposed a solution to improving the default > > behavior, by adding "force=FALSE" as a api addition to install.packages.( > > https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html) > > > > Would the R-core devs still consider this proposal? > > Whether or not they'd do it, it's easy for you to do it. > > install.packages <- function(pkgs, ..., force = FALSE) { > if (!force) { > pkgs <- Filter(Negate(requireNamespace), pkgs > > utils::install.packages(pkgs, ...) > } > > You might want to make this more elaborate, e.g. doing update.packages() > on the ones that exist. But really, isn't the problem with the script > you're using, which could have done a simple test before forcing a slow > install? > > Duncan Murdoch >[[alternative HTML version deleted]]
If this is the behaviour you are looking for, you might like to try pak (https://pak.r-lib.org) # Create a temporary library path <- tempfile() dir.create(path) .libPaths(path) pak::pkg_install("scales") #> ? Will install 8 packages: #> colorspace (1.4-1), labeling (0.3), munsell (0.5.0), R6 (2.4.0), RColorBrewer #> (1.1-2), Rcpp (1.0.2), scales (1.0.0), viridisLite (0.3.0) #> #> ? Will download 2 CRAN packages (4.7 MB), cached: 6 (3.69 MB). #> #> ? Installed colorspace 1.4-1 [139ms] #> ? Installed labeling 0.3 [206ms] #> ? Installed munsell 0.5.0 [288ms] #> ? Installed R6 2.4.0 [375ms] #> ? Installed RColorBrewer 1.1-2 [423ms] #> ? Installed Rcpp 1.0.2 [472ms] #> ? Installed scales 1.0.0 [511ms] #> ? Installed viridisLite 0.3.0 [569ms] #> ? 1 + 7 pkgs | kept 0, updated 0, new 8 | downloaded 2 (4.7 MB) [2.8s] pak::pkg_install("scales") #> ? No changes needed #> ? 1 + 7 pkgs | kept 7, updated 0, new 0 | downloaded 0 (0 B) [855ms] remove.packages(c("Rcpp", "munsell")) pak::pkg_install("scales") #> ? Will install 2 packages: #> munsell (0.5.0), Rcpp (1.0.2) #> #> ? All 2 packages (4.88 MB) are cached. #> #> ? Installed munsell 0.5.0 [75ms] #> ? Installed Rcpp 1.0.2 [242ms] #> ? 1 + 7 pkgs | kept 6, updated 0, new 2 | downloaded 0 (0 B) [1.5s] On Fri, Nov 8, 2019 at 1:07 AM Joshua Bradley <jgbradley1 at gmail.com> wrote:> > Hello, > > Currently if you install a package twice: > > install.packages("testit") > install.packages("testit") > > R will build the package from source (depending on what OS you're using) > twice by default. This becomes especially burdensome when people are using > big packages (i.e. lots of depends) and someone has a script with: > > install.packages("tidyverse") > ... > ... later on down the script > ... > install.packages("dplyr") > > In this case, "dplyr" is part of the tidyverse and will install twice. As > the primary "package manager" for R, it should not install a package twice > (by default) when it can be so easily checked. Indeed, many people resort > to writing a few lines of code to filter out already-installed packages An > r-help post from 2010 proposed a solution to improving the default > behavior, by adding "force=FALSE" as a api addition to install.packages.( > https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html) > > Would the R-core devs still consider this proposal? > > Josh Bradley > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- http://hadley.nz
Possibly Parallel Threads
- improving the performance of install.packages
- improving the performance of install.packages
- How to install Tidyverse on Ubuntu 17.04? Getting gcc errors for -fstack-protector-strong and -Wdate-time
- improving the performance of install.packages
- R 4.0.0 rebuild status