Hi Gabe, Keeping track of where a package was installed from would be a nice feature. However it wouldn't be as reliable as comparing hashes to decide whether a package needs re-installation or not. H. On 11/8/19 12:37, Gabriel Becker wrote:> Hi Josh, > > There are a few issues I can think of with this. The primary one is that > CRAN(/Bioconductor) is not the only place one can install packages from. I > might have version x.y.z of a package installed that was, at the time, a > development version I got from github, or installed locally, etc. Hell I > might have a later devel version but want the CRAN version. Not common, > sure, but wiill likely happen often enough that install.packages not doing > that for me when I tell it to is probably bad. > > Currently (though there has been some discussion of changing this) packages > do not remember where they were installed from, so R wouldn't know if the > version you have is actually fully the same one on the repository you > pointed install.packages to or not. If that were changed and we knew that > we were getting the byte identical package from the actual same source, I > think this would be a nice addition, though without it I think it would be > right a high but not high enough proportion of the time. > > R will build the package from source (depending on what OS you're using) >> twice by default. This becomes especially burdensome when people are using >> big packages (i.e. lots of depends) and someone has a script with: >> > > > install.packages("tidyverse") >> ... >> ... later on down the script >> ... >> install.packages("dplyr") >> > > I mean, IMHO and as I think Duncan was alluding to, that's straight up an > error by the script author. I think its a few of them, actually, but its at > least one. An understandable one, sure, but thats still what it is. Scripts > (which are meant to be run more than once, generally) usually shouldn't > really be calling install.packages in the first place, but if they do, they > should certainly not be installing umbrella packages and the packages they > bring with them separately. > > Even having one vectorized call to install.packages where all the packages > are installed would prevent this issue, including in the case where the > user doesn't understand the purpose of the tidyverse package. Though the > installation would still occur every time the script was run. > > > The last thing to note is that there are at least 2 packages which provide > a function which does this already (install.load and remotes), so people > can get this functionality if they need it. > > > On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <jgbradley1 at gmail.com> wrote: > >> >> >> I assumed this list is used to discuss proposals like this to the R >> codebase. If I'm on the wrong list, please let me know. >> > > This is the right place to discuss things like this. Thanks for starting > the conversation. > > Best, > ~G > >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e>-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
I believe introducing a backward compatible force=TRUE is a good start, even if we're not ready for making force=FALSE the default at this point. It would help simplify quite-common instructions like: if (requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install(...) to install.packages("BiocManager", force=FALSE) BiocManager::install(...) and more so when installing lots of packages conditionally, e.g. if (requireNamespace("foo")) install.packages("foo") if (requireNamespace("bar")) install.packages("bar") ... to install.packages(c("foo", "bar", ...), force = FALSE) Before deciding on making force=FALSE the new default, I think it would be valuable to play the devil's advocate and explore and identify all possible downsides of such a default, e.g. breaking existing instructions, downstream package code that uses install.packages() internally, and so on. /Henrik PS. Although the idea of having update.packages() install missing packages is not bad, I don't think I'm a not a fan for the sole purpose of risking installation instructions starting using update.packages() instead, which will certainly confuse those who don't know the history (think require() vs library()). On Fri, Nov 8, 2019 at 3:11 PM Pages, Herve <hpages at fredhutch.org> wrote:> > Hi Gabe, > > Keeping track of where a package was installed from would be a nice > feature. However it wouldn't be as reliable as comparing hashes to > decide whether a package needs re-installation or not. > > H. > > On 11/8/19 12:37, Gabriel Becker wrote: > > Hi Josh, > > > > There are a few issues I can think of with this. The primary one is that > > CRAN(/Bioconductor) is not the only place one can install packages from. I > > might have version x.y.z of a package installed that was, at the time, a > > development version I got from github, or installed locally, etc. Hell I > > might have a later devel version but want the CRAN version. Not common, > > sure, but wiill likely happen often enough that install.packages not doing > > that for me when I tell it to is probably bad. > > > > Currently (though there has been some discussion of changing this) packages > > do not remember where they were installed from, so R wouldn't know if the > > version you have is actually fully the same one on the repository you > > pointed install.packages to or not. If that were changed and we knew that > > we were getting the byte identical package from the actual same source, I > > think this would be a nice addition, though without it I think it would be > > right a high but not high enough proportion of the time. > > > > R will build the package from source (depending on what OS you're using) > >> twice by default. This becomes especially burdensome when people are using > >> big packages (i.e. lots of depends) and someone has a script with: > >> > > > > > > install.packages("tidyverse") > >> ... > >> ... later on down the script > >> ... > >> install.packages("dplyr") > >> > > > > I mean, IMHO and as I think Duncan was alluding to, that's straight up an > > error by the script author. I think its a few of them, actually, but its at > > least one. An understandable one, sure, but thats still what it is. Scripts > > (which are meant to be run more than once, generally) usually shouldn't > > really be calling install.packages in the first place, but if they do, they > > should certainly not be installing umbrella packages and the packages they > > bring with them separately. > > > > Even having one vectorized call to install.packages where all the packages > > are installed would prevent this issue, including in the case where the > > user doesn't understand the purpose of the tidyverse package. Though the > > installation would still occur every time the script was run. > > > > > > The last thing to note is that there are at least 2 packages which provide > > a function which does this already (install.load and remotes), so people > > can get this functionality if they need it. > > > > > > On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <jgbradley1 at gmail.com> wrote: > > > >> > >> > >> I assumed this list is used to discuss proposals like this to the R > >> codebase. If I'm on the wrong list, please let me know. > >> > > > > This is the right place to discuss things like this. Thanks for starting > > the conversation. > > > > Best, > > ~G > > > >> > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e> > > > -- > Herv? Pag?s > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Actually there is one gotcha here: even if a package has not changed (i.e. same exact hash), there are situations where you want to reinstall it because one package it depends on has changed. This is because some of the stuff that gets cached at installation time (e.g. method table) can become stale and needs to be resynced. We sometimes have to deal with this kind of situation in Bioconductor when we make changes to some infrastructure packages. To avoid package caches to become out-of-sync on the user machine after the user gets the new version of the infrastructure package, we also bump the versions of all the reverse deps for which the cache needs to be resynced. A side effect of the version bumps is to also trigger build and propagation of new Windows and Mac binaries for the reverse deps affected by the change, which is good, because they also need to be rebuilt and reinstalled. This is an ugly situation but luckily a rare one and it generally happens in BioC devel only. H. On 11/8/19 15:05, Herv? Pag?s wrote:> Hi Gabe, > > Keeping track of where a package was installed from would be a nice > feature. However it wouldn't be as reliable as comparing hashes to > decide whether a package needs re-installation or not. > > H. > > On 11/8/19 12:37, Gabriel Becker wrote: >> Hi Josh, >> >> There are a few issues I can think of with this. The primary one is that >> CRAN(/Bioconductor) is not the only place one can install packages >> from. I >> might have version x.y.z of a package installed that was, at the time, a >> development version I got from github, or installed locally, etc. Hell I >> might have a later devel version but want the CRAN version. Not common, >> sure, but wiill likely happen often enough that install.packages not >> doing >> that for me when I tell it to is probably bad. >> >> Currently (though there has been some discussion of changing this) >> packages >> do not remember where they were installed from, so R wouldn't know if the >> version you have is actually fully the same one on the repository you >> pointed install.packages to or not.? If that were changed? and we knew >> that >> we were getting the byte identical package from the actual same source, I >> think this would be a nice addition, though without it I think it >> would be >> right a high but not high enough proportion of the time. >> >> R will build the package from source (depending on what OS you're using) >>> twice by default. This becomes especially burdensome when people are >>> using >>> big packages (i.e. lots of depends) and someone has a script with: >>> >> >> >> install.packages("tidyverse") >>> ... >>> ... later on down the script >>> ... >>> install.packages("dplyr") >>> >> >> I mean, IMHO and as I think Duncan was alluding to, that's straight up an >> error by the script author. I think its a few of them, actually, but >> its at >> least one. An understandable one, sure, but thats still what it is. >> Scripts >> (which are meant to be run more than once, generally) usually shouldn't >> really be calling install.packages in the first place, but if they do, >> they >> should certainly not be installing umbrella packages and the packages >> they >> bring with them separately. >> >> Even having one vectorized call to install.packages where all the >> packages >> are installed would prevent this issue, including in the case where the >> user doesn't understand the purpose of the tidyverse package. Though the >> installation would still occur every time the script was run. >> >> >> The last thing to note is that there are at least 2 packages which >> provide >> a function which does this already (install.load and remotes), so people >> can get this functionality if they need it. >> >> >> On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <jgbradley1 at gmail.com> >> wrote: >> >>> >>> >>> I assumed this list is used to discuss proposals like this to the R >>> codebase. If I'm on the wrong list, please let me know. >>> >> >> This is the right place to discuss things like this. Thanks for starting >> the conversation. >> >> Best, >> ~G >> >>> >>> >> >> ????[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e= >> >> >-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Sounds a very reasonable approach to me. H. On 11/8/19 15:17, Henrik Bengtsson wrote:> I believe introducing a backward compatible force=TRUE is a good > start, even if we're not ready for making force=FALSE the default at > this point. It would help simplify quite-common instructions like: > > if (requireNamespace("BiocManager")) > install.packages("BiocManager") > BiocManager::install(...) > > to > > install.packages("BiocManager", force=FALSE) > BiocManager::install(...) > > and more so when installing lots of packages conditionally, e.g. > > if (requireNamespace("foo")) install.packages("foo") > if (requireNamespace("bar")) install.packages("bar") > ... > > to > > install.packages(c("foo", "bar", ...), force = FALSE) > > Before deciding on making force=FALSE the new default, I think it > would be valuable to play the devil's advocate and explore and > identify all possible downsides of such a default, e.g. breaking > existing instructions, downstream package code that uses > install.packages() internally, and so on. > > /Henrik > > PS. Although the idea of having update.packages() install missing > packages is not bad, I don't think I'm a not a fan for the sole > purpose of risking installation instructions starting using > update.packages() instead, which will certainly confuse those who > don't know the history (think require() vs library()). > > On Fri, Nov 8, 2019 at 3:11 PM Pages, Herve <hpages at fredhutch.org> wrote: >> >> Hi Gabe, >> >> Keeping track of where a package was installed from would be a nice >> feature. However it wouldn't be as reliable as comparing hashes to >> decide whether a package needs re-installation or not. >> >> H. >> >> On 11/8/19 12:37, Gabriel Becker wrote: >>> Hi Josh, >>> >>> There are a few issues I can think of with this. The primary one is that >>> CRAN(/Bioconductor) is not the only place one can install packages from. I >>> might have version x.y.z of a package installed that was, at the time, a >>> development version I got from github, or installed locally, etc. Hell I >>> might have a later devel version but want the CRAN version. Not common, >>> sure, but wiill likely happen often enough that install.packages not doing >>> that for me when I tell it to is probably bad. >>> >>> Currently (though there has been some discussion of changing this) packages >>> do not remember where they were installed from, so R wouldn't know if the >>> version you have is actually fully the same one on the repository you >>> pointed install.packages to or not. If that were changed and we knew that >>> we were getting the byte identical package from the actual same source, I >>> think this would be a nice addition, though without it I think it would be >>> right a high but not high enough proportion of the time. >>> >>> R will build the package from source (depending on what OS you're using) >>>> twice by default. This becomes especially burdensome when people are using >>>> big packages (i.e. lots of depends) and someone has a script with: >>>> >>> >>> >>> install.packages("tidyverse") >>>> ... >>>> ... later on down the script >>>> ... >>>> install.packages("dplyr") >>>> >>> >>> I mean, IMHO and as I think Duncan was alluding to, that's straight up an >>> error by the script author. I think its a few of them, actually, but its at >>> least one. An understandable one, sure, but thats still what it is. Scripts >>> (which are meant to be run more than once, generally) usually shouldn't >>> really be calling install.packages in the first place, but if they do, they >>> should certainly not be installing umbrella packages and the packages they >>> bring with them separately. >>> >>> Even having one vectorized call to install.packages where all the packages >>> are installed would prevent this issue, including in the case where the >>> user doesn't understand the purpose of the tidyverse package. Though the >>> installation would still occur every time the script was run. >>> >>> >>> The last thing to note is that there are at least 2 packages which provide >>> a function which does this already (install.load and remotes), so people >>> can get this functionality if they need it. >>> >>> >>> On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <jgbradley1 at gmail.com> wrote: >>> >>>> >>>> >>>> I assumed this list is used to discuss proposals like this to the R >>>> codebase. If I'm on the wrong list, please let me know. >>>> >>> >>> This is the right place to discuss things like this. Thanks for starting >>> the conversation. >>> >>> Best, >>> ~G >>> >>>> >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e>>> >> >> -- >> Herv? Pag?s >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fredhutch.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=fGJJxDES27LnpzyoNVndAepN8xSbeWQ7mB48xpQ-5UU&s=OQXCqMhgyQJDnh8FbLqcbXNHOXbd3F1uDWvKDS6Fk3s&e-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
On 08/11/2019 6:17 p.m., Henrik Bengtsson wrote:> I believe introducing a backward compatible force=TRUE is a good > start, even if we're not ready for making force=FALSE the default at > this point. It would help simplify quite-common instructions like > > if (requireNamespace("BiocManager")) > install.packages("BiocManager") > BiocManager::install(...) > > to > > install.packages("BiocManager", force=FALSE) > BiocManager::install(...)If simplifying instructions is the goal, it would be even simpler to just install it unconditionally: install.packages("BiocManager") Unlike dplyr (the original example in this thread), BiocManager is a tiny package with no compiling needed, so it hardly needs any time to install. And as previously mentioned, the backward compatible force=TRUE wouldn't help with the bad script at all. In fact, the bad script could be fixed simply by realizing that install.packages("tidyverse") means it's actually a bad idea to also include install.packages("dplyr") because the former would install dplyr if and only if it was not already installed. So it seems to me that fixing the bad script (by deleting one line) is the solution to the problem, not fixing R with a multistage series of revisions, tests, etc. Duncan Murdoch> > and more so when installing lots of packages conditionally, e.g. > > if (requireNamespace("foo")) install.packages("foo") > if (requireNamespace("bar")) install.packages("bar") > ... > > to > > install.packages(c("foo", "bar", ...), force = FALSE) > > Before deciding on making force=FALSE the new default, I think it > would be valuable to play the devil's advocate and explore and > identify all possible downsides of such a default, e.g. breaking > existing instructions, downstream package code that uses > install.packages() internally, and so on. > > /Henrik > > PS. Although the idea of having update.packages() install missing > packages is not bad, I don't think I'm a not a fan for the sole > purpose of risking installation instructions starting using > update.packages() instead, which will certainly confuse those who > don't know the history (think require() vs library()). > > On Fri, Nov 8, 2019 at 3:11 PM Pages, Herve <hpages at fredhutch.org> wrote: >> >> Hi Gabe, >> >> Keeping track of where a package was installed from would be a nice >> feature. However it wouldn't be as reliable as comparing hashes to >> decide whether a package needs re-installation or not. >> >> H. >> >> On 11/8/19 12:37, Gabriel Becker wrote: >>> Hi Josh, >>> >>> There are a few issues I can think of with this. The primary one is that >>> CRAN(/Bioconductor) is not the only place one can install packages from. I >>> might have version x.y.z of a package installed that was, at the time, a >>> development version I got from github, or installed locally, etc. Hell I >>> might have a later devel version but want the CRAN version. Not common, >>> sure, but wiill likely happen often enough that install.packages not doing >>> that for me when I tell it to is probably bad. >>> >>> Currently (though there has been some discussion of changing this) packages >>> do not remember where they were installed from, so R wouldn't know if the >>> version you have is actually fully the same one on the repository you >>> pointed install.packages to or not. If that were changed and we knew that >>> we were getting the byte identical package from the actual same source, I >>> think this would be a nice addition, though without it I think it would be >>> right a high but not high enough proportion of the time. >>> >>> R will build the package from source (depending on what OS you're using) >>>> twice by default. This becomes especially burdensome when people are using >>>> big packages (i.e. lots of depends) and someone has a script with: >>>> >>> >>> >>> install.packages("tidyverse") >>>> ... >>>> ... later on down the script >>>> ... >>>> install.packages("dplyr") >>>> >>> >>> I mean, IMHO and as I think Duncan was alluding to, that's straight up an >>> error by the script author. I think its a few of them, actually, but its at >>> least one. An understandable one, sure, but thats still what it is. Scripts >>> (which are meant to be run more than once, generally) usually shouldn't >>> really be calling install.packages in the first place, but if they do, they >>> should certainly not be installing umbrella packages and the packages they >>> bring with them separately. >>> >>> Even having one vectorized call to install.packages where all the packages >>> are installed would prevent this issue, including in the case where the >>> user doesn't understand the purpose of the tidyverse package. Though the >>> installation would still occur every time the script was run. >>> >>> >>> The last thing to note is that there are at least 2 packages which provide >>> a function which does this already (install.load and remotes), so people >>> can get this functionality if they need it. >>> >>> >>> On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <jgbradley1 at gmail.com> wrote: >>> >>>> >>>> >>>> I assumed this list is used to discuss proposals like this to the R >>>> codebase. If I'm on the wrong list, please let me know. >>>> >>> >>> This is the right place to discuss things like this. Thanks for starting >>> the conversation. >>> >>> Best, >>> ~G >>> >>>> >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e>>> >> >> -- >> Herv? Pag?s >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fredhutch.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >