Dirk Eddelbuettel
2023-Aug-06 21:05 UTC
[Rd] A demonstrated shortcoming of the R package management system
CRAN, by relying on the powerful package management system that is part of R, provides an unparalleled framework for extending R with nearly 20k packages. We recently encountered an issue that highlights a missing element in the otherwise outstanding package management system. So we would like to start a discussion about enhancing its feature set. As shown below, a mechanism to force reinstallation of packages may be needed. A demo is included below, it is reproducible in a container. We find the easiest/fastest reproduction is by saving the code snippet below in the current directory as eg 'matrixIssue.R' and have it run in a container as docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R This runs in under two minutes, first installing the older Matrix, next installs SeuratObject, and then by removing the older Matrix making the (already installed) current Matrix version the default. This simulates a package update for Matrix. Which, as the final snippet demonstrates, silently breaks SeuratObject as the cached S4 method Csparse_validate is now missing. So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable under Matrix 1.6.0. What this shows is that a call to update.packages() will silently corrupt an existing installation. We understand that this was known and addressed at CRAN by rebuilding all binary packages (for macOS and Windows). But it leaves both users relying on source installation as well as distributors of source packages in a dire situation. It hurt me three times: my default R installation was affected with unit tests (involving SeuratObject) silently failing. It similarly broke our CI setup at work. And it created a fairly bad headache for the Debian packaging I am involved with (and I surmise it affects other distro similarly). It would be good to have a mechanism where a package, when being upgraded, could flag that 'more actions are required' by the system (administrator). We think this example demonstrates that we need such a mechanism to avoid (silently !!) breaking existing installations, possibly by forcing reinstallation of other packages. R knows the package dependency graph and could trigger this, possibly after an 'opt-in' variable the user / admin sets. One possibility may be to add a new (versioned) field 'Breaks:'. Matrix could then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an installation of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0). Regards, Dirk ## Code example follows. Recommended to run the rocker/r2u container. ## Could also run 'apt update -qq; apt upgrade -y' but not required ## Thanks to my colleague Paul Hoffman for the core of this example ## now have Matrix 1.6.0 because r2u and CRAN remain current but we can install an older Matrix remotes::install_version('Matrix', '1.5.1') ## we can confirm that we have Matrix 1.5.1 packageVersion("Matrix") ## we now install SeuratObject from source and to speed things up we first install the binary install.packages("SeuratObject") # in this container via bspm/r2u as binary ## and then force a source installation (turning bspm off) _while Matrix is at 1.5.1_ if (requireNamespace("bspm", quietly=TRUE) bspm::disable() Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes') # Eigen compilation noise silencer install.packages('SeuratObject') ## we now remove the Matrix package version 1.5.1 we installed into /usr/local leaving 1.6.0 remove.packages("Matrix") packageVersion("Matrix") ## and we now run a bit of SeuratObject code that is now broken as Csparse_validate is gone suppressMessages(library(SeuratObject)) data('pbmc_small') graph <- pbmc_small[['RNA_snn']] class(graph) getClass('Graph') show(graph) # this fails -- dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Ben Bolker
2023-Aug-06 23:00 UTC
[Rd] A demonstrated shortcoming of the R package management system
I would support this suggestion. There is a similar binary dependency chain from Matrix ? TMB ? glmmTMB; we have implemented various checks to make users aware that they need to reinstall from source, and to some extent we've tried to push out synchronous updates (i.e., push an update of TMB to CRAN every time Matrix changes, and an update of glmmTMB after that), but centralized machinery for this would certainly be nice. FWIW some of the machinery is here: github.com/glmmTMB/glmmTMB/blob/d9ee7b043281341429381faa19b5e53cb5a378c3/glmmTMB/R/utils.R#L209-L295 -- it relies on a Makefile rule that caches the current installed version of TMB: github.com/glmmTMB/glmmTMB/blob/d9ee7b043281341429381faa19b5e53cb5a378c3/glmmTMB/R/utils.R#L209-L295 cheers Ben Bolker On 2023-08-06 5:05 p.m., Dirk Eddelbuettel wrote:> > CRAN, by relying on the powerful package management system that is part of R, > provides an unparalleled framework for extending R with nearly 20k packages. > > We recently encountered an issue that highlights a missing element in the > otherwise outstanding package management system. So we would like to start a > discussion about enhancing its feature set. As shown below, a mechanism to > force reinstallation of packages may be needed. > > A demo is included below, it is reproducible in a container. We find the > easiest/fastest reproduction is by saving the code snippet below in the > current directory as eg 'matrixIssue.R' and have it run in a container as > > docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R > > This runs in under two minutes, first installing the older Matrix, next > installs SeuratObject, and then by removing the older Matrix making the > (already installed) current Matrix version the default. This simulates a > package update for Matrix. Which, as the final snippet demonstrates, silently > breaks SeuratObject as the cached S4 method Csparse_validate is now missing. > So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable > under Matrix 1.6.0. > > What this shows is that a call to update.packages() will silently corrupt an > existing installation. We understand that this was known and addressed at > CRAN by rebuilding all binary packages (for macOS and Windows). > > But it leaves both users relying on source installation as well as > distributors of source packages in a dire situation. It hurt me three times: > my default R installation was affected with unit tests (involving > SeuratObject) silently failing. It similarly broke our CI setup at work. And > it created a fairly bad headache for the Debian packaging I am involved with > (and I surmise it affects other distro similarly). > > It would be good to have a mechanism where a package, when being upgraded, > could flag that 'more actions are required' by the system (administrator). > We think this example demonstrates that we need such a mechanism to avoid > (silently !!) breaking existing installations, possibly by forcing > reinstallation of other packages. R knows the package dependency graph and > could trigger this, possibly after an 'opt-in' variable the user / admin > sets. > > One possibility may be to add a new (versioned) field 'Breaks:'. Matrix could > then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an installation > of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but > permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of > SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0). > > Regards, Dirk > > > ## Code example follows. Recommended to run the rocker/r2u container. > ## Could also run 'apt update -qq; apt upgrade -y' but not required > ## Thanks to my colleague Paul Hoffman for the core of this example > > ## now have Matrix 1.6.0 because r2u and CRAN remain current but we can install an older Matrix > remotes::install_version('Matrix', '1.5.1') > > ## we can confirm that we have Matrix 1.5.1 > packageVersion("Matrix") > > ## we now install SeuratObject from source and to speed things up we first install the binary > install.packages("SeuratObject") # in this container via bspm/r2u as binary > ## and then force a source installation (turning bspm off) _while Matrix is at 1.5.1_ > if (requireNamespace("bspm", quietly=TRUE) bspm::disable() > Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes') # Eigen compilation noise silencer > install.packages('SeuratObject') > > ## we now remove the Matrix package version 1.5.1 we installed into /usr/local leaving 1.6.0 > remove.packages("Matrix") > packageVersion("Matrix") > > ## and we now run a bit of SeuratObject code that is now broken as Csparse_validate is gone > suppressMessages(library(SeuratObject)) > data('pbmc_small') > graph <- pbmc_small[['RNA_snn']] > class(graph) > getClass('Graph') > show(graph) # this fails > >
Ivan Krylov
2023-Aug-07 13:15 UTC
[Rd] A demonstrated shortcoming of the R package management system
? Sun, 6 Aug 2023 16:05:03 -0500 Dirk Eddelbuettel <edd at debian.org> ?????:> One possibility may be to add a new (versioned) field 'Breaks:'. > Matrix could then have added 'Breaks: SeuratObject (<= 4.1.3)' > preventing an installation of Matrix 1.6.0 when SeuratObject 4.1.3 > (or earlier) is present, but permitting an update to Matrix 1.6.0 > alongside a new version, say, 4.1.4 of SeuratObject which could > itself have a versioned Depends: Matrix (>= 1.6.0).I wouldn't entirely agree that Matrix 1.6.0 breaks SeuratObject 4.1.3, given that it's still possible to install first Matrix 1.6.0 and then SeuratObject 4.1.3. The breakage definitely exists, but not on the source package level. It may also not be easy for the package developer to notice breaking a binary package while performing reverse dependency checks, in time to add such a notice to their package. The recommended way to do that is tools::check_packages_in_dir(), which works on source packages. Would it help to reframe the problem in terms of binary packages acquiring dependency constraints that are more strict than those of the corresponding source packages? If a package that imports S4 classes from another package and thus ends up caching their definitions, R could compute a hash of the classes being imported, store it together with the installed package and complain noisily if the hash doesn't match later at load time. This could be used to detect such problems automatically (but could also result in false positives!). This is not the only way a binary package could accidentally depend on internals of another binary package. I remember reading about (but cannot find it now!) some packages "importing" a function from ggplot2 (I think) by assigning it into their namespace: foo <- ggplot2::useful_function This worked for quite a while, but later broke because ggplot2::useful_function called an internal function which ceased to exist in a new version of ggplot2. This is arguably a bug and probably even harder to track, but are there any other ways to catch a "binary dependency" for a package? -- Best regards, Ivan
Hadley Wickham
2023-Aug-08 13:34 UTC
[Rd] A demonstrated shortcoming of the R package management system
Hi Dirk, Do you think it's worth also/instead considering a fix to S4 to avoid this caching issue in future R versions? (This is top of my for me as we consider the design of S7, and I recently made a note to ensure we avoid similar problems there: github.com/RConsortium/OOP-WG/issues/317) Hadley On Sun, Aug 6, 2023 at 4:05?PM Dirk Eddelbuettel <edd at debian.org> wrote:> > > CRAN, by relying on the powerful package management system that is part of R, > provides an unparalleled framework for extending R with nearly 20k packages. > > We recently encountered an issue that highlights a missing element in the > otherwise outstanding package management system. So we would like to start a > discussion about enhancing its feature set. As shown below, a mechanism to > force reinstallation of packages may be needed. > > A demo is included below, it is reproducible in a container. We find the > easiest/fastest reproduction is by saving the code snippet below in the > current directory as eg 'matrixIssue.R' and have it run in a container as > > docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R > > This runs in under two minutes, first installing the older Matrix, next > installs SeuratObject, and then by removing the older Matrix making the > (already installed) current Matrix version the default. This simulates a > package update for Matrix. Which, as the final snippet demonstrates, silently > breaks SeuratObject as the cached S4 method Csparse_validate is now missing. > So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable > under Matrix 1.6.0. > > What this shows is that a call to update.packages() will silently corrupt an > existing installation. We understand that this was known and addressed at > CRAN by rebuilding all binary packages (for macOS and Windows). > > But it leaves both users relying on source installation as well as > distributors of source packages in a dire situation. It hurt me three times: > my default R installation was affected with unit tests (involving > SeuratObject) silently failing. It similarly broke our CI setup at work. And > it created a fairly bad headache for the Debian packaging I am involved with > (and I surmise it affects other distro similarly). > > It would be good to have a mechanism where a package, when being upgraded, > could flag that 'more actions are required' by the system (administrator). > We think this example demonstrates that we need such a mechanism to avoid > (silently !!) breaking existing installations, possibly by forcing > reinstallation of other packages. R knows the package dependency graph and > could trigger this, possibly after an 'opt-in' variable the user / admin > sets. > > One possibility may be to add a new (versioned) field 'Breaks:'. Matrix could > then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an installation > of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but > permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of > SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0). > > Regards, Dirk > > > ## Code example follows. Recommended to run the rocker/r2u container. > ## Could also run 'apt update -qq; apt upgrade -y' but not required > ## Thanks to my colleague Paul Hoffman for the core of this example > > ## now have Matrix 1.6.0 because r2u and CRAN remain current but we can install an older Matrix > remotes::install_version('Matrix', '1.5.1') > > ## we can confirm that we have Matrix 1.5.1 > packageVersion("Matrix") > > ## we now install SeuratObject from source and to speed things up we first install the binary > install.packages("SeuratObject") # in this container via bspm/r2u as binary > ## and then force a source installation (turning bspm off) _while Matrix is at 1.5.1_ > if (requireNamespace("bspm", quietly=TRUE) bspm::disable() > Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes') # Eigen compilation noise silencer > install.packages('SeuratObject') > > ## we now remove the Matrix package version 1.5.1 we installed into /usr/local leaving 1.6.0 > remove.packages("Matrix") > packageVersion("Matrix") > > ## and we now run a bit of SeuratObject code that is now broken as Csparse_validate is gone > suppressMessages(library(SeuratObject)) > data('pbmc_small') > graph <- pbmc_small[['RNA_snn']] > class(graph) > getClass('Graph') > show(graph) # this fails > > > -- > dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org > > ______________________________________________ > R-devel at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-devel-- hadley.nz