Dirk Eddelbuettel
2023-Aug-06  21:05 UTC
[Rd] A demonstrated shortcoming of the R package management system
CRAN, by relying on the powerful package management system that is part of R,
provides an unparalleled framework for extending R with nearly 20k packages.
We recently encountered an issue that highlights a missing element in the
otherwise outstanding package management system. So we would like to start a
discussion about enhancing its feature set. As shown below, a mechanism to
force reinstallation of packages may be needed.
A demo is included below, it is reproducible in a container. We find the
easiest/fastest reproduction is by saving the code snippet below in the
current directory as eg 'matrixIssue.R' and have it run in a container
as
   docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R
  
This runs in under two minutes, first installing the older Matrix, next
installs SeuratObject, and then by removing the older Matrix making the
(already installed) current Matrix version the default. This simulates a
package update for Matrix. Which, as the final snippet demonstrates, silently
breaks SeuratObject as the cached S4 method Csparse_validate is now missing.
So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable
under Matrix 1.6.0.
What this shows is that a call to update.packages() will silently corrupt an
existing installation.  We understand that this was known and addressed at
CRAN by rebuilding all binary packages (for macOS and Windows).
But it leaves both users relying on source installation as well as
distributors of source packages in a dire situation. It hurt me three times:
my default R installation was affected with unit tests (involving
SeuratObject) silently failing. It similarly broke our CI setup at work.  And
it created a fairly bad headache for the Debian packaging I am involved with
(and I surmise it affects other distro similarly).
It would be good to have a mechanism where a package, when being upgraded,
could flag that 'more actions are required' by the system
(administrator).
We think this example demonstrates that we need such a mechanism to avoid
(silently !!) breaking existing installations, possibly by forcing
reinstallation of other packages.  R knows the package dependency graph and
could trigger this, possibly after an 'opt-in' variable the user / admin
sets.
One possibility may be to add a new (versioned) field 'Breaks:'. Matrix
could
then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an
installation
of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but
permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of
SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0).
Regards,  Dirk
## Code example follows. Recommended to run the rocker/r2u container.
## Could also run 'apt update -qq; apt upgrade -y' but not required
## Thanks to my colleague Paul Hoffman for the core of this example
## now have Matrix 1.6.0 because r2u and CRAN remain current but we can install
an older Matrix
remotes::install_version('Matrix', '1.5.1')
## we can confirm that we have Matrix 1.5.1
packageVersion("Matrix")
## we now install SeuratObject from source and to speed things up we first
install the binary
install.packages("SeuratObject")   # in this container via bspm/r2u as
binary
## and then force a source installation (turning bspm off) _while Matrix is at
1.5.1_
if (requireNamespace("bspm", quietly=TRUE) bspm::disable()
Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes') 	# Eigen compilation
noise silencer
install.packages('SeuratObject')
## we now remove the Matrix package version 1.5.1 we installed into /usr/local
leaving 1.6.0
remove.packages("Matrix")
packageVersion("Matrix")
## and we now run a bit of SeuratObject code that is now broken as
Csparse_validate is gone
suppressMessages(library(SeuratObject))
data('pbmc_small')
graph <- pbmc_small[['RNA_snn']]
class(graph)
getClass('Graph')
show(graph) # this fails
-- 
dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Ben Bolker
2023-Aug-06  23:00 UTC
[Rd] A demonstrated shortcoming of the R package management system
I would support this suggestion.  There is a similar binary 
dependency chain from Matrix ? TMB ? glmmTMB; we have implemented 
various checks to make users aware that they need to reinstall from 
source, and to some extent we've tried to push out synchronous updates 
(i.e., push an update of TMB to CRAN every time Matrix changes, and an 
update of glmmTMB after that), but centralized machinery for this would 
certainly be nice.
   FWIW some of the machinery is here: 
https://github.com/glmmTMB/glmmTMB/blob/d9ee7b043281341429381faa19b5e53cb5a378c3/glmmTMB/R/utils.R#L209-L295
-- it relies on a Makefile rule that caches the current installed 
version of TMB: 
https://github.com/glmmTMB/glmmTMB/blob/d9ee7b043281341429381faa19b5e53cb5a378c3/glmmTMB/R/utils.R#L209-L295
   cheers
    Ben Bolker
On 2023-08-06 5:05 p.m., Dirk Eddelbuettel wrote:> 
> CRAN, by relying on the powerful package management system that is part of
R,
> provides an unparalleled framework for extending R with nearly 20k
packages.
> 
> We recently encountered an issue that highlights a missing element in the
> otherwise outstanding package management system. So we would like to start
a
> discussion about enhancing its feature set. As shown below, a mechanism to
> force reinstallation of packages may be needed.
> 
> A demo is included below, it is reproducible in a container. We find the
> easiest/fastest reproduction is by saving the code snippet below in the
> current directory as eg 'matrixIssue.R' and have it run in a
container as
> 
>     docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R
>    
> This runs in under two minutes, first installing the older Matrix, next
> installs SeuratObject, and then by removing the older Matrix making the
> (already installed) current Matrix version the default. This simulates a
> package update for Matrix. Which, as the final snippet demonstrates,
silently
> breaks SeuratObject as the cached S4 method Csparse_validate is now
missing.
> So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable
> under Matrix 1.6.0.
> 
> What this shows is that a call to update.packages() will silently corrupt
an
> existing installation.  We understand that this was known and addressed at
> CRAN by rebuilding all binary packages (for macOS and Windows).
> 
> But it leaves both users relying on source installation as well as
> distributors of source packages in a dire situation. It hurt me three
times:
> my default R installation was affected with unit tests (involving
> SeuratObject) silently failing. It similarly broke our CI setup at work. 
And
> it created a fairly bad headache for the Debian packaging I am involved
with
> (and I surmise it affects other distro similarly).
> 
> It would be good to have a mechanism where a package, when being upgraded,
> could flag that 'more actions are required' by the system
(administrator).
> We think this example demonstrates that we need such a mechanism to avoid
> (silently !!) breaking existing installations, possibly by forcing
> reinstallation of other packages.  R knows the package dependency graph and
> could trigger this, possibly after an 'opt-in' variable the user /
admin
> sets.
> 
> One possibility may be to add a new (versioned) field 'Breaks:'.
Matrix could
> then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an
installation
> of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but
> permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of
> SeuratObject which could itself have a versioned Depends: Matrix (>=
1.6.0).
> 
> Regards,  Dirk
> 
> 
> ## Code example follows. Recommended to run the rocker/r2u container.
> ## Could also run 'apt update -qq; apt upgrade -y' but not required
> ## Thanks to my colleague Paul Hoffman for the core of this example
> 
> ## now have Matrix 1.6.0 because r2u and CRAN remain current but we can
install an older Matrix
> remotes::install_version('Matrix', '1.5.1')
> 
> ## we can confirm that we have Matrix 1.5.1
> packageVersion("Matrix")
> 
> ## we now install SeuratObject from source and to speed things up we first
install the binary
> install.packages("SeuratObject")   # in this container via
bspm/r2u as binary
> ## and then force a source installation (turning bspm off) _while Matrix is
at 1.5.1_
> if (requireNamespace("bspm", quietly=TRUE) bspm::disable()
> Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes') 	# Eigen
compilation noise silencer
> install.packages('SeuratObject')
> 
> ## we now remove the Matrix package version 1.5.1 we installed into
/usr/local leaving 1.6.0
> remove.packages("Matrix")
> packageVersion("Matrix")
> 
> ## and we now run a bit of SeuratObject code that is now broken as
Csparse_validate is gone
> suppressMessages(library(SeuratObject))
> data('pbmc_small')
> graph <- pbmc_small[['RNA_snn']]
> class(graph)
> getClass('Graph')
> show(graph) # this fails
> 
>
Ivan Krylov
2023-Aug-07  13:15 UTC
[Rd] A demonstrated shortcoming of the R package management system
? Sun, 6 Aug 2023 16:05:03 -0500 Dirk Eddelbuettel <edd at debian.org> ?????:> One possibility may be to add a new (versioned) field 'Breaks:'. > Matrix could then have added 'Breaks: SeuratObject (<= 4.1.3)' > preventing an installation of Matrix 1.6.0 when SeuratObject 4.1.3 > (or earlier) is present, but permitting an update to Matrix 1.6.0 > alongside a new version, say, 4.1.4 of SeuratObject which could > itself have a versioned Depends: Matrix (>= 1.6.0).I wouldn't entirely agree that Matrix 1.6.0 breaks SeuratObject 4.1.3, given that it's still possible to install first Matrix 1.6.0 and then SeuratObject 4.1.3. The breakage definitely exists, but not on the source package level. It may also not be easy for the package developer to notice breaking a binary package while performing reverse dependency checks, in time to add such a notice to their package. The recommended way to do that is tools::check_packages_in_dir(), which works on source packages. Would it help to reframe the problem in terms of binary packages acquiring dependency constraints that are more strict than those of the corresponding source packages? If a package that imports S4 classes from another package and thus ends up caching their definitions, R could compute a hash of the classes being imported, store it together with the installed package and complain noisily if the hash doesn't match later at load time. This could be used to detect such problems automatically (but could also result in false positives!). This is not the only way a binary package could accidentally depend on internals of another binary package. I remember reading about (but cannot find it now!) some packages "importing" a function from ggplot2 (I think) by assigning it into their namespace: foo <- ggplot2::useful_function This worked for quite a while, but later broke because ggplot2::useful_function called an internal function which ceased to exist in a new version of ggplot2. This is arguably a bug and probably even harder to track, but are there any other ways to catch a "binary dependency" for a package? -- Best regards, Ivan
Hadley Wickham
2023-Aug-08  13:34 UTC
[Rd] A demonstrated shortcoming of the R package management system
Hi Dirk, Do you think it's worth also/instead considering a fix to S4 to avoid this caching issue in future R versions? (This is top of my for me as we consider the design of S7, and I recently made a note to ensure we avoid similar problems there: https://github.com/RConsortium/OOP-WG/issues/317) Hadley On Sun, Aug 6, 2023 at 4:05?PM Dirk Eddelbuettel <edd at debian.org> wrote:> > > CRAN, by relying on the powerful package management system that is part of R, > provides an unparalleled framework for extending R with nearly 20k packages. > > We recently encountered an issue that highlights a missing element in the > otherwise outstanding package management system. So we would like to start a > discussion about enhancing its feature set. As shown below, a mechanism to > force reinstallation of packages may be needed. > > A demo is included below, it is reproducible in a container. We find the > easiest/fastest reproduction is by saving the code snippet below in the > current directory as eg 'matrixIssue.R' and have it run in a container as > > docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R > > This runs in under two minutes, first installing the older Matrix, next > installs SeuratObject, and then by removing the older Matrix making the > (already installed) current Matrix version the default. This simulates a > package update for Matrix. Which, as the final snippet demonstrates, silently > breaks SeuratObject as the cached S4 method Csparse_validate is now missing. > So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable > under Matrix 1.6.0. > > What this shows is that a call to update.packages() will silently corrupt an > existing installation. We understand that this was known and addressed at > CRAN by rebuilding all binary packages (for macOS and Windows). > > But it leaves both users relying on source installation as well as > distributors of source packages in a dire situation. It hurt me three times: > my default R installation was affected with unit tests (involving > SeuratObject) silently failing. It similarly broke our CI setup at work. And > it created a fairly bad headache for the Debian packaging I am involved with > (and I surmise it affects other distro similarly). > > It would be good to have a mechanism where a package, when being upgraded, > could flag that 'more actions are required' by the system (administrator). > We think this example demonstrates that we need such a mechanism to avoid > (silently !!) breaking existing installations, possibly by forcing > reinstallation of other packages. R knows the package dependency graph and > could trigger this, possibly after an 'opt-in' variable the user / admin > sets. > > One possibility may be to add a new (versioned) field 'Breaks:'. Matrix could > then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an installation > of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but > permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of > SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0). > > Regards, Dirk > > > ## Code example follows. Recommended to run the rocker/r2u container. > ## Could also run 'apt update -qq; apt upgrade -y' but not required > ## Thanks to my colleague Paul Hoffman for the core of this example > > ## now have Matrix 1.6.0 because r2u and CRAN remain current but we can install an older Matrix > remotes::install_version('Matrix', '1.5.1') > > ## we can confirm that we have Matrix 1.5.1 > packageVersion("Matrix") > > ## we now install SeuratObject from source and to speed things up we first install the binary > install.packages("SeuratObject") # in this container via bspm/r2u as binary > ## and then force a source installation (turning bspm off) _while Matrix is at 1.5.1_ > if (requireNamespace("bspm", quietly=TRUE) bspm::disable() > Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes') # Eigen compilation noise silencer > install.packages('SeuratObject') > > ## we now remove the Matrix package version 1.5.1 we installed into /usr/local leaving 1.6.0 > remove.packages("Matrix") > packageVersion("Matrix") > > ## and we now run a bit of SeuratObject code that is now broken as Csparse_validate is gone > suppressMessages(library(SeuratObject)) > data('pbmc_small') > graph <- pbmc_small[['RNA_snn']] > class(graph) > getClass('Graph') > show(graph) # this fails > > > -- > dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- http://hadley.nz
Seemingly Similar Threads
- customizing library locations for R in ubuntu
- customizing library locations for R in ubuntu
- How setClass() may introduce a binary dependency between packages
- How setClass() may introduce a binary dependency between packages
- Announcing r2u: 20k CRAN binaries for Ubuntu 22.04 + 20.04