? Wed, 1 May 2024 16:57:18 +0000
"Howard, Tim G \(DEC\) via R-help" <r-help at r-project.org>
?????:
> Is this real?
Yes, but with a giant elephant in the room that many are overlooking.
It has actually always been much worse.
Until R-4.4.0, there used to be a way for readRDS() to return an
unevaluated "promise object". When you access the returned value, the
code attached to the promise object is evaluated. Starting with
R-4.4.0, this particular ability is now forbidden. One particular
attack is now prevented, but the whole class of attacks is still
fundamentally impossible to avoid. The resulting increase in safety is
very small.
The R data files, both those produced by save() and opened by load(),
and those produced by saveRDS() and readRDS(), contain _internal_
object state. The code processing those objects trusts the internal
object state, because it has no other alternative, no other source of
state. This is true of all of base R, CRAN and BioConductor.
Many R objects contain executable code. For example, many saved models
contain -- as part of this internal state that gets stored inside *.rds
and *.RData files -- executable expressions that produce model matrices
from data frames. It is trivial for any aspiring attacker to take such
an object and replace the model expression with one that would take over
your system. When you perform ordinary R operations on the doctored
object, the attacker-provided "model expression" instead does whatever
the attacker wants.
The above was just one example of "trusting the internal state". An
attacker can come up with similar attacks for ALTREP objects, 'glue'
strings and a lot of other features of R, without ever touching
promises (the topic of CVE-2024-27322) or exploiting parser
vulnerabilities.
One safe way to move forward is to set aside a strict subset of R Data
Serialization format that cannot be used to create any executable code
or touch potentially vulnerable state (such as ALTREP, I think) and
reject all other features of RDS. Yes, this abandons the ability to
save model objects and many other great features of R serialization,
including those that make 'parallel' clusters possible. (But we trust
our clusters and should use regular serialize() with them.) I've been
working on this today; it's very raw, not even a package yet, and it
doesn't even read some of my data correctly, but I believe it's a
secure way forward: https://codeberg.org/aitap/unserializeData
--
Best regards,
Ivan