Henrik Bengtsson
2018-Oct-31 02:51 UTC
[Rd] PATCH: Asserting that 'connection' used has not changed + R_GetConnection2()
SUMMARY: I'm proposing that R assert that 'connection' options have not changed since first created such that R will produce the following error:> fh <- file("a.txt", open = "w+") > cat("hello\n", file = fh) > close(fh)> fh2 <- file("b.txt", open = "w+") > cat("world\n", file = fh2)> cat("hello again\n", file = fh)Error in cat("hello again\n", file = fh) : invalid connection (non-existing 'conn_id') Note that, currently in R, the latter silently writes to 'b.txt' - not 'a.txt' (for more details, see https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81). BACKGROUND: In R, connections are indexed by their (zero-based) row indices in the table of available connections. For example,> fh <- file("a.txt", open = "w") > showConnections(all = TRUE)description class mode text isopen can read can write 0 "stdin" "terminal" "r" "text" "opened" "yes" "no" 1 "stdout" "terminal" "w" "text" "opened" "no" "yes" 2 "stderr" "terminal" "w" "text" "opened" "no" "yes" 3 "a.txt" "file" "w" "text" "opened" "no" "yes"> con <- getConnection(3) > identical(con, fh)[1] TRUE ISSUE: The problem with the current design/implementation where connections are referred to by their index (only), is that (i) the table of connections changes over time and (ii) connection indices are recycled. Because a `connection` object holds the connection row index, it means that *the actual underlying connection that a `connection` object refers to may change over its lifetime*. SUGGESTION: Make use of the 'Rconn' struct field 'id', which is unique, to assert that the 'connection' object used is referring to the original/expected connection. The 'id' field is available via attribute 'conn_id' part of a 'connection' object. PATCH: See attached 'connection.patch' file (or https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81#issuecomment-434210222). The patch introduces a new SEXP R_GetConnection2(SEXP sConn) function, which looks up a connection by its index *and* the 'id' field. This function is backward compatible with R_GetConnection(), which looks up a connection by its index (only). In addition, R_GetConnection2() also accepts 'sConn' of type integer, which the looks up the connection similar to how the internal getConnection() function does it. Comment: The patch is just one of many alternatives. Hopefully, it helps clarify what I'm suggesting. It passes 'make check' and I've tested it on a few packages of mine that make heavy use of different types of connections. In addition to "overridden" connections, the patch protects against invalid 'connection':s that have been serialized, e.g.> fh2 <- file("b.txt", open = "w+") > saveRDS(fh2, file = "fh2.rds") > fh3 <- readRDS("fh2.rds") > attr(fh2, "conn_id")<pointer: 0x78>> attr(fh3, "conn_id")<pointer: (nil)> #<== NIL because external pointer was lost when serialized> isOpen(fh2)[1] TRUE> isOpen(fh3)Error in isOpen(fh3) : invalid connection ('conn_id' is NULL) This is useful, when for instance 'connection':s are (incorrectly) passed to background R sessions (e.g. PSOCK cluster nodes). SEE ALSO: * More details of the above are scribbled down on https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81 * R-devel post 'closeAllConnections() can really mess things up', 2016-10-30, https://stat.ethz.ch/pipermail/r-devel/2016-October/073331.html All the best, Henrik -------------- next part -------------- A non-text attachment was scrubbed... Name: connections.patch Type: text/x-patch Size: 15539 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20181030/d2a946b2/attachment.bin>
Henrik Bengtsson
2019-Feb-09 00:47 UTC
[Rd] PATCH: Asserting that 'connection' used has not changed + R_GetConnection2()
Bumping this thread in the hope to catch the attention from R core. As I try to argue in my original post, given the existing internal structure of connections, I don't think it's too hard to add protection against the use of corrupted R connection. Writing to corrupted connection is a mistake that currently may pass silently while corrupting a non-intended target. Henrik On Tue, Oct 30, 2018, 19:51 Henrik Bengtsson <henrik.bengtsson at gmail.com wrote:> SUMMARY: > > I'm proposing that R assert that 'connection' options have not changed > since first created such that R will produce the following error: > > > fh <- file("a.txt", open = "w+") > > cat("hello\n", file = fh) > > close(fh) > > > fh2 <- file("b.txt", open = "w+") > > cat("world\n", file = fh2) > > > cat("hello again\n", file = fh) > Error in cat("hello again\n", file = fh) : > invalid connection (non-existing 'conn_id') > > Note that, currently in R, the latter silently writes to 'b.txt' - not > 'a.txt' (for more details, see > https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81). > > > BACKGROUND: > > In R, connections are indexed by their (zero-based) row indices in the > table of available connections. For example, > > > fh <- file("a.txt", open = "w") > > showConnections(all = TRUE) > description class mode text isopen can read can write > 0 "stdin" "terminal" "r" "text" "opened" "yes" "no" > 1 "stdout" "terminal" "w" "text" "opened" "no" "yes" > 2 "stderr" "terminal" "w" "text" "opened" "no" "yes" > 3 "a.txt" "file" "w" "text" "opened" "no" "yes" > > con <- getConnection(3) > > identical(con, fh) > [1] TRUE > > > ISSUE: > > The problem with the current design/implementation where connections > are referred to by their index (only), is that > > (i) the table of connections changes over time and > (ii) connection indices are recycled. > > Because a `connection` object holds the connection row index, it means > that *the actual underlying connection that a `connection` object > refers to may change over its lifetime*. > > > SUGGESTION: > > Make use of the 'Rconn' struct field 'id', which is unique, to assert > that the 'connection' object used is referring to the > original/expected connection. The 'id' field is available via > attribute 'conn_id' part of a 'connection' object. > > > PATCH: > > See attached 'connection.patch' file (or > > https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81#issuecomment-434210222 > ). > The patch introduces a new SEXP R_GetConnection2(SEXP sConn) function, > which looks up a connection by its index *and* the 'id' field. This > function is backward compatible with R_GetConnection(), which looks up > a connection by its index (only). In addition, R_GetConnection2() also > accepts 'sConn' of type integer, which the looks up the connection > similar to how the internal getConnection() function does it. > > Comment: The patch is just one of many alternatives. Hopefully, it > helps clarify what I'm suggesting. It passes 'make check' and I've > tested it on a few packages of mine that make heavy use of different > types of connections. > > In addition to "overridden" connections, the patch protects against > invalid 'connection':s that have been serialized, e.g. > > > fh2 <- file("b.txt", open = "w+") > > saveRDS(fh2, file = "fh2.rds") > > fh3 <- readRDS("fh2.rds") > > attr(fh2, "conn_id") > <pointer: 0x78> > > attr(fh3, "conn_id") > <pointer: (nil)> #<== NIL because external pointer was lost when > serialized > > isOpen(fh2) > [1] TRUE > > isOpen(fh3) > Error in isOpen(fh3) : invalid connection ('conn_id' is NULL) > > This is useful, when for instance 'connection':s are (incorrectly) > passed to background R sessions (e.g. PSOCK cluster nodes). > > > SEE ALSO: > > * More details of the above are scribbled down on > https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81 > * R-devel post 'closeAllConnections() can really mess things up', > 2016-10-30, > https://stat.ethz.ch/pipermail/r-devel/2016-October/073331.html > > All the best, > > Henrik >[[alternative HTML version deleted]]