When I originally implemented connections in R 1.2.0, I followed the model in the 'Green Book' closely. There were a number of features that forced a particular implementation, and one was getConnection() that allows one to recreate a connection object from a number. I am wondering if anyone makes use of this, and if so for what? It would seem closer to the R philosophy to have connection objects that get garbage collected when no R object refers to them. This would allow for example readLines(gzfile("foo.gz")) which currently leaks a connection slot as the connection cannot be closed (except via closeAllConnections() or getConnection()) without an R object being returned. The correct usage currently is readLines(con <- gzfile("foo.gz")); close(con) which is a little awkward but more importantly seems little understood. Another issue is that the current connection objects can be saved and restored but refer to a global table that is session-specific so they lose their meaning (and perhaps gain an unintended one). What I suspect is that very few users are aware of the Green Book description and so we have freedom to make some substantial changes to the implementation. Both issues suggest that connection objects should be based on external pointers (which did not exist way back in 1.2.0). [I know there is a call to getConnection in package gtools, but the return value is unused!] -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
In a previous version of the 'filehash' package, the 'filehashDB1' class had a slot for an open connection corresponding to the database file. I quickly learned that if the R object ever got removed or reassigned I was left hanging with an open file connection. If I remember correctly, I resorted to creating an environment in the R object which stored the connection number for the the database file connection. Then I registered a finalizer for that environment which grabbed the connection via 'getConnection' and then closed the connection. I eventually abandoned this approach since it was error-prone and I often ran into strange difficult-to-reproduce situations where the R object representing the database had been removed but the file connection was still open because garbage collection had not yet occurred. I would have very much preferred a system where the file connection was automatically closed once any references to it were gone. -roger On 5/30/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:> When I originally implemented connections in R 1.2.0, I followed the model > in the 'Green Book' closely. There were a number of features that forced > a particular implementation, and one was getConnection() that allows one > to recreate a connection object from a number. > > I am wondering if anyone makes use of this, and if so for what? > > It would seem closer to the R philosophy to have connection objects that > get garbage collected when no R object refers to them. This would allow > for example > > readLines(gzfile("foo.gz")) > > which currently leaks a connection slot as the connection cannot be closed > (except via closeAllConnections() or getConnection()) without an R object > being returned. > > The correct usage currently is > > readLines(con <- gzfile("foo.gz")); close(con) > > which is a little awkward but more importantly seems little understood. > > Another issue is that the current connection objects can be saved and > restored but refer to a global table that is session-specific so they lose > their meaning (and perhaps gain an unintended one). > > What I suspect is that very few users are aware of the Green Book > description and so we have freedom to make some substantial changes > to the implementation. Both issues suggest that connection objects should > be based on external pointers (which did not exist way back in 1.2.0). > > [I know there is a call to getConnection in package gtools, but the return > value is unused!] > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/
Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:> When I originally implemented connections in R 1.2.0, I followed the model > in the 'Green Book' closely. There were a number of features that forced > a particular implementation, and one was getConnection() that allows one > to recreate a connection object from a number. > > I am wondering if anyone makes use of this, and if so for what?I don't see any uses of it in the Bioconductor package sources.> It would seem closer to the R philosophy to have connection objects that > get garbage collected when no R object refers to them. This would allow > for example > > readLines(gzfile("foo.gz"))I think this would be a nice improvement as it matches what many people already assume happens as well as matches what some other languages do (in particular, Python). + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
Prof Brian Ripley wrote:> When I originally implemented connections in R 1.2.0, I followed the model > in the 'Green Book' closely. There were a number of features that forced > a particular implementation, and one was getConnection() that allows one > to recreate a connection object from a number.[...]> Another issue is that the current connection objects can be saved and > restored but refer to a global table that is session-specific so they lose > their meaning (and perhaps gain an unintended one). > > What I suspect is that very few users are aware of the Green Book > description and so we have freedom to make some substantial changes > to the implementation. Both issues suggest that connection objects should > be based on external pointers (which did not exist way back in 1.2.0).Sounds great! I would also like to see the following interface (all or in parts) added for working with connections from C. This is an update to the patch I created here: http://wiki.r-project.org/rwiki/doku.php?id=developers:r_connections_api /* Acting upon a connection */ void R_CloseConnection(SEXP); int R_VfprintfConnection(SEXP, const char *format, va_list ap); int R_FgetcConnection(SEXP); double R_SeekConnection(SEXP, double where, int origin, int rw); void R_TruncateConnection(SEXP); int R_FlushConnection(SEXP); size_t R_ReadConnection(SEXP, void *buf, size_t size, size_t n); size_t R_WriteConnection(SEXP, const void *buf, size_t size, size_t n); /* Querying a connection */ Rboolean R_ConnectionIsText(SEXP); Rboolean R_ConnectionIsOpen(SEXP); Rboolean R_ConnectionCanRead(SEXP); Rboolean R_ConnectionCanWrite(SEXP); Rboolean R_ConnectionCanSeek(SEXP); Rboolean R_ConnectionIsBlocking(SEXP); /* Prototypes for new connections created in C */ typedef Rboolean (*Rc_open)(void *private); typedef void (*Rc_close)(void *private); typedef void (*Rc_destroy)(void *private); /* when closing connection */ typedef int (*Rc_vfprintf)(void *private, const char *, va_list); typedef int (*Rc_fgetc)(void *private); typedef double (*Rc_seek)(void *private, double, int, int); typedef void (*Rc_truncate)(void *private); typedef int (*Rc_fflush)(void *private); typedef size_t (*Rc_read)(void *, size_t, size_t, void *private); typedef size_t (*Rc_write)(const void *, size_t, size_t, void *private); /* Create a Connection */ SEXP R_NewConnection(char *class, char *description, char *mode, Rboolean blocking, Rc_open, Rc_close, Rc_destroy, Rc_vfprintf, Rc_fgetc, Rc_seek, Rc_truncate, Rc_fflush, Rc_read, Rc_write, void *private); /* Swap out the standard C streams. More exotic, but it may clean up the messy R_ConsoleFile, R_Outputfile, WriteConsole(), WriteConsoleEx(), etc... confusion. */ Rboolean R_RegisterStdinConnection(SEXP scon); Rboolean R_RegisterStdoutConnection(SEXP scon); Rboolean R_RegisterStderrConnection(SEXP scon); Jeff -- http://biostat.mc.vanderbilt.edu/JeffreyHorner
Prof Brian Ripley a ?crit :> When I originally implemented connections in R 1.2.0, I followed the model > in the 'Green Book' closely. There were a number of features that forced > a particular implementation, and one was getConnection() that allows one > to recreate a connection object from a number.I'm currently using connections (socketConnection(), etc) and I first want to *thank you* for this nice work. (imho, it's so much simpler than the underlying C/C++ stuff.)> I am wondering if anyone makes use of this, and if so for what?I use getConnection(). In the context in which I use it, the number of the connection is known a priori. So getConnection() is an easy way to access to the connection for the functions which need to. I do not however pretend this is the best way to proceed.> It would seem closer to the R philosophy to have connection objects that > get garbage collected when no R object refers to them. This would allow > for example > ... readLines(con <- gzfile("foo.gz")); close(con) > which is a little awkward but more importantly seems little understood.There could be/was the same debate in C/C++. That's may be just a matter of education about not forgetting to close previously opened doors !> What I suspect is that very few users are aware of the Green Book > description and so we have freedom to make some substantial changes > to the implementation. Both issues suggest that connection objects should > be based on external pointers (which did not exist way back in 1.2.0).I'm not skilled enough for any advice here, but from a simple user point of view, I just hope it could continue to be as simple and practical as today. And I renew my thanks for the existing tool (and also the rest !).
Prof Brian Ripley <ripley at stats.ox.ac.uk> writes: > [I know there is a call to getConnection in package gtools, but > the return value is unused!] Speaking of connections and gtools, I've been wondering if the gtools setTCPNoDelay function's method of getting the Unix file descriptor of a socket connection is safe. It's basically doing this: d <- as.integer(socketConnection("localhost", 80)[1]) Is that depending on an implementation detail that may change? I think it's useful to allow C extensions to set OS specific socket options on connection objects. But perhaps Jeffrey Horner's proposal addresses this need. - Steve