Jeffrey Horner
2006-Mar-30 22:34 UTC
[Rd] Writing character vectors with embedded nulls to a connection
Is this possible? I've tried both writeChar() and writeBin() to no avail. My goal is to serialize(ascii=FALSE) an object to a connection but determine the size of the serialized object before hand: sobject <- serialize(object,NULL,ascii=FALSE) len <- nchar(sobject) # # run some code here to notify listener on other end of connection # how many bytes I'm getting ready to send # writeChar(sobject,con) The other option is to serialize twice: len <- nchar(serialize(object,NULL,ascii=FALSE)) # # run some code here to notify listener on other end of connection # how many bytes I'm getting ready to send # serialize(object,con,ascii=FALSE) Object stores, like memcache (http://danga.com/memcached/), need to know object sizes before storing. RDBMS's which support large objects (CLOBS or BLOBS) don't nececarilly need to know object sizes before-hand, but they do have max column size limits which must be honored. BTW, readchar() can read strings with embedded nulls; I figured writeChar() should be able to write them. -- Jeffrey Horner Computer Systems Analyst School of Medicine 615-322-8606 Department of Biostatistics Vanderbilt University
Prof Brian Ripley
2006-Mar-31 07:38 UTC
[Rd] Writing character vectors with embedded nulls to a connection
I think you should be using a raw type to hold such data in R. It is not intentional that readChar handles embedded nuls (and in fact it might not in an MBCS). As ?serialize says For 'serialize', 'NULL' unless 'connection=NULL', when the result is stored in the first element of a character vector (but is not a normal character string unless 'ascii = TRUE' and should not be processed except by 'unserialize'). so you have been told this is not intended to work as you tried. serialize predates the raw type, or it would have made use of it. In these days of MBCS character strings it is increasingly unsafe to use them to hold anything other than valid character data. On Thu, 30 Mar 2006, Jeffrey Horner wrote:> Is this possible? I've tried both writeChar() and writeBin() to no avail. > > My goal is to serialize(ascii=FALSE) an object to a connection but > determine the size of the serialized object before hand: > > sobject <- serialize(object,NULL,ascii=FALSE) > len <- nchar(sobject) > # > # run some code here to notify listener on other end of connection > # how many bytes I'm getting ready to send > # > writeChar(sobject,con) > > The other option is to serialize twice: > > len <- nchar(serialize(object,NULL,ascii=FALSE)) > # > # run some code here to notify listener on other end of connection > # how many bytes I'm getting ready to send > # > serialize(object,con,ascii=FALSE) > > Object stores, like memcache (http://danga.com/memcached/), need to know > object sizes before storing. RDBMS's which support large objects (CLOBS > or BLOBS) don't nececarilly need to know object sizes before-hand, but > they do have max column size limits which must be honored. > > BTW, readchar() can read strings with embedded nulls; I figured > writeChar() should be able to write them. > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595