Hi, I am trying to figure a way to allocate a string SEXP so that gc() won't ever collect it. Here is a little bit of a background. Suppose I want to write a .Call-callable function that upon each call returns the same value, say mkChar("foo"): SEXP getFoo() { return mkChar("foo"); } The above implementation doesn't take advantage of the fact that mkChar("foo") could be pre-computed only once, and then the function would return the pre-computed value. So the question is how to create this precomputed value. The closest thing I could find in the sources is R_NaString, but I was not able to trace down how it comes about. Thanks, Vadim P.S. I was able to solve a similar problem with symbols. If I need a symbol "foo", I do static SEXP FooSymbol = install("foo"); and then use FooSymbol instead of install("foo") [[alternative HTML version deleted]]
Thanks Duncan! R_PreserveObject will do. One thought, wouldn't it make sense to modify R_PreserveObject to return its argument? This would allow things like static SEXP fooSexp = R_PreserveObject(mkChar("foo")); and would also make R_PreserveObject more similar to Rf_protect(). There should be no problem w/ backward compatability, at least not that I could see. Thanks, Vadim> -----Original Message----- > From: Duncan Temple Lang [mailto:duncan@wald.ucdavis.edu] > Sent: Tuesday, April 12, 2005 12:52 PM > To: Vadim Ogranovich > Cc: r-devel@stat.math.ethz.ch > Subject: Re: [Rd] How allocate STRSXP outside of gc > > Look at R_PreserveObject to see if it will do what you want. > It certainly used to. > > SEXP getFoo() > { > static SEXP val = NULL; > > if(!val) { > val = mkChar("foo"); > R_PreserveObject(val); > } > > return(val); > } > > > You may want to have a routine that is called when the > package is unloaded that calls R_ReleaseObject(). > > Alternatively, store the object in a package's namespace > environment and it won't be gc'ed. > > D. > > > Vadim Ogranovich wrote: > > Hi, > > > > I am trying to figure a way to allocate a string SEXP so that gc() > > won't ever collect it. > > > > Here is a little bit of a background. Suppose I want to write a > > .Call-callable function that upon each call returns the same value, > > say > > mkChar("foo"): > > > > SEXP getFoo() { > > return mkChar("foo"); > > } > > > > The above implementation doesn't take advantage of the fact that > > mkChar("foo") could be pre-computed only once, and then the > function > > would return the pre-computed value. So the question is how > to create > > this precomputed value. > > > > > > The closest thing I could find in the sources is > R_NaString, but I was > > not able to trace down how it comes about. > > > > > > Thanks, > > Vadim > > > > > > P.S. I was able to solve a similar problem with symbols. If > I need a > > symbol "foo", I do > > > > static SEXP FooSymbol = install("foo"); > > > > and then use FooSymbol instead of install("foo") > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Duncan Temple Lang duncan@wald.ucdavis.edu > Department of Statistics work: (530) 752-4782 > 371 Kerr Hall fax: (530) 752-7099 > One Shields Ave. > University of California at Davis > Davis, CA 95616, USA > > > >
On Tue, Apr 12, 2005 at 12:31:03PM -0700, Vadim Ogranovich wrote:> Hi, > > I am trying to figure a way to allocate a string SEXP so that gc() won't > ever collect it. > > Here is a little bit of a background. Suppose I want to write a > .Call-callable function that upon each call returns the same value, say > mkChar("foo"): > > SEXP getFoo() { > return mkChar("foo"); > } > > The above implementation doesn't take advantage of the fact that > mkChar("foo") could be pre-computed only once, and then the function > would return the pre-computed value. So the question is how to create > this precomputed value. > > > The closest thing I could find in the sources is R_NaString, but I was > not able to trace down how it comes about.For being unaffected by R's memory management, it may be the best to not use a SEXP for storing the pre-computed result at all. Rather, use a static variable "private" to your code, as in SEXP getFoo() { static char *foo = NULL; if (foo == NULL) { foo = the_difficult_to_compute_value_of_foo(); } return mkChar(foo); } This way, getFoo indeed invokes mkChar each time, but in your scenario, that might be an overhead which is negligible compared to the actual computation of the foo value. Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk@cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----*
mkChar is a rather expensive call since it allocates a new R object. For example in reading char data from a file it is often advantageous to first try to look up an already made R string and only then use mkChar. That is, the overhead of the lookup is usually smaller than that of mkChar.> -----Original Message----- > From: r-devel-bounces@stat.math.ethz.ch > [mailto:r-devel-bounces@stat.math.ethz.ch] On Behalf Of Jan T. Kim > Sent: Wednesday, April 13, 2005 3:44 AM > To: r-devel@stat.math.ethz.ch > Subject: Re: [Rd] How allocate STRSXP outside of gc > > On Tue, Apr 12, 2005 at 12:31:03PM -0700, Vadim Ogranovich wrote: > > Hi, > > > > I am trying to figure a way to allocate a string SEXP so that gc() > > won't ever collect it. > > > > Here is a little bit of a background. Suppose I want to write a > > .Call-callable function that upon each call returns the same value, > > say > > mkChar("foo"): > > > > SEXP getFoo() { > > return mkChar("foo"); > > } > > > > The above implementation doesn't take advantage of the fact that > > mkChar("foo") could be pre-computed only once, and then the > function > > would return the pre-computed value. So the question is how > to create > > this precomputed value. > > > > > > The closest thing I could find in the sources is > R_NaString, but I was > > not able to trace down how it comes about. > > For being unaffected by R's memory management, it may be the > best to not use a SEXP for storing the pre-computed result at > all. Rather, use a static variable "private" to your code, as in > > SEXP getFoo() > { > static char *foo = NULL; > > if (foo == NULL) > { > foo = the_difficult_to_compute_value_of_foo(); > } > return mkChar(foo); > } > > This way, getFoo indeed invokes mkChar each time, but in your > scenario, that might be an overhead which is negligible > compared to the actual computation of the foo value. > > Best regards, Jan > -- > +- Jan T. Kim > -------------------------------------------------------+ > | *NEW* email: jtk@cmp.uea.ac.uk > | > | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk > | > *-----=< hierarchical systems are for files, not for humans > >=-----* > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Yes, and space sharing also improves speed since gc() does not need to collect so many objects. I thought about more efficient formats for my data, but: * ASCII is ubiquitous. Your have grep, head, perl, etc. to work w/ them * AFAIK, there is no industry standard binary format and a mature supporting C-library (especially when the data needs to be compressed). I considered HDF and netcdf. * the programs that collect my data store it in ASCII. It is advantageous to be able to read it directly from the original files. (I have about 200G of these compressed) * C code was able to read the data at a decent speed, it was the R's overhead that was causing problems. One of them was mkChar, the other was how chars are read from a connection. I detailed my findings in a message to r-devel. I tried to see is I could improve the original R codes for IO, but for various reasons decided that I wouldn't be able to accomplish this. In the end I decided to write a custom R IO package which came close to the speed of raw C code (the difference is largely due to the lookup overhead). Thanks, Vadim> -----Original Message----- > From: Prof Brian Ripley [mailto:ripley@stats.ox.ac.uk] > Sent: Thursday, April 14, 2005 12:02 AM > To: Vadim Ogranovich > Cc: Jan T. Kim; r-devel@stat.math.ethz.ch > Subject: RE: [Rd] How allocate STRSXP outside of gc > > On Wed, 13 Apr 2005, Vadim Ogranovich wrote: > > > mkChar is a rather expensive call since it allocates a new > R object. > > For example in reading char data from a file it is often > advantageous > > to first try to look up an already made R string and only > then use mkChar. > > That is, the overhead of the lookup is usually smaller than that of > > mkChar. > > Yes (and that is one reason why scan in 2.1.0 uses lookups, > space sharing being the other), but both are really fast and > this only comes into play with hundreds of millions of items. > (On my machine mkChar takes about 200 ns, hardly `rather > expensive'.) And if you have that much data, why not store > it in a more efficient format? > > -- > Brian D. Ripley, ripley@stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 >
Yes, HDF5 had this promise at the time I looked at it, but it was not there yet. Don't know the current status. Judging from your e-mail, they've delivered. Thank you for pointing to NCO. I didn't know about it.> -----Original Message----- > From: Jeffrey Horner [mailto:jeff.horner@vanderbilt.edu] > Sent: Thursday, April 14, 2005 11:19 AM > To: Vadim Ogranovich > Cc: r-devel@stat.math.ethz.ch > Subject: Re: [Rd] How allocate STRSXP outside of gc > > Vadim Ogranovich wrote: > [...] > > * AFAIK, there is no industry standard binary format and a mature > > supporting C-library (especially when the data needs to be > compressed). > > I considered HDF and netcdf. > [...] > > Interesting. I just finished reading a little about HDF's new > format HD5 and their web documentation claims it's flexible > enough to store compressed or chunked data: > > http://hdf.ncsa.uiuc.edu/whatishdf5.html > > Also, you mentioned that you like line oriented ASCII files > since many UNIX utilities work with them, but have you > considered NCO, a collection of UNIX utilites for processing > netcdf files: > > http://nco.sourceforge.net/ > > -- > Jeffrey Horner Computer Systems Analyst School > of Medicine > 615-322-8606 Department of Biostatistics Vanderbilt > University >