Hi all, Could anyone point me to one or more examples in the R sources of a C function that is called without knowing in advance what will be the length (say) of the output vector? To make myself clearer, we have a C function that computes probabilities until their sum gets "close enough" to 1. Hence, the number of probabilities is not known in advance. I would like to have an idea what is the best way to handle this situation in R. Thanks in advance! --- Vincent Goulet, Associate Professor ?cole d'actuariat Universit? Laval, Qu?bec Vincent.Goulet at act.ulaval.ca http://vgoulet.act.ulaval.ca
Salut Vincent, On 6 June 2007 at 13:17, Vincent Goulet wrote: | Could anyone point me to one or more examples in the R sources of a C | function that is called without knowing in advance what will be the | length (say) of the output vector? | | To make myself clearer, we have a C function that computes | probabilities until their sum gets "close enough" to 1. Hence, the | number of probabilities is not known in advance. | | I would like to have an idea what is the best way to handle this | situation in R. I haven't been on the 'Dirk tells everybody to use RcppTemplate' soapbox lately, so let me (drumroll) suggest to use RcppTemplate. So if you were to consider C++, which can be done incrementally relative to C, then you could collect your data in a self-growing, autonagically managemed STL Vector, pass it to RcppTemplate, and you're done. I have a local .deb package for r-cran-rcpptemplate I can send you (and which I should upload to Debian), or you can just use it from source on CRAN. You want the source for the examples anyway. If C++ is not an option, then I typically just hunt for decent examples that do similar stuff. BDR's RODBC is usually a good source -- he also does not know ex ante how long the return sets are. It's not _that_ hard, just tedious. You examine the length of your vector, assign a new one for results of that length, and copy elements. As an example, see the following from my rdieharder package (that is pending an upload to CRAN) where C as opposed to C++ was a given constraint. I wrote this in a hurry and I may well have missed something to make it a tad better -- but as indicated above, I'd rather be doing it in C++ anyway. In the code, element [1] is a vector of possibly varying size: /* vector of size three: [0] is scalar ks_pv, [1] is pvalues vec, [2] name */ PROTECT(result = allocVector(VECSXP, 3)); /* alloc scalar, and set it */ PROTECT(pv = allocVector(REALSXP, 1)); REAL(pv)[0] = testptr->ks_pvalue; /* vector and set it */ PROTECT(vec = allocVector(REALSXP, testptr->psamples)); for (i = 0; i < testptr->psamples; i++) { REAL(vec)[i] = testptr->pvalues[i]; } PROTECT(name = allocVector(STRSXP, 1)); SET_STRING_ELT(name, 0, mkChar(dtestptr->name)); /* insert scalar and vector */ SET_VECTOR_ELT(result, 0, pv); SET_VECTOR_ELT(result, 1, vec); SET_VECTOR_ELT(result, 2, name); UNPROTECT(4); return result; and testptr->psamples has the unknonw 'N' of vector length. And yes, we need better cookbook examples for all this. Hth, Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison
Vincent Goulet wrote:> Hi all, > > Could anyone point me to one or more examples in the R sources of a C > function that is called without knowing in advance what will be the > length (say) of the output vector? > > To make myself clearer, we have a C function that computes > probabilities until their sum gets "close enough" to 1. Hence, the > number of probabilities is not known in advance. >Hi Vincent, Let's say you want to write a function get_matches(const char * pattern, const char * x) that will find all the occurrences of string 'pattern' in string 'x' and "return" their positions in the form of an array of integers. Of course you don't know in advance how many occurrences you're going to find. One possible strategy is to: - Add an extra arg to 'get_matches' for storing the positions and make 'get_matches' return the number of matches (i.e. the length of *pos): int get_matches(int **pos_ptr, const char * pattern, const char * x) Note that pos_ptr is a pointer to an int pointer. - In get_matches(): use a local array of ints and start with an arbitrary initial size for it: int get_matches(...) { int *tmp_pos, tmp_size, npos = 0; tmp_size = some initial guess of the number of matches tmp_pos = (int *) S_alloc((long) tmp_size, sizeof(int)); ... Then start searching for matches and every time you find one, store its position in tmp_pos[npos] and increase npos. When tmp_pos is full (npos == tmp_size), realloc with: ... old_size = tmp_size; tmp_size = 2 * old_size; /* there are many different strategies for this */ tmp_pos = (int *) S_realloc((char *) tmp_pos, (long) tmp_size, (long) old_tmp_size, sizeof(int)); ... Note that there is no need to check that the call to S_alloc() or S_realloc() were successful because these functions will raise an error and end the call to .Call if they fail. In this case they will free the memory currently allocated (and so will do on any error or user interrupt). When you are done, just return with: ... *pos_ptr = tmp_pos; return npos; } - Call get_matches with: int *pos, npos; npos = get_matches(&pos, pattern, x); Note that memory allocation took place in 'get_matches' but now you need to decide how and when the memory pointed by 'pos' will be freed. In the R environment, this can be addressed by using exclusively transient storage allocation (http://cran.r-project.org/doc/manuals/R-exts.html#Transient) as we did in get_matches() so the allocated memory will be automatically reclaimed at the end of the call to .C or .Call. Of course, the integers stored in pos have to be moved to a "safe" place before .Call returns. Typically this will be done with something like: SEXP Call_get_matches(...) { ... npos = get_matches(&pos, pattern, x); PROTECT(pos_sxp = NEW_INTEGER(npos)); memcpy(INTEGER(pos_sxp), pos, npos * sizeof(int)); UNPROTECT(1); return pos_sxp; /* end of call to .Call */ } There are many variations around this. One of them is to "share" pos and npos between get_matches and its caller by making them global variables (in this case it is recommended to use 'static' in their declarations but this requires that get_matches and its caller are in the same .c file). Hope this helps. H.> I would like to have an idea what is the best way to handle this > situation in R. > > Thanks in advance! > > --- > Vincent Goulet, Associate Professor > ?cole d'actuariat > Universit? Laval, Qu?bec > Vincent.Goulet at act.ulaval.ca http://vgoulet.act.ulaval.ca > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
On Wed, 6 Jun 2007, Vincent Goulet wrote:> Could anyone point me to one or more examples in the R sources of a C > function that is called without knowing in advance what will be the > length (say) of the output vector? > > To make myself clearer, we have a C function that computes > probabilities until their sum gets "close enough" to 1. Hence, the > number of probabilities is not known in advance. > > I would like to have an idea what is the best way to handle this > situation in R.I think you will want to use the .Call(), not .C(), interface. Then you can expand the output vector as you see fit. E.g., the following sets the output vector length to 0 and adds 1 to it each time it needs to. This works in both R and Splus 8.0. In Splus SET_LENGTH(vec, len) reserves some extra space (above the 'len' requested items) so that future calls to expand it a bit don't always copy it. In R, each call to SET_LENGTH appears to copy the input vector, so you probably want to add extra logic to reserve some extra space and finally trim it down to the precise size you want. #include <R.h> #include <Rdefines.h> /* .Call("unknownReturnLength", 0.02) : return * sequence of random uniforms, quitting when * you see the first one below 0.02. */ SEXP unknownReturnLength(SEXP prob_p) { int retval_length ; /* long preferred in S-PLUS */ double prob, r ; SEXP retval ; retval_length = 0 ; /* how does PROTECT interact with SET_LENGTH? */ PROTECT(retval = NEW_NUMERIC(retval_length)) ; prob = asReal(prob_p); GetRNGstate(); while((r=unif_rand()) >= prob) { double *oldptr = NUMERIC_POINTER(retval) ; retval_length++ ; SET_LENGTH(retval, retval_length) ; if (oldptr != NUMERIC_POINTER(retval)) Rprintf("expanding retval from %d to %d moved it\n", retval_length-1, retval_length) ; NUMERIC_POINTER(retval)[retval_length-1] = r ; } PutRNGstate(); return retval ; } ---------------------------------------------------------------------------- Bill Dunlap Insightful Corporation bill at insightful dot com 360-428-8146 "All statements in this message represent the opinions of the author and do not necessarily reflect Insightful Corporation policy or position."