Paul Johnson
2012-Dec-10 06:51 UTC
[Rd] Changing arguments inside .Call. Wise to encourage "const" on all arguments?
I'm continuing my work on finding speedups in generalized inverse calculations in some simulations. It leads me back to .C and .Call, and some questions I've never been able to answer for myself. It may be I can push some calculations to LAPACK in or C BLAS, that's why I realized again I don't understand the call by reference or value semantics of .Call Why aren't users of .Call encouraged to "const" their arguments, and why doesn't .Call do this for them (if we really believe in return by value)? R Gentleman's R Programming for Bioinformatics is the most understandable treatment I've found on .Call. It appears to me .Call leaves "wiggle room" where there should be none. Here's Gentleman on p. 185. "For .Call and .External, the return value is an R object (the C functions must return a SEXP), and for these functions the values that were passed are typically not modified. If they must be modified, then making a copy in R, prior to invoking the C code, is necessary." I *think* that means: .Call allows return by reference, BUT we really wish users would not use it. Users can damage R data structures that are pointed to unless they really truly know what they are doing on the C side. ?? This seems dangerous. Why allow return by reference at all? On p. 197, there's a similar comment "Any function that has been invoked by either .External or .Call will have all of its arguments protected already. You do not need to protect them. .... [T]hey were not duplicated and should be treated as read-only values." "should be ... read-only" concerns me. They are "protected" in the garbage collector sense, but they are not protected from "return by reference" damage. Right? Why doesn't the documentation recommend function writers to mark arguments to C functions as const? Isn't that what the return by value policy would suggest? Here's a troublesome example in R src/main/array.c: /* DropDims strips away redundant dimensioning information. */ /* If there is an appropriate dimnames attribute the correct */ /* element is extracted and attached to the vector as a names */ /* attribute. Note that this function mutates x. */ /* Duplication should occur before this is called. */ SEXP DropDims(SEXP x) { SEXP dims, dimnames, newnames = R_NilValue; int i, n, ndims; PROTECT(x); dims = getAttrib(x, R_DimSymbol); [... SNIP ....] setAttrib(x, R_DimNamesSymbol, R_NilValue); setAttrib(x, R_DimSymbol, R_NilValue); setAttrib(x, R_NamesSymbol, newnames); [... SNIP ....] return x; } Well, at least there's a warning comment with that one. But it does show .Call allows call by reference. Why allow it, though? DropDims could copy x, modify the copy, and return it. I wondered why DropDims bothers to return x at all. We seem to be using modify and return by reference there. I also wondered why x is PROTECTED, actually. Its an argument, wasn't it automatically protected? Is it no longer protected after the function starts modifying it? Here's an example with similar usage in Writing R Extensions, section 5.10.1 "Calling .Call". It protects the arguments a and b (needed ??), then changes them. #include <R.h> #include <Rdefines.h> SEXP convolve2(SEXP a, SEXP b) { R_len_t i, j, na, nb, nab; double *xa, *xb, *xab; SEXP ab; PROTECT(a = AS_NUMERIC(a)); /* PJ wonders, doesn't this alter "a" in calling code*/ PROTECT(b = AS_NUMERIC(b)); na = LENGTH(a); nb = LENGTH(b); nab = na + nb - 1; PROTECT(ab = NEW_NUMERIC(nab)); xa = NUMERIC_POINTER(a); xb = NUMERIC_POINTER(b); xab = NUMERIC_POINTER(ab); for(i = 0; i < nab; i++) xab[i] = 0.0; for(i = 0; i < na; i++) for(j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j]; UNPROTECT(3); return(ab); } Doesn't PROTECT(a = AS_NUMERIC(a)); have the alter the data structure "a" both inside the C function and in the calling R code as well? And, if a was PROTECTED automatically, could we do without that PROTECT()? pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu
Simon Urbanek
2012-Dec-10 19:05 UTC
[Rd] Changing arguments inside .Call. Wise to encourage "const" on all arguments?
On Dec 10, 2012, at 1:51 AM, Paul Johnson wrote:> I'm continuing my work on finding speedups in generalized inverse > calculations in some simulations. It leads me back to .C and .Call, > and some questions I've never been able to answer for myself. It may > be I can push some calculations to LAPACK in or C BLAS, that's why I > realized again I don't understand the call by reference or value > semantics of .Call > > Why aren't users of .Call encouraged to "const" their arguments, and > why doesn't .Call do this for them (if we really believe in return by > value)? >Because there is a difference between the *data* part of the SEXP and the object itself. Internal structure of the object may need to be modified (e.g. the NAMED ref counting increased when you assign it) in a call to R API. You can't flag the data part as const separately, so you have to use non-const SEXP.> R Gentleman's R Programming for Bioinformatics is the most > understandable treatment I've found on .Call. It appears to me .Call > leaves "wiggle room" where there should be none. Here's Gentleman on > p. 185. "For .Call and .External, the return value is an R object (the > C functions must return a SEXP), and for these functions the values > that were passed are typically not modified. If they must be > modified, then making a copy in R, prior to invoking the C code, is > necessary." > > I *think* that means: > > .Call allows return by reference, BUT we really wish users would not > use it. Users can damage R data structures that are pointed to unless > they really truly know what they are doing on the C side. ?? > > This seems dangerous. Why allow return by reference at all? >Because it is completely legal to do things like SEXP last(SEXP bar) { if (TYPEOF(bar) = VECSXP && LENGTH(bar) > 0) return VECTOR_ELT(bar, LENGTH(bar) - 1); Rf_error("sorry, I only work on lists"); } There is no point in duplicating the element.> On p. 197, there's a similar comment "Any function that has been > invoked by either .External or .Call will have all of its arguments > protected already. You do not need to protect them. .... [T]hey were > not duplicated and should be treated as read-only values." > > "should be ... read-only" concerns me. They are "protected" in the > garbage collector sense,Yes> but they are not protected from "return by > reference" damage. Right? >There is no "return by reference damage". The only problem is if you modify input arguments while someone else holds a reference, but there is no way in C to prevent that while still allowing them to be useful. Note that it is legal to modify input arguments if there are no references to it. Cheers, Simon> Why doesn't the documentation recommend function writers to mark > arguments to C functions as const? Isn't that what the return by > value policy would suggest? > > Here's a troublesome example in R src/main/array.c: > > /* DropDims strips away redundant dimensioning information. */ > /* If there is an appropriate dimnames attribute the correct */ > /* element is extracted and attached to the vector as a names */ > /* attribute. Note that this function mutates x. */ > /* Duplication should occur before this is called. */ > > SEXP DropDims(SEXP x) > { > SEXP dims, dimnames, newnames = R_NilValue; > int i, n, ndims; > > PROTECT(x); > dims = getAttrib(x, R_DimSymbol); > [... SNIP ....] > setAttrib(x, R_DimNamesSymbol, R_NilValue); > setAttrib(x, R_DimSymbol, R_NilValue); > setAttrib(x, R_NamesSymbol, newnames); > [... SNIP ....] > > return x; > } > > Well, at least there's a warning comment with that one. But it does > show .Call allows call by reference. > > Why allow it, though? DropDims could copy x, modify the copy, and return it. > > I wondered why DropDims bothers to return x at all. We seem to be > using modify and return by reference there. > > I also wondered why x is PROTECTED, actually. Its an argument, wasn't > it automatically protected? Is it no longer protected after the > function starts modifying it? > > Here's an example with similar usage in Writing R Extensions, section > 5.10.1 "Calling .Call". It protects the arguments a and b (needed > ??), then changes them. > > #include <R.h> > #include <Rdefines.h> > > SEXP convolve2(SEXP a, SEXP b) > { > R_len_t i, j, na, nb, nab; > double *xa, *xb, *xab; > SEXP ab; > > PROTECT(a = AS_NUMERIC(a)); /* PJ wonders, doesn't this alter > "a" in calling code*/ > PROTECT(b = AS_NUMERIC(b)); > na = LENGTH(a); nb = LENGTH(b); nab = na + nb - 1; > PROTECT(ab = NEW_NUMERIC(nab)); > xa = NUMERIC_POINTER(a); xb = NUMERIC_POINTER(b); > xab = NUMERIC_POINTER(ab); > for(i = 0; i < nab; i++) xab[i] = 0.0; > for(i = 0; i < na; i++) > for(j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j]; > UNPROTECT(3); > return(ab); > } > > > Doesn't > > PROTECT(a = AS_NUMERIC(a)); > > have the alter the data structure "a" both inside the C function and > in the calling R code as well? And, if a was PROTECTED automatically, > could we do without that PROTECT()? > > pj > > -- > Paul E. Johnson > Professor, Political Science Assoc. Director > 1541 Lilac Lane, Room 504 Center for Research Methods > University of Kansas University of Kansas > http://pj.freefaculty.org http://quant.ku.edu > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >