Pavel N. Krivitsky
2012-May-04 17:42 UTC
[Rd] [patch] Behavior of .C() and .Fortran() when given double(0) or integer(0).
Dear R-devel, While tracking down some hard-to-reproduce bugs in a package I maintain, I stumbled on a behavior change between R 2.15.0 and the current R-devel (or SVN trunk). In 2.15.0 and earlier, if you passed an 0-length vector of the right mode (e.g., double(0) or integer(0)) as one of the arguments in a .C() call with DUP=TRUE (the default), the C routine would be passed NULL (the C pointer, not R NULL) in the corresponding argument. The current development version instead passes it a pointer to what appears to be memory location immediately following the the SEXP that holds the metadata for the argument. If the argument has length 0, this is often memory belonging to a different R object. (DUP=FALSE in 2.15.0 appears to have the same behavior as R-devel.) .C() documentation and Writing R Extensions don't explicitly specify a behavior for 0-length vectors, so I don't know if this change is intentional, or whether it was a side-effect of the following news item: .C() and .Fortran() do less copying: arguments which are raw, logical, integer, real or complex vectors and are unnamed are not copied before the call, and (named or not) are not copied after the call. Lists are no longer copied (they are supposed to be used read-only in the C code). Was the change in the empty vector behavior intentional? It seems to me that standardizing on the behavior of giving the C routine NULL is safer, more consistent with other memory-related routines, and more convenient: whereas dereferencing a NULL pointer is an immediate (and therefore easily traced) segfault, dereferencing an invalid pointer that is nevertheless in the general memory area allocated to the program often causes subtle errors down the line; R_alloc asked to allocate 0 bytes returns NULL, at least on my platform; and the C routine can easily check if a pointer is NULL, but with the R-devel behavior, the programmer has to add an explicit way of telling that an empty vector was passed. I've attached a small test case (dotC_NULL.* files) that shows the difference. The C file should be built with R CMD SHLIB, and the R file calls the functions in the library with a variety of arguments. Output I get from running R CMD BATCH --no-timing --vanilla --slave dotC_NULL.R on R 2.15.0, R trunk, and R trunk with my patch (described below) are attached. The attached patch (dotC_NULL.patch) against the current trunk (affecting src/main/dotcode.c) restores the old behavior for DUP=TRUE (i.e., 0-length vector -> NULL pointer) and extends it to the DUP=FALSE case. It does so by checking if an argument --- if it's of mode raw, integer, real, or complex --- to a .C() or .Fortran() call has length 0, and, if so, sets the pointer to be passed to NULL and then skips the copying of the C routine's changes back to the R object for that argument. The additional computing cost should be negligible (i.e., checking if vector length equals 0 and break-ing out of a switch statement if so). The patch appears to work, at least for my package, and R CMD check passes for all recommended packages (on my 64-bit Linux system), but this is my first time working with R's internals, so handle with care. Best, Pavel Krivitsky -------------- next part -------------- R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base R_alloc asked to allocate 1 byte: Pointer to output from R_alloc() of 1 bytes: 0x211c470. Return value: [1] 1 R_alloc asked to allocate 0 bytes: Pointer to output from R_alloc() of 0 bytes: (nil). Return value: [1] 0 Integer vector with 1 element: Pointer to arg: 0x2123b00. Return value: [1] 0 Integer vector with 0 elements: Pointer to arg: (nil). Return value: integer(0) Integer vector with 1 element and DUP=FALSE: Pointer to arg: 0x2132940. Return value: [1] 0 Integer vector with 0 elements and DUP=FALSE: Pointer to arg: 0x2134a80. Return value: integer(0) -------------- next part -------------- R Under development (unstable) (2012-05-04 r59314) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base R_alloc asked to allocate 1 byte: Pointer to output from R_alloc() of 1 bytes: 0x1e56270. Return value: [1] 1 R_alloc asked to allocate 0 bytes: Pointer to output from R_alloc() of 0 bytes: (nil). Return value: [1] 0 Integer vector with 1 element: Pointer to arg: 0x1e60db0. Return value: [1] 0 Integer vector with 0 elements: Pointer to arg: 0x1e75188. Return value: integer(0) Integer vector with 1 element and DUP=FALSE: Pointer to arg: 0x1e6ad90. Return value: [1] 0 Integer vector with 0 elements and DUP=FALSE: Pointer to arg: 0x1e7dc10. Return value: integer(0) -------------- next part -------------- R Under development (unstable) (2012-05-04 r59314) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base R_alloc asked to allocate 1 byte: Pointer to output from R_alloc() of 1 bytes: 0x27495c0. Return value: [1] 1 R_alloc asked to allocate 0 bytes: Pointer to output from R_alloc() of 0 bytes: (nil). Return value: [1] 0 Integer vector with 1 element: Pointer to arg: 0x2754100. Return value: [1] 0 Integer vector with 0 elements: Pointer to arg: (nil). Return value: integer(0) Integer vector with 1 element and DUP=FALSE: Pointer to arg: 0x275e0e0. Return value: [1] 0 Integer vector with 0 elements and DUP=FALSE: Pointer to arg: (nil). Return value: integer(0) -------------- next part -------------- sessionInfo() cat("\n") dyn.load("dotC_NULL.so") run_test<-function(desc,Cfun,args){ cat(desc,"\n") out <- do.call(".C",c(list(Cfun),args)) cat("Return value: ") print(out[[1]]) cat("\n") } run_test("R_alloc asked to allocate 1 byte:", "R_alloc_test",list(nbytes=as.integer(1))) run_test("R_alloc asked to allocate 0 bytes:", "R_alloc_test",list(nbytes=as.integer(0))) run_test("Integer vector with 1 element:", "dotC_NULL",list(arg=integer(1))) run_test("Integer vector with 0 elements:", "dotC_NULL",list(arg=integer(0))) run_test("Integer vector with 1 element and DUP=FALSE:", "dotC_NULL",list(arg=integer(1), DUP=FALSE)) run_test("Integer vector with 0 elements and DUP=FALSE:", "dotC_NULL",list(arg=integer(0), DUP=FALSE)) -------------- next part -------------- A non-text attachment was scrubbed... Name: dotC_NULL.patch Type: text/x-patch Size: 2473 bytes Desc: URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120504/ab68095c/attachment.bin>
Pavel N. Krivitsky
2012-May-06 19:54 UTC
[Rd] [patch] Behavior of .C() and .Fortran() when given double(0) or integer(0).
Oops... Forgot to attach the dotC_NULL.c, the C source file for the test case. Pavel Krivitsky On Fri, 2012-05-04 at 13:42 -0400, Pavel N. Krivitsky wrote:> Dear R-devel, > > While tracking down some hard-to-reproduce bugs in a package I maintain, > I stumbled on a behavior change between R 2.15.0 and the current R-devel > (or SVN trunk). > > In 2.15.0 and earlier, if you passed an 0-length vector of the right > mode (e.g., double(0) or integer(0)) as one of the arguments in a .C() > call with DUP=TRUE (the default), the C routine would be passed NULL > (the C pointer, not R NULL) in the corresponding argument. The current > development version instead passes it a pointer to what appears to be > memory location immediately following the the SEXP that holds the > metadata for the argument. If the argument has length 0, this is often > memory belonging to a different R object. (DUP=FALSE in 2.15.0 > appears to have the same behavior as R-devel.) > > .C() documentation and Writing R Extensions don't explicitly specify a > behavior for 0-length vectors, so I don't know if this change is > intentional, or whether it was a side-effect of the following news item: > > .C() and .Fortran() do less copying: arguments which are raw, > logical, integer, real or complex vectors and are unnamed are not > copied before the call, and (named or not) are not copied after > the call. Lists are no longer copied (they are supposed to be > used read-only in the C code). > > Was the change in the empty vector behavior intentional? > > It seems to me that standardizing on the behavior of giving the C > routine NULL is safer, more consistent with other memory-related > routines, and more convenient: whereas dereferencing a NULL pointer is > an immediate (and therefore easily traced) segfault, dereferencing an > invalid pointer that is nevertheless in the general memory area > allocated to the program often causes subtle errors down the line; > R_alloc asked to allocate 0 bytes returns NULL, at least on my platform; > and the C routine can easily check if a pointer is NULL, but with the > R-devel behavior, the programmer has to add an explicit way of telling > that an empty vector was passed. > > I've attached a small test case (dotC_NULL.* files) that shows the > difference. The C file should be built with R CMD SHLIB, and the R file > calls the functions in the library with a variety of arguments. Output I > get from running > R CMD BATCH --no-timing --vanilla --slave dotC_NULL.R > on R 2.15.0, R trunk, and R trunk with my patch (described below) are attached. > > The attached patch (dotC_NULL.patch) against the current trunk > (affecting src/main/dotcode.c) restores the old behavior for DUP=TRUE > (i.e., 0-length vector -> NULL pointer) and extends it to the DUP=FALSE > case. It does so by checking if an argument --- if it's of mode raw, > integer, real, or complex --- to a .C() or .Fortran() call has length 0, > and, if so, sets the pointer to be passed to NULL and then skips the > copying of the C routine's changes back to the R object for that > argument. The additional computing cost should be negligible (i.e., > checking if vector length equals 0 and break-ing out of a switch > statement if so). > > The patch appears to work, at least for my package, and R CMD check > passes for all recommended packages (on my 64-bit Linux system), but > this is my first time working with R's internals, so handle with care. > > Best, > Pavel Krivitsky > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Pavel N. Krivitsky
2012-May-07 19:16 UTC
[Rd] [patch] Behavior of .C() and .Fortran() when given double(0) or integer(0). --- Missing C file.
Hi, It looks like I didn't forget to attach it after all, but R-devel strips C source code files. Remove the ".txt" from the attached file to compile the test case. Best, Pavel On Fri, 2012-05-04 at 13:42 -0400, Pavel N. Krivitsky wrote:> Dear R-devel, > > While tracking down some hard-to-reproduce bugs in a package I maintain, > I stumbled on a behavior change between R 2.15.0 and the current R-devel > (or SVN trunk). > > In 2.15.0 and earlier, if you passed an 0-length vector of the right > mode (e.g., double(0) or integer(0)) as one of the arguments in a .C() > call with DUP=TRUE (the default), the C routine would be passed NULL > (the C pointer, not R NULL) in the corresponding argument. The current > development version instead passes it a pointer to what appears to be > memory location immediately following the the SEXP that holds the > metadata for the argument. If the argument has length 0, this is often > memory belonging to a different R object. (DUP=FALSE in 2.15.0 > appears to have the same behavior as R-devel.) > > .C() documentation and Writing R Extensions don't explicitly specify a > behavior for 0-length vectors, so I don't know if this change is > intentional, or whether it was a side-effect of the following news item: > > .C() and .Fortran() do less copying: arguments which are raw, > logical, integer, real or complex vectors and are unnamed are not > copied before the call, and (named or not) are not copied after > the call. Lists are no longer copied (they are supposed to be > used read-only in the C code). > > Was the change in the empty vector behavior intentional? > > It seems to me that standardizing on the behavior of giving the C > routine NULL is safer, more consistent with other memory-related > routines, and more convenient: whereas dereferencing a NULL pointer is > an immediate (and therefore easily traced) segfault, dereferencing an > invalid pointer that is nevertheless in the general memory area > allocated to the program often causes subtle errors down the line; > R_alloc asked to allocate 0 bytes returns NULL, at least on my platform; > and the C routine can easily check if a pointer is NULL, but with the > R-devel behavior, the programmer has to add an explicit way of telling > that an empty vector was passed. > > I've attached a small test case (dotC_NULL.* files) that shows the > difference. The C file should be built with R CMD SHLIB, and the R file > calls the functions in the library with a variety of arguments. Output I > get from running > R CMD BATCH --no-timing --vanilla --slave dotC_NULL.R > on R 2.15.0, R trunk, and R trunk with my patch (described below) are attached. > > The attached patch (dotC_NULL.patch) against the current trunk > (affecting src/main/dotcode.c) restores the old behavior for DUP=TRUE > (i.e., 0-length vector -> NULL pointer) and extends it to the DUP=FALSE > case. It does so by checking if an argument --- if it's of mode raw, > integer, real, or complex --- to a .C() or .Fortran() call has length 0, > and, if so, sets the pointer to be passed to NULL and then skips the > copying of the C routine's changes back to the R object for that > argument. The additional computing cost should be negligible (i.e., > checking if vector length equals 0 and break-ing out of a switch > statement if so). > > The patch appears to work, at least for my package, and R CMD check > passes for all recommended packages (on my 64-bit Linux system), but > this is my first time working with R's internals, so handle with care. > > Best, > Pavel Krivitsky > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-------------- next part -------------- #include <R.h> void dotC_NULL(int *arg){ Rprintf("Pointer to arg: %p.\n",(void *) arg); } void R_alloc_test(int *nbytes){ char *p = (char *) R_alloc(*nbytes,sizeof(char)); Rprintf("Pointer to output from R_alloc() of %d byte(s): %p.\n",*nbytes,(void *) p); }
Prof Brian Ripley
2012-May-17 09:46 UTC
[Rd] [patch] Behavior of .C() and .Fortran() when given double(0) or integer(0).
On 04/05/2012 18:42, Pavel N. Krivitsky wrote:> Dear R-devel, > > While tracking down some hard-to-reproduce bugs in a package I maintain, > I stumbled on a behavior change between R 2.15.0 and the current R-devel > (or SVN trunk). > > In 2.15.0 and earlier, if you passed an 0-length vector of the right > mode (e.g., double(0) or integer(0)) as one of the arguments in a .C() > call with DUP=TRUE (the default), the C routine would be passed NULL > (the C pointer, not R NULL) in the corresponding argument. The currentWhere did you get that from? The documentation says it passes an (e.g.) double* pointer to a copy of the data area of the R vector. There is no change in the documented behaviour .... Now, of course a zero-length area can be at any address, but none is stated anywhere that I am aware of.> development version instead passes it a pointer to what appears to be > memory location immediately following the the SEXP that holds the > metadata for the argument. If the argument has length 0, this is often > memory belonging to a different R object. (DUP=FALSE in 2.15.0 > appears to have the same behavior as R-devel.) > > .C() documentation and Writing R Extensions don't explicitly specify a > behavior for 0-length vectors, so I don't know if this change is > intentional, or whether it was a side-effect of the following news item: > > .C() and .Fortran() do less copying: arguments which are raw, > logical, integer, real or complex vectors and are unnamed are not > copied before the call, and (named or not) are not copied after > the call. Lists are no longer copied (they are supposed to be > used read-only in the C code). > > Was the change in the empty vector behavior intentional? > > It seems to me that standardizing on the behavior of giving the C > routine NULL is safer, more consistent with other memory-related > routines, and more convenient: whereas dereferencing a NULL pointer is > an immediate (and therefore easily traced) segfault, dereferencing anThat's not true, in general.> invalid pointer that is nevertheless in the general memory area > allocated to the program often causes subtle errors down the line; > R_alloc asked to allocate 0 bytes returns NULL, at least on my platform;Again, undocumented and should not be relied on.> and the C routine can easily check if a pointer is NULL, but with the > R-devel behavior, the programmer has to add an explicit way of telling > that an empty vector was passed.It's no different from any other vector length: it is easy for careless programmers to read/write off the ends of the allocated area, and this is why in R-devel we have an option to check for that (and of course also what valgrind is good at finding in an instrumented version of R).> I've attached a small test case (dotC_NULL.* files) that shows the > difference. The C file should be built with R CMD SHLIB, and the R file > calls the functions in the library with a variety of arguments. Output I > get from running > R CMD BATCH --no-timing --vanilla --slave dotC_NULL.R > on R 2.15.0, R trunk, and R trunk with my patch (described below) are attached. > > The attached patch (dotC_NULL.patch) against the current trunk > (affecting src/main/dotcode.c) restores the old behavior for DUP=TRUE > (i.e., 0-length vector -> NULL pointer) and extends it to the DUP=FALSE > case. It does so by checking if an argument --- if it's of mode raw, > integer, real, or complex --- to a .C() or .Fortran() call has length 0, > and, if so, sets the pointer to be passed to NULL and then skips the > copying of the C routine's changes back to the R object for that > argument. The additional computing cost should be negligible (i.e., > checking if vector length equals 0 and break-ing out of a switch > statement if so). > > The patch appears to work, at least for my package, and R CMD check > passes for all recommended packages (on my 64-bit Linux system), but > this is my first time working with R's internals, so handle with care.That's easy: we will not be changing this. In particular, the new checks I refer to above rely on passing the address of an in-process memory area with guard bytes.> Best, > Pavel Krivitsky > > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Pavel N. Krivitsky
2012-Jul-01 21:17 UTC
[Rd] [patch] Behavior of .C() and .Fortran() when given double(0) or integer(0).
On Sat, 2012-05-26 at 14:15 -0500, Dirk Eddelbuettel wrote:> On 26 May 2012 at 14:00, Simon Urbanek wrote: > | [...] the real answer is use .Call() instead. > > Maybe Kurt could add something to that extent to the R FAQ ?Since it looks like the 0-length -> invalid pointer behavior is here to stay, I want to second this request. It had taken me a long time to track mysterious behavior of my package down to this issue, and I think it would be helpful for developers in the future to have some documented behavior, even if the documentation said that the behavior was undefined. Thanks, Pavel Krivitsky
Apparently Analagous Threads
- [patch] Behavior of .C() and .Fortran() when given double(0) or integer(0) (repost).
- Checking for user interrupt in a .C() call without without triggering a non-local exit.
- .C and DUP=TRUE versus .Call
- Submitting packages with weak circular dependencies to CRAN.
- Augment base::replace(x, list, value) to allow list= to be a predicate?