Andrew Piskorski
2014-Apr-18 13:55 UTC
[Rd] Why did R 3.0's resolveNativeRoutine remove full-search ability?
In versions of R prior to 3.0, by default .C and .Call would find the requested C function regardless of which shared library it was located in. You could use the PACKAGE argument to restrict the search to a specific library, but doing so was not necessary for it to work. R 3.0 introduced a significant change to that behavior; from the NEWS file: CHANGES IN R 3.0.0: PERFORMANCE IMPROVEMENTS: * A foreign function call (.C() etc) in a package without a PACKAGE argument will only look in the first DLL specified in the NAMESPACE file of the package rather than searching all loaded DLLs. A few packages needed PACKAGE arguments added. That is not merely a performance improvement, it is a significant change in functionality. Now, when R code in my package foo tries to call C code located in bar.so, it fails with a "not resolved from current namespace (foo)" error. It works if I change all my uses of .C and .Call to pass a PACKAGE="bar" argument. Ok, I can make that change in my code, no big deal. What surprises me though, is that there appears to be no way to invoke the old (and very conventional Unix-style), "I don't want to specify where the function is located, just keep searching until you find it" behavior. Is there really no way to do that, and if so, why not? Comparing the R sources on the 3.1 vs. 2.15 branches, it looks as if this is due to some simple changes to resolveNativeRoutine in "src/main/dotcode.c". Specifically, the newer code adds this: errorcall(call, "\"%s\" not resolved from current namespace (%s)", buf, ns); And removes these lines: /* need to continue if the namespace search failed */ *fun = R_FindSymbol(buf, dll.DLLname, symbol); if (*fun) return args; Is that extra call to R_FindSymbol really all that's necessary to invoke the old "keep searching" behavior? Would it be a good idea to provide an optional way of finding a native routine regardless of where it's located, perhaps via an optional PACKAGE=NA argument to .C, .Call, etc.? And now I see that help(".Call") says: 'PACKAGE = ""' used to be accepted (but was undocumented): it is now an error. I assume passing PACKAGE="" used to invoke the same "keep searching" behavior as not passing any PACKAGE argument at all. So apparently the removal of functionality was intentional. I'd like to better understand why. Why should that be an error? Or said another way, why has traditional Unix-style symbol resolution been banned from use with .C and .Call ? -- Andrew Piskorski <atp at piskorski.com>
Simon Urbanek
2014-Apr-18 14:29 UTC
[Rd] Why did R 3.0's resolveNativeRoutine remove full-search ability?
Andrew, On Apr 18, 2014, at 9:55 AM, Andrew Piskorski <atp at piskorski.com> wrote:> In versions of R prior to 3.0, by default .C and .Call would find the > requested C function regardless of which shared library it was located > in. You could use the PACKAGE argument to restrict the search to a > specific library, but doing so was not necessary for it to work. > > R 3.0 introduced a significant change to that behavior; from the NEWS > file: > > CHANGES IN R 3.0.0: > PERFORMANCE IMPROVEMENTS: > * A foreign function call (.C() etc) in a package without a PACKAGE > argument will only look in the first DLL specified in the > NAMESPACE file of the package rather than searching all loaded > DLLs. A few packages needed PACKAGE arguments added. > > That is not merely a performance improvement, it is a significant > change in functionality. Now, when R code in my package foo tries to > call C code located in bar.so, it fails with a "not resolved from > current namespace (foo)" error. It works if I change all my uses of > .C and .Call to pass a PACKAGE="bar" argument. Ok, I can make that > change in my code, no big deal. > > What surprises me though, is that there appears to be no way to invoke > the old (and very conventional Unix-style), "I don't want to specify > where the function is located, just keep searching until you find it" > behavior. Is there really no way to do that, and if so, why not? > > Comparing the R sources on the 3.1 vs. 2.15 branches, it looks as if > this is due to some simple changes to resolveNativeRoutine in > "src/main/dotcode.c". Specifically, the newer code adds this: > > errorcall(call, "\"%s\" not resolved from current namespace (%s)", > buf, ns); > > And removes these lines: > > /* need to continue if the namespace search failed */ > *fun = R_FindSymbol(buf, dll.DLLname, symbol); > if (*fun) return args; > > Is that extra call to R_FindSymbol really all that's necessary to > invoke the old "keep searching" behavior? Would it be a good idea to > provide an optional way of finding a native routine regardless of > where it's located, perhaps via an optional PACKAGE=NA argument to .C, > .Call, etc.? > > And now I see that help(".Call") says: > > 'PACKAGE = ""' used to be accepted (but was undocumented): it is > now an error. > > I assume passing PACKAGE="" used to invoke the same "keep searching" > behavior as not passing any PACKAGE argument at all. So apparently > the removal of functionality was intentional. I'd like to better > understand why. Why should that be an error? Or said another way, > why has traditional Unix-style symbol resolution been banned from use > with .C and .Call ? >I cannot speak for the author, but a very strong argument is to prevent (symbol) namespace issues. If you cannot even say where the symbol comes from, you have absolutely no way of knowing that the symbol you get has anything to do with the symbol you intended to get, because you could get any random symbol in any shared object that may or may not have anything to do with your code. Note that even you as the author of the code have no control over the namespace so although you intended this to work, loading some other package can break your code - and in a fatal manner since this will typically lead to a segfault. Do you have any strong use case for allowing this given how dangerous it is? Ever since symbol registration has been made easy, it's much more efficient and safe to use symbols directly instead. Cheers, Simon