Mikael Jagan
2022-Sep-02 12:51 UTC
[Rd] R_check_class_etc(x, valid) is "slow" when 'valid' contains class(x)
On 2022-09-02 5:30 am, Tomas Kalibera wrote:> Hi Mikael, > > On 8/28/22 01:13, Mikael Jagan wrote: >> R_check_class_etc(x, valid) spends a nontrivial amount of time finding >> an environment 'rho' containing the definition of class(x), evaluating >> (in R, not C) methods::.classEnv(class(x)). >> >> It then returns the result of R_check_class_and_super(x, valid, rho). >> But R_check_class_and_super() does not use 'rho' at all in the trivial >> case where class(x) is found in 'valid'. > right, that could be improved. Do you have an example which exhibits the > problem, have you found this by profiling something?I benchmarked .Call("R_etc", x) for x <- new("dgTMatrix"), after compiling and loading the following: ``` #include <Rinternals.h> SEXP R_etc(SEXP x) { static const char *valid[] = {"dgCMatrix", "dgRMatrix", "dgTMatrix", ""}; return ScalarInteger(R_check_class_etc(x, valid)); } ``` R-devel used ~2 microseconds while my patched version of R used ~0.2 microseconds. So, not a bottleneck by any means. On the other hand, 'Matrix' and probably other packages do call R_check_class_etc() quite often, hence my report.>> >> My feeling is that this can be improved.? I am happy to contribute a patch, >> if it would be considered by R-core. > > Both R_check_class_etc and R_check_class_and_super are unfortunately exported, > the former is used a lot in packages (even though they are not mentioned in > Writing R Extensions, so actually shouldn't be used in packages). Anyway, it > would be easier if we could preserve their interface and behavior. > > Maybe we could support rho==NULL in R_check_class_and_super, the environment > would be looked up in that case when needed. R_check_class_etc would simply only > call R_check_class_and_super with that argument. I see that > R_check_class_and_super uses asChar() on the class attribute, while > R_check_class_etc does not currently for looking for the environment, but I > assume doing that in both cases should not matter (and it would have to be tested). > > So this would be a trivial change, but if you wanted to create a minimal patch, > I will be happy to have a look.I've implemented essentially what you've described; see attached. Mikael> > Best > Tomas > > >> Mikael >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel-------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch.diff URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20220902/7e17c4be/attachment.ksh>
Tomas Kalibera
2022-Sep-08 13:23 UTC
[Rd] R_check_class_etc(x, valid) is "slow" when 'valid' contains class(x)
On 02/09/2022 14:51, Mikael Jagan wrote:> > > On 2022-09-02 5:30 am, Tomas Kalibera wrote: >> Hi Mikael, >> >> On 8/28/22 01:13, Mikael Jagan wrote: >>> R_check_class_etc(x, valid) spends a nontrivial amount of time finding >>> an environment 'rho' containing the definition of class(x), evaluating >>> (in R, not C) methods::.classEnv(class(x)). >>> >>> It then returns the result of R_check_class_and_super(x, valid, rho). >>> But R_check_class_and_super() does not use 'rho' at all in the trivial >>> case where class(x) is found in 'valid'. >> right, that could be improved. Do you have an example which exhibits >> the problem, have you found this by profiling something? > I benchmarked .Call("R_etc", x) for x <- new("dgTMatrix"), after > compiling > and loading the following: > > ``` > #include <Rinternals.h> > > SEXP R_etc(SEXP x) { > ??? static const char *valid[] = {"dgCMatrix", "dgRMatrix", > "dgTMatrix", ""}; > ??? return ScalarInteger(R_check_class_etc(x, valid)); > } > ``` > > R-devel used ~2 microseconds while my patched version of R used ~0.2 > microseconds.? So, not a bottleneck by any means.? On the other hand, > 'Matrix' and probably other packages do call R_check_class_etc() quite > often, hence my report.I see, ideally one would have a real scenario with a real performance issue leading to this. With a micro-benchmark, one could come up with pretty much arbitrary speedup, so it is harder to judge whether an optimization is actually worth the effort, the risk of introducing bugs and sometimes the added complexity.>>> >>> My feeling is that this can be improved.? I am happy to contribute a >>> patch, >>> if it would be considered by R-core. >> >> Both R_check_class_etc and R_check_class_and_super are unfortunately >> exported, the former is used a lot in packages (even though they are >> not mentioned in Writing R Extensions, so actually shouldn't be used >> in packages). Anyway, it would be easier if we could preserve their >> interface and behavior. >> >> Maybe we could support rho==NULL in R_check_class_and_super, the >> environment would be looked up in that case when needed. >> R_check_class_etc would simply only call R_check_class_and_super with >> that argument. I see that R_check_class_and_super uses asChar() on >> the class attribute, while R_check_class_etc does not currently for >> looking for the environment, but I assume doing that in both cases >> should not matter (and it would have to be tested). >> >> So this would be a trivial change, but if you wanted to create a >> minimal patch, I will be happy to have a look. > > I've implemented essentially what you've described; see attached.Thanks, I've restored the original behavior with pkg == R_NilValue (note rho was R_GlobalEnv), I've used C NULL to mark the special mode when "rho" should be found, and I made some small tweaks. I will add after more testing, if the results are ok. Best Tomas> > Mikael > >> >> Best >> Tomas >> >> >>> Mikael >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel