Dirk Eddelbuettel
2016-Dec-20 16:56 UTC
[Rd] Request: Increasing MAX_NUM_DLLS in Rdynload.c
On 20 December 2016 at 17:40, Martin Maechler wrote: | >>>>> Steve Bronder <sbronder at stevebronder.com> | >>>>> on Tue, 20 Dec 2016 01:34:31 -0500 writes: | | > Thanks Henrik this is very helpful! I will try this out on our tests and | > see if gcDLLs() has a positive effect. | | > mlr currently has tests broken down by learner type such as classification, | > regression, forecasting, clustering, etc.. There are 83 classifiers alone | > so even when loading and unloading across learner types we can still hit | > the MAX_NUM_DLLS error, meaning we'll have to break them down further (or | > maybe we can be clever with gcDLLs()?). I'm CC'ing Lars Kotthoff and Bernd | > Bischl to make sure I am representing the issue well. | | This came up *here* in May 2015 | and then May 2016 ... did you not find it when googling. | | Hint: Use | site:stat.ethz.ch MAX_NUM_DLLS | as search string in Google, so it will basically only search the | R mailing list archives | | Here's the start of that thread : | | https://stat.ethz.ch/pipermail/r-devel/2016-May/072637.html | | There was not a clear conclusion back then, notably as | Prof Brian Ripley noted that 100 had already been an increase | and that a large number of loaded DLLs decreases look up speed. | | OTOH (I think others have noted that) a large number of DLLs | only penalizes those who *do* load many, and we should probably | increase it. | | Your use case of "hyper packages" which load many others | simultaneously is somewhat convincing to me... in so far as the | general feeling is that memory should be cheap and limits should | not be low. | | (In spite of Brian Ripleys good reasons against it, I'd still | aim for a *dynamic*, i.e. automatically increased list here). Yes. Start with 10 or 20, add 10 as needed. Still fast in the 'small N' case and no longer a road block for the 'big N' case required by mlr et al. As a C++ programmer, I am now going to hug my std::vector and quietly retreat. Dirk | Martin Maechler | | > Regards, | | > Steve Bronder | > Website: stevebronder.com | > Phone: 412-719-1282 | > Email: sbronder at stevebronder.com | | | > On Tue, Dec 20, 2016 at 1:04 AM, Henrik Bengtsson < | > henrik.bengtsson at gmail.com> wrote: | | >> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some | >> packages don't unload their DLLs when they being unloaded themselves. | >> In other words, there may be left-over DLLs just sitting there doing | >> nothing but occupying space. You can remove these, using: | >> | >> R.utils::gcDLLs() | >> | >> Maybe that will help you get through your tests (as long as you're | >> unloading packages). gcDLLs() will look at base::getLoadedDLLs() and | >> its content and compare to loadedNamespaces() and unregister any | >> "stray" DLLs that remain after corresponding packages have been | >> unloaded. | >> | >> I think it would be useful if R CMD check would also check that DLLs | >> are unregistered when a package is unloaded | >> (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), but of | >> course, someone needs to write the code / a patch for this to happen. | >> | >> /Henrik | >> | >> On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder | >> <sbronder at stevebronder.com> wrote: | >> > This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to | >> 500. | >> > | >> > On line 131 of Rdynload.c, changing | >> > | >> > #define MAX_NUM_DLLS 100 | >> > | >> > to | >> > | >> > #define MAX_NUM_DLLS 500 | >> > | >> > | >> > In development of the mlr package, there have been several episodes in | >> the | >> > past where we have had to break up unit tests because of the "maximum | >> > number of DLLs reached" error. This error has been an inconvenience that | >> is | >> > going to keep happening as the package continues to grow. Is there more | >> > than meets the eye with this error or would everything be okay if the | >> above | >> > line changes? Would that have a larger effect in other parts of R? | >> > | >> > As R grows, we are likely to see more 'meta-packages' such as the | >> > Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded | >> at | >> > any point in time to conduct effective unit tests. If MAX_NUM_DLLS is | >> set | >> > to 100 for a very particular reason than I apologize, but if it is | >> possible | >> > to increase MAX_NUM_DLLS it would at least make the testing at mlr much | >> > easier. | >> > | >> > I understand you are all very busy and thank you for your time. | >> > | >> > | >> > Regards, | >> > | >> > Steve Bronder | >> > Website: stevebronder.com | >> > Phone: 412-719-1282 | >> > Email: sbronder at stevebronder.com | >> > | >> > [[alternative HTML version deleted]] | >> > | >> > ______________________________________________ | >> > R-devel at r-project.org mailing list | >> > https://stat.ethz.ch/mailman/listinfo/r-devel | >> | | > [[alternative HTML version deleted]] | | > ______________________________________________ | > R-devel at r-project.org mailing list | > https://stat.ethz.ch/mailman/listinfo/r-devel | | ______________________________________________ | R-devel at r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-devel -- http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Hi, Dirk: On 12/20/2016 10:56 AM, Dirk Eddelbuettel wrote:> On 20 December 2016 at 17:40, Martin Maechler wrote: > | >>>>> Steve Bronder <sbronder at stevebronder.com> > | >>>>> on Tue, 20 Dec 2016 01:34:31 -0500 writes: > | > | > Thanks Henrik this is very helpful! I will try this out on our tests and > | > see if gcDLLs() has a positive effect. > | > | > mlr currently has tests broken down by learner type such as classification, > | > regression, forecasting, clustering, etc.. There are 83 classifiers alone > | > so even when loading and unloading across learner types we can still hit > | > the MAX_NUM_DLLS error, meaning we'll have to break them down further (or > | > maybe we can be clever with gcDLLs()?). I'm CC'ing Lars Kotthoff and Bernd > | > Bischl to make sure I am representing the issue well. > | > | This came up *here* in May 2015 > | and then May 2016 ... did you not find it when googling. > | > | Hint: Use > | site:stat.ethz.ch MAX_NUM_DLLS > | as search string in Google, so it will basically only search the > | R mailing list archives > | > | Here's the start of that thread : > | > | https://stat.ethz.ch/pipermail/r-devel/2016-May/072637.html > | > | There was not a clear conclusion back then, notably as > | Prof Brian Ripley noted that 100 had already been an increase > | and that a large number of loaded DLLs decreases look up speed. > | > | OTOH (I think others have noted that) a large number of DLLs > | only penalizes those who *do* load many, and we should probably > | increase it. > | > | Your use case of "hyper packages" which load many others > | simultaneously is somewhat convincing to me... in so far as the > | general feeling is that memory should be cheap and limits should > | not be low. > | > | (In spite of Brian Ripleys good reasons against it, I'd still > | aim for a *dynamic*, i.e. automatically increased list here). > > Yes. Start with 10 or 20, add 10 as needed. Still fast in the 'small N' > case and no longer a road block for the 'big N' case required by mlr et al. > > As a C++ programmer, I am now going to hug my std::vector and quietly retreat.May I humbly request a translation of "std::vector" for people like me who are not familiar with C++? I got the following: > install.packages('std') Warning in install.packages : package ?std? is not available (for R version 3.3.2) Thanks, Spencer Graves> > Dirk > > > | Martin Maechler > | > | > Regards, > | > | > Steve Bronder > | > Website: stevebronder.com > | > Phone: 412-719-1282 > | > Email: sbronder at stevebronder.com > | > | > | > On Tue, Dec 20, 2016 at 1:04 AM, Henrik Bengtsson < > | > henrik.bengtsson at gmail.com> wrote: > | > | >> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some > | >> packages don't unload their DLLs when they being unloaded themselves. > | >> In other words, there may be left-over DLLs just sitting there doing > | >> nothing but occupying space. You can remove these, using: > | >> > | >> R.utils::gcDLLs() > | >> > | >> Maybe that will help you get through your tests (as long as you're > | >> unloading packages). gcDLLs() will look at base::getLoadedDLLs() and > | >> its content and compare to loadedNamespaces() and unregister any > | >> "stray" DLLs that remain after corresponding packages have been > | >> unloaded. > | >> > | >> I think it would be useful if R CMD check would also check that DLLs > | >> are unregistered when a package is unloaded > | >> (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), but of > | >> course, someone needs to write the code / a patch for this to happen. > | >> > | >> /Henrik > | >> > | >> On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder > | >> <sbronder at stevebronder.com> wrote: > | >> > This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to > | >> 500. > | >> > > | >> > On line 131 of Rdynload.c, changing > | >> > > | >> > #define MAX_NUM_DLLS 100 > | >> > > | >> > to > | >> > > | >> > #define MAX_NUM_DLLS 500 > | >> > > | >> > > | >> > In development of the mlr package, there have been several episodes in > | >> the > | >> > past where we have had to break up unit tests because of the "maximum > | >> > number of DLLs reached" error. This error has been an inconvenience that > | >> is > | >> > going to keep happening as the package continues to grow. Is there more > | >> > than meets the eye with this error or would everything be okay if the > | >> above > | >> > line changes? Would that have a larger effect in other parts of R? > | >> > > | >> > As R grows, we are likely to see more 'meta-packages' such as the > | >> > Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded > | >> at > | >> > any point in time to conduct effective unit tests. If MAX_NUM_DLLS is > | >> set > | >> > to 100 for a very particular reason than I apologize, but if it is > | >> possible > | >> > to increase MAX_NUM_DLLS it would at least make the testing at mlr much > | >> > easier. > | >> > > | >> > I understand you are all very busy and thank you for your time. > | >> > > | >> > > | >> > Regards, > | >> > > | >> > Steve Bronder > | >> > Website: stevebronder.com > | >> > Phone: 412-719-1282 > | >> > Email: sbronder at stevebronder.com > | >> > > | >> > [[alternative HTML version deleted]] > | >> > > | >> > ______________________________________________ > | >> > R-devel at r-project.org mailing list > | >> > https://stat.ethz.ch/mailman/listinfo/r-devel > | >> > | > | > [[alternative HTML version deleted]] > | > | > ______________________________________________ > | > R-devel at r-project.org mailing list > | > https://stat.ethz.ch/mailman/listinfo/r-devel > | > | ______________________________________________ > | R-devel at r-project.org mailing list > | https://stat.ethz.ch/mailman/listinfo/r-devel >
See inlin ?e? On Tue, Dec 20, 2016 at 12:14 PM, Spencer Graves < spencer.graves at prodsyse.com> wrote:> Hi, Dirk: > > > > On 12/20/2016 10:56 AM, Dirk Eddelbuettel wrote: > >> On 20 December 2016 at 17:40, Martin Maechler wrote: >> | >>>>> Steve Bronder <sbronder at stevebronder.com> >> | >>>>> on Tue, 20 Dec 2016 01:34:31 -0500 writes: >> | >> | > Thanks Henrik this is very helpful! I will try this out on our >> tests and >> | > see if gcDLLs() has a positive effect. >> | >> | > mlr currently has tests broken down by learner type such as >> classification, >> | > regression, forecasting, clustering, etc.. There are 83 >> classifiers alone >> | > so even when loading and unloading across learner types we can >> still hit >> | > the MAX_NUM_DLLS error, meaning we'll have to break them down >> further (or >> | > maybe we can be clever with gcDLLs()?). I'm CC'ing Lars Kotthoff >> and Bernd >> | > Bischl to make sure I am representing the issue well. >> | >> | This came up *here* in May 2015 >> | and then May 2016 ... did you not find it when googling. > > | >> | Hint: Use >> | site:stat.ethz.ch MAX_NUM_DLLS >> | as search string in Google, so it will basically only search the >> | R mailing list archives >> > ?I did not know this and apologize. I starred this email so I can use itnext time I have a question or request. I did find (and left a comment) on the stackoverflow question in which you left an answer to this question. http://stackoverflow.com/a/37021455/2269255> | >> | Here's the start of that thread : >> | >> | >> ?? >> ?? >> https://stat.ethz.ch/pipermail/r-devel/2016-May/072637.html >> | >> | There was not a clear conclusion back then, notably as >> | Prof Brian Ripley noted that 100 had already been an increase >> | and that a large number of loaded DLLs decreases look up speed. > > | >> | OTOH (I think others have noted that) a large number of DLLs >> | only penalizes those who *do* load many, and we should probably >> | increase it. >> > ?Am I correct in understanding that the decrease in lookup speed onlyhappens when a large number of DLLs are loaded? If so, this is an expected cost to having many DLLs and one that I, and I would guess other developers, would be willing to pay to have more DLLs available. If increasing MAX_NUM_DLLS would increase R's fixed memory footprint a significant amount then I think that's a reasonable argument against the increase in MAX_NUM_DLLS. ?> | >> | Your use case of "hyper packages" which load many others >> | simultaneously is somewhat convincing to me... in so far as the >> | general feeling is that memory should be cheap and limits should >> | not be low. >> > ?It should also be pointed out that even in the case of "hyper packages"like mlr, this is only an issue during unit testing. I wonder if there is some middle ground here? Would it be difficult to have a compile flag that would change the number of MAX_NUM_DLLS when compiling R from source? I believe this would allow us to increase MAX_NUM_DLLS when testing in Travis and Jenkins while keeping the same footprint for regular users.?> | >> | (In spite of Brian Ripleys good reasons against it, I'd still >> | aim for a *dynamic*, i.e. automatically increased list here). >> >> Yes. Start with 10 or 20, add 10 as needed. Still fast in the 'small N' >> case and no longer a road block for the 'big N' case required by mlr et >> al. >> > ?This would be nice! Though my concern is the R-core team's time. This isthe best answer, but I don't feel comfortable requesting it because I can't help with this and do not want to take up R-core's time without a very significant reason.? ?Unit testing for a meta-package is a particular case, though I think an important one which will impact R over the long term. The answers from least to most complex are something like: 1. Do nothing 2. Increase MAX_NUM_DLLS 3. Compiler flag for MAX_NUM_DLLS ( I actually have no reference to how difficult this would be) 4. Change to dynamic loading I'm requesting (2) because I think it's a simple short term answer until someone has time to sit down and work out (4).?> >> As a C++ programmer, I am now going to hug my >> ?? >> std::vector and quietly retreat. >> > > > May I humbly request a translation of "std::vector" for people like me who > are not familiar with C++? > > > I got the following: > > > > install.packages('std') > Warning in install.packages : > package ?std? is not available (for R version 3.3.2) > > > Thanks, > Spencer Graves > > >> Dirk >> >> | Martin Maechler >> | >> | > Regards, >> | >> | > Steve Bronder >> | > Website: stevebronder.com >> | > Phone: 412-719-1282 >> | > Email: sbronder at stevebronder.com >> | >> | >> | > On Tue, Dec 20, 2016 at 1:04 AM, Henrik Bengtsson < >> | > henrik.bengtsson at gmail.com> wrote: >> | >> | >> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because >> some >> | >> packages don't unload their DLLs when they being unloaded >> themselves. >> | >> In other words, there may be left-over DLLs just sitting there >> doing >> | >> nothing but occupying space. You can remove these, using: >> | >> >> | >> R.utils::gcDLLs() >> | >> >> | >> Maybe that will help you get through your tests (as long as >> you're >> | >> unloading packages). gcDLLs() will look at >> base::getLoadedDLLs() and >> | >> its content and compare to loadedNamespaces() and unregister any >> | >> "stray" DLLs that remain after corresponding packages have been >> | >> unloaded. >> | >> >> | >> I think it would be useful if R CMD check would also check that >> DLLs >> | >> are unregistered when a package is unloaded >> | >> (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), >> but of >> | >> course, someone needs to write the code / a patch for this to >> happen. >> | >> >> | >> /Henrik >> | >> >> | >> On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder >> | >> <sbronder at stevebronder.com> wrote: >> | >> > This is a request to increase MAX_NUM_DLLS in Rdynload.c in >> from 100 to >> | >> 500. >> | >> > >> | >> > On line 131 of Rdynload.c, changing >> | >> > >> | >> > #define MAX_NUM_DLLS 100 >> | >> > >> | >> > to >> | >> > >> | >> > #define MAX_NUM_DLLS 500 >> | >> > >> | >> > >> | >> > In development of the mlr package, there have been several >> episodes in >> | >> the >> | >> > past where we have had to break up unit tests because of the >> "maximum >> | >> > number of DLLs reached" error. This error has been an >> inconvenience that >> | >> is >> | >> > going to keep happening as the package continues to grow. Is >> there more >> | >> > than meets the eye with this error or would everything be okay >> if the >> | >> above >> | >> > line changes? Would that have a larger effect in other parts >> of R? >> | >> > >> | >> > As R grows, we are likely to see more 'meta-packages' such as >> the >> | >> > Hadley-verse, caret, mlr, etc. need an increasing amount of >> DLLs loaded >> | >> at >> | >> > any point in time to conduct effective unit tests. If >> MAX_NUM_DLLS is >> | >> set >> | >> > to 100 for a very particular reason than I apologize, but if >> it is >> | >> possible >> | >> > to increase MAX_NUM_DLLS it would at least make the testing at >> mlr much >> | >> > easier. >> | >> > >> | >> > I understand you are all very busy and thank you for your time. >> | >> > >> | >> > >> | >> > Regards, >> | >> > >> | >> > Steve Bronder >> | >> > Website: stevebronder.com >> | >> > Phone: 412-719-1282 >> | >> > Email: sbronder at stevebronder.com >> | >> > >> | >> > [[alternative HTML version deleted]] >> | >> > >> | >> > ______________________________________________ >> | >> > R-devel at r-project.org mailing list >> | >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> | >> >> | >> | > [[alternative HTML version deleted]] >> | >> | > ______________________________________________ >> | > R-devel at r-project.org mailing list >> | > https://stat.ethz.ch/mailman/listinfo/r-devel >> | >> | ______________________________________________ >> | R-devel at r-project.org mailing list >> | https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >- ? Steve Bronder? [[alternative HTML version deleted]]