Roman Levenstein
2008-May-17 16:48 UTC
[LLVMdev] Forward: Discussion about custom memory allocators for STL
Hi, There is a discussion thread on llvm-commits list about a possibility of using custom memory allocators for STL to improve the performance and reduce the memory pressure of STL containers, e.g. std::set. I thought that this discussion may be interesting for a wider audience and therefore I re-post the messages on llvm-dev as well. It would be interesting to hear what others think about - custom allocators, - reasons for current inefficiency of many STL containers on Evan's/Chris configs and - possibility of using 3rd party libs like Boost (or their parts) in LLVM Thanks, - Roman ----- Forwarded Mail ----> From: Roman Levenstein <romix.llvm at googlemail.com> > To: CVS Commit Messages for LLVM repository <llvm-commits at cs.uiuc.edu> > Send: Friday, 16. May 2008, 19:20:29 > Subject: Re: [llvm-commits] Speeding up RegAllocLinearScan on big test-cases > > Hi, > > 2008/5/7 Evan Cheng : > > Can we hook up the llvm pool allocator to std::set and use it for the > > register allocator? It's simple and it made a huge difference on Mac > > OS X when we switched all LiveInterval VNInfo allocations to it. > > > > Evan > > Yes. We can hook up the llvm pool allocator to std::set. I have a working > implementation. > > > > On May 7, 2008, at 1:24 AM, Roman Levenstein wrote: > > > > > Hi, > > > > > > 2008/5/7 Bill Wendling : > > >> On Tue, May 6, 2008 at 3:31 PM, Evan Cheng > > >> wrote: > > >>> On May 6, 2008, at 6:13 AM, Roman Levenstein > > >> > > >> > > >>>> But one thing about std::set that could be eventually interesting > > >>>> at > > >>>> many places > > >>>> is the following: > > >>>> - in many situations we know the maximal size of the set in > > >>>> advance. > > >>>> For example, > > >>>> in this patch, the set can contain at most all live intervals. In > > >>>> the scheduler, > > >>>> the availableQueue can contain at most all SUnits. It means that > > >>>> if we would > > >>>> be able to allocate the memory for the maximum possible number of > > >>>> elements > > >>>> in advance, then there is no need for any additional memory > > >>>> allocation. > > >>>> > > >>>> - I think, a custom STL allocator could be written that could do > > >>>> exactly this. It would > > >>>> reserve memory for the maximum number of elements (of the equal > > >>>> size?)and > > >>>> maintain a free list of cells. Then we can have a very efficient > > >>>> allocation and sets > > >>>> that do no produce to much malloc/free pressure. The same idea can > > >>>> be used > > >>>> also for some other STL containers. > > >>>> > > >>>> What do you think? > > >>> > > >>> I am not sure. I have little experience with custom STL allocators. > > >>> Perhaps others will chime in. For now, using std::set is fine. > > >>> > > >> I created custom allocators before. They aren't that bad. You just > > >> have to get the functionality correct (the allocators I created > > >> called > > >> a malloc function that already had this functionality). The major > > >> caveat is that if you don't have a well-tested memory manager > > >> available, this can lead to nasty bugs. I would stick with std::set > > >> unless it becomes a major albatross around our necks. > > > > > > I have a lot of experience with custom allocators. I extensively used > > > them in the optimizing C compiler that I have written few years ago > > > for an embedded target. Initially I was using simple malloc/free and > > > new/delete. Then I moved to GC, but at the end I switched to > > > custom memory allocators. I can only save, that they had very positive > > > impact on the performance of my compiler. > > > > > > Custom memory allocators are not such a black art, as it may seem at > > > first glance, actually. And there are quite well proven allocators. > > > Usually, they > > > are not that complex and it is rather easy to see if they are correct. > > > Normally, they are used for node-based STL containers or for most > > > typical nodes created by the compiler (e.g. SDNode, SDOperand, > > > SUnit, etc). > > > And they can really speed-up things, e.g. if they use pool > > > allocation or > > > segregated storage or if they free all the objects at once. > > > For example, imagine that we use such an allocator for SUnit nodes. > > > Then > > > it may reserve memory for N SUnit objects and very quickly allocate > > > it. > > > Once scheduling is over, all such objects can be freed at once, just > > > by > > > cleaning/deleting the custom allocator. > > > > > > I'm currently experimenting with 4-5 different custom STL allocators > > > I have found on the Internet. Once I have representative figures to > > > compare them again STL's standard allocators and after I cleaned up the > code, > > > I'll report about it to this mailing list. > > OK. So, I've tested 6 allocators. One of them is the standard STL > allocator. Other allocators > we found by me on the Internet and made STL-compliant via a special > templatized adapter class. > I don't want to go in detail at this moment, since the code is not > quite polished yet, but I'd > mention that bump_allocator is an STL-compliant version of LLVM's pool > allocator. > > My test checks their performance for allocating node-based containers, > i.e. std::list and std::set. > I try to insert 1000000 nodes into the list and set using different allocators. > While doing it, I observe the following picture: > > ***Tests with *** > Sort (ss_allocator):0.517465 > Sort (fsb_allocator):0.693605 > Sort (bump_allocator):0.5398639999 > Sort (fastalloc_allocator):0.5254200001 > Sort (boost_fastalloc_allocator):0.520713 > Sort (default allocator):0.631207 > > ***Tests with *** > Insertion (ss_allocator):0.8642740001 > Insertion (fsb_allocator):0.932031 > Insertion (bump_allocator):0.9571639999 > Insertion (fast_allocator):0.950616 > Insertion (boost_fast_allocator):0.9666030001 > Insertion (default_allocator):1.210076 > > So, we can see, that performance-wise the difference is not that huge. > But if we look at the number of new/delete calls, then it is quite different: > 1) without STL standard allocator - Total of only 847(!!!) mallocs for > all of the allocators together, while adding 1000000 nodes for each > of them. > 2) with STL standard allocator - Total of 2000628 mallocs for all of > the allocators together, while adding 1000000 nodes for each of them. > > So, the standard allocator of STL produces a huge number of > new/delete calls. And other allocators reduce it > by almost 4 orders of magnitude. But, as mentioned before, it DOES NOT > result in a big performance difference on > my Ubuntu/Linux/x86 machine, which indicates that mallocs are very > efficient here. But for you it seems to be very different... > > So the question is: why does STL allocator perform so poorly on > PowerPC that you seem to use? > Could it be that the malloc or STL implementation of your OS/compiler > is particularly bad? > Would it be possible for you to try using a custom malloc > implementation that would replace that > system malloc? E.g. could you try to link LLVM with Doug Lea's malloc, > which is a standard implementation on > the Linux/x86 systems? Or do you have other explanations for this? > > BTW, boost seems to provide a nice pool allocator. And it is > "production grade" and "bullet-proof" compared > to many others. Would it be too bad, if LLVM would use it? This does > not mean that the whole boost should > be available. There is a special tool provided with boost that > extracts only those subsets of it, that are required. > And boost has a BSD-like license. > > -Roman > _______________________________________________ > llvm-commits mailing list > llvm-commits at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits__________________________________________________________ Gesendet von Yahoo! Mail. Dem pfiffigeren Posteingang. http://de.overview.mail.yahoo.com
Jon Harrop
2008-May-17 17:27 UTC
[LLVMdev] Forward: Discussion about custom memory allocators for STL
On Saturday 17 May 2008 17:48:58 Roman Levenstein wrote:> It would be interesting to hear what others think about > - custom allocators,Unless you have very strong evidence that a specific change will provide considerable benefits without introducing problems anywhere else, I would strongly suggest forgetting about custom STL allocators and focussing on something more productive (i.e. features rather than optimizations). If you do go ahead with it, make sure it can be removed easily and keep checking that it remains worthwhile as the project develops. One of the last things I did before I ditched C++ completely was to waste a lot of time Greenspunning garbage collectors in the form of custom STL allocators in a last desperate attempt to get adequate performance out of C++. My efforts were completely fruitless. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?e
Barry Kelly
2008-May-17 17:34 UTC
[LLVMdev] Forward: Discussion about custom memory allocators for STL
Roman Levenstein <romixlev at yahoo.com> wrote:> There is a discussion thread on llvm-commits list about a possibility of using custom memory allocators for STL to improve the performance and reduce the memory pressure of STL containers, e.g. std::set. > I thought that this discussion may be interesting for a wider audience and therefore I re-post the messages on llvm-dev as well. > > It would be interesting to hear what others think about > - custom allocators,"Normal" memory allocators are a serious pessimization for typical compiler usage patterns and will waste a fair amount of time and a non-trivial percentage of space pointlessly keeping track of and freeing tiny lumps of memory. Compilers tend to (1) incrementally allocate largeish chunks of temporary storage for the duration of a deep, probably recursive, call tree, and want to free it all in one go at the end (with this pattern being itself recursive), and (2) allocate memory with affinity for a particular compilation unit, assuming they are doing as much as possible (parsing, compiling, assembling, linking, etc.) in memory for efficiency, such that it can in turn be freed en-masse when that compilation unit is finally through. Pattern (1) wants a stack-like allocator, with an API resembling { void *Allocate(size_t); size_t Mark(); void Reset(size_t mark); void Free(); }. Pattern (2) wants something more simple: { void *Allocate(int heapID, size_t); void Free(int heapID); } (assuming a non-OO API; eliminate heapID if the thing is an object). Pattern (2) can be implemented in terms of a generalized pattern (1). -- Barry -- http://barrkel.blogspot.com/
Kenneth Boyd
2008-May-17 18:59 UTC
[LLVMdev] Forward: Discussion about custom memory allocators for STL
Roman Levenstein wrote:> Hi, > > There is a discussion thread on llvm-commits list about a possibility of using custom memory allocators for STL to improve the performance and reduce the memory pressure of STL containers, e.g. std::set. > I thought that this discussion may be interesting for a wider audience and therefore I re-post the messages on llvm-dev as well. > > It would be interesting to hear what others think about > - custom allocators, >Only as a build option, and must include performance comparisons against the default allocator in automated testing.> - reasons for current inefficiency of many STL containers on Evan's/Chris configs and > - possibility of using 3rd party libs like Boost (or their parts) in LLVM >I'd consider forking from Boost at least: * static asserts in conjunction with the type traits library and std::numeric_limits * concept_check * the enable_if library allowing conditional definition of templates, again in conjunction with the type traits library and std:::numeric_limits. I believe GCC uses a forked version of Boost for the first two points, in its STL. (At least, the comments suggest this.) Kenneth Boyd
Dominic Hamon
2008-May-18 01:28 UTC
[LLVMdev] Forward: Discussion about custom memory allocators for STL
Roman Levenstein wrote:> - possibility of using 3rd party libs like Boost (or their parts) in LLVM >There is a thread elsewhere on this mailing list illustrating how important it is for the maintainers of LLVM to keep LLVM usable in a commercial environment. As such, I would strongly recommend avoiding Boost as it has a bad name in some quarters, regardless of its license, for including work that is not safe for commercial users to take on. Ie, there are so many contributors, and their contribution tracking has been poor in the past, that business affairs departments in commercial companies, and their associated patent lawyers, are unable to determine how much of Boost is truly the authors' own work, and how much is borrowed. Given the delicate relationship between the commercial sector and open source, adding Boost usage to LLVM will harm the commercial sectors view of this product. Dominic
Óscar Fuentes
2008-May-18 02:05 UTC
[LLVMdev] Forward: Discussion about custom memory allocators for STL
Dominic Hamon <dom.hamon at gmail.com> writes: [snip]> Boost as it has a bad name in some quarters, regardless of its license, > for including work that is not safe for commercial users to take on. Ie, > there are so many contributors, and their contribution tracking has been > poor in the past, that business affairs departments in commercial > companies, and their associated patent lawyers, are unable to determine > how much of Boost is truly the authors' own work, and how much is borrowed.Isn't this applicable to LLVM itself? Just asking. [snip] -- Oscar
me22
2008-May-18 02:41 UTC
[LLVMdev] Forward: Discussion about custom memory allocators for STL
On Sat, May 17, 2008 at 9:28 PM, Dominic Hamon <dom.hamon at gmail.com> wrote:> There is a thread elsewhere on this mailing list illustrating how > important it is for the maintainers of LLVM to keep LLVM usable in a > commercial environment. As such, I would strongly recommend avoiding > Boost as it has a bad name in some quarters, regardless of its license, > for including work that is not safe for commercial users to take on. Ie, > there are so many contributors, and their contribution tracking has been > poor in the past, that business affairs departments in commercial > companies, and their associated patent lawyers, are unable to determine > how much of Boost is truly the authors' own work, and how much is borrowed. > > Given the delicate relationship between the commercial sector and open > source, adding Boost usage to LLVM will harm the commercial sectors view > of this product. >Do you have a source for that opinion? I've never heard that view of Boost.
Chris Lattner
2008-May-18 04:38 UTC
[LLVMdev] Forward: Discussion about custom memory allocators for STL
On May 17, 2008, at 9:48 AM, Roman Levenstein wrote:> There is a discussion thread on llvm-commits list about a > possibility of using custom memory allocators for STL to improve the > performance and reduce the memory pressure of STL containers, e.g. > std::set. > I thought that this discussion may be interesting for a wider > audience and therefore I re-post the messages on llvm-dev as well. > > It would be interesting to hear what others think about > - custom allocators,I think custom allocators can be a great thing, as long as they are specific to instances of containers, not a general replacement for malloc/free. That way they can be used on demand when profiling shows they are important. Ideally, I would be able to do something like: MyRegion R; std:set<x> (R); std::map<y,z> (R); etc, and have the nodes for both containers share the same pool. We already have some things like this that we use in clang, check out llvm/support/allocators.h. It is not the standard STL allocator interface though, we need a simple adaptor.> - reasons for current inefficiency of many STL containers on Evan's/ > Chris configs andThe reason is pretty obvious when you know what to look for. On the mac, std::allocator<t> just calls malloc/free. On linux and apparently everywhere else, std::allocator<T> is implemented in terms of a strange pool allocator. [IMO, the mac way of doing things is the right way to go, but we can discuss philosophy elsewhere. Here we will focus on the ramifications of this reality] The end result of this is that std::set/map on the mac ends up calling malloc/free for each node in the red/black tree, and it allocates a metric butt load of nodes (one per each item inserted). Further, each query of the set incurs traversals of the tree. In the case of the mac, because malloc/free is used, the allocations are spread all throughout the heap and traversing a tree touches all sorts of random pages of memory in very cache-unfriendly ways. I think the performance delta of mac vs linux is primarily due to this locality issue, not the raw performance of malloc on the two systems. On linux (for example) the custom implementation of std::set happens to get lucky, and generally gives better locality than mapping directly to malloc/free, just because only a very small subset of the objects on the heap go through std::allocator (f.e. none of the SDNodes or LLVM IR objects go through std::allocator). This implicit partitioning of the heap leads to accidental improved performance, but the performance is still pretty bad if you do have multiple node-based containers live at once (though that is rare in llvm). I did a whole phd thesis about partitioning memory objects, so I know that partitioning data structures into their own regions can be a big win :). It just happens that std::allocator on linux gets lucky on this for LLVM, since we generally have few STL containers live at once (and when we beat on them, we beat on them heavily).> - possibility of using 3rd party libs like Boost (or their parts) in > LLVMWe have already imported pieces of boost. The general rule is that it is ok to use specific pieces, but they have to be converted to the LLVM style and way of doing things. Also, things with huge chains of dependencies sometimes have to be simplified. -Chris
Apparently Analagous Threads
- [LLVMdev] Forward: Discussion about custom memory allocators for STL
- [LLVMdev] Forward: Discussion about custom memory allocators for STL
- [LLVMdev] Forward: Discussion about custom memory allocators for STL
- [LLVMdev] Forward: Discussion about custom memory allocators for STL
- [LLVMdev] Forward: Discussion about custom memory allocators for STL