Michael Lewis
2013-Aug-02  16:26 UTC
[LLVMdev] Assorted notes on garbage collection with LLVM
Hi all, I've been working recently on a precise garbage collector which runs alongside native code JITted by LLVM. Today marks the first time the GC has passed its entire test suite as well as extensive soak tests in non-trivial programs. It's been an interesting and educational process, to say the least, and I've run into quite a few things that might be useful to know for others tackling similar projects. I've compiled my notes on the subject here for posterity: https://code.google.com/p/epoch-language/wiki/GarbageCollectionScheme I'd be happy to field any questions about the adventure as well, although that probably belongs off-list. Thanks, - Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130802/a64d1f53/attachment.html>
Thanks for sharing. If there is anything which you feel would be relevant to add to our our official documentation, please feel free to send patches (or new documents <http://llvm.org/docs/SphinxQuickstartTemplate.html> if you have a lot to say). -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130802/a9d7ad96/attachment.html>
Very good to see developments in LLVM+GC! I wrote a high-level virtual machine with accurate garbage collection (supporting multiple user threads) back in 2008. Fearing LLVMs GC support at that time, I used a naïve shadow stack, pushing local references as they appeared (from function arguments, load instructions and allocations) onto a thread-local stack and batch popping at function returns by resetting the shadow stack pointer. I put a GC safe point at every functions entry point. HLVM prohibits loops (uses recursion instead with tail call elimination) so no other GC safe points are needed. Thanks to this very simple design, the single threaded implementation was about 20 lines of code and the multithreading-capable one was about 100 lines of code. My benchmarks showed that absolute performance was very good (better than Java, MLton, OCaml and Haskell) and, in fact, just 15% slower than C++ in this case: http://flyingfrogblog.blogspot.co.uk/2010/01/hlvm-on-ray-tracer-language-com parison.html and scalability was also very good: http://flyingfrogblog.blogspot.co.uk/2010/01/naive-parallelism-with-hlvm.htm l I believe your approach is potentially faster but substantially more complicated so I would regard it as an optimization over the very simple approach that I used. Therefore, it would be very interesting and valuable see the performance advantages of your approach. Have you done any benchmarks? Cheers, Jon. From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Michael Lewis Sent: 02 August 2013 17:27 To: LLVMdev at cs.uiuc.edu Subject: [LLVMdev] Assorted notes on garbage collection with LLVM Hi all, I've been working recently on a precise garbage collector which runs alongside native code JITted by LLVM. Today marks the first time the GC has passed its entire test suite as well as extensive soak tests in non-trivial programs. It's been an interesting and educational process, to say the least, and I've run into quite a few things that might be useful to know for others tackling similar projects. I've compiled my notes on the subject here for posterity: https://code.google.com/p/epoch-language/wiki/GarbageCollectionScheme I'd be happy to field any questions about the adventure as well, although that probably belongs off-list. Thanks, - Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130802/e4bd7f1c/attachment.html>
Manuel Jacob
2013-Aug-02  22:55 UTC
[LLVMdev] Assorted notes on garbage collection with LLVM
Hi Mike, On 2013-08-02 18:26, Michael Lewis wrote:> I've been working recently on a precise garbage collector which runs > alongside native code JITted by LLVM. Today marks the first time the > GC has passed its entire test suite as well as extensive soak tests in > non-trivial programs.I'm glad to hear that you solved the issues you described in your last mail. How did you solve them? Was that because of the stack-coloring bug? There are some things in the code linked to in the Wiki page I do not understand yet. Where do the negative stack offsets come from? The stack root walking code in my project doesn't need to check for negative stack offsets. What are bookmarks? What's the purpose of the TriggerGarbageCollection function? Have you considered using a hash map in GarbageWorker() that maps safe points to lists of GCRoots? I think this would simplify the code and reduce the overhead of stack root walking. Are you interested in improving the code generator's support for garbage collection? I posted some thoughts in a mail with the Subject "New ideas about how to improve garbage collection support". That would also solve the stack-coloring bug. -Manuel
Michael Lewis
2013-Aug-03  03:31 UTC
[LLVMdev] Assorted notes on garbage collection with LLVM
On Fri, Aug 2, 2013 at 1:28 PM, Sean Silva <chisophugis at gmail.com> wrote:> Thanks for sharing. If there is anything which you feel would be relevant to > add to our our official documentation, please feel free to send patches (or > new documents <http://llvm.org/docs/SphinxQuickstartTemplate.html> if you > have a lot to say).Thanks - I'll set aside some time over the weekend to put together some docs. On Fri, Aug 2, 2013 at 3:18 PM, Jon Harrop <jonathandeanharrop at googlemail.com> wrote:> I believe your approach is potentially faster but substantially more > complicated so I would regard it as an optimization over the very simple > approach that I used. Therefore, it would be very interesting and valuable > see the performance advantages of your approach. Have you done any > benchmarks?I've done some *very* quick and dirty tests, but I'm still honing in on getting my frontend to generate good IR, so that the optimizer passes can really work their magic. There's still also a lot of low-hanging fruit to optimize in the GC itself. I have a basic raytracer (funny that you chose a similar test!) which renders a shaded sphere floating over a plane, with simple shadowing and lighting. On my current hardware, the tracer runs at 600x600 pixels and roughly 19 frames per second (~52 ms per frame) with garbage collection fully enabled. A comparable implementation in C++ runs at about 23 FPS, so something like a 16% difference in speed. Keep in mind that the Epoch language implementation has a lot of work yet to go - I'm fully confident I can narrow that margin appreciably by introducing key features to the language like greater control over memory layout and locality of reference. I also want to build in some heuristics to the Epoch runtime to avoid triggering GC stack/heap traversals as often as they are currently done. That should gain a reasonable amount of performance by itself. Implementing a moving/compacting collector and some other goodies should go even further (since I can do cheap allocations instead of basically relying on malloc under the hood), and once the optimizations are really cranking, I think I can get to within a couple percent of my Visual C++ build of the raytracer. On Fri, Aug 2, 2013 at 3:55 PM, Manuel Jacob <me at manueljacob.de> wrote:> I'm glad to hear that you solved the issues you described in your last > mail. How did you solve them? Was that because of the stack-coloring bug?As it turns out almost all of my headaches were caused by the stack-coloring issue. The basic approach described on the Epoch wiki page seems to work fine as long as stack slots are not merged.> Where do the negative stack offsets come from? The stack root walking code > in my project doesn't need to check for negative stack offsets.This is interesting to hear... all I know is that LLVM hands me negative offsets for some emitted functions. I have not yet ascertained the cause for this.> What are bookmarks?Bookmarks are a hack to get around FPO optimizations. Suppose we have a call stack that looks like this: Epoch function A [FPO disabled] External function 1 [FPO enabled] External function 2 [FPO enabled] Epoch function B [FPO disabled] Epoch function C [FPO disabled] If we trigger GC in function A, we have to walk the stack, but FPO can potentially cause the two stack frames for the external code to "vanish" - depending on how the optimization is performed, we might end up getting a frame pointer that points into C instead of B. The bookmark system just records that we "exited" the Epoch-controlled stack frame in function B, during the invocation of External Function 2. When A goes to run GC, it verifies that B's stack frame is walked regardless of where the FPO-mangled stack leads us. Since B has FPO disabled, we know it will walk correctly into C's stack frame.> What's the purpose of the TriggerGarbageCollection function?For verification, the Epoch GC doesn't behave like a standard production GC, that is, it will collect aggressively instead of only when under memory pressure, or generationally, etc. TriggerGarbageCollection is a simple shim invoked from JITted Epoch code which causes a collection cycle to occur. I also use this shim to capture the current stack pointer and hand it off to the garbage worker routines. This is the best way I could think of to capture the stack pointer accurately without relying on weird hacks like taking the address of a local variable, etc.> Have you considered using a hash map in GarbageWorker() that maps safe > points to lists of GCRoots? I think this would simplify the code and reduce > the overhead of stack root walking.Yep! This is in fact one of the next optimizations I plan on tackling. It should be a solid win both in terms of code cleanliness and performance, as you observed.> Are you interested in improving the code generator's support for garbage > collection? I posted some thoughts in a mail with the Subject "New ideas > about how to improve garbage collection support". That would also solve the > stack-coloring bug.I'd love to work on getting GC faster and more accessible, but I'm afraid the internals of the code generator are still a bit out of scope of my personal project (which is the development of a novel language). Since I only have so much free time to spare, it's a tough call whether to invest that time in LLVM or in my own project. That said, I would be more than happy to contribute back to the LLVM project as soon as I am confident that I have something of value to offer. For now the best I can provide is some guidance on how to use the existing facilities; I'd need to spend a lot more time in the LLVM core to feel comfortable suggesting changes there. - Mike