Hello llvm-dev! My colleages and I are currently evaluating llvm's suitability as a JIT compiler interfacing with a precise, relocating garbage collector. While we couldn't find code or writeups that deal with the issues specific to this design goal, it is entirely possible that we may have missed something; we would appreciate references to relevant code or writeups that people on this list may be aware of. As an example, one issue that makes this non-trivial is that llvm (as far as we know) is free to manufacture pointers to locations _inside_ objects, something referred to as a "derived pointer" in some places. Since these pointers need to be updated in sync with the objects they point into, a relocating GC needs to be aware of them; and the runtime needs to be able to read off which registers and stack slots hold such pointers at every safepoint. We've looked into llvm's existing GC support, and the mechanism it provides does not seem to help in this use case. This email is deliberately terse, but we are more than happy to get into details about the approaches we've considered as the discussion progresses. Pointers to existing work related to this or similar issues is especially welcome. Thanks! -- Sanjoy
Rafael Espíndola
2013-Oct-24 21:50 UTC
[LLVMdev] Interfacing llvm with a precise, relocating GC
On 24 October 2013 17:32, Sanjoy Das <sanjoy at azulsystems.com> wrote:> Hello llvm-dev! > > My colleages and I are currently evaluating llvm's suitability as a > JIT compiler interfacing with a precise, relocating garbage collector. > While we couldn't find code or writeups that deal with the issues > specific to this design goal, it is entirely possible that we may have > missed something; we would appreciate references to relevant code or > writeups that people on this list may be aware of.This would be hard. Currently what we have support for is a non-moving GC where all the roots are in memory. Adding support for a non-moving gc with register roots would not be too hard and might be possible to reuse some of the recent stackmap work. For a moving GC you would probably have to change how we represent pointer arithmetic in the selection dag and MI. It would be quiet a big change. CCIng Andy and Patrick since they might have an idea of how much work that would be and what the costs and benefits for LLVM are. Also to note is that there are plans to move away from selection dag, so it might be good to sync this work with whatever we end up using instead. Cheers, Rafael
Andrew Trick
2013-Oct-24 21:56 UTC
[LLVMdev] Interfacing llvm with a precise, relocating GC
On Oct 24, 2013, at 2:50 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:> On 24 October 2013 17:32, Sanjoy Das <sanjoy at azulsystems.com> wrote: >> Hello llvm-dev! >> >> My colleages and I are currently evaluating llvm's suitability as a >> JIT compiler interfacing with a precise, relocating garbage collector. >> While we couldn't find code or writeups that deal with the issues >> specific to this design goal, it is entirely possible that we may have >> missed something; we would appreciate references to relevant code or >> writeups that people on this list may be aware of. > > > This would be hard. Currently what we have support for is a non-moving > GC where all the roots are in memory. Adding support for a non-moving > gc with register roots would not be too hard and might be possible to > reuse some of the recent stackmap work. > > For a moving GC you would probably have to change how we represent > pointer arithmetic in the selection dag and MI. It would be quiet a > big change. CCIng Andy and Patrick since they might have an idea of > how much work that would be and what the costs and benefits for LLVM > are.100% agreement.> Also to note is that there are plans to move away from selection dag, > so it might be good to sync this work with whatever we end up using > instead.FYI: when this was talked about, I heard mention that GEPs should be lowered early in the IR->MI pipeline. I didn’t hear any ideas that would make derived pointer tracking easier. -Andy
Philip Reames
2013-Oct-25 18:50 UTC
[LLVMdev] Interfacing llvm with a precise, relocating GC
On 10/24/13 2:50 PM, Rafael Espíndola wrote:> On 24 October 2013 17:32, Sanjoy Das <sanjoy at azulsystems.com> wrote: >> Hello llvm-dev! >> >> My colleages and I are currently evaluating llvm's suitability as a >> JIT compiler interfacing with a precise, relocating garbage collector. >> While we couldn't find code or writeups that deal with the issues >> specific to this design goal, it is entirely possible that we may have >> missed something; we would appreciate references to relevant code or >> writeups that people on this list may be aware of. > > This would be hard. Currently what we have support for is a non-moving > GC where all the roots are in memory. Adding support for a non-moving > gc with register roots would not be too hard and might be possible to > reuse some of the recent stackmap work.Agreed, I think all the mechanisms are either in tree already, or shortly to be so. There's still a fair amount of work required to make it all work together, but the task seems approachable.> For a moving GC you would probably have to change how we represent > pointer arithmetic in the selection dag and MI. It would be quiet a > big change. CCIng Andy and Patrick since they might have an idea of > how much work that would be and what the costs and benefits for LLVM > are. > > Also to note is that there are plans to move away from selection dag, > so it might be good to sync this work with whatever we end up using > instead.Ouch. I hadn't realized that GEPs were desugared to integer arithmetic that early. That does seem like it would be a problem. Thank you for pointing this out. Assuming we had a scheme to avoid/solve this specific issue, are you aware of any other ones? Can you point me to any previous discussion of the selection dag migration? This isn't a topic I usually follow on the list. A quick search didn't find the conversations you're referencing. Philip
Sanjoy: This document which I wrote several years ago may be of some use to you: Building a Stack Crawler in LLVM<https://docs.google.com/document/d/1-ws0KYo47S0CgqpwkjfWDBJ8wFhW_0UYKxPIJ0TyKrQ/edit?usp=sharing&authkey=COD8_LcL> I have successfully implemented a copying collector using LLVM. I did not implement support for interior pointers, however I have a number of ideas on how to approach it. The language that I was implementing was similar to C# in that classes were divided into "reference" types and "value" types. A pointer to a reference type on the heap always pointed to the start of an allocation, whereas a pointer to a value type was always an interior pointer. (In other words, a value type could never exist by itself in the heap, it had to be embedded within some reference type). This means that the compiler could always know whether a pointer was an internal pointer or not. Thus, pointers to reference types were just machine-level pointers, whereas pointers to value types consisted of a pointer+offset, with the pointer part pointing to the start of a heap allocation. Combined with the fact that pointers to value types were relatively rare, this allowed internal pointers with a minimum of overhead. Now, having said all that, I feel compelled to give a few warnings. Part of the reason I abandoned this project was because of limitations in LLVM's garbage collection intrinsics, which I have written about extensively on this list. The current llvm.gcroot strategy requires the frontend to be very complex, generate highly inefficient code, and that code is mostly unoptimizable since LLVM's optimizers generally won't touch a value that has been marked as a GC root. Worse, the support for GC in the LLVM community is fairly low - the garbage collection intrinsics in LLVM have not been updated or improved in the 7 years of my following the project, and there's been very little discussion of GC on the mailing list (I do a search for the word "collect" in the LLVM archives about once a month, which is how this thread came to my attention.) Most of the people working on/with LLVM are working with non-GC languages, or with languages that have simple enough memory models (e.g. "everything is an atom") that the existing intrinsics are sufficient. There are also a few people who have gotten around the problems by defining their own stack frames instead of using the LLVM intrinsics. There have been numerous proposals over the years for better GC intrinsics, but nothing has come out of these discussions so far. There's a good reason for this: improving the GC support would require a major commitment, since all of the backend code generators and optimizers would potentially be affected. My current favorite GC proposal involves annotating types - that is, to define a new kind of derived type that is essentially a 2-tuple consisting of a base type + GC metadata (the second argument to llvm.gcroot). This means that "root-ness" would be a property of a type rather than a value, which means that the rootness could automatically be propagated to intermediate values or SSA values through optimization without the frontend having to do a lot of spilling and reloading of values. (Plus having the ability to associate metadata with types might be useful for other things besides GC.) On Thu, Oct 24, 2013 at 2:32 PM, Sanjoy Das <sanjoy at azulsystems.com> wrote:> Hello llvm-dev! > > My colleages and I are currently evaluating llvm's suitability as a > JIT compiler interfacing with a precise, relocating garbage collector. > While we couldn't find code or writeups that deal with the issues > specific to this design goal, it is entirely possible that we may have > missed something; we would appreciate references to relevant code or > writeups that people on this list may be aware of. > > As an example, one issue that makes this non-trivial is that llvm (as > far as we know) is free to manufacture pointers to locations _inside_ > objects, something referred to as a "derived pointer" in some places. > Since these pointers need to be updated in sync with the objects they > point into, a relocating GC needs to be aware of them; and the runtime > needs to be able to read off which registers and stack slots hold such > pointers at every safepoint. We've looked into llvm's existing GC > support, and the mechanism it provides does not seem to help in this > use case. > > This email is deliberately terse, but we are more than happy to get > into details about the approaches we've considered as the discussion > progresses. Pointers to existing work related to this or similar > issues is especially welcome. > > Thanks! > > -- Sanjoy > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131028/6d4966f4/attachment.html>
Hi Talin, Thank you for your response! Since you mention that you don't implement derived pointers, I'm a little confused about how your approach solves the issue I brought up. It seems to me that unless you plan on pinning some objects, support for derived pointers isn't optional. I'm not talking about derived pointers that the front-end introduces (which can be tracked) but the ones that are introduced by llvm during IR level optimizations or, worse, during and after instruction selection. To be sure that we're talking about the same thing, as an example, consider the loop (in pseudo llvm IR): for (int i = 0 to (length - 1)) { total += a[i]; safepoint(); // This thread may be stopped here, and `a' // may be relocated. } llvm can transform this loop to for (int *i = &a[0] to &a[length - 1]) { total += *i; safepoint(); // llvm has now introduced an additional // derived pointer, `i'. }>From llvm's point of view this is a valid transformation; but it endsup creating a new pointer the GC has to be aware of, and needs to be relocated in sync with a. Thanks! -- Sanjoy
Apparently Analagous Threads
- [LLVMdev] Interfacing llvm with a precise, relocating GC
- [LLVMdev] Interfacing llvm with a precise, relocating GC
- [LLVMdev] Interfacing llvm with a precise, relocating GC
- [LLVMdev] Interfacing llvm with a precise, relocating GC
- [LLVMdev] Interfacing llvm with a precise, relocating GC