Chris Lattner wrote:> On Feb 26, 2009, at 12:02 AM, Talin wrote: > >> With the increasing >> number of LLVM-based VMs and other projects, I suspect that the desire >> for more comprehensive garbage collection support in LLVM is only >> going >> to increase. >> > > Absolutely! > > >> Part of the reason why there isn't more direct support for GC is the >> theory that there is no such thing as a one-size-fits-all collector. >> The >> argument goes that a really efficient collector design requires >> detailed >> knowledge of the object model of the language being compiled. >> > > Yes, you do need to have some knowledge about the object model. > However, it would be perfectly reasonable for LLVM to support and > include multiple different collectors for different classes of language. > > >> On the other hand, it is possible to make a counter-argument to this >> theory that goes like this: The Java VM has been used to implement a >> large number of front-end languages efficiently, without requiring a >> special garbage collector for each language. >> > > Most importantly to me, the takeaway from Java is that just having > something that works "well enough" is really important and helps > bootstrap a lot of projects, which can then take the "last 10% of > performance" as an optimization opportunity, instead of being blocked > from even starting with LLVM. > > I'd claim that JavaVM really isn't a good way to implement a lisp vm > or something like that. However, the perf delta induced by the Java > VM may just *not matter* in the big picture. At least with LLVM, a > Lisp implementation could be brought up on an "OOP GC" and switched to > something more custom as the project develops. > > >> It also seems to me that even radically different collector designs >> could utilize some common building blocks for heap management, work >> queuing, and so on. >> > > Yes. > > >> Of course, there is always a danger when creating libraries of the >> "ivory tower syndrome", putting a lot of effort into components that >> don't actually get used. This is why it would be even better to >> create a >> standard, high performance collector for LLVM that actually uses these >> methods. <many good thoughts trimmed> >> > > What you see in LLVM right now is really only the second step of the > planned GC evolution. The first step was very minimal, but useful for > bridging to other existing collectors. The second step was Gordon's > (significant!) extensions to the system which allowed him to tie in > the Ocaml collector and bring some more sanity to codegen. > > While people object to adding high level features to LLVM, high level > and language-specific features are *great* in llvm as long as they are > cleanly separable. I would *love* to see a composable collection of > different GC subroutines with clean interfaces built on LLVM "assembly > language" GC stuff. > > In my ideal world, this would be: > > 1. Subsystems [with clean interfaces] for thread management, > finalization, object model interactions, etc. > 2. Within different high-level designs (e.g. copying, mark/sweep, etc) > there can be replaceable policy components etc. > 3. A couple of actual GC implementations built on top of #1/2. > Ideally there would only be a couple of high-level collectors that can > be parameterized by replacing subsystems and policies. > 4. A very simple language implementation that uses the facilities, on > the order of complexity as the kaleidoscope tutorial. > > As far as I know, there is nothing that prevents this from happening > today, we just need leadership in the area to drive it. To avoid the > "ivory tower" problem, I'd strongly recommend starting with a simple > GC and language and get the whole thing working top to bottom. From > there, the various pieces can be generalized out etc. This ensures > that there is always a *problem being solved* and something that works > and is testable. > > One of the annoying reasons that the GC stuff is only halfway fleshed > out is that I was working on an out of tree project (which of course > got forgotten about when I left) when developing the GC intrinsics, so > there is no fully working example in public. > > -Chris > > ps. Code generation for the GC intrinsics can be improved > significantly. We can add new intrinsics that don't pin things to the > stack, update optimizations, and do many other things if people > started using the GC stuff seriously. >So I guess what I would be helpful for me is a roadmap that defines more clearly (a) what parts you plan to build in LLVM (beyond what is already there), (b) what parts you would like to have contributed, and (c) what parts you definitely want to keep external. In particular, I'd like to get a clearer picture of the shapes of the various pieces and their roles. For example, I mentioned the "stop the world" function - however since LLVM defines no primitives for creating threads or synchronizing between them, its hard to see how this could be part of LLVM proper. On the other hand, a sibling project (like vmkit or clang) could probably make a more restrictive set of assumptions, such as the existence of either POSIX or Windows threading models or some analog of those being available. "Stop the world" is implementable in terms of those threading primitives if you assume that mutator threads use sync points. Thus, it seems to me that the proper home for such a function would be in a sibling project outside of LLVM proper. At the same time, however, core LLVM could benefit from having an implementation of this, in that it could guide the design of the IR by providing more concrete use cases for things like inserting sync points in generated code and such.> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Hi Talin, First of, thanks for generating such an interesting discussion on garbage collection!> For example, I mentioned the "stop the world" function - however since > LLVM defines no primitives for creating threads or synchronizing between > them, its hard to see how this could be part of LLVM proper.So LLVM should not be aware of "stop the world" stuff. It's really a matter of your collector. What LLVM should be aware is the position of safe points in the code. That's the case with Gordon's GCStrategy class. However, to implement "stop the world", runtimes should be able to add runtime-specific code at these safe points. If you could, then adding a cooperative, stop-the-world GC would be straightforward. Just poll a variable during the safe-point. When a collection is required, the collector updates that variable and waits for all threads to poll it. After Gordon's *fantastic* work on GC interfaces in LLVM, I think that's the one thing missing in the design. The other intrinsics let you generate code that should be GC-independent and only need GC implementations. So you can start right away writing your collector(s) :)> On the > other hand, a sibling project (like vmkit or clang) could probably make > a more restrictive set of assumptions, such as the existence of either > POSIX or Windows threading models or some analog of those being > available. "Stop the world" is implementable in terms of those threading > primitives if you assume that mutator threads use sync points. >That's right!> Thus, it seems to me that the proper home for such a function would be > in a sibling project outside of LLVM proper. At the same time, however, > core LLVM could benefit from having an implementation of this, in that > it could guide the design of the IR by providing more concrete use cases > for things like inserting sync points in generated code and such. >There is a runtime directory in llvm, maybe that's where memory and threading examples could be added? However, a real-world GC is no piece of cake and way out of the scope of LLVM. So I like the idea of a sibling project. The runtime directory should just contain simple implementations. Nicolas
Nicolas Geoffray wrote:>> For example, I mentioned the "stop the world" function - however since >> LLVM defines no primitives for creating threads or synchronizing between >> them, its hard to see how this could be part of LLVM proper. > > So LLVM should not be aware of "stop the world" stuff. It's really a > matter of your collector. What LLVM should be aware is the position of > safe points in the code. That's the case with Gordon's GCStrategy class. > However, to implement "stop the world", runtimes should be able to add > runtime-specific code at these safe points. If you could, then adding a > cooperative, stop-the-world GC would be straightforward. Just poll a > variable during the safe-point. When a collection is required, the > collector updates that variable and waits for all threads to poll it.Yes, I agree. That's exactly what we need for Shark, which uses Sun's Java VM. Andrew.
On Mar 2, 2009, at 05:20, Nicolas Geoffray wrote:> So LLVM should not be aware of "stop the world" stuff. It's really a > matter of your collector. What LLVM should be aware is the position > of safe points in the code. That's the case with Gordon's GCStrategy > class. However, to implement "stop the world", runtimes should be > able to add runtime-specific code at these safe points. If you > could, then adding a cooperative, stop-the-world GC would be > straightforward. Just poll a variable during the safe-point. When a > collection is required, the collector updates that variable and > waits for all threads to poll it. > > After Gordon's *fantastic* work on GC interfaces in LLVM, I think > that's the one thing missing in the design.Generous of you, Nicolas. :) There are actually several blocking features missing, and injecting code at safe points is one of them. http://llvm.org/docs/GarbageCollection.html#collector-algos Major outstanding work that I think blocks entire classes of users from started: - Emit code at safe points, as you observed. This is necessary for supporting threading, as you point out. - Add safe points for loops that do not otherwise have them. Prevent (for (;;) { }) from blocking collection. - Hooks for injecting stack maps into the JIT. Something akin to the AsmWriter's GC printer registry. SSA values as roots, liveness analysis, register maps, and derived pointer maps are code quality improvements.> The other intrinsics let you generate code that should be GC- > independent and only need GC implementations. So you can start right > away writing your collector(s) :)Definitely so! — Gordon
On Feb 27, 2009, at 9:29 PM, Talin wrote:>> > So I guess what I would be helpful for me is a roadmap that defines > more > clearly (a) what parts you plan to build in LLVM (beyond what is > already > there), (b) what parts you would like to have contributed, and (c) > what > parts you definitely want to keep external. In particular, I'd like to > get a clearer picture of the shapes of the various pieces and their > roles.I don't have a specific roadmap, because this is unfortunately not something that I will be working on in the foreseeable future. I don't think that there is any specific part that makes sense to keep external to the project, the code would just ideally be factored well. I would be very fine with an initial implementation of a GC library to be 100% specific to a very narrow domain. Given an initial implementation, pieces can be factored out later as additional clients are added.> For example, I mentioned the "stop the world" function - however since > LLVM defines no primitives for creating threads or synchronizing > between > them, its hard to see how this could be part of LLVM proper.That just means that the basics should be added first :). The absence of those sorts of routines shouldn't be seen as lack of desire to have them. You mentioned in another email that you are not certain how to compose the various pieces of an allocator together. I think that a policy- based design that uses template instantiation (ala 'modern C++ design') would make sense here, considering that the overhead of every piece is potentially extremely important. -Chris