thr3ads.net - llvm dev - [LLVMdev] Garbage collection [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Talin

2009-Feb-28 05:29 UTC

[LLVMdev] Garbage collection

Chris Lattner wrote:> On Feb 26, 2009, at 12:02 AM, Talin wrote:
>   
>>  With the increasing
>> number of LLVM-based VMs and other projects, I suspect that the desire
>> for more comprehensive garbage collection support in LLVM is only  
>> going
>> to increase.
>>     
>
> Absolutely!
>
>   
>> Part of the reason why there isn't more direct support for GC is
the
>> theory that there is no such thing as a one-size-fits-all collector.  
>> The
>> argument goes that a really efficient collector design requires  
>> detailed
>> knowledge of the object model of the language being compiled.
>>     
>
> Yes, you do need to have some knowledge about the object model.   
> However, it would be perfectly reasonable for LLVM to support and  
> include multiple different collectors for different classes of language.
>
>   
>> On the other hand, it is possible to make a counter-argument to this
>> theory that goes like this: The Java VM has been used to implement a
>> large number of front-end languages efficiently, without requiring a
>> special garbage collector for each language.
>>     
>
> Most importantly to me, the takeaway from Java is that just having  
> something that works "well enough" is really important and helps
> bootstrap a lot of projects, which can then take the "last 10% of  
> performance" as an optimization opportunity, instead of being blocked
> from even starting with LLVM.
>
> I'd claim that JavaVM really isn't a good way to implement a lisp
vm
> or something like that.  However, the perf delta induced by the Java  
> VM may just *not matter* in the big picture.  At least with LLVM, a  
> Lisp implementation could be brought up on an "OOP GC" and
switched to
> something more custom as the project develops.
>
>   
>> It also seems to me that even radically different collector designs
>> could utilize some common building blocks for heap management, work
>> queuing, and so on.
>>     
>
> Yes.
>
>   
>> Of course, there is always a danger when creating libraries of the
>> "ivory tower syndrome", putting a lot of effort into
components that
>> don't actually get used. This is why it would be even better to  
>> create a
>> standard, high performance collector for LLVM that actually uses these
>> methods.  <many good thoughts trimmed>
>>     
>
> What you see in LLVM right now is really only the second step of the  
> planned GC evolution.  The first step was very minimal, but useful for  
> bridging to other existing collectors.  The second step was Gordon's  
> (significant!) extensions to the system which allowed him to tie in  
> the Ocaml collector and bring some more sanity to codegen.
>
> While people object to adding high level features to LLVM, high level  
> and language-specific features are *great* in llvm as long as they are  
> cleanly separable.  I would *love* to see a composable collection of  
> different GC subroutines with clean interfaces built on LLVM "assembly
> language" GC stuff.
>
> In my ideal world, this would be:
>
> 1. Subsystems [with clean interfaces] for thread management,  
> finalization, object model interactions, etc.
> 2. Within different high-level designs (e.g. copying, mark/sweep, etc)  
> there can be replaceable policy components etc.
> 3. A couple of actual GC implementations built on top of #1/2.   
> Ideally there would only be a couple of high-level collectors that can  
> be parameterized by replacing subsystems and policies.
> 4. A very simple language implementation that uses the facilities, on  
> the order of complexity as the kaleidoscope tutorial.
>
> As far as I know, there is nothing that prevents this from happening  
> today, we just need leadership in the area to drive it.  To avoid the  
> "ivory tower" problem, I'd strongly recommend starting with a
simple
> GC and language and get the whole thing working top to bottom.  From  
> there, the various pieces can be generalized out etc.  This ensures  
> that there is always a *problem being solved* and something that works  
> and is testable.
>
> One of the annoying reasons that the GC stuff is only halfway fleshed  
> out is that I was working on an out of tree project (which of course  
> got forgotten about when I left) when developing the GC intrinsics, so  
> there is no fully working example in public.
>
> -Chris
>
> ps. Code generation for the GC intrinsics can be improved  
> significantly.  We can add new intrinsics that don't pin things to the
> stack, update optimizations, and do many other things if people  
> started using the GC stuff seriously.
>   So I guess what I would be helpful for me is a roadmap that defines more 
clearly (a) what parts you plan to build in LLVM (beyond what is already 
there), (b) what parts you would like to have contributed, and (c) what 
parts you definitely want to keep external. In particular, I'd like to 
get a clearer picture of the shapes of the various pieces and their roles.

For example, I mentioned the "stop the world" function - however since
LLVM defines no primitives for creating threads or synchronizing between 
them, its hard to see how this could be part of LLVM proper. On the 
other hand, a sibling project (like vmkit or clang) could probably make 
a more restrictive set of assumptions, such as the existence of either 
POSIX or Windows threading models or some analog of those being 
available. "Stop the world" is implementable in terms of those
threading
primitives if you assume that mutator threads use sync points.

Thus, it seems to me that the proper home for such a function would be 
in a sibling project outside of LLVM proper. At the same time, however, 
core LLVM could benefit from having an implementation of this, in that 
it could guide the design of the IR by providing more concrete use cases 
for things like inserting sync points in generated code and
such.> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Nicolas Geoffray

2009-Mar-02 10:20 UTC

head link

[LLVMdev] Garbage collection

Hi Talin,

First of, thanks for generating such an interesting discussion on 
garbage collection!
> For example, I mentioned the "stop the world" function - however
since
> LLVM defines no primitives for creating threads or synchronizing between 
> them, its hard to see how this could be part of LLVM proper. 
So LLVM should not be aware of "stop the world" stuff. It's really
a
matter of your collector. What LLVM should be aware is the position of 
safe points in the code. That's the case with Gordon's GCStrategy class.
However, to implement "stop the world", runtimes should be able to add
runtime-specific code at these safe points. If you could, then adding a 
cooperative, stop-the-world GC would be straightforward. Just poll a 
variable during the safe-point. When a collection is required, the 
collector updates that variable and waits for all threads to poll it.

After Gordon's *fantastic* work on GC interfaces in LLVM, I think that's
the one thing missing in the design. The other intrinsics let you 
generate code that should be GC-independent and only need GC 
implementations. So you can start right away writing your collector(s) :)
> On the 
> other hand, a sibling project (like vmkit or clang) could probably make 
> a more restrictive set of assumptions, such as the existence of either 
> POSIX or Windows threading models or some analog of those being 
> available. "Stop the world" is implementable in terms of those
threading
> primitives if you assume that mutator threads use sync points.
>   
That's right!
> Thus, it seems to me that the proper home for such a function would be 
> in a sibling project outside of LLVM proper. At the same time, however, 
> core LLVM could benefit from having an implementation of this, in that 
> it could guide the design of the IR by providing more concrete use cases 
> for things like inserting sync points in generated code and such.
>   
There is a runtime directory in llvm, maybe that's where memory and 
threading examples could be added? However, a real-world GC is no piece 
of cake and way out of the scope of LLVM. So I like the idea of a 
sibling project. The runtime directory should just contain simple 
implementations.

Nicolas

Andrew Haley

2009-Mar-02 10:31 UTC

head link

[LLVMdev] Garbage collection

Nicolas Geoffray wrote:
>> For example, I mentioned the "stop the world" function -
however since
>> LLVM defines no primitives for creating threads or synchronizing
between
>> them, its hard to see how this could be part of LLVM proper. 
> 
> So LLVM should not be aware of "stop the world" stuff. It's
really a
> matter of your collector. What LLVM should be aware is the position of 
> safe points in the code. That's the case with Gordon's GCStrategy
class.
> However, to implement "stop the world", runtimes should be able
to add
> runtime-specific code at these safe points. If you could, then adding a 
> cooperative, stop-the-world GC would be straightforward. Just poll a 
> variable during the safe-point. When a collection is required, the 
> collector updates that variable and waits for all threads to poll it.
Yes, I agree.  That's exactly what we need for Shark, which uses Sun's
Java VM.

Andrew.

Gordon Henriksen

2009-Mar-02 15:48 UTC

head link

[LLVMdev] Garbage collection

On Mar 2, 2009, at 05:20, Nicolas Geoffray wrote:
> So LLVM should not be aware of "stop the world" stuff. It's
really a
> matter of your collector. What LLVM should be aware is the position  
> of safe points in the code. That's the case with Gordon's
GCStrategy
> class. However, to implement "stop the world", runtimes should be
> able to add runtime-specific code at these safe points. If you  
> could, then adding a cooperative, stop-the-world GC would be  
> straightforward. Just poll a variable during the safe-point. When a  
> collection is required, the collector updates that variable and  
> waits for all threads to poll it.
>
> After Gordon's *fantastic* work on GC interfaces in LLVM, I think  
> that's the one thing missing in the design.
Generous of you, Nicolas. :) There are actually several blocking  
features missing, and injecting code at safe points is one of them.

     http://llvm.org/docs/GarbageCollection.html#collector-algos

Major outstanding work that I think blocks entire classes of users  
from started:

   - Emit code at safe points, as you observed.
     This is necessary for supporting threading, as you point out.

   - Add safe points for loops that do not otherwise have them.
     Prevent (for (;;) { }) from blocking collection.

   - Hooks for injecting stack maps into the JIT.
     Something akin to the AsmWriter's GC printer registry.

SSA values as roots, liveness analysis, register maps, and derived  
pointer maps are code quality improvements.
> The other intrinsics let you generate code that should be GC- 
> independent and only need GC implementations. So you can start right  
> away writing your collector(s) :)
Definitely so!

— Gordon

Chris Lattner

2009-Mar-09 04:14 UTC

head link

[LLVMdev] Garbage collection

On Feb 27, 2009, at 9:29 PM, Talin wrote:>>
> So I guess what I would be helpful for me is a roadmap that defines  
> more
> clearly (a) what parts you plan to build in LLVM (beyond what is  
> already
> there), (b) what parts you would like to have contributed, and (c)  
> what
> parts you definitely want to keep external. In particular, I'd like to
> get a clearer picture of the shapes of the various pieces and their  
> roles.
I don't have a specific roadmap, because this is unfortunately not  
something that I will be working on in the foreseeable future.  I  
don't think that there is any specific part that makes sense to keep  
external to the project, the code would just ideally be factored  
well.  I would be very fine with an initial implementation of a GC  
library to be 100% specific to a very narrow domain.  Given an initial  
implementation, pieces can be factored out later as additional clients  
are added.
> For example, I mentioned the "stop the world" function - however
since
> LLVM defines no primitives for creating threads or synchronizing  
> between
> them, its hard to see how this could be part of LLVM proper.
That just means that the basics should be added first :).  The absence  
of those sorts of routines shouldn't be seen as lack of desire to have  
them.

You mentioned in another email that you are not certain how to compose  
the various pieces of an allocator together.  I think that a policy- 
based design that uses template instantiation (ala 'modern C++  
design') would make sense here, considering that the overhead of every  
piece is potentially extremely important.

-Chris

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Mar 2009 - [LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

Seemingly Similar Threads