Mark Shannon
2009-Feb-27 17:56 UTC
[LLVMdev] Why LLVM should NOT have garbage collection intrinsics
Gordon Henriksen wrote:> Hi Mark, > > I don't think anyone will dispute that it's easier to hack up a shadow > stack (or plug into a conservative collector) to get up and running > with GC. That is absolutely the route to go if portability trumps > performance.Why? LLVM is all about portability AND performance.> > If you review the mailing list history, I think you'll also find that > developers who do care about performance have been disappointed with > the impact of using a shadow stack, either managed with LLVM > intrinsics or by hand. Even the current state of LLVM GC (static stack > maps) is a significant performance improvement—but it absolutely does > require support from the code generator. Return addresses must be > mapped to stack maps, and only the code generator knows where return > addresses lie and how the stack frame is laid out.I agree that the code-generator should provide information about stack-layout, and it must be possible to inform the optimisation passes that certain memory locations may be moved. But information about stack layout is useful for things other than GC and would be useful for interactive debugging as well. Intrinsics should be named for their function, not for their presumed usage.> > The ultimate endgoal is to support schemes with still-lower execution > overhead. The next step for LLVM GC would be elimination of the reload > penalty for using GC intrinsics with a copying collector. This, again, > requires that the code generator perform bookkeeping for GC pointers.Elimination of the reload penalty is impossible, unless the GC can be informed about traceable objects in registers.> > I'm not sure where such vociferous concern on this subject arises. All > the extant collector plugins I'm aware of operate in conjunction with > the target-independent framework and require exactly zero code within > each target backend.No collector plugins actually use gcread/gcwrite, since there are no generational collectors for llvm (as yet). According to the documentation http://llvm.org/docs/GarbageCollection.html#runtime The GC interface is "a work in progress" The semantics of llvm.gcroot are vague: "At compile-time, the code generator generates information to allow the runtime to find the pointer at GC safe points." Vague, ill-specified interfaces are worse than none. Fundamentally, implementers of new back-ends shouldn't have to worry about GC, and implementers of GC algorithms should not have to delve into the internals of the back-end. Mark.
Gordon Henriksen
2009-Feb-27 18:13 UTC
[LLVMdev] Why LLVM should NOT have garbage collection intrinsics
On Feb 27, 2009, at 12:56, Mark Shannon wrote:> Gordon Henriksen wrote: > >> The ultimate endgoal is to support schemes with still-lower >> execution overhead. The next step for LLVM GC would be elimination >> of the reload penalty for using GC intrinsics with a copying >> collector. This, again, requires that the code generator perform >> bookkeeping for GC pointers. > > Elimination of the reload penalty is impossible, unless the GC can > be informed about traceable objects in registers.Exactly.>> I'm not sure where such vociferous concern on this subject arises. >> All the extant collector plugins I'm aware of operate in >> conjunction with the target-independent framework and require >> exactly zero code within each target backend. > > No collector plugins actually use gcread/gcwrite, since there are > no generational collectors for llvm (as yet). > > According to the documentation > http://llvm.org/docs/GarbageCollection.html#runtime > The GC interface is "a work in progress"The "runtime interface" is a historical artifact. LLVM does not impose a runtime library on its users. I wouldn't have a problem deleting all mention of it, since LLVM does not impose a contract on the runtime.> The semantics of llvm.gcroot are vague: > "At compile-time, the code generator generates information to allow > the > runtime to find the pointer at GC safe points." > > Vague, ill-specified interfaces are worse than none.There's nothing ill-defined about the semantics of gcroot except insofar as GC code generation is pluggable.> Fundamentally, implementers of new back-ends shouldn't have to worry > about GC, and implementers of GC algorithms should not have to delve > into the internals of the back-end.This is precisely the current state of affairs. — Gordon
Mark Shannon
2009-Mar-01 10:41 UTC
[LLVMdev] Why LLVM should NOT have garbage collection intrinsics
Gordon Henriksen wrote:> > The "runtime interface" is a historical artifact. LLVM does not impose > a runtime library on its users. I wouldn't have a problem deleting all > mention of it, since LLVM does not impose a contract on the runtime. >Excellent, I found it somewhat unhelpful!>> The semantics of llvm.gcroot are vague: >> "At compile-time, the code generator generates information to allow >> the >> runtime to find the pointer at GC safe points." >> >> Vague, ill-specified interfaces are worse than none. > > There's nothing ill-defined about the semantics of gcroot except > insofar as GC code generation is pluggable. >Sorry, but "At compile-time, the code generator generates information to allow the runtime to find the pointer at GC safe points." does not really say anything. No one could possibly implement this "specification". Sorry about all my negative comments, but I would like to implement a generational collector for llvm, but I cannot do so in a portable way. So, here is a suggestion: Call the GC 'intrinsics' something else, "extinsics"?, and provide low-level intrinsics so that the GC calls, gcroot, gcread and gcwrite can be converted to GC-free LLVM code in a GC-lowering pass. IR+GC -> | GC Lowering pass | -> IR Rather than than the current. IR+GC -> | Backend lowering pass(es) | -> SelectionDAG Read and write barriers can already be written in llvm-IR. It is the marking of roots that is the problem. Given that any new intrinsics/instructions are an additional burden on all back-ends, I'm not going to propose particular ones, but it seems that they are needed. By the way, I think that adding a GC pointer type is an unnecessary burden on the the back-ends, front-ends really should be able to handle this. The current trio of gcroot, gcread and gcwrite is OK, BUT GC implementations should be able to translate them to llvm-IR so that the optimisers and back-ends can do their jobs without worrying about GC details. As an aside, I think that debug info can be treated in a similar way: IR+debug -> | Debug lowering pass | -> IR After all both debug and GC require similar things, that is, information about the location of stack variables (and possibly, register variables) and the machine location of points in code (for line numbering or gc-safe points). If intrinsics/instructions to do the above can be implemented then I will port my generational, copying collector to LLVM *and* maintain it for as long as possible. Mark.
Seemingly Similar Threads
- [LLVMdev] Why LLVM should NOT have garbage collection intrinsics
- [LLVMdev] Why LLVM should NOT have garbage collection intrinsics[MESSAGE NOT SCANNED]
- [LLVMdev] Why LLVM should NOT have garbage collection intrinsics
- [LLVMdev] Why LLVM should NOT have garbage collection intrinsics[MESSAGE NOT SCANNED]
- [LLVMdev] Garbage collection questions