Mark Shannon
2009-Feb-27 11:38 UTC
[LLVMdev] Why LLVM should NOT have garbage collection intrinsics[MESSAGE NOT SCANNED]
Hi, I realise this might be a bit controversial ;) Suppose I am writing a VM (such as VMKit), or a VM toolkit, and I want to add a generational GC. If I want to use the llvm.gcwrite intrinsic for my write barrier then I need to write a GC and then implement for each and *every* backend the gcwrite intrinsic for my write barrier. Now, if I don't use the intrinsic, I need to write my write barrier *once* in llvm IR. All I need is a nop intrinsic and ensure that all objects collectable by the GC are reachable from some global variable. This ensures that the optimisation phases know that they cannot rely on memory objects not moving at GC safe points. I have a *copying* collector that works with llvm JITted code, so I know that this works :) In fact, this leads to a more general point: ANY intrinsic that is not guaranteed to be implemented by ALL backends is useless, since a front-end that uses llvm to target multiple architectures MUST avoid them. Mark.
Mark Shannon
2009-Feb-27 17:56 UTC
[LLVMdev] Why LLVM should NOT have garbage collection intrinsics
Gordon Henriksen wrote:> Hi Mark, > > I don't think anyone will dispute that it's easier to hack up a shadow > stack (or plug into a conservative collector) to get up and running > with GC. That is absolutely the route to go if portability trumps > performance.Why? LLVM is all about portability AND performance.> > If you review the mailing list history, I think you'll also find that > developers who do care about performance have been disappointed with > the impact of using a shadow stack, either managed with LLVM > intrinsics or by hand. Even the current state of LLVM GC (static stack > maps) is a significant performance improvement—but it absolutely does > require support from the code generator. Return addresses must be > mapped to stack maps, and only the code generator knows where return > addresses lie and how the stack frame is laid out.I agree that the code-generator should provide information about stack-layout, and it must be possible to inform the optimisation passes that certain memory locations may be moved. But information about stack layout is useful for things other than GC and would be useful for interactive debugging as well. Intrinsics should be named for their function, not for their presumed usage.> > The ultimate endgoal is to support schemes with still-lower execution > overhead. The next step for LLVM GC would be elimination of the reload > penalty for using GC intrinsics with a copying collector. This, again, > requires that the code generator perform bookkeeping for GC pointers.Elimination of the reload penalty is impossible, unless the GC can be informed about traceable objects in registers.> > I'm not sure where such vociferous concern on this subject arises. All > the extant collector plugins I'm aware of operate in conjunction with > the target-independent framework and require exactly zero code within > each target backend.No collector plugins actually use gcread/gcwrite, since there are no generational collectors for llvm (as yet). According to the documentation http://llvm.org/docs/GarbageCollection.html#runtime The GC interface is "a work in progress" The semantics of llvm.gcroot are vague: "At compile-time, the code generator generates information to allow the runtime to find the pointer at GC safe points." Vague, ill-specified interfaces are worse than none. Fundamentally, implementers of new back-ends shouldn't have to worry about GC, and implementers of GC algorithms should not have to delve into the internals of the back-end. Mark.
Apparently Analagous Threads
- [LLVMdev] Why LLVM should NOT have garbage collection intrinsics
- [LLVMdev] Why LLVM should NOT have garbage collection intrinsics[MESSAGE NOT SCANNED]
- [LLVMdev] Why LLVM should NOT have garbage collection intrinsics[MESSAGE NOT SCANNED]
- [LLVMdev] Why LLVM should NOT have garbage collection intrinsics
- [LLVMdev] Garbage collection questions