thr3ads.net - llvm dev - [LLVMdev] Why LLVM should NOT have garbage collection intrinsics [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Mark Shannon

2009-Feb-27 17:56 UTC

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

Gordon Henriksen wrote:> Hi Mark,
> 
> I don't think anyone will dispute that it's easier to hack up a
shadow
> stack (or plug into a conservative collector) to get up and running  
> with GC. That is absolutely the route to go if portability trumps  
> performance.
Why? LLVM is all about portability AND performance.
> 
> If you review the mailing list history, I think you'll also find that  
> developers who do care about performance have been disappointed with  
> the impact of using a shadow stack, either managed with LLVM  
> intrinsics or by hand. Even the current state of LLVM GC (static stack  
> maps) is a significant performance improvement—but it absolutely does  
> require support from the code generator. Return addresses must be  
> mapped to stack maps, and only the code generator knows where return  
> addresses lie and how the stack frame is laid out.
I agree that the code-generator should provide information about 
stack-layout, and it must be possible to inform the optimisation passes 
that certain memory locations may be moved.

But information about stack layout is useful for things other than GC 
and would be useful for interactive debugging as well.

Intrinsics should be named for their function, not for their presumed usage.
> 
> The ultimate endgoal is to support schemes with still-lower execution  
> overhead. The next step for LLVM GC would be elimination of the reload  
> penalty for using GC intrinsics with a copying collector. This, again,  
> requires that the code generator perform bookkeeping for GC pointers.
Elimination of the reload penalty is impossible, unless the GC can be 
informed about traceable objects in registers.
> 
> I'm not sure where such vociferous concern on this subject arises. All
> the extant collector plugins I'm aware of operate in conjunction with  
> the target-independent framework and require exactly zero code within  
> each target backend.
No collector plugins actually use gcread/gcwrite, since there  are no 
generational collectors for llvm (as yet).

According to the documentation
http://llvm.org/docs/GarbageCollection.html#runtime
The GC interface is "a work in progress"

The semantics of llvm.gcroot are vague:
"At compile-time, the code generator generates information to allow the 
runtime to find the pointer at GC safe points."

Vague, ill-specified interfaces are worse than none.

Fundamentally, implementers of new back-ends shouldn't have to worry 
about GC, and implementers of GC algorithms should not have to delve 
into the internals of the back-end.

Mark.

Gordon Henriksen

2009-Feb-27 18:13 UTC

head link

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

On Feb 27, 2009, at 12:56, Mark Shannon wrote:
> Gordon Henriksen wrote:
>
>> The ultimate endgoal is to support schemes with still-lower  
>> execution overhead. The next step for LLVM GC would be elimination  
>> of the reload penalty for using GC intrinsics with a copying  
>> collector. This, again, requires that the code generator perform  
>> bookkeeping for GC pointers.
>
> Elimination of the reload penalty is impossible, unless the GC can  
> be informed about traceable objects in registers.
Exactly.
>> I'm not sure where such vociferous concern on this subject arises.
>> All the extant collector plugins I'm aware of operate in  
>> conjunction with the target-independent framework and require  
>> exactly zero code within each target backend.
>
> No collector plugins actually use gcread/gcwrite, since there  are  
> no generational collectors for llvm (as yet).
>
> According to the documentation
> http://llvm.org/docs/GarbageCollection.html#runtime
> The GC interface is "a work in progress"
The "runtime interface" is a historical artifact. LLVM does not impose
a runtime library on its users. I wouldn't have a problem deleting all  
mention of it, since LLVM does not impose a contract on the runtime.
> The semantics of llvm.gcroot are vague:
> "At compile-time, the code generator generates information to allow  
> the
> runtime to find the pointer at GC safe points."
>
> Vague, ill-specified interfaces are worse than none.
There's nothing ill-defined about the semantics of gcroot except  
insofar as GC code generation is pluggable.
> Fundamentally, implementers of new back-ends shouldn't have to worry  
> about GC, and implementers of GC algorithms should not have to delve  
> into the internals of the back-end.

This is precisely the current state of affairs.

— Gordon

Mark Shannon

2009-Mar-01 10:41 UTC

head link

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

Gordon Henriksen wrote:> 
> The "runtime interface" is a historical artifact. LLVM does not
impose
> a runtime library on its users. I wouldn't have a problem deleting all
> mention of it, since LLVM does not impose a contract on the runtime.
> Excellent, I found it somewhat unhelpful!
>> The semantics of llvm.gcroot are vague:
>> "At compile-time, the code generator generates information to
allow
>> the
>> runtime to find the pointer at GC safe points."
>>
>> Vague, ill-specified interfaces are worse than none.
> 
> There's nothing ill-defined about the semantics of gcroot except  
> insofar as GC code generation is pluggable.
> Sorry, but "At compile-time, the code generator generates information to 
allow the runtime to find the pointer at GC safe points." does not 
really say anything.
No one could possibly implement this "specification".

Sorry about all my negative comments, but I would like to implement a 
generational collector for llvm, but I cannot do so in a portable way.

So, here is a suggestion:

Call the GC 'intrinsics' something else, "extinsics"?, and
provide
low-level intrinsics so that the GC calls, gcroot, gcread and gcwrite 
can be converted to GC-free LLVM code in a GC-lowering pass.

IR+GC -> | GC Lowering pass | -> IR

Rather than than the current.

IR+GC -> | Backend lowering pass(es) | -> SelectionDAG

Read and write barriers can already be written in llvm-IR.
It is the marking of roots that is the problem.

Given that any new intrinsics/instructions are an additional burden on 
all back-ends, I'm not going to propose particular ones, but it seems 
that they are needed.

By the way, I think that adding a GC pointer type is an unnecessary 
burden on the the back-ends, front-ends really should be able to handle 
this.

The current trio of gcroot, gcread and gcwrite is OK, BUT GC 
implementations should be able to translate them to llvm-IR so that the 
optimisers and back-ends can do their jobs without worrying about GC 
details.

As an aside, I think that debug info can be treated in a similar way:

IR+debug -> | Debug lowering pass | ->  IR

After all both debug and GC require similar things, that is, information 
about the location of stack variables (and possibly, register variables)
and the machine location of points in code (for line numbering or 
gc-safe points).

If intrinsics/instructions to do the above can be implemented then I 
will port my generational, copying collector to LLVM *and* maintain it 
for as long as possible.

Mark.

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Mar 2009 - [LLVMdev] Why LLVM should NOT have garbage collection intrinsics

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

Possibly Parallel Threads