thr3ads.net - llvm dev - [LLVMdev] Why LLVM should NOT have garbage collection intrinsics[MESSAGE NOT SCANNED] [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Mark Shannon

2009-Mar-01 10:41 UTC

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

Gordon Henriksen wrote:> 
> The "runtime interface" is a historical artifact. LLVM does not
impose
> a runtime library on its users. I wouldn't have a problem deleting all
> mention of it, since LLVM does not impose a contract on the runtime.
> Excellent, I found it somewhat unhelpful!
>> The semantics of llvm.gcroot are vague:
>> "At compile-time, the code generator generates information to
allow
>> the
>> runtime to find the pointer at GC safe points."
>>
>> Vague, ill-specified interfaces are worse than none.
> 
> There's nothing ill-defined about the semantics of gcroot except  
> insofar as GC code generation is pluggable.
> Sorry, but "At compile-time, the code generator generates information to 
allow the runtime to find the pointer at GC safe points." does not 
really say anything.
No one could possibly implement this "specification".

Sorry about all my negative comments, but I would like to implement a 
generational collector for llvm, but I cannot do so in a portable way.

So, here is a suggestion:

Call the GC 'intrinsics' something else, "extinsics"?, and
provide
low-level intrinsics so that the GC calls, gcroot, gcread and gcwrite 
can be converted to GC-free LLVM code in a GC-lowering pass.

IR+GC -> | GC Lowering pass | -> IR

Rather than than the current.

IR+GC -> | Backend lowering pass(es) | -> SelectionDAG

Read and write barriers can already be written in llvm-IR.
It is the marking of roots that is the problem.

Given that any new intrinsics/instructions are an additional burden on 
all back-ends, I'm not going to propose particular ones, but it seems 
that they are needed.

By the way, I think that adding a GC pointer type is an unnecessary 
burden on the the back-ends, front-ends really should be able to handle 
this.

The current trio of gcroot, gcread and gcwrite is OK, BUT GC 
implementations should be able to translate them to llvm-IR so that the 
optimisers and back-ends can do their jobs without worrying about GC 
details.

As an aside, I think that debug info can be treated in a similar way:

IR+debug -> | Debug lowering pass | ->  IR

After all both debug and GC require similar things, that is, information 
about the location of stack variables (and possibly, register variables)
and the machine location of points in code (for line numbering or 
gc-safe points).

If intrinsics/instructions to do the above can be implemented then I 
will port my generational, copying collector to LLVM *and* maintain it 
for as long as possible.

Mark.

Gordon Henriksen

2009-Mar-01 15:32 UTC

head link

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

On 2009-03-01, at 05:41, Mark Shannon wrote:
> Gordon Henriksen wrote:
>
>>> The semantics of llvm.gcroot are vague: "At compile-time, the
code
>>> generator generates information to allow the runtime to find the  
>>> pointer at GC safe points."
>>>
>>> Vague, ill-specified interfaces are worse than none.
>>
>> There's nothing ill-defined about the semantics of gcroot except  
>> insofar as GC code generation is pluggable.
>>
>
> Sorry, but "At compile-time, the code generator generates  
> information to allow the runtime to find the pointer at GC safe  
> points." does not really say anything. No one could possibly  
> implement this "specification".
llvm.gcroot is an interface to a runtime library (or binary format)  
only through the mediation of the GC plugin, so the exact front-to- 
back behavior is undefined, yes. Likewise, the 'add' instruction does  
not specify by what machine instruction the addition will be  
performed, yet it is not vague.

What is communicated to the plugins themselves through the presence of  
a llvm.gcroot call is detailed lower in the document, in the  
Implementing a GC Plugin section.

This is abstract and complex, but not imprecise. Still, if you'd like  
to propose improved wording for GarbageCollection.html that makes this  
clearer for you, I'd be happy to incorporate it.
> I would like to implement a generational collector for llvm, but I  
> cannot do so in a portable way.
You'll certainly need to map roots on the stack and use write barriers.

• shadow-stack is an easy, portable way to bring up root discovery.  
You can switch to static stack maps later (with the requirement that  
your runtime be able to crawl the machine stack, which is out-of-scope  
for LLVM unless Talin makes some progress with his GC building blocks).

• As you observe, your write barrier can be written in LLVM IR without  
the use of the llvm.gcwrite intrinsic if you so desire. Otherwise, you  
can perform the IR-to-IR transform to eliminate llvm.gcwrite using the  
performCustomLowering hook.

What else is blocking you?
> By the way, I think that adding a GC pointer type is an unnecessary  
> burden on the the back-ends, front-ends really should be able to  
> handle this.
I don't see how these points mesh together into a single concern, but  
I can address your three points:

• On your concern for "burdening the backends." You've used this
straw
man before. LLVM backends are not monoliths; they actually share a  
great deal of code. All current GC changes were made in the shared  
codebase, making the cost-to-implement and maintain O(1), not O(N). I  
see no reason this would change in the future.

• On whether the back-end need be involved. As I've already discussed  
WRT stack layout, the back-end must be involved because only it knows  
how stack frames and code are laid out. I'd like to reemphasize that  
back ends can and do introduce or delete both control flow and calls  
in the program--adding safe points which are not (as such) represented  
in LLVM IR. Thus, the front-end cannot know even the set of all safe  
points (much less their locations). From that, it follows that  
liveness and stack maps cannot be computed by the front-end, precisely  
because they need be computed at said unknown safe points. Finally, I  
hope it's abundantly obvious that register maps are impossible for the  
front-end to compute.

• On whether a GC pointer type is necessary in the IR. 'llvm.gcroot'  
as it stands today basically makes GC pointer manipulation code opaque  
to all of LLVM's optimizations, both front- and back-end. (The root  
alloca 'escapes', so mem2reg can't hack on it, and all is lost.) The
generated code is full of redundant memory operations as a result.  
There's true redundancy, and there's the redundancy required when  
passing a safe point without register maps. Allowing SSA values to be  
GC roots directly (rather than merely pointing to roots) would enable  
improvements in this area.
> The current trio of gcroot, gcread and gcwrite is OK, BUT GC  
> implementations should be able to translate them to llvm-IR so that  
> the optimisers and back-ends can do their jobs without worrying  
> about GC details.

GC plugins can already eliminate GC intrinsics prior to code  
generation, for cases where the GC scheme can be represented as pure  
IR. In the tree, shadow-stack is implemented as a pure IR transform.

http://llvm.org/docs/GarbageCollection.html#custom

But this is not the interesting case, since it has limitations as I  
discussed above (and could've been written without the intrinsics in  
the first place).

— Gordon

Mark Shannon

2009-Mar-01 19:11 UTC

head link

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics[MESSAGE NOT SCANNED]

Gordon Henriksen wrote:
> You'll certainly need to map roots on the stack and use write barriers.
> 
> • shadow-stack is an easy, portable way to bring up root discovery.  
> You can switch to static stack maps later (with the requirement that  
> your runtime be able to crawl the machine stack, which is out-of-scope  
> for LLVM unless Talin makes some progress with his GC building blocks).
This this is the crux of my argument:
Without the ability to traverse the stack in a portable way, the only 
way I can write a portable GC is to avoid the llvm intrinsics.
Therefore, they are useless and should be removed.

However, if the ability to traverse the stack is added to llvm then most 
of my objections to the intrinsics disappear.
> • As you observe, your write barrier can be written in LLVM IR without  
> the use of the llvm.gcwrite intrinsic if you so desire. Otherwise, you  
> can perform the IR-to-IR transform to eliminate llvm.gcwrite using the  
> performCustomLowering hook.
> 
> What else is blocking you?
> 
Nothing, except the lack of stack traversal code ;)

Once the portable stack-traversal code is available, I'll port my GC, as 
promised.

Thanks for taking the time to discuss this.
You've just about convinced me that the intrinsics should stay, but I 
still think the interface to the GC subsystem is (currently) a bit of a 
mess.

Mark.

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Mar 2009 - [LLVMdev] Why LLVM should NOT have garbage collection intrinsics[MESSAGE NOT SCANNED]

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

[LLVMdev] Why LLVM should NOT have garbage collection intrinsics[MESSAGE NOT SCANNED]

Seemingly Similar Threads