thr3ads.net - llvm dev - [LLVMdev] Adding safe-point code generation [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Jeffrey Yasskin

2009-Jun-22 05:49 UTC

[LLVMdev] Adding safe-point code generation

Hi all,

I need to add thread-switching support to Unladen Swallow's JIT, and
LLVM's safe point support looks like a good way to get code into all
the right places. However,
http://llvm.org/docs/GarbageCollection.html#collector-algos points out
that there's no way to emit code at safe points yet, and there are no
loop safe points at all. So I'll be trying to implement them.

Is there anything I should know before starting?

One way to do this might be to add a FunctionPass to
LLVMTargetMachine::addCommonCodeGenPasses() alongside
createGCLoweringPass(), which would insert user-defined code for safe
points. Unfortunately, code inserted there would stick around in the
IR after the machine code was emitted, and if the function were JITted
again, we'd get duplicate safe points.

Another way to do it might be to add a MachineFunction pass next to
createGCMachineCodeAnalysisPass() (or instead of it), which could emit
appropriate MachineInstructions to implement the safe point. This, of
course, forces safe points to be written in MachineInstructions
instead of IR instructions, which isn't ideal.

Another way might be to run a pass over the IR inserting
llvm.safepoint() calls, which could be implemented as a function in
the module. Then we'd want a MachineFunction pass to inline this for
us during codegen. The llvm.safepoint() calls could be easily
identified and removed if the IR needs to be re-used.

Thoughts?

Thanks,
Jeffrey

Nicolas Geoffray

2009-Jun-22 07:57 UTC

head link

[LLVMdev] Adding safe-point code generation

Hi Jeffrey,

Jeffrey Yasskin wrote:> Hi all,
>
> I need to add thread-switching support to Unladen Swallow's JIT, and
> LLVM's safe point support looks like a good way to get code into all
> the right places. However,
> http://llvm.org/docs/GarbageCollection.html#collector-algos points out
> that there's no way to emit code at safe points yet, and there are no
> loop safe points at all. So I'll be trying to implement them.
>
>   
That's great!
> Is there anything I should know before starting?
>
> One way to do this might be to add a FunctionPass to
> LLVMTargetMachine::addCommonCodeGenPasses() alongside
> createGCLoweringPass(), which would insert user-defined code for safe
> points. Unfortunately, code inserted there would stick around in the
> IR after the machine code was emitted, and if the function were JITted
> again, we'd get duplicate safe points.
>
>   
Agree. I guess, a flag on a function saying that GC/Thread points have 
already been inserted would make it.
> Another way to do it might be to add a MachineFunction pass next to
> createGCMachineCodeAnalysisPass() (or instead of it), which could emit
> appropriate MachineInstructions to implement the safe point. This, of
> course, forces safe points to be written in MachineInstructions
> instead of IR instructions, which isn't ideal.
>
>   
I don't think that's worth the pain, if it's only to avoid
duplicated
safe points.
> Another way might be to run a pass over the IR inserting
> llvm.safepoint() calls, which could be implemented as a function in
> the module. Then we'd want a MachineFunction pass to inline this for
> us during codegen. The llvm.safepoint() calls could be easily
> identified and removed if the IR needs to be re-used.
>
>   
I like this one. I think that's the best way to go. The safe points 
"pass" in the GC already computes the safe points (except for
"for
loops") , so we've got a base. I don't think llvm.safepoint needs
to be
a function, a function pass could replace the intrinsic with instructions.

Thanks for working on this!

Nicolas
> Thoughts?
>
> Thanks,
> Jeffrey
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Gordon Henriksen

2009-Jun-23 11:18 UTC

head link

[LLVMdev] Adding safe-point code generation

Hi Jeffrey,

On 2009-06-22, at 01:49, Jeffrey Yasskin wrote:
> I need to add thread-switching support to Unladen Swallow's JIT, and  
> LLVM's safe point support looks like a good way to get code into all  
> the right places. However,
http://llvm.org/docs/GarbageCollection.html#collector-algos
>  points out that there's no way to emit code at safe points yet, and  
> there are no loop safe points at all. So I'll be trying to implement  
> them.
>
> Is there anything I should know before starting?
Sounds like you've got the right idea.
> One way to do this might be to add a FunctionPass to  
> LLVMTargetMachine::addCommonCodeGenPasses() alongside  
> createGCLoweringPass(), which would insert user-defined code for  
> safe points. Unfortunately, code inserted there would stick around  
> in the IR after the machine code was emitted, and if the function  
> were JITted again, we'd get duplicate safe points.
Unfortunately, I don't believe this is workable. It would make this  
work much easier if it were.
> Another way to do it might be to add a MachineFunction pass next to  
> createGCMachineCodeAnalysisPass() (or instead of it), which could  
> emit appropriate MachineInstructions to implement the safe point.  
> This, of course, forces safe points to be written in  
> MachineInstructions instead of IR instructions, which isn't ideal.
I think this is the way to go, though it's also the most involved. My  
primary rationale is that code generation can hack on the CFG, even  
introducing loops where there were none expressed in the IR. It could  
be that I'm being unnecessarily pessimistic on this point, though.

As a specific example of the code generator hacking on the CFG, take  
atomic operations which expand to loops on architectures which use  
load-reserved/store-conditional to implement these primitives. It may  
not be necessary or desirable to add safe points to these loops, but  
perhaps should be preferred on the basis o correctness.

As another example, consider a 64-bit integer divide on a 32-bit  
architecture expanding to a libcall. Some, but perhaps not all,  
collection algorithms would want to emit safe point code for this  
call, but it simply does not exist in the IR to instrument.

Also, code injection of the form 'give me 8 bytes of noops at each  
safe point' and 'insert a cold instruction sequence at the end of the  
function' are best expressed at the machine code level. Safe points  
are hot code and unusual, target-specific techniques are regularly  
used with them if you survey the literature, so a design which  
accommodates that reality is preferred, even though hacking on the  
MachineFunction representation is less pleasant than the IR.

One element of this design that is desirable from a design perspective  
is that it preserves the original IR. Chris has said that it's a long- 
term goal of LLVM to not mangle the Function during code generation,  
and this moves in that direction instead of regressing.
> Another way might be to run a pass over the IR inserting  
> llvm.safepoint() calls, which could be implemented as a function in  
> the module. Then we'd want a MachineFunction pass to inline this for  
> us during codegen. The llvm.safepoint() calls could be easily  
> identified and removed if the IR needs to be re-used.
I see this as fairly equivalent to the first option.

Also, regardless, stop point markers (a label is actually generated)  
need to survive as such into the MachineFunction representation else  
we'll never be able to generate a register map.

Hope that helps,
Gordon

P.S. There's an interesting circularity which I may not have accounted  
for in the original design: If code is injected at each safe point,  
and a call instruction is injected, do we need to generate another  
safe point for that call? Clearly, the expansion of a safe point  
cannot be recursive with itself; but I think that we should allow  
generating a register map at the return address of that call, just as  
some collectors may want to instrument the libcall case discussed above.

Actually, this distinction between safe points for inserting code and  
safe points for frame maps is probably is a critical design issue for  
your use case. Our current definition of a safe point is at the return  
address of a call instruction, which is precisely what's required to  
call the stack. This is NOT the location where you want to add a call  
to your runtime.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090623/6c07a73a/attachment.html>

Gordon Henriksen

2009-Jun-23 13:59 UTC

head link

[LLVMdev] Adding safe-point code generation

On 2009-06-23, at 07:18, Gordon Henriksen wrote:
> Actually, this distinction between safe points for inserting code  
> and safe points for frame maps is probably is a critical design  
> issue for your use case. Our current definition of a safe point is  
> at the return address of a call instruction, which is precisely  
> what's required to call the stack. This is NOT the location where  
> you want to add a call to your runtime.
“... precisely what's required to walk the stack ...”, rather.

— Gordon

Jeffrey Yasskin

2009-Jun-23 19:18 UTC

head link

[LLVMdev] Adding safe-point code generation

On Tue, Jun 23, 2009 at 4:18 AM, Gordon Henriksen <gordonhenriksen at
me.com>wrote:
> Hi Jeffrey,
> On 2009-06-22, at 01:49, Jeffrey Yasskin wrote:
>
> I need to add thread-switching support to Unladen Swallow's JIT, and
LLVM's
> safe point support looks like a good way to get code into all the right
> places. However,
> http://llvm.org/docs/GarbageCollection.html#collector-algos points
> out that there's no way to emit code at safe points yet, and there are
> no loop safe points at all. So I'll be trying to implement them.
>
> Is there anything I should know before starting?
>
>
> Sounds like you've got the right idea.
>
> One way to do this might be to add a FunctionPass
> to LLVMTargetMachine::addCommonCodeGenPasses()
> alongside createGCLoweringPass(), which would insert user-defined code for
> safe points. Unfortunately, code inserted there would stick around in the
IR
> after the machine code was emitted, and if the function were JITted again,
> we'd get duplicate safe points.
>
>
> Unfortunately, I don't believe this is workable. It would make this
work
> much easier if it were.
>
> Another way to do it might be to add a MachineFunction pass next
> to createGCMachineCodeAnalysisPass() (or instead of it), which could
> emit appropriate MachineInstructions to implement the safe point. This,
> of course, forces safe points to be written in MachineInstructions instead
> of IR instructions, which isn't ideal.
>
>
> I think this is the way to go, though it's also the most involved. My
> primary rationale is that code generation can hack on the CFG, even
> introducing loops where there were none expressed in the IR. It could be
> that I'm being unnecessarily pessimistic on this point, though.
>
> As a specific example of the code generator hacking on the CFG, take atomic
> operations which expand to loops on architectures which use
> load-reserved/store-conditional to implement these primitives. It may not
be
> necessary or desirable to add safe points to these loops, but perhaps
should
> be preferred on the basis o correctness.
>
As another example, consider a 64-bit integer divide on a
32-bit> architecture expanding to a libcall. Some, but perhaps not all, collection
> algorithms would want to emit safe point code for this call, but it simply
> does not exist in the IR to instrument.
>
I'm worried that it'll be necessary for some garbage collectors and hurt
performance too much for others. Should it be configurable? I don't need to
insert safe points in such loops and calls for my use, so I'd try to make it
extensible, but then leave support for these loops and calls to someone who
actually needs them.

Also, code injection of the form 'give me 8 bytes of noops at each
safe> point' and 'insert a cold instruction sequence at the end of the
function'
> are best expressed at the machine code level. Safe points are hot code and
> unusual, target-specific techniques are regularly used with them if you
> survey the literature, so a design which accommodates that reality is
> preferred, even though hacking on the MachineFunction representation is
less
> pleasant than the IR.
>
I'd like to allow target-specific techniques, but also allow
target-independent techniques, and the target-independent techniques are
more important for what I'm actually doing. Now that I've looked into
MachineInstructions more, it looks like it's impossible to use them in a
target-independent way.
>
> One element of this design that is desirable from a design perspective is
> that it preserves the original IR. Chris has said that it's a long-term
goal
> of LLVM to not mangle the Function during code generation, and this moves
in
> that direction instead of regressing.
>
I definitely like that feature.

Another way might be to run a pass over the IR inserting
llvm.safepoint()> calls, which could be implemented as a function in the module. Then
we'd
> want a MachineFunction pass to inline this for us during codegen. The
> llvm.safepoint() calls could be easily identified and removed if the IR
> needs to be re-used.
>
>
> I see this as fairly equivalent to the first option.
>
I don't, because the user can control how they lower llvm.safepoint(). For
uses like mine, where I'd rather write the safe point in IR, I can implement
it as a function (if I have a selectiondag-level inliner). For more advanced
garbage collectors, they can implement it as a custom lowering in their
target. The biggest downside of an llvm.safepoint intrinsic would be that it
couldn't capture loops and calls introduced by lowering, unless those
lowerings explicitly introduced new safepoints.
>
> Also, regardless, stop point markers (a label is actually generated) need
> to survive as such into the MachineFunction representation else we'll
never
> be able to generate a register map.
>
> Hope that helps,
> Gordon
>
>
> P.S. There's an interesting circularity which I may not have accounted
for
> in the original design: If code is injected at each safe point, and a call
> instruction is injected, do we need to generate *another* safe point for
> that call? Clearly, the expansion of a safe point cannot be recursive with
> itself; but I think that we should allow generating a register map at the
> return address of that call, just as some collectors may want to instrument
> the libcall case discussed above.
>
> Actually, this distinction between safe points for inserting code and safe
> points for frame maps is probably is a critical design issue for your use
> case. Our current definition of a safe point is at the return address of a
> call instruction, which is precisely what's required to walk the stack.
This
> is NOT the location where you want to add a call to your runtime.
>
I think you're saying that we need two kinds of safe points: points at which
to insert code, and points that the GCFunctionInfo::iterator traverses. The
code-inserting safe points could insert traversed safe points.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090623/57aa3d10/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Jun 2009 - [LLVMdev] Adding safe-point code generation

[LLVMdev] Adding safe-point code generation

[LLVMdev] Adding safe-point code generation

[LLVMdev] Adding safe-point code generation

[LLVMdev] Adding safe-point code generation

[LLVMdev] Adding safe-point code generation

Possibly Parallel Threads