thr3ads.net - llvm dev - [LLVMdev] Representing a safepoint as an instruction in the x86 backend? [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Philip Reames

2014-Feb-26 01:32 UTC

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

I've got a pseudo instruction with some tricky semantics I need help 
figuring out how to encode properly.  For those interested, this is to 
support fully relocating garbage collection.  I'm going to try to 
express the requirements clearly so that you don't need to understand 
the use case in detail.

My end goal is to capture a list of registers and/or stack offsets for a 
list of virtual registers (known explicitly inside SelectionDAGBuilder) 
at the PC following a call instruction.  In particular, I need to be 
able to update all physical copies of these virtual registers.  I've 
decided to approach this by introducing a new psuedo instruction with 
these semantics, but if anyone has an alternate approach they'd 
recommend, I'm open to that too.

Here's the semantics of my 'instruction':
- It must immediately follow a call instruction, before any return value 
copies, or frame manipulation.  Not every call has a following safepoint.
- It has a variable number of arguments (virtual registers).  All can be 
both read and written.
- It can handle any combination of stack locations and registers. 
Ideally, it should not effect register allocation.

The approach I've taken to date is based in part on the work done for 
PATCHPOINT.  Here's what I've done:
- Introduced a SAFEPOINT psuedo instruction
- Reverse engineered the CALLSEQ_* series of nodes to insert my node 
after the CALL node in both glue and chain sequences.  (BTW, is there 
any documentation on the call sequence?  I think I've reverse engineered 
it correctly, but I'm not completely sure.)
- Introduced folding logic in foldMemoryOperand (analogous to 
PATCHPOINT, but which marks both load and store) -- this is where my 
problem currently lies
- Inserted code during MCInstLower to record the statepoint

The problem with this is that a reload from a stack slot will sometimes 
be inserted between the CALL and the SAFEPOINT.  This is problematic 
since we are no longer recording the list of locations at the site of 
the call itself.  If the recorded information is used during the 
lifetime of the subroutine call, the wrong locations would be updated.  
That would be "bad".

The reason for this is that the folding logic only applies if there's a 
single use of the physical register.  If there's more than one use, it's
assumed to be cheaper to reload than to perform two folded operations 
against memory.  (I don't know if this is true always, but more 
importantly for me, it breaks my intended semantics.)

Does anyone know of a way to avoid the fold step to begin with?  I'd 
really like the register allocation to not give preference to register 
uses for this instruction.  If a virtual register is already in the 
stack, it shouldn't attempt to reload before this instruction. I haven't
been able to find the appropriate hook for this.

I can go ahead and hack the folding code to unconditionally fold into 
SAFEPOINTs and move the load after the SAFEPOINT, but that feels like an 
utter hack.  Before going down the road, does anyone have a better 
suggestion?

I'm very open to suggestions here.  If I'm taking the wrong approach or 
something sounds like it doesn't work the way I've described, please 
point it out.  I will freely admit this is my first serious endeavour 
into the x86 backend and that I'm learning as I go.

Philip

Note: For the moment, this is all x86 specific.  Most of it could be 
made architecture independent without too much effort.

Juergen Ributzka

2014-Feb-27 21:08 UTC

head link

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

Hi Philip,

wouldn’t it be easier to adapt the patchpoint intrinsic to allow it to take also
a function address/global address as target argument. We are already tapping
into the target-specific call lowering, so the call would be lowered exactly the
same way. By doing this you don’t have to worry about anything getting inserted
between your CALL and SAFEPOINT instruction. I know the GLUE flag is supposed to
keep things together in the call sequence, but I think it is very fragile to
rely on this.

On Feb 25, 2014, at 5:32 PM, Philip Reames <listmail at philipreames.com>
wrote:
> The reason for this is that the folding logic only applies if there's a
single use of the physical register.  If there's more than one use, it's
assumed to be cheaper to reload than to perform two folded operations against
memory.  (I don't know if this is true always, but more importantly for me,
it breaks my intended semantics.)
> 
Yes, that is true and the code would break if you remove that constraint,
because it would delete the instruction without checking if there are still any
uses remaining - although this should be easy to fix.
> Does anyone know of a way to avoid the fold step to begin with?  I'd
really like the register allocation to not give preference to register uses for
this instruction.  If a virtual register is already in the stack, it
shouldn't attempt to reload before this instruction. I haven't been able
to find the appropriate hook for this.
I thought Andy did something about this for patchpoints and stackmaps.
Didn't we handled that in the target-specific frame index elimination?

-Juergen

Andrew Trick

2014-Feb-27 22:42 UTC

head link

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

On Feb 27, 2014, at 1:08 PM, Juergen Ributzka <juergen at apple.com>
wrote:
> wouldn’t it be easier to adapt the patchpoint intrinsic to allow it to take
also a function address/global address as target argument. We are already
tapping into the target-specific call lowering, so the call would be lowered
exactly the same way. By doing this you don’t have to worry about anything
getting inserted between your CALL and SAFEPOINT instruction. I know the GLUE
flag is supposed to keep things together in the call sequence, but I think it is
very fragile to rely on this.

GLUE is only for keeping things together during ISEL because SelectionDAG does
not represent dependencies on named resources like physical registers.

Juergen is right about using patchpoint. Part of the reason that we have two
intrinsics, stackmap and patchpoint[1], is that patchpoint allows you to embed a
call instruction. A pseudo instruction is the only way to “glue” operations
through the entire code generator (let’s ignore MachineBundles for now).

We always meant to have patchpoint take a function address. I think there are
just a few FIXMEs to take care of to make it work.

-Andy

[1] I would really prefer to have only one patchpoint intrinsic that can be used
for stack maps too, but the semantics for reserving encoding space don’t quite
match up.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140227/aa14b346/attachment.html>

Andrew Trick

2014-Feb-27 22:49 UTC

head link

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

On Feb 25, 2014, at 5:32 PM, Philip Reames <listmail at philipreames.com>
wrote:
> I've got a pseudo instruction with some tricky semantics I need help
figuring out how to encode properly.  For those interested, this is to support
fully relocating garbage collection.  I'm going to try to express the
requirements clearly so that you don't need to understand the use case in
detail.
> 
> My end goal is to capture a list of registers and/or stack offsets for a
list of virtual registers (known explicitly inside SelectionDAGBuilder) at the
PC following a call instruction.  In particular, I need to be able to update all
physical copies of these virtual registers.  I've decided to approach this
by introducing a new psuedo instruction with these semantics, but if anyone has
an alternate approach they'd recommend, I'm open to that too.
> 
> Here's the semantics of my 'instruction':
> - It must immediately follow a call instruction, before any return value
copies, or frame manipulation.  Not every call has a following safepoint.
> - It has a variable number of arguments (virtual registers).  All can be
both read and written.
> - It can handle any combination of stack locations and registers. Ideally,
it should not effect register allocation.
> 
> The approach I've taken to date is based in part on the work done for
PATCHPOINT.  Here's what I've done:
> - Introduced a SAFEPOINT psuedo instruction
> - Reverse engineered the CALLSEQ_* series of nodes to insert my node after
the CALL node in both glue and chain sequences.  (BTW, is there any
documentation on the call sequence?  I think I've reverse engineered it
correctly, but I'm not completely sure.)
> - Introduced folding logic in foldMemoryOperand (analogous to PATCHPOINT,
but which marks both load and store) -- this is where my problem currently lies
> - Inserted code during MCInstLower to record the statepoint
> 
> The problem with this is that a reload from a stack slot will sometimes be
inserted between the CALL and the SAFEPOINT.  This is problematic since we are
no longer recording the list of locations at the site of the call itself.  If
the recorded information is used during the lifetime of the subroutine call, the
wrong locations would be updated.  That would be "bad".
> 
> The reason for this is that the folding logic only applies if there's a
single use of the physical register.  If there's more than one use, it's
assumed to be cheaper to reload than to perform two folded operations against
memory.  (I don't know if this is true always, but more importantly for me,
it breaks my intended semantics.)
> 
> Does anyone know of a way to avoid the fold step to begin with?  I'd
really like the register allocation to not give preference to register uses for
this instruction.  If a virtual register is already in the stack, it
shouldn't attempt to reload before this instruction. I haven't been able
to find the appropriate hook for this.
> 
> I can go ahead and hack the folding code to unconditionally fold into
SAFEPOINTs and move the load after the SAFEPOINT, but that feels like an utter
hack.  Before going down the road, does anyone have a better suggestion?
Anything we do to avoid the reload is purely an optimization. There is no way to
guarantee it. I think you really need your pseudo instruction to include the
call, which is exactly what patchpoint was designed for.

That said, it would be nice to be more aggressive in folding reloads into
stackmaps/patchpoints. It sounds like you are already making use of the
foldPatchPoint logic that I added to avoid the reloads. We just need some higher
level logic to handle more cases. I’m not sure how hard it will be. See
InlineSpiller::foldMemoryOperand.

-Andy
> 
> I'm very open to suggestions here.  If I'm taking the wrong
approach or something sounds like it doesn't work the way I've
described, please point it out.  I will freely admit this is my first serious
endeavour into the x86 backend and that I'm learning as I go.
> 
> Philip
> 
> Note: For the moment, this is all x86 specific.  Most of it could be made
architecture independent without too much effort.

Philip Reames

2014-Feb-28 00:56 UTC

head link

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

On 02/27/2014 01:08 PM, Juergen Ributzka wrote:> Hi Philip,
>
> wouldn’t it be easier to adapt the patchpoint intrinsic to allow it to take
also a function address/global address as target argument. We are already
tapping into the target-specific call lowering, so the call would be lowered
exactly the same way. By doing this you don’t have to worry about anything
getting inserted between your CALL and SAFEPOINT instruction. I know the GLUE
flag is supposed to keep things together in the call sequence, but I think it is
very fragile to rely on this.Let me rephrase your suggestion slightly to make sure I understood it.  
You're suggesting that I replace the CALL node with a PATCHPOINT node 
which includes both the actual call arguments and the stack map 
arguments.  This could be done by extending the existing IR level 
patchpoint intrinsic, but could also be done at the SelectionDAGBuilder 
level.  This pseudo instruction would be lowered to the actual call in 
MCInstLower.

Thinking about it, I think this would work.  Originally, I'd thought 
there was an issue with distinguishing true arguments vs stack map 
arguments, but it looks like the PATCHPOINT code in recordPatchPoint 
already knows how to do that.  Was that added later?  I don't remember 
seeing this.
>   
> On Feb 25, 2014, at 5:32 PM, Philip Reames <listmail at
philipreames.com> wrote:
>
>> The reason for this is that the folding logic only applies if
there's a single use of the physical register.  If there's more than one
use, it's assumed to be cheaper to reload than to perform two folded
operations against memory.  (I don't know if this is true always, but more
importantly for me, it breaks my intended semantics.)
>>
> Yes, that is true and the code would break if you remove that constraint,
because it would delete the instruction without checking if there are still any
uses remaining - although this should be easy to fix.
Agreed.  On both parts.>
>> Does anyone know of a way to avoid the fold step to begin with? 
I'd really like the register allocation to not give preference to register
uses for this instruction.  If a virtual register is already in the stack, it
shouldn't attempt to reload before this instruction. I haven't been able
to find the appropriate hook for this.
> I thought Andy did something about this for patchpoints and stackmaps.
Didn't we handled that in the target-specific frame index elimination?It's incomplete.  Or at least it was when used with the scheme I 
originally described.  I'll have to see how it fits after moving to the 
CALL replacement scheme.

Philip

Philip Reames

2014-Feb-28 01:04 UTC

head link

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

On 02/27/2014 02:49 PM, Andrew Trick wrote:> On Feb 25, 2014, at 5:32 PM, Philip Reames <listmail at
philipreames.com> wrote:
>> Does anyone know of a way to avoid the fold step to begin with? 
I'd really like the register allocation to not give preference to register
uses for this instruction.  If a virtual register is already in the stack, it
shouldn't attempt to reload before this instruction. I haven't been able
to find the appropriate hook for this.
>>
>> I can go ahead and hack the folding code to unconditionally fold into
SAFEPOINTs and move the load after the SAFEPOINT, but that feels like an utter
hack.  Before going down the road, does anyone have a better suggestion?
> Anything we do to avoid the reload is purely an optimization. There is no
way to guarantee it. I think you really need your pseudo instruction to include
the call, which is exactly what patchpoint was designed for.I'm probably going to move in that direction, but even with the 
patchpoint implementation this isn't strictly true.

If my stack map has more arguments than physical registers, it's 
mandatory that the patchpoint not require a load.  Since writing my 
original email, I think I've figured out how to address this in the 
register allocator.  I'll try to clean this up and send a patch.

Consider the following pseudo code:
(ptr1, ..., ptr200) = initialize_pointers();
call some_function() // safepoint needed here
use(ptr1, ..., ptr200) //more than one use to prevent folding

I'm not entirely sure the above applies to the patchpoint 
implementation.  I know it definitely applied to my previous approach.  
I'll try to create a test case using only patchpoint and share it.
>
> That said, it would be nice to be more aggressive in folding reloads into
stackmaps/patchpoints. It sounds like you are already making use of the
foldPatchPoint logic that I added to avoid the reloads. We just need some higher
level logic to handle more cases. I’m not sure how hard it will be. See
InlineSpiller::foldMemoryOperand.If you have suggestions for specific missing cases, let me know.

p.s. I'll need to extend PATCHPOINT semantics in at least minor ways.  
Every argument is both readable and *writeable*.  This is different than 
the existing intrinsics.

Philip

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Feb 2014 - [LLVMdev] Representing a safepoint as an instruction in the x86 backend?

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

Possibly Parallel Threads