thr3ads.net - llvm dev - [llvm-dev] imm COPY generated by PHI elim not propagated [Nov 2019]

If this information is useful, please help other people find it:
Share via:

Ryan Taylor via llvm-dev

2019-Nov-20 16:47 UTC

[llvm-dev] imm COPY generated by PHI elim not propagated

I was looking at writing a pass after PHI elim to do this, just trying to
dump the reaching def MIs but get lots of no live segments issues. Have
included addREquired and addPreserved for LiveIntervals and
setPreservesAll().

-Ryan

On Fri, Nov 15, 2019 at 2:58 PM Quentin Colombet <qcolombet at apple.com>
wrote:
> You could do it after RA and before rewrite, when you still have the live
> intervals around.
>
> On Nov 15, 2019, at 11:16 AM, Ryan Taylor <ryta1203 at gmail.com>
wrote:
>
> This would require getting the reaching definition which requires live
> intervals analysis.
>
>
> On Thu, Nov 14, 2019 at 12:15 PM Quentin Colombet <qcolombet at
apple.com>
> wrote:
>
>> That sounds like the folding could be done when you expand the copy in
>> expand pseudo after regalloc.
>>
>> > On Nov 14, 2019, at 12:20 AM, Arsenault, Matthew <
>> Matthew.Arsenault at amd.com> wrote:
>> >
>> > In this case the load imm is foldable into the copy, once
converted to
>> a mov. Directly folding this would be 4 v_mov_b32 instead of 5 produced
>> currently
>> >
>> > -Matt
>> >
>> > On 11/14/19, 07:20, "llvm-dev on behalf of Quentin Colombet
via
>> llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> >    Hi Ryan,
>> >
>> >    Unless you can fold your immediate directly in an instruction,
it is
>> actually not profitable to propagate them. Indeed you will end up with
a
>> bunch of load imm instead of reusing a register that already hold this
>> value.
>> >
>> >    The way it works right now is, if holding this value in a
register
>> is too expensive, i.e., it triggers a spill, then we rematerialize the
>> immediate instead of holding a register for it.
>> >
>> >    Cheers,
>> >    -Quentin
>> >
>> >> On Nov 13, 2019, at 7:36 AM, Ryan Taylor via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> I have some code such that:
>> >>
>> >> vgpr1 = mov 0
>> >> branch bb
>> >> bb:
>> >> PHI vgpr2 = vgpr1, ….
>> >> PHI vgpr3 = vgpr1, ….
>> >> PHI vgpr4 = vgpr1, ….
>> >> PHI vgpr5 = vgpr1, ….
>> >>
>> >> PHI node elimination is generating copies for all these PHIs
(and
>> hoisting them) as such:
>> >>
>> >> vgpr1 = 0
>> >> vgpr20 = COPY vgpr1 // old vgpr2
>> >> vgpr30 = COPY vgpr1 // old vgpr3
>> >> vgpr40 = COPY vgpr1 // old vgpr4
>> >> vgpr 50 = COPY vgprt1 // old vgpr5
>> >>
>> >> I expect the zero to get propagated in a later phase but
it's not. I
>> was looking at adding immediate folding to the register coalescer but
this
>> doesn't really seem like the right place. Any suggestions?
>> >>
>> >> I'm sort of surprised that other targets haven't run
into this issue.
>> >>
>> >> -Ryan
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> >    _______________________________________________
>> >    LLVM Developers mailing list
>> >    llvm-dev at lists.llvm.org
>> >    https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> >
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191120/2d289239/attachment-0001.html>

Quentin Colombet via llvm-dev

2019-Nov-20 17:40 UTC

head link

[llvm-dev] imm COPY generated by PHI elim not propagated

I think doing that before reg alloc (and thus right after phi elimination) is
too early.

By doing this that early we will lose the opportunity to coalesce all the
copies.
In other words what I am saying is a bunch of copies may be worse than just a
few load immediate, but one load immediate reused thanks to copy coalescing is
better than a few load immediate.

My 2c ;)
> Le 20 nov. 2019 à 08:47, Ryan Taylor <ryta1203 at gmail.com> a écrit
:
> 
> 
> I was looking at writing a pass after PHI elim to do this, just trying to
dump the reaching def MIs but get lots of no live segments issues. Have included
addREquired and addPreserved for LiveIntervals and setPreservesAll().
> 
> -Ryan
> 
>> On Fri, Nov 15, 2019 at 2:58 PM Quentin Colombet <qcolombet at
apple.com> wrote:
>> You could do it after RA and before rewrite, when you still have the
live intervals around.
>> 
>>> On Nov 15, 2019, at 11:16 AM, Ryan Taylor <ryta1203 at
gmail.com> wrote:
>>> 
>>> This would require getting the reaching definition which requires
live intervals analysis.
>>> 
>>> 
>>>> On Thu, Nov 14, 2019 at 12:15 PM Quentin Colombet <qcolombet
at apple.com> wrote:
>>>> That sounds like the folding could be done when you expand the
copy in expand pseudo after regalloc.
>>>> 
>>>> > On Nov 14, 2019, at 12:20 AM, Arsenault, Matthew
<Matthew.Arsenault at amd.com> wrote:
>>>> > 
>>>> > In this case the load imm is foldable into the copy, once
converted to a mov. Directly folding this would be 4 v_mov_b32 instead of 5
produced currently
>>>> > 
>>>> > -Matt
>>>> > 
>>>> > On 11/14/19, 07:20, "llvm-dev on behalf of Quentin
Colombet via llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of
llvm-dev at lists.llvm.org> wrote:
>>>> > 
>>>> >    Hi Ryan,
>>>> > 
>>>> >    Unless you can fold your immediate directly in an
instruction, it is actually not profitable to propagate them. Indeed you will
end up with a bunch of load imm instead of reusing a register that already hold
this value.
>>>> > 
>>>> >    The way it works right now is, if holding this value in
a register is too expensive, i.e., it triggers a spill, then we rematerialize
the immediate instead of holding a register for it.
>>>> > 
>>>> >    Cheers,
>>>> >    -Quentin
>>>> > 
>>>> >> On Nov 13, 2019, at 7:36 AM, Ryan Taylor via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>> >> 
>>>> >> I have some code such that:
>>>> >> 
>>>> >> vgpr1 = mov 0
>>>> >> branch bb
>>>> >> bb:
>>>> >> PHI vgpr2 = vgpr1, ….
>>>> >> PHI vgpr3 = vgpr1, ….
>>>> >> PHI vgpr4 = vgpr1, ….
>>>> >> PHI vgpr5 = vgpr1, ….
>>>> >> 
>>>> >> PHI node elimination is generating copies for all
these PHIs (and hoisting them) as such:
>>>> >> 
>>>> >> vgpr1 = 0
>>>> >> vgpr20 = COPY vgpr1 // old vgpr2
>>>> >> vgpr30 = COPY vgpr1 // old vgpr3
>>>> >> vgpr40 = COPY vgpr1 // old vgpr4
>>>> >> vgpr 50 = COPY vgprt1 // old vgpr5
>>>> >> 
>>>> >> I expect the zero to get propagated in a later phase
but it's not. I was looking at adding immediate folding to the register
coalescer but this doesn't really seem like the right place. Any
suggestions?
>>>> >> 
>>>> >> I'm sort of surprised that other targets
haven't run into this issue.
>>>> >> 
>>>> >> -Ryan
>>>> >> 
>>>> >> 
>>>> >> _______________________________________________
>>>> >> LLVM Developers mailing list
>>>> >> llvm-dev at lists.llvm.org
>>>> >>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> > 
>>>> >    _______________________________________________
>>>> >    LLVM Developers mailing list
>>>> >    llvm-dev at lists.llvm.org
>>>> >   
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> > 
>>>> > 
>>>> 
>> -------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191120/f9943fe1/attachment-0001.html>

Ryan Taylor via llvm-dev

2019-Nov-20 19:17 UTC

head link

[llvm-dev] imm COPY generated by PHI elim not propagated

Quentin,

 Can you be more specific please? If I call the pass in addPostRegAlloc llc
crashes after Slot index numbering with:

LLVM ERROR: Invalid global physical register

I'm not sure how this makes sense.

-Ryan

On Wed, Nov 20, 2019 at 12:41 PM Quentin Colombet <qcolombet at apple.com>
wrote:
> I think doing that before reg alloc (and thus right after phi elimination)
> is too early.
>
> By doing this that early we will lose the opportunity to coalesce all the
> copies.
> In other words what I am saying is a bunch of copies may be worse than
> just a few load immediate, but one load immediate reused thanks to copy
> coalescing is better than a few load immediate.
>
> My 2c ;)
>
> Le 20 nov. 2019 à 08:47, Ryan Taylor <ryta1203 at gmail.com> a écrit
:
>
> 
> I was looking at writing a pass after PHI elim to do this, just trying to
> dump the reaching def MIs but get lots of no live segments issues. Have
> included addREquired and addPreserved for LiveIntervals and
> setPreservesAll().
>
> -Ryan
>
> On Fri, Nov 15, 2019 at 2:58 PM Quentin Colombet <qcolombet at
apple.com>
> wrote:
>
>> You could do it after RA and before rewrite, when you still have the
live
>> intervals around.
>>
>> On Nov 15, 2019, at 11:16 AM, Ryan Taylor <ryta1203 at gmail.com>
wrote:
>>
>> This would require getting the reaching definition which requires live
>> intervals analysis.
>>
>>
>> On Thu, Nov 14, 2019 at 12:15 PM Quentin Colombet <qcolombet at
apple.com>
>> wrote:
>>
>>> That sounds like the folding could be done when you expand the copy
in
>>> expand pseudo after regalloc.
>>>
>>> > On Nov 14, 2019, at 12:20 AM, Arsenault, Matthew <
>>> Matthew.Arsenault at amd.com> wrote:
>>> >
>>> > In this case the load imm is foldable into the copy, once
converted to
>>> a mov. Directly folding this would be 4 v_mov_b32 instead of 5
produced
>>> currently
>>> >
>>> > -Matt
>>> >
>>> > On 11/14/19, 07:20, "llvm-dev on behalf of Quentin
Colombet via
>>> llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of
>>> llvm-dev at lists.llvm.org> wrote:
>>> >
>>> >    Hi Ryan,
>>> >
>>> >    Unless you can fold your immediate directly in an
instruction, it
>>> is actually not profitable to propagate them. Indeed you will end
up with a
>>> bunch of load imm instead of reusing a register that already hold
this
>>> value.
>>> >
>>> >    The way it works right now is, if holding this value in a
register
>>> is too expensive, i.e., it triggers a spill, then we rematerialize
the
>>> immediate instead of holding a register for it.
>>> >
>>> >    Cheers,
>>> >    -Quentin
>>> >
>>> >> On Nov 13, 2019, at 7:36 AM, Ryan Taylor via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> >>
>>> >> I have some code such that:
>>> >>
>>> >> vgpr1 = mov 0
>>> >> branch bb
>>> >> bb:
>>> >> PHI vgpr2 = vgpr1, ….
>>> >> PHI vgpr3 = vgpr1, ….
>>> >> PHI vgpr4 = vgpr1, ….
>>> >> PHI vgpr5 = vgpr1, ….
>>> >>
>>> >> PHI node elimination is generating copies for all these
PHIs (and
>>> hoisting them) as such:
>>> >>
>>> >> vgpr1 = 0
>>> >> vgpr20 = COPY vgpr1 // old vgpr2
>>> >> vgpr30 = COPY vgpr1 // old vgpr3
>>> >> vgpr40 = COPY vgpr1 // old vgpr4
>>> >> vgpr 50 = COPY vgprt1 // old vgpr5
>>> >>
>>> >> I expect the zero to get propagated in a later phase but
it's not. I
>>> was looking at adding immediate folding to the register coalescer
but this
>>> doesn't really seem like the right place. Any suggestions?
>>> >>
>>> >> I'm sort of surprised that other targets haven't
run into this issue.
>>> >>
>>> >> -Ryan
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> LLVM Developers mailing list
>>> >> llvm-dev at lists.llvm.org
>>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >
>>> >    _______________________________________________
>>> >    LLVM Developers mailing list
>>> >    llvm-dev at lists.llvm.org
>>> >    https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >
>>> >
>>>
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191120/5a6eb07b/attachment-0001.html>

llvm dev - Nov 2019 - imm COPY generated by PHI elim not propagated

[llvm-dev] imm COPY generated by PHI elim not propagated

[llvm-dev] imm COPY generated by PHI elim not propagated

[llvm-dev] imm COPY generated by PHI elim not propagated