thr3ads.net - llvm dev - [llvm-dev] LSR [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Jonas Paulsson via llvm-dev

2017-Apr-11 13:52 UTC

[llvm-dev] LSR

>> Has anyone any idea on how to best handle this? Can LSR
"split" an IV
>> to use an extra register? Or would this need to be done in a target 
>> specific pass?
>
> When you say "an extra address register" would this imply LSR
adding
> an additional PHI?
>
>  -Hal
>Yes, that would have worked well at least in this type of loop. Can LSR 
do this?

I experimented with adding a check for 12 bit offsets distance in 
isProfitableIncrement() (checking against all members of the chain), 
which resulted in several chains being produced by LSR, instead of just 
one. The chains that now formed now had immediate offsets that were 
close to each other, so that they should result in addresses with 12 bit 
offsets. But, to my disappointment, LSR did not handle these different 
chains by generating new PHI-nodes for each of them (or by skipping 
those that ended up with just one store in the chain), but instead it 
still output the stores in the same way as before.

I also see that LSR is thinking in terms of increments between the 
memory accesses. In the loop I am working with it's disappointing to see 
that before each memory access, the base address is loaded into 
register, and then the offset is added, and then the access, which is 3 
instructions. It should have been just an add/sub after the previous 
access before the memory access, per LSRs intentions. I wonder where 
this is supposed to be handled: In some sort of target pre-isel pass 
that chains the GEPs? Or is this just folded more often on other targets?

/Jonas

Hal Finkel via llvm-dev

2017-Apr-12 14:11 UTC

head link

[llvm-dev] LSR

On 04/11/2017 08:52 AM, Jonas Paulsson wrote:>
>>> Has anyone any idea on how to best handle this? Can LSR
"split" an
>>> IV to use an extra register? Or would this need to be done in a 
>>> target specific pass?
>>
>> When you say "an extra address register" would this imply LSR
adding
>> an additional PHI?
>>
>>  -Hal
>>
> Yes, that would have worked well at least in this type of loop. Can 
> LSR do this?
No, LSR won't add new PHIs. This is a long-standing deficiency (and 
also, in part, prevents it from properly handling pre/post-inc 
addressing modes, which motives some target-specific passes such as 
lib/Target/PowerPC/PPCLoopPreIncPrep.cpp).
>
> I experimented with adding a check for 12 bit offsets distance in 
> isProfitableIncrement() (checking against all members of the chain), 
> which resulted in several chains being produced by LSR, instead of 
> just one. The chains that now formed now had immediate offsets that 
> were close to each other, so that they should result in addresses with 
> 12 bit offsets. But, to my disappointment, LSR did not handle these 
> different chains by generating new PHI-nodes for each of them (or by 
> skipping those that ended up with just one store in the chain), but 
> instead it still output the stores in the same way as before.
>
> I also see that LSR is thinking in terms of increments between the 
> memory accesses. In the loop I am working with it's disappointing to 
> see that before each memory access, the base address is loaded into 
> register, and then the offset is added, and then the access, which is 
> 3 instructions. It should have been just an add/sub after the previous 
> access before the memory access, per LSRs intentions. I wonder where 
> this is supposed to be handled: In some sort of target pre-isel pass 
> that chains the GEPs? Or is this just folded more often on other targets?
As I recall, it does not do this now (although this is also needed for 
handling pre/post-inc addressing modes properly).

  -Hal
>
> /Jonas
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Jonas Paulsson via llvm-dev

2017-Apr-18 05:28 UTC

head link

[llvm-dev] LSR

Hi Hal,>
> No, LSR won't add new PHIs. This is a long-standing deficiency (and 
> also, in part, prevents it from properly handling pre/post-inc 
> addressing modes, which motives some target-specific passes such as 
> lib/Target/PowerPC/PPCLoopPreIncPrep.cpp).
>I do find in this regression that gcc manages to use four address 
registers, while llvm uses just one, with a lot of extra address 
building instructions as a result, where the gcc loop looks very clean. 
I suspect that this could be a needed feature. Would you recommend doing 
this (splitting PHIs) as a "LSRPrep" pass, perhaps in the target, or 
would you try to extend LSR itself?
>>
>>
>> I also see that LSR is thinking in terms of increments between the 
>> memory accesses. In the loop I am working with it's disappointing
to
>> see that before each memory access, the base address is loaded into 
>> register, and then the offset is added, and then the access, which is 
>> 3 instructions. It should have been just an add/sub after the 
>> previous access before the memory access, per LSRs intentions. I 
>> wonder where this is supposed to be handled: In some sort of target 
>> pre-isel pass that chains the GEPs? Or is this just folded more often 
>> on other targets?
>
> As I recall, it does not do this now (although this is also needed for 
> handling pre/post-inc addressing modes properly).
>Same question - It might be simpler to do a separate post-LSR GEP 
handling (in CodeGenPrepare, perhaps?), but I suspect it would also be 
possible to extend LSR to do this instead?

/Jonas

llvm dev - Apr 2017 - LSR

[llvm-dev] LSR

[llvm-dev] LSR

[llvm-dev] LSR