thr3ads.net - llvm dev - [LLVMdev] "Machine LICM" for Constants? [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Matt Johnson

2012-Mar-07 18:38 UTC

[LLVMdev] "Machine LICM" for Constants?

Hi All,
     I work on a backend for a target similar to Mips, where large 
immediates are loaded into registers with 2 instructions, 1 to load the 
MSBits and 1 to load the LSBits.  I've noticed a recurring pattern 
where, despite low register pressure, these constants will be 
rematerialized in every iteration of a loop, rather than being hoisted.  
Here's an example using the mips-unknown-unknown target and Clang/LLVM 
HEAD.  From newlib's implementation of strncat:

#define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080)
while (!DETECTNULL (*aligned_s1))
   aligned_s1++;

This loop gets lowered under -O3 to:

$BB0_5:
   lui $3, 32896
   lui $7, 65278
   ori $3, $3, 32896 ###### Materialize 0x80808080
   lw  $8, 4($2)
   nop
   and $9, $8, $3
   ori $7, $7, 65279 ###### Materialize -(0x01010101)
   addiu $2, $2, 4
   xor $3, $9, $3
   addu  $7, $8, $7
   and $3, $3, $7
   beq $3, $zero, $BB0_5


There are a ton of unused caller-saved registers in this small function, 
so I expected the constant materialization to be hoisted out of the 
tight loop.  I'm still learning about the new register allocator and am 
not immediately able to make sense of its debug output (and the 
'problem' may be elsewhere in any case).  I'm happy to post the
results
of -debug-only regalloc if they're useful.

Is my desire to hoist the constants out of the loop reasonable?  Is 
there something I can do (hints or passes in my backend, clang/opt flag, 
etc.) to make this happen today?  If not, what is the root cause?  Maybe 
there's no way to hoist things out of a loop once IR is lowered into a 
SelectionDAG?

Thanks,
Matt

Evan Cheng

2012-Mar-07 20:45 UTC

head link

[LLVMdev] "Machine LICM" for Constants?

Yes machine-licm can and should hoist constant materialization instructions out
of the loop. If it's not doing that, it's probably because the target is
not modeling the instruction correctly. I would walk through
MachineLICM::IsLoopInvariantInst() in the debugger to figure it out. You can
also try compiling the same bitcode for a target like ARM or X86 as a
comparison.

Evan

On Mar 7, 2012, at 10:38 AM, Matt Johnson wrote:
> Hi All,
>     I work on a backend for a target similar to Mips, where large 
> immediates are loaded into registers with 2 instructions, 1 to load the 
> MSBits and 1 to load the LSBits.  I've noticed a recurring pattern 
> where, despite low register pressure, these constants will be 
> rematerialized in every iteration of a loop, rather than being hoisted.  
> Here's an example using the mips-unknown-unknown target and Clang/LLVM 
> HEAD.  From newlib's implementation of strncat:
> 
> #define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080)
> while (!DETECTNULL (*aligned_s1))
>   aligned_s1++;
> 
> This loop gets lowered under -O3 to:
> 
> $BB0_5:
>   lui $3, 32896
>   lui $7, 65278
>   ori $3, $3, 32896 ###### Materialize 0x80808080
>   lw  $8, 4($2)
>   nop
>   and $9, $8, $3
>   ori $7, $7, 65279 ###### Materialize -(0x01010101)
>   addiu $2, $2, 4
>   xor $3, $9, $3
>   addu  $7, $8, $7
>   and $3, $3, $7
>   beq $3, $zero, $BB0_5
> 
> 
> There are a ton of unused caller-saved registers in this small function, 
> so I expected the constant materialization to be hoisted out of the 
> tight loop.  I'm still learning about the new register allocator and am
> not immediately able to make sense of its debug output (and the 
> 'problem' may be elsewhere in any case).  I'm happy to post the
results
> of -debug-only regalloc if they're useful.
> 
> Is my desire to hoist the constants out of the loop reasonable?  Is 
> there something I can do (hints or passes in my backend, clang/opt flag, 
> etc.) to make this happen today?  If not, what is the root cause?  Maybe 
> there's no way to hoist things out of a loop once IR is lowered into a 
> SelectionDAG?
> 
> Thanks,
> Matt
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Matt Johnson

2012-Mar-08 02:28 UTC

head link

[LLVMdev] "Machine LICM" for Constants?

Thanks for the tip!  I looked into it and it looks like the problem as 
of SVN HEAD is that the lui and ori instructions in Mips are considered 
cheap (1-cycle def-use latency) by MachineLICM::IsCheapInstruction(), 
but are not trivially materializable because their register operands are 
not always available.  This makes MachineLICM::IsProfitableToHoist() 
return false, preventing the hoist even though 
MachineLICM::IsLoopInvariantInst() returns true.

The comment in IsProfitableToHoist() is:

// If the instruction is cheap, only hoist if it is re-materilizable 
[sic]. LICM
// will increase register pressure. It's probably not worth it if the
// instruction is cheap.

The function then proceeds to actually *estimate* register pressure for 
non-cheap instructions to determine whether or not to hoist them.
This heuristic seems reasonable, but doesn't seem to do the right thing 
in this case.  Hacking the instruction itineraries to make the 
instructions not seem cheap doesn't seem like the right answer either.  
I'm guessing the motivation for this heuristic is that, in a loop with 
many possible hoists, some cheap and some expensive, we would prefer to 
hoist the expensive ones rather than wasting all our register slack on 
the cheap ones.

Is there another way to accomplish this goal while still performing the 
hoist in situations where register pressure is low enough?  Say, 
considering the instructions in a loop for hoisting in descending order 
of cost, rather than in program order?

Note that ARM gets around this by creating a pseudo-instruction for 
32-bit immediate loads (MOVi32imm) , rather than putting a pattern 
directly in ARMInstrInfo.td.  This fused instruction *is* 
rematerializable (since it defines the entire register), even though 
either of the two half-register instructions by themselves cannot be.  
This is one way my target and Mips could hack around the problem, but 
for my target at least it has the disadvantage of having to add an 
ExpandPseudo pass to my backend and put logic in C++ that seems (IMO) to 
belong in TableGen.

-Matt

On 03/07/2012 02:45 PM, Evan Cheng wrote:> Yes machine-licm can and should hoist constant materialization instructions
out of the loop. If it's not doing that, it's probably because the
target is not modeling the instruction correctly. I would walk through
MachineLICM::IsLoopInvariantInst() in the debugger to figure it out. You can
also try compiling the same bitcode for a target like ARM or X86 as a
comparison.
>
> Evan
>
> On Mar 7, 2012, at 10:38 AM, Matt Johnson wrote:
>
>> Hi All,
>>      I work on a backend for a target similar to Mips, where large
>> immediates are loaded into registers with 2 instructions, 1 to load the
>> MSBits and 1 to load the LSBits.  I've noticed a recurring pattern
>> where, despite low register pressure, these constants will be
>> rematerialized in every iteration of a loop, rather than being hoisted.
>> Here's an example using the mips-unknown-unknown target and
Clang/LLVM
>> HEAD.  From newlib's implementation of strncat:
>>
>> #define DETECTNULL(X) (((X) - 0x01010101)&  ~(X)&  0x80808080)
>> while (!DETECTNULL (*aligned_s1))
>>    aligned_s1++;
>>
>> This loop gets lowered under -O3 to:
>>
>> $BB0_5:
>>    lui $3, 32896
>>    lui $7, 65278
>>    ori $3, $3, 32896 ###### Materialize 0x80808080
>>    lw  $8, 4($2)
>>    nop
>>    and $9, $8, $3
>>    ori $7, $7, 65279 ###### Materialize -(0x01010101)
>>    addiu $2, $2, 4
>>    xor $3, $9, $3
>>    addu  $7, $8, $7
>>    and $3, $3, $7
>>    beq $3, $zero, $BB0_5
>>
>>
>> There are a ton of unused caller-saved registers in this small
function,
>> so I expected the constant materialization to be hoisted out of the
>> tight loop.  I'm still learning about the new register allocator
and am
>> not immediately able to make sense of its debug output (and the
>> 'problem' may be elsewhere in any case).  I'm happy to post
the results
>> of -debug-only regalloc if they're useful.
>>
>> Is my desire to hoist the constants out of the loop reasonable?  Is
>> there something I can do (hints or passes in my backend, clang/opt
flag,
>> etc.) to make this happen today?  If not, what is the root cause? 
Maybe
>> there's no way to hoist things out of a loop once IR is lowered
into a
>> SelectionDAG?
>>
>> Thanks,
>> Matt
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Mar 2012 - [LLVMdev] "Machine LICM" for Constants?

[LLVMdev] "Machine LICM" for Constants?

[LLVMdev] "Machine LICM" for Constants?

[LLVMdev] "Machine LICM" for Constants?

Seemingly Similar Threads