Thanks for the tip! I looked into it and it looks like the problem as
of SVN HEAD is that the lui and ori instructions in Mips are considered
cheap (1-cycle def-use latency) by MachineLICM::IsCheapInstruction(),
but are not trivially materializable because their register operands are
not always available. This makes MachineLICM::IsProfitableToHoist()
return false, preventing the hoist even though
MachineLICM::IsLoopInvariantInst() returns true.
The comment in IsProfitableToHoist() is:
// If the instruction is cheap, only hoist if it is re-materilizable
[sic]. LICM
// will increase register pressure. It's probably not worth it if the
// instruction is cheap.
The function then proceeds to actually *estimate* register pressure for
non-cheap instructions to determine whether or not to hoist them.
This heuristic seems reasonable, but doesn't seem to do the right thing
in this case. Hacking the instruction itineraries to make the
instructions not seem cheap doesn't seem like the right answer either.
I'm guessing the motivation for this heuristic is that, in a loop with
many possible hoists, some cheap and some expensive, we would prefer to
hoist the expensive ones rather than wasting all our register slack on
the cheap ones.
Is there another way to accomplish this goal while still performing the
hoist in situations where register pressure is low enough? Say,
considering the instructions in a loop for hoisting in descending order
of cost, rather than in program order?
Note that ARM gets around this by creating a pseudo-instruction for
32-bit immediate loads (MOVi32imm) , rather than putting a pattern
directly in ARMInstrInfo.td. This fused instruction *is*
rematerializable (since it defines the entire register), even though
either of the two half-register instructions by themselves cannot be.
This is one way my target and Mips could hack around the problem, but
for my target at least it has the disadvantage of having to add an
ExpandPseudo pass to my backend and put logic in C++ that seems (IMO) to
belong in TableGen.
-Matt
On 03/07/2012 02:45 PM, Evan Cheng wrote:> Yes machine-licm can and should hoist constant materialization instructions
out of the loop. If it's not doing that, it's probably because the
target is not modeling the instruction correctly. I would walk through
MachineLICM::IsLoopInvariantInst() in the debugger to figure it out. You can
also try compiling the same bitcode for a target like ARM or X86 as a
comparison.
>
> Evan
>
> On Mar 7, 2012, at 10:38 AM, Matt Johnson wrote:
>
>> Hi All,
>> I work on a backend for a target similar to Mips, where large
>> immediates are loaded into registers with 2 instructions, 1 to load the
>> MSBits and 1 to load the LSBits. I've noticed a recurring pattern
>> where, despite low register pressure, these constants will be
>> rematerialized in every iteration of a loop, rather than being hoisted.
>> Here's an example using the mips-unknown-unknown target and
Clang/LLVM
>> HEAD. From newlib's implementation of strncat:
>>
>> #define DETECTNULL(X) (((X) - 0x01010101)& ~(X)& 0x80808080)
>> while (!DETECTNULL (*aligned_s1))
>> aligned_s1++;
>>
>> This loop gets lowered under -O3 to:
>>
>> $BB0_5:
>> lui $3, 32896
>> lui $7, 65278
>> ori $3, $3, 32896 ###### Materialize 0x80808080
>> lw $8, 4($2)
>> nop
>> and $9, $8, $3
>> ori $7, $7, 65279 ###### Materialize -(0x01010101)
>> addiu $2, $2, 4
>> xor $3, $9, $3
>> addu $7, $8, $7
>> and $3, $3, $7
>> beq $3, $zero, $BB0_5
>>
>>
>> There are a ton of unused caller-saved registers in this small
function,
>> so I expected the constant materialization to be hoisted out of the
>> tight loop. I'm still learning about the new register allocator
and am
>> not immediately able to make sense of its debug output (and the
>> 'problem' may be elsewhere in any case). I'm happy to post
the results
>> of -debug-only regalloc if they're useful.
>>
>> Is my desire to hoist the constants out of the loop reasonable? Is
>> there something I can do (hints or passes in my backend, clang/opt
flag,
>> etc.) to make this happen today? If not, what is the root cause?
Maybe
>> there's no way to hoist things out of a loop once IR is lowered
into a
>> SelectionDAG?
>>
>> Thanks,
>> Matt
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev