On Nov 12, 2010, at 7:23 AM, Renato Golin wrote:
> Hi folks, me again,
>
> So, I want to implement a simple optimization in a NEON case I've seen
> these days, most as a matter of exercise, but it also simplifies (just
> a bit) the code generated.
>
> The case is simple:
>
> uint32x2_t x, res;
> res = vceq_u32(x, vcreate_u32(0));
>
> This will generate the following code:
>
> ; zero d16
> vmov.i32 d16, #0x0
> ; load a into d17
> movw r0, :lower16:a
> movt r0, :upper16:a
> vld1.32 {d17}, [r0]
> ; compare two registers
> vceq.i32 d17, d17, d16
>
> But, because the vector is zero, and there is a NEON instruction to
> compare against an immediate zero (VCEQZ), we could combine the two
> instructions:
>
> ; load a into d17
> movw r0, :lower16:a
> movt r0, :upper16:a
> vld1.32 {d17}, [r0]
> ; compare two registers
> vceq.i32 d17, d17, #0
>
> thus, saving the VMOV.
>
> I know, it's not much, but it's a good start for me to get the hand
of
> writing such passes.
This would be a nice optimization, and it's not a bad place to get started
down in the depths of llvm codegen....
>
> So, should I put this as a special case in NEON lowering or make it as
> part of an optimization pass? Which classes should I look first?
I recommend implementing this as a target-specific DAG combine optimization. We
already have target-specific DAG nodes for the relevant NEON comparison
operations (ARMISD::VCEQ, etc. -- see ARMISelLowering.h) as well as the vmov
(ARMISD::VMOVIMM). You just need to teach the DAG combiner how to fold them
together. Here's what you need to do (all of this code is in
ARMISelLowering.cpp):
0. (You don't actually need to do anything, but I'm just mentioning it
FYI.) For selection DAG nodes that are not target-specific, you need to inform
the DAG combiner that you want to do some target-specific combining. Look for
calls to setTargetDAGCombine() for examples of this. For this case, the
relevant nodes are all target-specific, so the DAG combiner will call the
target-specific combining hook anyway.
1. Add the ARMISD::VCEQ etc. nodes to the switch in
ARMTargetLowering::PerformDAGCombine.
2. Add a function to be called for those comparisons that checks if one operand
is an ARMISD::VMOVIMM node with an immediate of zero. Note for future reference
that the actual operand of VMOVIMM is an encoded value that represents one of
the possible vector immediates for the "one register plus a modified
immediate" format. In this case it doesn't matter because the
canonical encoding of a zero vector is just zero. When you find that case, use
DAG.getNode() to return a new node for the compare against zero operation. The
PerformShiftCombine function is a fairly simple example of what needs to be done
(although it's doing a completely different combination).
3. Write a testcase and make sure it works.
Thanks for offering to work on this!