thr3ads.net - llvm dev - [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Duncan Sands

2011-Sep-21 20:24 UTC

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

This patch synthesizes haddps/haddpd/hsubps/hsubpd instructions from floating
point additions and subtractions of appropriate vector shuffles.  To do this I
introduced new x86 FHADD and FHSUB opcodes.  These need to be wired up somehow
in the .td file to the appropriate instructions.  Since I have no idea how
tablegen works I just hacked it in horribly.  It works, but breaks support for
the hadd etc intrinsics (if you take a look at how I did it you will see why!).
I'm sending the patch for comments, and in the hope that someone will
explain
how I should be doing the tablegen bits.

Ciao, Duncan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hadd.diff
Type: text/x-patch
Size: 17330 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110921/f4b0c8c9/attachment.bin>

Bruno Cardoso Lopes

2011-Sep-21 21:00 UTC

head link

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

Hi Duncan,

On Wed, Sep 21, 2011 at 1:24 PM, Duncan Sands <baldrick at free.fr>
wrote:> This patch synthesizes haddps/haddpd/hsubps/hsubpd instructions from
> floating
> point additions and subtractions of appropriate vector shuffles.  To do
this
> I
> introduced new x86 FHADD and FHSUB opcodes.  These need to be wired up
> somehow
> in the .td file to the appropriate instructions.  Since I have no idea how
> tablegen works I just hacked it in horribly.  It works, but breaks support
> for
> the hadd etc intrinsics (if you take a look at how I did it you will see
> why!).
> I'm sending the patch for comments, and in the hope that someone will
> explain
> how I should be doing the tablegen bits.
This is awesome :D

Some comments:

+  // Try to synthesize horizontal adds from adds of shuffles.
+  if (((Subtarget->hasSSE3() && (VT == MVT::v4f32 || VT ==
MVT::v2f64)) ||
+       (Subtarget->hasAVX() && (VT == MVT::v8f32 || VT ==
MVT::v4f64))) &&
+      isHorizontalBinOp(LHS, RHS, true))

1) You probably want to do something like:

"bool HasHorizontalArith = Subtarget->hasSSE3() ||
Subtarget->hasAVX()" and check it for the first condition, because
when AVX is on, the SSE levels are all turned off (as to consider AVX
a reimplementation of all SSE levels).

For the second condition: Does this logic works for 256-bit vectors?
I'm asking that because although the 128-bit HADDPS and the 256-bit
HADDPD have the same number of elements, their horizontal operation
behavior is different (look at AVX manual for details)! If it doesn't,
just remove the 256-bit handling for now.

2) Rename horizontal.ll to sse3-haddsub.ll
3) Can you duplicate the testcase file to something like
avx-haddsub.ll, and check for the AVX 128-bit versions too?
4) Your tablegen modifications are totally fine, for the intrinsics just do:

let Predicates = [HasSSE3] in {
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
          (HADDPSrr VR128:$src1, VR128:$src2)>;
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
          (HADDPSrm VR128:$src1, addr:$src2)>;
...

and

let Predicates = [HasAVX] in {
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
          (VHADDPSrr VR128:$src1, VR128:$src2)>;
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
          (VHADDPSrm VR128:$src1, addr:$src2)>;
...

Thanks Duncan,

-- 
Bruno Cardoso Lopes
http://www.brunocardoso.cc

Duncan Sands

2011-Sep-22 19:14 UTC

head link

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

Hi Bruno,
> Some comments:
>
> +  // Try to synthesize horizontal adds from adds of shuffles.
> +  if (((Subtarget->hasSSE3()&&  (VT == MVT::v4f32 || VT ==
MVT::v2f64)) ||
> +       (Subtarget->hasAVX()&&  (VT == MVT::v8f32 || VT ==
MVT::v4f64)))&&
> +      isHorizontalBinOp(LHS, RHS, true))
>
> 1) You probably want to do something like:
>
> "bool HasHorizontalArith = Subtarget->hasSSE3() ||
> Subtarget->hasAVX()" and check it for the first condition, because
> when AVX is on, the SSE levels are all turned off (as to consider AVX
> a reimplementation of all SSE levels).
>
> For the second condition: Does this logic works for 256-bit vectors?
> I'm asking that because although the 128-bit HADDPS and the 256-bit
> HADDPD have the same number of elements, their horizontal operation
> behavior is different (look at AVX manual for details)! If it doesn't,
> just remove the 256-bit handling for now.
it's not clear whether it is correct for 256 bit operations.  The AVX docs
only describe the integer horizontal operations, not the floating point ones;
for the integer ones the 256 bit ones work differently.  If someone has a
machine with AVX to test on, I've attached avx-hadd.s.  It should be
possible
to do:
   gcc -o avx-hadd avx-hadd.s
   ./avx-hadd
and the result should make it clear.

In the meantime I'm removed the 256 bit logic.
> 2) Rename horizontal.ll to sse3-haddsub.ll
Done!
> 3) Can you duplicate the testcase file to something like
> avx-haddsub.ll, and check for the AVX 128-bit versions too?
I added the avx checks to the same file (in which case calling it
sse3-haddsub.ll is not so great).
> 4) Your tablegen modifications are totally fine, for the intrinsics just
do:
>
> let Predicates = [HasSSE3] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>            (HADDPSrr VR128:$src1, VR128:$src2)>;
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop
addr:$src2)),
>            (HADDPSrm VR128:$src1, addr:$src2)>;
> ...
>
> and
>
> let Predicates = [HasAVX] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>            (VHADDPSrr VR128:$src1, VR128:$src2)>;
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop
addr:$src2)),
>            (VHADDPSrm VR128:$src1, addr:$src2)>;
> ...
I came up with a vim macro that added them for me (see attached patch).
Probably there is a way to compress this using tablegen magic, but I don't
know how.

OK to apply?

Ciao, Duncan.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: avx-hadd.s
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hadd.diff
Type: text/x-patch
Size: 23340 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.bin>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Sep 2011 - [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

Reasonably Related Threads