thr3ads.net - llvm dev - [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Duncan Sands

2011-Sep-22 19:14 UTC

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

Hi Bruno,
> Some comments:
>
> +  // Try to synthesize horizontal adds from adds of shuffles.
> +  if (((Subtarget->hasSSE3()&&  (VT == MVT::v4f32 || VT ==
MVT::v2f64)) ||
> +       (Subtarget->hasAVX()&&  (VT == MVT::v8f32 || VT ==
MVT::v4f64)))&&
> +      isHorizontalBinOp(LHS, RHS, true))
>
> 1) You probably want to do something like:
>
> "bool HasHorizontalArith = Subtarget->hasSSE3() ||
> Subtarget->hasAVX()" and check it for the first condition, because
> when AVX is on, the SSE levels are all turned off (as to consider AVX
> a reimplementation of all SSE levels).
>
> For the second condition: Does this logic works for 256-bit vectors?
> I'm asking that because although the 128-bit HADDPS and the 256-bit
> HADDPD have the same number of elements, their horizontal operation
> behavior is different (look at AVX manual for details)! If it doesn't,
> just remove the 256-bit handling for now.
it's not clear whether it is correct for 256 bit operations.  The AVX docs
only describe the integer horizontal operations, not the floating point ones;
for the integer ones the 256 bit ones work differently.  If someone has a
machine with AVX to test on, I've attached avx-hadd.s.  It should be
possible
to do:
   gcc -o avx-hadd avx-hadd.s
   ./avx-hadd
and the result should make it clear.

In the meantime I'm removed the 256 bit logic.
> 2) Rename horizontal.ll to sse3-haddsub.ll
Done!
> 3) Can you duplicate the testcase file to something like
> avx-haddsub.ll, and check for the AVX 128-bit versions too?
I added the avx checks to the same file (in which case calling it
sse3-haddsub.ll is not so great).
> 4) Your tablegen modifications are totally fine, for the intrinsics just
do:
>
> let Predicates = [HasSSE3] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>            (HADDPSrr VR128:$src1, VR128:$src2)>;
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop
addr:$src2)),
>            (HADDPSrm VR128:$src1, addr:$src2)>;
> ...
>
> and
>
> let Predicates = [HasAVX] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>            (VHADDPSrr VR128:$src1, VR128:$src2)>;
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop
addr:$src2)),
>            (VHADDPSrm VR128:$src1, addr:$src2)>;
> ...
I came up with a vim macro that added them for me (see attached patch).
Probably there is a way to compress this using tablegen magic, but I don't
know how.

OK to apply?

Ciao, Duncan.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: avx-hadd.s
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hadd.diff
Type: text/x-patch
Size: 23340 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.bin>

Gurd, Preston

2011-Sep-22 19:45 UTC

head link

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

The output of the avx-hadd program is

3 11 7 15

Preston

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Duncan Sands
Sent: Thursday, September 22, 2011 3:14 PM
To: Bruno Cardoso Lopes
Cc: LLVMdev
Subject: Re: [LLVMdev] Patch to synthesize x86 hadd instructions; need help with
the tablegen bits

Hi Bruno,
> Some comments:
>
> +  // Try to synthesize horizontal adds from adds of shuffles.
> +  if (((Subtarget->hasSSE3()&&  (VT == MVT::v4f32 || VT ==
MVT::v2f64)) ||
> +       (Subtarget->hasAVX()&&  (VT == MVT::v8f32 || VT ==
MVT::v4f64)))&&
> +      isHorizontalBinOp(LHS, RHS, true))
>
> 1) You probably want to do something like:
>
> "bool HasHorizontalArith = Subtarget->hasSSE3() ||
> Subtarget->hasAVX()" and check it for the first condition, because
> when AVX is on, the SSE levels are all turned off (as to consider AVX 
> a reimplementation of all SSE levels).
>
> For the second condition: Does this logic works for 256-bit vectors?
> I'm asking that because although the 128-bit HADDPS and the 256-bit 
> HADDPD have the same number of elements, their horizontal operation 
> behavior is different (look at AVX manual for details)! If it doesn't, 
> just remove the 256-bit handling for now.
it's not clear whether it is correct for 256 bit operations.  The AVX docs
only describe the integer horizontal operations, not the floating point ones;
for the integer ones the 256 bit ones work differently.  If someone has a
machine with AVX to test on, I've attached avx-hadd.s.  It should be
possible to do:
   gcc -o avx-hadd avx-hadd.s
   ./avx-hadd
and the result should make it clear.

In the meantime I'm removed the 256 bit logic.
> 2) Rename horizontal.ll to sse3-haddsub.ll
Done!
> 3) Can you duplicate the testcase file to something like 
> avx-haddsub.ll, and check for the AVX 128-bit versions too?
I added the avx checks to the same file (in which case calling it
sse3-haddsub.ll is not so great).
> 4) Your tablegen modifications are totally fine, for the intrinsics just
do:
>
> let Predicates = [HasSSE3] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>            (HADDPSrr VR128:$src1, VR128:$src2)>; def : 
> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
>            (HADDPSrm VR128:$src1, addr:$src2)>; ...
>
> and
>
> let Predicates = [HasAVX] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>            (VHADDPSrr VR128:$src1, VR128:$src2)>; def : 
> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
>            (VHADDPSrm VR128:$src1, addr:$src2)>; ...
I came up with a vim macro that added them for me (see attached patch).
Probably there is a way to compress this using tablegen magic, but I don't
know how.

OK to apply?

Ciao, Duncan.

Duncan Sands

2011-Sep-22 19:48 UTC

head link

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

> The output of the avx-hadd program is
>
> 3 11 7 15
Thanks Preston.  That means that the 256 bit AVX floating point horizontal
add works the same as the integer 256 bits ones, i.e. inconsistently with
the 128 bit ones.

Ciao, Duncan.
>
> Preston
>
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu] On Behalf Of Duncan Sands
> Sent: Thursday, September 22, 2011 3:14 PM
> To: Bruno Cardoso Lopes
> Cc: LLVMdev
> Subject: Re: [LLVMdev] Patch to synthesize x86 hadd instructions; need help
with the tablegen bits
>
> Hi Bruno,
>
>> Some comments:
>>
>> +  // Try to synthesize horizontal adds from adds of shuffles.
>> +  if (((Subtarget->hasSSE3()&&   (VT == MVT::v4f32 || VT ==
MVT::v2f64)) ||
>> +       (Subtarget->hasAVX()&&   (VT == MVT::v8f32 || VT ==
MVT::v4f64)))&&
>> +      isHorizontalBinOp(LHS, RHS, true))
>>
>> 1) You probably want to do something like:
>>
>> "bool HasHorizontalArith = Subtarget->hasSSE3() ||
>> Subtarget->hasAVX()" and check it for the first condition,
because
>> when AVX is on, the SSE levels are all turned off (as to consider AVX
>> a reimplementation of all SSE levels).
>>
>> For the second condition: Does this logic works for 256-bit vectors?
>> I'm asking that because although the 128-bit HADDPS and the 256-bit
>> HADDPD have the same number of elements, their horizontal operation
>> behavior is different (look at AVX manual for details)! If it
doesn't,
>> just remove the 256-bit handling for now.
>
> it's not clear whether it is correct for 256 bit operations.  The AVX
docs only describe the integer horizontal operations, not the floating point
ones; for the integer ones the 256 bit ones work differently.  If someone has a
machine with AVX to test on, I've attached avx-hadd.s.  It should be
possible to do:
>     gcc -o avx-hadd avx-hadd.s
>     ./avx-hadd
> and the result should make it clear.
>
> In the meantime I'm removed the 256 bit logic.
>
>> 2) Rename horizontal.ll to sse3-haddsub.ll
>
> Done!
>
>> 3) Can you duplicate the testcase file to something like
>> avx-haddsub.ll, and check for the AVX 128-bit versions too?
>
> I added the avx checks to the same file (in which case calling it
sse3-haddsub.ll is not so great).
>
>> 4) Your tablegen modifications are totally fine, for the intrinsics
just do:
>>
>> let Predicates = [HasSSE3] in {
>> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>>             (HADDPSrr VR128:$src1, VR128:$src2)>; def :
>> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
>>             (HADDPSrm VR128:$src1, addr:$src2)>; ...
>>
>> and
>>
>> let Predicates = [HasAVX] in {
>> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>>             (VHADDPSrr VR128:$src1, VR128:$src2)>; def :
>> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
>>             (VHADDPSrm VR128:$src1, addr:$src2)>; ...
>
> I came up with a vim macro that added them for me (see attached patch).
> Probably there is a way to compress this using tablegen magic, but I
don't know how.
>
> OK to apply?
>
> Ciao, Duncan.

Bruno Cardoso Lopes

2011-Sep-22 19:52 UTC

head link

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

> I added the avx checks to the same file (in which case calling it
> sse3-haddsub.ll is not so great).
Oops, I missed that.
>> 4) Your tablegen modifications are totally fine, for the intrinsics
just
>> do:
>>
>> let Predicates = [HasSSE3] in {
>> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>>           (HADDPSrr VR128:$src1, VR128:$src2)>;
>> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop
addr:$src2)),
>>           (HADDPSrm VR128:$src1, addr:$src2)>;
>> ...
>>
>> and
>>
>> let Predicates = [HasAVX] in {
>> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>>           (VHADDPSrr VR128:$src1, VR128:$src2)>;
>> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop
addr:$src2)),
>>           (VHADDPSrm VR128:$src1, addr:$src2)>;
>> ...
>
> I came up with a vim macro that added them for me (see attached patch).
> Probably there is a way to compress this using tablegen magic, but I
don't
> know how.
Cool! This is fine! :)
> OK to apply?
LGTM! Yep!

-- 
Bruno Cardoso Lopes
http://www.brunocardoso.cc

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Sep 2011 - [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

Possibly Parallel Threads