Duncan Sands
2011-Sep-22 19:14 UTC
[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
Hi Bruno,> Some comments: > > + // Try to synthesize horizontal adds from adds of shuffles. > + if (((Subtarget->hasSSE3()&& (VT == MVT::v4f32 || VT == MVT::v2f64)) || > + (Subtarget->hasAVX()&& (VT == MVT::v8f32 || VT == MVT::v4f64)))&& > + isHorizontalBinOp(LHS, RHS, true)) > > 1) You probably want to do something like: > > "bool HasHorizontalArith = Subtarget->hasSSE3() || > Subtarget->hasAVX()" and check it for the first condition, because > when AVX is on, the SSE levels are all turned off (as to consider AVX > a reimplementation of all SSE levels). > > For the second condition: Does this logic works for 256-bit vectors? > I'm asking that because although the 128-bit HADDPS and the 256-bit > HADDPD have the same number of elements, their horizontal operation > behavior is different (look at AVX manual for details)! If it doesn't, > just remove the 256-bit handling for now.it's not clear whether it is correct for 256 bit operations. The AVX docs only describe the integer horizontal operations, not the floating point ones; for the integer ones the 256 bit ones work differently. If someone has a machine with AVX to test on, I've attached avx-hadd.s. It should be possible to do: gcc -o avx-hadd avx-hadd.s ./avx-hadd and the result should make it clear. In the meantime I'm removed the 256 bit logic.> 2) Rename horizontal.ll to sse3-haddsub.llDone!> 3) Can you duplicate the testcase file to something like > avx-haddsub.ll, and check for the AVX 128-bit versions too?I added the avx checks to the same file (in which case calling it sse3-haddsub.ll is not so great).> 4) Your tablegen modifications are totally fine, for the intrinsics just do: > > let Predicates = [HasSSE3] in { > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), > (HADDPSrr VR128:$src1, VR128:$src2)>; > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), > (HADDPSrm VR128:$src1, addr:$src2)>; > ... > > and > > let Predicates = [HasAVX] in { > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), > (VHADDPSrr VR128:$src1, VR128:$src2)>; > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), > (VHADDPSrm VR128:$src1, addr:$src2)>; > ...I came up with a vim macro that added them for me (see attached patch). Probably there is a way to compress this using tablegen magic, but I don't know how. OK to apply? Ciao, Duncan. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: avx-hadd.s URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.ksh> -------------- next part -------------- A non-text attachment was scrubbed... Name: hadd.diff Type: text/x-patch Size: 23340 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.bin>
Gurd, Preston
2011-Sep-22 19:45 UTC
[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
The output of the avx-hadd program is 3 11 7 15 Preston -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands Sent: Thursday, September 22, 2011 3:14 PM To: Bruno Cardoso Lopes Cc: LLVMdev Subject: Re: [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits Hi Bruno,> Some comments: > > + // Try to synthesize horizontal adds from adds of shuffles. > + if (((Subtarget->hasSSE3()&& (VT == MVT::v4f32 || VT == MVT::v2f64)) || > + (Subtarget->hasAVX()&& (VT == MVT::v8f32 || VT == MVT::v4f64)))&& > + isHorizontalBinOp(LHS, RHS, true)) > > 1) You probably want to do something like: > > "bool HasHorizontalArith = Subtarget->hasSSE3() || > Subtarget->hasAVX()" and check it for the first condition, because > when AVX is on, the SSE levels are all turned off (as to consider AVX > a reimplementation of all SSE levels). > > For the second condition: Does this logic works for 256-bit vectors? > I'm asking that because although the 128-bit HADDPS and the 256-bit > HADDPD have the same number of elements, their horizontal operation > behavior is different (look at AVX manual for details)! If it doesn't, > just remove the 256-bit handling for now.it's not clear whether it is correct for 256 bit operations. The AVX docs only describe the integer horizontal operations, not the floating point ones; for the integer ones the 256 bit ones work differently. If someone has a machine with AVX to test on, I've attached avx-hadd.s. It should be possible to do: gcc -o avx-hadd avx-hadd.s ./avx-hadd and the result should make it clear. In the meantime I'm removed the 256 bit logic.> 2) Rename horizontal.ll to sse3-haddsub.llDone!> 3) Can you duplicate the testcase file to something like > avx-haddsub.ll, and check for the AVX 128-bit versions too?I added the avx checks to the same file (in which case calling it sse3-haddsub.ll is not so great).> 4) Your tablegen modifications are totally fine, for the intrinsics just do: > > let Predicates = [HasSSE3] in { > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), > (HADDPSrr VR128:$src1, VR128:$src2)>; def : > Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), > (HADDPSrm VR128:$src1, addr:$src2)>; ... > > and > > let Predicates = [HasAVX] in { > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), > (VHADDPSrr VR128:$src1, VR128:$src2)>; def : > Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), > (VHADDPSrm VR128:$src1, addr:$src2)>; ...I came up with a vim macro that added them for me (see attached patch). Probably there is a way to compress this using tablegen magic, but I don't know how. OK to apply? Ciao, Duncan.
Duncan Sands
2011-Sep-22 19:48 UTC
[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
> The output of the avx-hadd program is > > 3 11 7 15Thanks Preston. That means that the 256 bit AVX floating point horizontal add works the same as the integer 256 bits ones, i.e. inconsistently with the 128 bit ones. Ciao, Duncan.> > Preston > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands > Sent: Thursday, September 22, 2011 3:14 PM > To: Bruno Cardoso Lopes > Cc: LLVMdev > Subject: Re: [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits > > Hi Bruno, > >> Some comments: >> >> + // Try to synthesize horizontal adds from adds of shuffles. >> + if (((Subtarget->hasSSE3()&& (VT == MVT::v4f32 || VT == MVT::v2f64)) || >> + (Subtarget->hasAVX()&& (VT == MVT::v8f32 || VT == MVT::v4f64)))&& >> + isHorizontalBinOp(LHS, RHS, true)) >> >> 1) You probably want to do something like: >> >> "bool HasHorizontalArith = Subtarget->hasSSE3() || >> Subtarget->hasAVX()" and check it for the first condition, because >> when AVX is on, the SSE levels are all turned off (as to consider AVX >> a reimplementation of all SSE levels). >> >> For the second condition: Does this logic works for 256-bit vectors? >> I'm asking that because although the 128-bit HADDPS and the 256-bit >> HADDPD have the same number of elements, their horizontal operation >> behavior is different (look at AVX manual for details)! If it doesn't, >> just remove the 256-bit handling for now. > > it's not clear whether it is correct for 256 bit operations. The AVX docs only describe the integer horizontal operations, not the floating point ones; for the integer ones the 256 bit ones work differently. If someone has a machine with AVX to test on, I've attached avx-hadd.s. It should be possible to do: > gcc -o avx-hadd avx-hadd.s > ./avx-hadd > and the result should make it clear. > > In the meantime I'm removed the 256 bit logic. > >> 2) Rename horizontal.ll to sse3-haddsub.ll > > Done! > >> 3) Can you duplicate the testcase file to something like >> avx-haddsub.ll, and check for the AVX 128-bit versions too? > > I added the avx checks to the same file (in which case calling it sse3-haddsub.ll is not so great). > >> 4) Your tablegen modifications are totally fine, for the intrinsics just do: >> >> let Predicates = [HasSSE3] in { >> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), >> (HADDPSrr VR128:$src1, VR128:$src2)>; def : >> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), >> (HADDPSrm VR128:$src1, addr:$src2)>; ... >> >> and >> >> let Predicates = [HasAVX] in { >> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), >> (VHADDPSrr VR128:$src1, VR128:$src2)>; def : >> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), >> (VHADDPSrm VR128:$src1, addr:$src2)>; ... > > I came up with a vim macro that added them for me (see attached patch). > Probably there is a way to compress this using tablegen magic, but I don't know how. > > OK to apply? > > Ciao, Duncan.
Bruno Cardoso Lopes
2011-Sep-22 19:52 UTC
[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
> I added the avx checks to the same file (in which case calling it > sse3-haddsub.ll is not so great).Oops, I missed that.>> 4) Your tablegen modifications are totally fine, for the intrinsics just >> do: >> >> let Predicates = [HasSSE3] in { >> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), >> (HADDPSrr VR128:$src1, VR128:$src2)>; >> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), >> (HADDPSrm VR128:$src1, addr:$src2)>; >> ... >> >> and >> >> let Predicates = [HasAVX] in { >> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), >> (VHADDPSrr VR128:$src1, VR128:$src2)>; >> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), >> (VHADDPSrm VR128:$src1, addr:$src2)>; >> ... > > I came up with a vim macro that added them for me (see attached patch). > Probably there is a way to compress this using tablegen magic, but I don't > know how.Cool! This is fine! :)> OK to apply?LGTM! Yep! -- Bruno Cardoso Lopes http://www.brunocardoso.cc
Seemingly Similar Threads
- [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
- [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
- [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
- [LLVMdev] X86 LowerVECTOR_SHUFFLE Question
- [LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics