Duncan Sands
2011-Sep-21 20:24 UTC
[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
This patch synthesizes haddps/haddpd/hsubps/hsubpd instructions from floating point additions and subtractions of appropriate vector shuffles. To do this I introduced new x86 FHADD and FHSUB opcodes. These need to be wired up somehow in the .td file to the appropriate instructions. Since I have no idea how tablegen works I just hacked it in horribly. It works, but breaks support for the hadd etc intrinsics (if you take a look at how I did it you will see why!). I'm sending the patch for comments, and in the hope that someone will explain how I should be doing the tablegen bits. Ciao, Duncan. -------------- next part -------------- A non-text attachment was scrubbed... Name: hadd.diff Type: text/x-patch Size: 17330 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110921/f4b0c8c9/attachment.bin>
Bruno Cardoso Lopes
2011-Sep-21 21:00 UTC
[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
Hi Duncan, On Wed, Sep 21, 2011 at 1:24 PM, Duncan Sands <baldrick at free.fr> wrote:> This patch synthesizes haddps/haddpd/hsubps/hsubpd instructions from > floating > point additions and subtractions of appropriate vector shuffles. To do this > I > introduced new x86 FHADD and FHSUB opcodes. These need to be wired up > somehow > in the .td file to the appropriate instructions. Since I have no idea how > tablegen works I just hacked it in horribly. It works, but breaks support > for > the hadd etc intrinsics (if you take a look at how I did it you will see > why!). > I'm sending the patch for comments, and in the hope that someone will > explain > how I should be doing the tablegen bits.This is awesome :D Some comments: + // Try to synthesize horizontal adds from adds of shuffles. + if (((Subtarget->hasSSE3() && (VT == MVT::v4f32 || VT == MVT::v2f64)) || + (Subtarget->hasAVX() && (VT == MVT::v8f32 || VT == MVT::v4f64))) && + isHorizontalBinOp(LHS, RHS, true)) 1) You probably want to do something like: "bool HasHorizontalArith = Subtarget->hasSSE3() || Subtarget->hasAVX()" and check it for the first condition, because when AVX is on, the SSE levels are all turned off (as to consider AVX a reimplementation of all SSE levels). For the second condition: Does this logic works for 256-bit vectors? I'm asking that because although the 128-bit HADDPS and the 256-bit HADDPD have the same number of elements, their horizontal operation behavior is different (look at AVX manual for details)! If it doesn't, just remove the 256-bit handling for now. 2) Rename horizontal.ll to sse3-haddsub.ll 3) Can you duplicate the testcase file to something like avx-haddsub.ll, and check for the AVX 128-bit versions too? 4) Your tablegen modifications are totally fine, for the intrinsics just do: let Predicates = [HasSSE3] in { def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), (HADDPSrr VR128:$src1, VR128:$src2)>; def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), (HADDPSrm VR128:$src1, addr:$src2)>; ... and let Predicates = [HasAVX] in { def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), (VHADDPSrr VR128:$src1, VR128:$src2)>; def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), (VHADDPSrm VR128:$src1, addr:$src2)>; ... Thanks Duncan, -- Bruno Cardoso Lopes http://www.brunocardoso.cc
Duncan Sands
2011-Sep-22 19:14 UTC
[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
Hi Bruno,> Some comments: > > + // Try to synthesize horizontal adds from adds of shuffles. > + if (((Subtarget->hasSSE3()&& (VT == MVT::v4f32 || VT == MVT::v2f64)) || > + (Subtarget->hasAVX()&& (VT == MVT::v8f32 || VT == MVT::v4f64)))&& > + isHorizontalBinOp(LHS, RHS, true)) > > 1) You probably want to do something like: > > "bool HasHorizontalArith = Subtarget->hasSSE3() || > Subtarget->hasAVX()" and check it for the first condition, because > when AVX is on, the SSE levels are all turned off (as to consider AVX > a reimplementation of all SSE levels). > > For the second condition: Does this logic works for 256-bit vectors? > I'm asking that because although the 128-bit HADDPS and the 256-bit > HADDPD have the same number of elements, their horizontal operation > behavior is different (look at AVX manual for details)! If it doesn't, > just remove the 256-bit handling for now.it's not clear whether it is correct for 256 bit operations. The AVX docs only describe the integer horizontal operations, not the floating point ones; for the integer ones the 256 bit ones work differently. If someone has a machine with AVX to test on, I've attached avx-hadd.s. It should be possible to do: gcc -o avx-hadd avx-hadd.s ./avx-hadd and the result should make it clear. In the meantime I'm removed the 256 bit logic.> 2) Rename horizontal.ll to sse3-haddsub.llDone!> 3) Can you duplicate the testcase file to something like > avx-haddsub.ll, and check for the AVX 128-bit versions too?I added the avx checks to the same file (in which case calling it sse3-haddsub.ll is not so great).> 4) Your tablegen modifications are totally fine, for the intrinsics just do: > > let Predicates = [HasSSE3] in { > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), > (HADDPSrr VR128:$src1, VR128:$src2)>; > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), > (HADDPSrm VR128:$src1, addr:$src2)>; > ... > > and > > let Predicates = [HasAVX] in { > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), > (VHADDPSrr VR128:$src1, VR128:$src2)>; > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), > (VHADDPSrm VR128:$src1, addr:$src2)>; > ...I came up with a vim macro that added them for me (see attached patch). Probably there is a way to compress this using tablegen magic, but I don't know how. OK to apply? Ciao, Duncan. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: avx-hadd.s URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.ksh> -------------- next part -------------- A non-text attachment was scrubbed... Name: hadd.diff Type: text/x-patch Size: 23340 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.bin>
Possibly Parallel Threads
- [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
- [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
- [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
- Question about llvm vectors
- [LLVMdev] X86 LowerVECTOR_SHUFFLE Question