hameeza ahmed via llvm-dev
2017-Aug-06 21:21 UTC
[llvm-dev] VBROADCAST Implementation Issues
i want to implement gather for v64i32. i wrote following code. def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins i2048mem:$src), "GATHER_256B\t{$src, $dst|$dst, $src}", [(set VR_2048:$dst, (v64i32 (masked_gather addr:$src)))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; Also i wrote this line in isellowering.h setOperationAction(ISD::MGATHER, MVT::v64i32, Legal); But I am getting following error: llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: Unhandled"' failed. What is my mistake? Please help me. On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> I am trying to implement vector shuffle for v64i32. Is the following > correct? > > > def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), > (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2, > $dst|$dst, $src1, $src2}", > [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 > VR_2048:$src2)))]>, TA; > > Please help. > > > > > On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> i managed to get rid of above error for VT.is2048BitVector()). >> >> this was implemented already. >> >> now will try define other vectors like VT.is4096BitVector()). >> >> >> >> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> Thank you. actually i have to implement both i32 and i64. so i >>> implemented two instructions now one broadcastS other broadcastD. Although >>> while doing broadcast from memory to register i was getting no such error >>> with 1 instruction and other patterns i64, i32 etc. but then also i >>> implemented its 2 versions single and double. >>> >>> Actually, i am trying to compile matrix multiplication code for greater >>> size vector. There i need to include many new instructions in my backend >>> like shuffle, gather etc. For now i am getting the following error. >>> >>> >>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>> vector type"' failed. >>> >>> i tried including is2048Bit Vector() and others. also in vectortype.h i >>> included these types for EVT but was unable to compile backend and getting >>> errors. >>> >>> Please help. >>> >>> Thank You >>> >>> >>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <craig.topper at gmail.com> >>> wrote: >>> >>>> You need a new instruction. And your scalar register size needs to >>>> match your vector element size. So GR32 instead of GR64 >>>> >>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <hahmed2305 at gmail.com> >>>> wrote: >>>> >>>>> Sorry to disturb, >>>>> Now i want to implement instruction to broadcast scalar register >>>>> content to vector. >>>>> >>>>> like this; >>>>> vpbroadcastq zmm0, rsi >>>>> >>>>> >>>>> I tried implementing it as follows; >>>>> >>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), (ins >>>>> GR64:$src), >>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>> GR64:$src)))], >>>>> IIC_MOV_MEM>, TA; >>>>> >>>>> >>>>> >>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>> (BROADCASTR_256B GR64:$src)>; >>>>> >>>>> >>>>> Is it fine? Also do i need to define a new instruction for this like >>>>> BROADCASTR_256B? can i use the previous instruction BROADCAST_256B (the one >>>>> that broadcast memory scalar to vector) and just define new pattern? >>>>> >>>>> Please help. >>>>> >>>>> Thank You >>>>> >>>>> >>>>> >>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed <hahmed2305 at gmail.com> >>>>> wrote: >>>>> >>>>>> Thank You so much. >>>>>> >>>>>> Wao you are simply genius. >>>>>> initially I didnt include load in both the main instruction and >>>>>> pattern so i included in both as follows: >>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>> i2048mem:$src), >>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast ( >>>>>> loadi32 addr:$src))))], >>>>>> IIC_MOV_MEM>, TA; >>>>>> >>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>> (BROADCAST_256B addr:$src)>; >>>>>> And it worked perfectly. >>>>>> >>>>>> Thank You again. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper <craig.topper at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Your pattern needs to be >>>>>>> >>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>> >>>>>>> ~Craig >>>>>>> >>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>> >>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>> >>>>>>>> i am getting error. >>>>>>>> What is wrong with this pattern? >>>>>>>> >>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed <hahmed2305 at gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> in x86 it is; >>>>>>>>> >>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>> >>>>>>>>> mine is >>>>>>>>> >>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 VR512:$src), >>>>>>>>>> sub_xmm))>; >>>>>>>>>> which is similar to mine. >>>>>>>>>> Why its not working then? >>>>>>>>>> >>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>> >>>>>>>>>>> ~Craig >>>>>>>>>>> >>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>> instrinfo.td >>>>>>>>>>>> >>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>>>>>> addr:$src)))], >>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>> >>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 = X86ISD::VBROADCAST t62 >>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, undef:i64 >>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper TargetConstantPool:i64<float >>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>> t64: i64 = TargetConstantPool<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>> >>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> added the setoperationaction line in isellowering.cpp. now >>>>>>>>>>>>>>> getting the following error. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR is >>>>>>>>>>>>>>>>>> not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I made your mentioned changes and included broadcast >>>>>>>>>>>>>>>>>>> instruction in instructioninfo.td. but i made no >>>>>>>>>>>>>>>>>>> changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR >>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>> t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with constant >>>>>>>>>>>>>>>>>>>>> something like this; >>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] + >>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float 0.200000003 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for vector of 64 >>>>>>>>>>>>>>>>>>>>> elements. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> -- >>>> ~Craig >>>> >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/6a188f2b/attachment-0001.html>
Craig Topper via llvm-dev
2017-Aug-06 21:57 UTC
[llvm-dev] VBROADCAST Implementation Issues
masked_gather returns two results. The data and the modified mask. Note the $dst and the $mask_wb in the pattern below. multiclass avx512_gather<bits<8> opc, string OpcodeStr, X86VectorVTInfo _, X86MemOperand memop, PatFrag GatherNode> { let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb", ExeDomain = _.ExeDomain in def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb), (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), !strconcat(OpcodeStr#_.Suffix, "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"), [(set _.RC:$dst, _.KRCWM:$mask_wb, (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, vectoraddr:$src2))]>, EVEX, EVEX_K, EVEX_CD8<_.EltSize, CD8VT1>; } ~Craig On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> i want to implement gather for v64i32. i wrote following code. > > def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins > i2048mem:$src), > "GATHER_256B\t{$src, $dst|$dst, $src}", > [(set VR_2048:$dst, (v64i32 (masked_gather > addr:$src)))], > IIC_MOV_MEM>, TA; > def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; > > Also i wrote this line in isellowering.h > > setOperationAction(ISD::MGATHER, MVT::v64i32, > Legal); > > But I am getting following error: > > llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: > llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init *, > llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: Unhandled"' > failed. > > What is my mistake? > > Please help me. > > > On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> I am trying to implement vector shuffle for v64i32. Is the following >> correct? >> >> >> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2, >> $dst|$dst, $src1, $src2}", >> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 >> VR_2048:$src2)))]>, TA; >> >> Please help. >> >> >> >> >> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> i managed to get rid of above error for VT.is2048BitVector()). >>> >>> this was implemented already. >>> >>> now will try define other vectors like VT.is4096BitVector()). >>> >>> >>> >>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> Thank you. actually i have to implement both i32 and i64. so i >>>> implemented two instructions now one broadcastS other broadcastD. Although >>>> while doing broadcast from memory to register i was getting no such error >>>> with 1 instruction and other patterns i64, i32 etc. but then also i >>>> implemented its 2 versions single and double. >>>> >>>> Actually, i am trying to compile matrix multiplication code for greater >>>> size vector. There i need to include many new instructions in my backend >>>> like shuffle, gather etc. For now i am getting the following error. >>>> >>>> >>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>>> vector type"' failed. >>>> >>>> i tried including is2048Bit Vector() and others. also in vectortype.h >>>> i included these types for EVT but was unable to compile backend and >>>> getting errors. >>>> >>>> Please help. >>>> >>>> Thank You >>>> >>>> >>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <craig.topper at gmail.com> >>>> wrote: >>>> >>>>> You need a new instruction. And your scalar register size needs to >>>>> match your vector element size. So GR32 instead of GR64 >>>>> >>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <hahmed2305 at gmail.com> >>>>> wrote: >>>>> >>>>>> Sorry to disturb, >>>>>> Now i want to implement instruction to broadcast scalar register >>>>>> content to vector. >>>>>> >>>>>> like this; >>>>>> vpbroadcastq zmm0, rsi >>>>>> >>>>>> >>>>>> I tried implementing it as follows; >>>>>> >>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), (ins >>>>>> GR64:$src), >>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>> GR64:$src)))], >>>>>> IIC_MOV_MEM>, TA; >>>>>> >>>>>> >>>>>> >>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>> >>>>>> >>>>>> Is it fine? Also do i need to define a new instruction for this like >>>>>> BROADCASTR_256B? can i use the previous instruction BROADCAST_256B (the one >>>>>> that broadcast memory scalar to vector) and just define new pattern? >>>>>> >>>>>> Please help. >>>>>> >>>>>> Thank You >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thank You so much. >>>>>>> >>>>>>> Wao you are simply genius. >>>>>>> initially I didnt include load in both the main instruction and >>>>>>> pattern so i included in both as follows: >>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>> i2048mem:$src), >>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast ( >>>>>>> loadi32 addr:$src))))], >>>>>>> IIC_MOV_MEM>, TA; >>>>>>> >>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>> And it worked perfectly. >>>>>>> >>>>>>> Thank You again. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper <craig.topper at gmail.com >>>>>>> > wrote: >>>>>>> >>>>>>>> Your pattern needs to be >>>>>>>> >>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>> >>>>>>>> ~Craig >>>>>>>> >>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed <hahmed2305 at gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>> >>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>> >>>>>>>>> i am getting error. >>>>>>>>> What is wrong with this pattern? >>>>>>>>> >>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> in x86 it is; >>>>>>>>>> >>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>> >>>>>>>>>> mine is >>>>>>>>>> >>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 VR512:$src), >>>>>>>>>>> sub_xmm))>; >>>>>>>>>>> which is similar to mine. >>>>>>>>>>> Why its not working then? >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>> >>>>>>>>>>>> ~Craig >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>>> instrinfo.td >>>>>>>>>>>>> >>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>>>>>>> addr:$src)))], >>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>> >>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 = X86ISD::VBROADCAST >>>>>>>>>>>>>> t62 >>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, undef:i64 >>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper TargetConstantPool:i64<float >>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float 0x3FC99999A0000000> >>>>>>>>>>>>>> 0 >>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> added the setoperationaction line in isellowering.cpp. now >>>>>>>>>>>>>>>> getting the following error. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR is >>>>>>>>>>>>>>>>>>> not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included broadcast >>>>>>>>>>>>>>>>>>>> instruction in instructioninfo.td. but i made no >>>>>>>>>>>>>>>>>>>> changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR >>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with constant >>>>>>>>>>>>>>>>>>>>>> something like this; >>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] + >>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float 0.200000003 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for vector of >>>>>>>>>>>>>>>>>>>>>> 64 elements. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>>> ~Craig >>>>> >>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170806/912e0a88/attachment-0001.html>
hameeza ahmed via llvm-dev
2017-Aug-07 08:13 UTC
[llvm-dev] VBROADCAST Implementation Issues
Hello, I did as you said, Please tell me whether the following correct now?? def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}}, $src2}"), [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 (GatherNode (VR_2048:$src1), _.KRCWM:$mask, VR_2048:$src2))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>; Thank You On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com> wrote:> masked_gather returns two results. The data and the modified mask. Note > the $dst and the $mask_wb in the pattern below. > > multiclass avx512_gather<bits<8> opc, string OpcodeStr, X86VectorVTInfo _, > X86MemOperand memop, PatFrag GatherNode> { > let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb", > ExeDomain = _.ExeDomain in > def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb), > (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), > !strconcat(OpcodeStr#_.Suffix, > "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"), > [(set _.RC:$dst, _.KRCWM:$mask_wb, > (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, > vectoraddr:$src2))]>, EVEX, EVEX_K, > EVEX_CD8<_.EltSize, CD8VT1>; > } > > ~Craig > > On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> i want to implement gather for v64i32. i wrote following code. >> >> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >> i2048mem:$src), >> "GATHER_256B\t{$src, $dst|$dst, $src}", >> [(set VR_2048:$dst, (v64i32 (masked_gather >> addr:$src)))], >> IIC_MOV_MEM>, TA; >> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; >> >> Also i wrote this line in isellowering.h >> >> setOperationAction(ISD::MGATHER, MVT::v64i32, >> Legal); >> >> But I am getting following error: >> >> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init *, >> llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: Unhandled"' >> failed. >> >> What is my mistake? >> >> Please help me. >> >> >> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> I am trying to implement vector shuffle for v64i32. Is the following >>> correct? >>> >>> >>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2, >>> $dst|$dst, $src1, $src2}", >>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 >>> VR_2048:$src2)))]>, TA; >>> >>> Please help. >>> >>> >>> >>> >>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> i managed to get rid of above error for VT.is2048BitVector()). >>>> >>>> this was implemented already. >>>> >>>> now will try define other vectors like VT.is4096BitVector()). >>>> >>>> >>>> >>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>> wrote: >>>> >>>>> Thank you. actually i have to implement both i32 and i64. so i >>>>> implemented two instructions now one broadcastS other broadcastD. Although >>>>> while doing broadcast from memory to register i was getting no such error >>>>> with 1 instruction and other patterns i64, i32 etc. but then also i >>>>> implemented its 2 versions single and double. >>>>> >>>>> Actually, i am trying to compile matrix multiplication code for >>>>> greater size vector. There i need to include many new instructions in my >>>>> backend like shuffle, gather etc. For now i am getting the following error. >>>>> >>>>> >>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>>>> vector type"' failed. >>>>> >>>>> i tried including is2048Bit Vector() and others. also in vectortype.h >>>>> i included these types for EVT but was unable to compile backend and >>>>> getting errors. >>>>> >>>>> Please help. >>>>> >>>>> Thank You >>>>> >>>>> >>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <craig.topper at gmail.com> >>>>> wrote: >>>>> >>>>>> You need a new instruction. And your scalar register size needs to >>>>>> match your vector element size. So GR32 instead of GR64 >>>>>> >>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <hahmed2305 at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Sorry to disturb, >>>>>>> Now i want to implement instruction to broadcast scalar register >>>>>>> content to vector. >>>>>>> >>>>>>> like this; >>>>>>> vpbroadcastq zmm0, rsi >>>>>>> >>>>>>> >>>>>>> I tried implementing it as follows; >>>>>>> >>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), (ins >>>>>>> GR64:$src), >>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>> GR64:$src)))], >>>>>>> IIC_MOV_MEM>, TA; >>>>>>> >>>>>>> >>>>>>> >>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>> >>>>>>> >>>>>>> Is it fine? Also do i need to define a new instruction for this like >>>>>>> BROADCASTR_256B? can i use the previous instruction BROADCAST_256B (the one >>>>>>> that broadcast memory scalar to vector) and just define new pattern? >>>>>>> >>>>>>> Please help. >>>>>>> >>>>>>> Thank You >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thank You so much. >>>>>>>> >>>>>>>> Wao you are simply genius. >>>>>>>> initially I didnt include load in both the main instruction and >>>>>>>> pattern so i included in both as follows: >>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>>> i2048mem:$src), >>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast ( >>>>>>>> loadi32 addr:$src))))], >>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>> >>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>> And it worked perfectly. >>>>>>>> >>>>>>>> Thank You again. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>> >>>>>>>>> Your pattern needs to be >>>>>>>>> >>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>> >>>>>>>>> ~Craig >>>>>>>>> >>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>> >>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>> >>>>>>>>>> i am getting error. >>>>>>>>>> What is wrong with this pattern? >>>>>>>>>> >>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> in x86 it is; >>>>>>>>>>> >>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>> >>>>>>>>>>> mine is >>>>>>>>>>> >>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 VR512:$src), >>>>>>>>>>>> sub_xmm))>; >>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>> Why its not working then? >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>> >>>>>>>>>>>>> ~Craig >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>>>> instrinfo.td >>>>>>>>>>>>>> >>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 = X86ISD::VBROADCAST >>>>>>>>>>>>>>> t62 >>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, undef:i64 >>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper TargetConstantPool:i64<float >>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> added the setoperationaction line in isellowering.cpp. now >>>>>>>>>>>>>>>>> getting the following error. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR is >>>>>>>>>>>>>>>>>>>> not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included broadcast >>>>>>>>>>>>>>>>>>>>> instruction in instructioninfo.td. but i made no >>>>>>>>>>>>>>>>>>>>> changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR >>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with >>>>>>>>>>>>>>>>>>>>>>> constant something like this; >>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] + >>>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float 0.200000003 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for vector of >>>>>>>>>>>>>>>>>>>>>>> 64 elements. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> -- >>>>>> ~Craig >>>>>> >>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/a5db2fe6/attachment.html>