hameeza ahmed via llvm-dev
2017-Aug-07 08:13 UTC
[llvm-dev] VBROADCAST Implementation Issues
Hello, I did as you said, Please tell me whether the following correct now?? def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}}, $src2}"), [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 (GatherNode (VR_2048:$src1), _.KRCWM:$mask, VR_2048:$src2))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>; Thank You On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com> wrote:> masked_gather returns two results. The data and the modified mask. Note > the $dst and the $mask_wb in the pattern below. > > multiclass avx512_gather<bits<8> opc, string OpcodeStr, X86VectorVTInfo _, > X86MemOperand memop, PatFrag GatherNode> { > let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb", > ExeDomain = _.ExeDomain in > def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb), > (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), > !strconcat(OpcodeStr#_.Suffix, > "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"), > [(set _.RC:$dst, _.KRCWM:$mask_wb, > (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, > vectoraddr:$src2))]>, EVEX, EVEX_K, > EVEX_CD8<_.EltSize, CD8VT1>; > } > > ~Craig > > On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> i want to implement gather for v64i32. i wrote following code. >> >> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >> i2048mem:$src), >> "GATHER_256B\t{$src, $dst|$dst, $src}", >> [(set VR_2048:$dst, (v64i32 (masked_gather >> addr:$src)))], >> IIC_MOV_MEM>, TA; >> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; >> >> Also i wrote this line in isellowering.h >> >> setOperationAction(ISD::MGATHER, MVT::v64i32, >> Legal); >> >> But I am getting following error: >> >> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init *, >> llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: Unhandled"' >> failed. >> >> What is my mistake? >> >> Please help me. >> >> >> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> I am trying to implement vector shuffle for v64i32. Is the following >>> correct? >>> >>> >>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2, >>> $dst|$dst, $src1, $src2}", >>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 >>> VR_2048:$src2)))]>, TA; >>> >>> Please help. >>> >>> >>> >>> >>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> i managed to get rid of above error for VT.is2048BitVector()). >>>> >>>> this was implemented already. >>>> >>>> now will try define other vectors like VT.is4096BitVector()). >>>> >>>> >>>> >>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>> wrote: >>>> >>>>> Thank you. actually i have to implement both i32 and i64. so i >>>>> implemented two instructions now one broadcastS other broadcastD. Although >>>>> while doing broadcast from memory to register i was getting no such error >>>>> with 1 instruction and other patterns i64, i32 etc. but then also i >>>>> implemented its 2 versions single and double. >>>>> >>>>> Actually, i am trying to compile matrix multiplication code for >>>>> greater size vector. There i need to include many new instructions in my >>>>> backend like shuffle, gather etc. For now i am getting the following error. >>>>> >>>>> >>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>>>> vector type"' failed. >>>>> >>>>> i tried including is2048Bit Vector() and others. also in vectortype.h >>>>> i included these types for EVT but was unable to compile backend and >>>>> getting errors. >>>>> >>>>> Please help. >>>>> >>>>> Thank You >>>>> >>>>> >>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <craig.topper at gmail.com> >>>>> wrote: >>>>> >>>>>> You need a new instruction. And your scalar register size needs to >>>>>> match your vector element size. So GR32 instead of GR64 >>>>>> >>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <hahmed2305 at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Sorry to disturb, >>>>>>> Now i want to implement instruction to broadcast scalar register >>>>>>> content to vector. >>>>>>> >>>>>>> like this; >>>>>>> vpbroadcastq zmm0, rsi >>>>>>> >>>>>>> >>>>>>> I tried implementing it as follows; >>>>>>> >>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), (ins >>>>>>> GR64:$src), >>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>> GR64:$src)))], >>>>>>> IIC_MOV_MEM>, TA; >>>>>>> >>>>>>> >>>>>>> >>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>> >>>>>>> >>>>>>> Is it fine? Also do i need to define a new instruction for this like >>>>>>> BROADCASTR_256B? can i use the previous instruction BROADCAST_256B (the one >>>>>>> that broadcast memory scalar to vector) and just define new pattern? >>>>>>> >>>>>>> Please help. >>>>>>> >>>>>>> Thank You >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thank You so much. >>>>>>>> >>>>>>>> Wao you are simply genius. >>>>>>>> initially I didnt include load in both the main instruction and >>>>>>>> pattern so i included in both as follows: >>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>>> i2048mem:$src), >>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast ( >>>>>>>> loadi32 addr:$src))))], >>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>> >>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>> And it worked perfectly. >>>>>>>> >>>>>>>> Thank You again. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>> >>>>>>>>> Your pattern needs to be >>>>>>>>> >>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>> >>>>>>>>> ~Craig >>>>>>>>> >>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>> >>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>> >>>>>>>>>> i am getting error. >>>>>>>>>> What is wrong with this pattern? >>>>>>>>>> >>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> in x86 it is; >>>>>>>>>>> >>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>> >>>>>>>>>>> mine is >>>>>>>>>>> >>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 VR512:$src), >>>>>>>>>>>> sub_xmm))>; >>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>> Why its not working then? >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>> >>>>>>>>>>>>> ~Craig >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>>>> instrinfo.td >>>>>>>>>>>>>> >>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 = X86ISD::VBROADCAST >>>>>>>>>>>>>>> t62 >>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, undef:i64 >>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper TargetConstantPool:i64<float >>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> added the setoperationaction line in isellowering.cpp. now >>>>>>>>>>>>>>>>> getting the following error. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR is >>>>>>>>>>>>>>>>>>>> not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included broadcast >>>>>>>>>>>>>>>>>>>>> instruction in instructioninfo.td. but i made no >>>>>>>>>>>>>>>>>>>>> changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR >>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with >>>>>>>>>>>>>>>>>>>>>>> constant something like this; >>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] + >>>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float 0.200000003 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for vector of >>>>>>>>>>>>>>>>>>>>>>> 64 elements. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> -- >>>>>> ~Craig >>>>>> >>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/a5db2fe6/attachment.html>
hameeza ahmed via llvm-dev
2017-Aug-07 08:20 UTC
[llvm-dev] VBROADCAST Implementation Issues
i am getting this error error: Variable not defined: '_' for _.KRCWM what to do? On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> Hello, > I did as you said, > > Please tell me whether the following correct now?? > > def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, > _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), > "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} > {${mask}}, $src2}"), > [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 > (GatherNode (VR_2048:$src1), _.KRCWM:$mask, > VR_2048:$src2))], > IIC_MOV_MEM>, TA; > def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>; > > Thank You > > On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com> > wrote: > >> masked_gather returns two results. The data and the modified mask. Note >> the $dst and the $mask_wb in the pattern below. >> >> multiclass avx512_gather<bits<8> opc, string OpcodeStr, X86VectorVTInfo _, >> X86MemOperand memop, PatFrag GatherNode> { >> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb", >> ExeDomain = _.ExeDomain in >> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb), >> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), >> !strconcat(OpcodeStr#_.Suffix, >> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"), >> [(set _.RC:$dst, _.KRCWM:$mask_wb, >> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, >> vectoraddr:$src2))]>, EVEX, EVEX_K, >> EVEX_CD8<_.EltSize, CD8VT1>; >> } >> >> ~Craig >> >> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> i want to implement gather for v64i32. i wrote following code. >>> >>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >>> i2048mem:$src), >>> "GATHER_256B\t{$src, $dst|$dst, $src}", >>> [(set VR_2048:$dst, (v64i32 (masked_gather >>> addr:$src)))], >>> IIC_MOV_MEM>, TA; >>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; >>> >>> Also i wrote this line in isellowering.h >>> >>> setOperationAction(ISD::MGATHER, MVT::v64i32, >>> Legal); >>> >>> But I am getting following error: >>> >>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init >>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: >>> Unhandled"' failed. >>> >>> What is my mistake? >>> >>> Please help me. >>> >>> >>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> I am trying to implement vector shuffle for v64i32. Is the following >>>> correct? >>>> >>>> >>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2, >>>> $dst|$dst, $src1, $src2}", >>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 >>>> VR_2048:$src2)))]>, TA; >>>> >>>> Please help. >>>> >>>> >>>> >>>> >>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>> wrote: >>>> >>>>> i managed to get rid of above error for VT.is2048BitVector()). >>>>> >>>>> this was implemented already. >>>>> >>>>> now will try define other vectors like VT.is4096BitVector()). >>>>> >>>>> >>>>> >>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>> wrote: >>>>> >>>>>> Thank you. actually i have to implement both i32 and i64. so i >>>>>> implemented two instructions now one broadcastS other broadcastD. Although >>>>>> while doing broadcast from memory to register i was getting no such error >>>>>> with 1 instruction and other patterns i64, i32 etc. but then also i >>>>>> implemented its 2 versions single and double. >>>>>> >>>>>> Actually, i am trying to compile matrix multiplication code for >>>>>> greater size vector. There i need to include many new instructions in my >>>>>> backend like shuffle, gather etc. For now i am getting the following error. >>>>>> >>>>>> >>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>>>>> vector type"' failed. >>>>>> >>>>>> i tried including is2048Bit Vector() and others. also in >>>>>> vectortype.h i included these types for EVT but was unable to compile >>>>>> backend and getting errors. >>>>>> >>>>>> Please help. >>>>>> >>>>>> Thank You >>>>>> >>>>>> >>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <craig.topper at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> You need a new instruction. And your scalar register size needs to >>>>>>> match your vector element size. So GR32 instead of GR64 >>>>>>> >>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <hahmed2305 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Sorry to disturb, >>>>>>>> Now i want to implement instruction to broadcast scalar register >>>>>>>> content to vector. >>>>>>>> >>>>>>>> like this; >>>>>>>> vpbroadcastq zmm0, rsi >>>>>>>> >>>>>>>> >>>>>>>> I tried implementing it as follows; >>>>>>>> >>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), (ins >>>>>>>> GR64:$src), >>>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>> GR64:$src)))], >>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>>> >>>>>>>> >>>>>>>> Is it fine? Also do i need to define a new instruction for this >>>>>>>> like BROADCASTR_256B? can i use the previous instruction BROADCAST_256B >>>>>>>> (the one that broadcast memory scalar to vector) and just define new >>>>>>>> pattern? >>>>>>>> >>>>>>>> Please help. >>>>>>>> >>>>>>>> Thank You >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed <hahmed2305 at gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> Thank You so much. >>>>>>>>> >>>>>>>>> Wao you are simply genius. >>>>>>>>> initially I didnt include load in both the main instruction and >>>>>>>>> pattern so i included in both as follows: >>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>>>> i2048mem:$src), >>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast ( >>>>>>>>> loadi32 addr:$src))))], >>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>> >>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>> And it worked perfectly. >>>>>>>>> >>>>>>>>> Thank You again. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Your pattern needs to be >>>>>>>>>> >>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>> >>>>>>>>>> ~Craig >>>>>>>>>> >>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>>> >>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>> >>>>>>>>>>> i am getting error. >>>>>>>>>>> What is wrong with this pattern? >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> in x86 it is; >>>>>>>>>>>> >>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>> >>>>>>>>>>>> mine is >>>>>>>>>>>> >>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 VR512:$src), >>>>>>>>>>>>> sub_xmm))>; >>>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>>> Why its not working then? >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>>>>> instrinfo.td >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 = X86ISD::VBROADCAST >>>>>>>>>>>>>>>> t62 >>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, undef:i64 >>>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper TargetConstantPool:i64<float >>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> added the setoperationaction line in isellowering.cpp. >>>>>>>>>>>>>>>>>> now getting the following error. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom); >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR is >>>>>>>>>>>>>>>>>>>>> not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included broadcast >>>>>>>>>>>>>>>>>>>>>> instruction in instructioninfo.td. but i made no >>>>>>>>>>>>>>>>>>>>>> changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR >>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with >>>>>>>>>>>>>>>>>>>>>>>> constant something like this; >>>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] + >>>>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for vector of >>>>>>>>>>>>>>>>>>>>>>>> 64 elements. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/d282f9ed/attachment-0001.html>
hameeza ahmed via llvm-dev
2017-Aug-07 08:54 UTC
[llvm-dev] VBROADCAST Implementation Issues
Changed it to; def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64:$mask), (ins i2048mem:$src), "GATHER_256B\t{$src, {$dst}{${mask}}|${dst} {${mask}}, $src}", [(set VR_2048:$dst, VK64:$mask, (v64i32 (masked_gather addr:$src)))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; Now getting following error: Unhandled memory encoding VK64 Unhandled memory encoding UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347! What to do? On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> i am getting this error > error: Variable not defined: '_' > for _.KRCWM > what to do? > > On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Hello, >> I did as you said, >> >> Please tell me whether the following correct now?? >> >> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), >> "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} >> {${mask}}, $src2}"), >> [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 >> (GatherNode (VR_2048:$src1), _.KRCWM:$mask, >> VR_2048:$src2))], >> IIC_MOV_MEM>, TA; >> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>; >> >> Thank You >> >> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com> >> wrote: >> >>> masked_gather returns two results. The data and the modified mask. Note >>> the $dst and the $mask_wb in the pattern below. >>> >>> multiclass avx512_gather<bits<8> opc, string OpcodeStr, X86VectorVTInfo >>> _, >>> X86MemOperand memop, PatFrag GatherNode> { >>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb", >>> ExeDomain = _.ExeDomain in >>> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb), >>> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), >>> !strconcat(OpcodeStr#_.Suffix, >>> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"), >>> [(set _.RC:$dst, _.KRCWM:$mask_wb, >>> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, >>> vectoraddr:$src2))]>, EVEX, EVEX_K, >>> EVEX_CD8<_.EltSize, CD8VT1>; >>> } >>> >>> ~Craig >>> >>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> i want to implement gather for v64i32. i wrote following code. >>>> >>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >>>> i2048mem:$src), >>>> "GATHER_256B\t{$src, $dst|$dst, $src}", >>>> [(set VR_2048:$dst, (v64i32 (masked_gather >>>> addr:$src)))], >>>> IIC_MOV_MEM>, TA; >>>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; >>>> >>>> Also i wrote this line in isellowering.h >>>> >>>> setOperationAction(ISD::MGATHER, >>>> MVT::v64i32, Legal); >>>> >>>> But I am getting following error: >>>> >>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >>>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init >>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: >>>> Unhandled"' failed. >>>> >>>> What is my mistake? >>>> >>>> Please help me. >>>> >>>> >>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com> >>>> wrote: >>>> >>>>> I am trying to implement vector shuffle for v64i32. Is the following >>>>> correct? >>>>> >>>>> >>>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2, >>>>> $dst|$dst, $src1, $src2}", >>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 >>>>> VR_2048:$src2)))]>, TA; >>>>> >>>>> Please help. >>>>> >>>>> >>>>> >>>>> >>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>> wrote: >>>>> >>>>>> i managed to get rid of above error for VT.is2048BitVector()). >>>>>> >>>>>> this was implemented already. >>>>>> >>>>>> now will try define other vectors like VT.is4096BitVector()). >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thank you. actually i have to implement both i32 and i64. so i >>>>>>> implemented two instructions now one broadcastS other broadcastD. Although >>>>>>> while doing broadcast from memory to register i was getting no such error >>>>>>> with 1 instruction and other patterns i64, i32 etc. but then also i >>>>>>> implemented its 2 versions single and double. >>>>>>> >>>>>>> Actually, i am trying to compile matrix multiplication code for >>>>>>> greater size vector. There i need to include many new instructions in my >>>>>>> backend like shuffle, gather etc. For now i am getting the following error. >>>>>>> >>>>>>> >>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>>>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>>>>>> vector type"' failed. >>>>>>> >>>>>>> i tried including is2048Bit Vector() and others. also in >>>>>>> vectortype.h i included these types for EVT but was unable to compile >>>>>>> backend and getting errors. >>>>>>> >>>>>>> Please help. >>>>>>> >>>>>>> Thank You >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <craig.topper at gmail.com >>>>>>> > wrote: >>>>>>> >>>>>>>> You need a new instruction. And your scalar register size needs to >>>>>>>> match your vector element size. So GR32 instead of GR64 >>>>>>>> >>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <hahmed2305 at gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Sorry to disturb, >>>>>>>>> Now i want to implement instruction to broadcast scalar register >>>>>>>>> content to vector. >>>>>>>>> >>>>>>>>> like this; >>>>>>>>> vpbroadcastq zmm0, rsi >>>>>>>>> >>>>>>>>> >>>>>>>>> I tried implementing it as follows; >>>>>>>>> >>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), (ins >>>>>>>>> GR64:$src), >>>>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>>> GR64:$src)))], >>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>>>> >>>>>>>>> >>>>>>>>> Is it fine? Also do i need to define a new instruction for this >>>>>>>>> like BROADCASTR_256B? can i use the previous instruction BROADCAST_256B >>>>>>>>> (the one that broadcast memory scalar to vector) and just define new >>>>>>>>> pattern? >>>>>>>>> >>>>>>>>> Please help. >>>>>>>>> >>>>>>>>> Thank You >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Thank You so much. >>>>>>>>>> >>>>>>>>>> Wao you are simply genius. >>>>>>>>>> initially I didnt include load in both the main instruction and >>>>>>>>>> pattern so i included in both as follows: >>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>>>>> i2048mem:$src), >>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast ( >>>>>>>>>> loadi32 addr:$src))))], >>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>> >>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>> And it worked perfectly. >>>>>>>>>> >>>>>>>>>> Thank You again. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Your pattern needs to be >>>>>>>>>>> >>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>> >>>>>>>>>>> ~Craig >>>>>>>>>>> >>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>>>> >>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>> >>>>>>>>>>>> i am getting error. >>>>>>>>>>>> What is wrong with this pattern? >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> in x86 it is; >>>>>>>>>>>>> >>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>>> >>>>>>>>>>>>> mine is >>>>>>>>>>>>> >>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 >>>>>>>>>>>>>> VR512:$src), sub_xmm))>; >>>>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>>>> Why its not working then? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>>>>>> instrinfo.td >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 >>>>>>>>>>>>>>>>> X86ISD::VBROADCAST t62 >>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, undef:i64 >>>>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> added the setoperationaction line in isellowering.cpp. >>>>>>>>>>>>>>>>>>> now getting the following error. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom); >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR >>>>>>>>>>>>>>>>>>>>>> is not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included broadcast >>>>>>>>>>>>>>>>>>>>>>> instruction in instructioninfo.td. but i made no >>>>>>>>>>>>>>>>>>>>>>> changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 >>>>>>>>>>>>>>>>>>>>>>> BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with >>>>>>>>>>>>>>>>>>>>>>>>> constant something like this; >>>>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] + >>>>>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for vector >>>>>>>>>>>>>>>>>>>>>>>>> of 64 elements. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>> ~Craig >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/684e0944/attachment.html>