hameeza ahmed via llvm-dev
2017-Aug-07 17:19 UTC
[llvm-dev] VBROADCAST Implementation Issues
Thank You. Still getting errors.I have modified my instructions as you said as follows: def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} {${mask}}, $src2}", [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 (masked_gather (VR_2048:$src1), VK64WM:$mask, addr:$src2)))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (masked_gather (VR_2048:$src1), (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, addr:$src2)>; Now getting this error: llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier(): Assertion `numPhysicalOperands >= 2 + additionalOperands && numPhysicalOperands <= 4 + additionalOperands && "Unexpected number of operands for MRMSrcMemFrm"' failed. On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at gmail.com> wrote:> masked_gather takes 3 inputs. not just an address. See the AVX512 pattern > is pasted earlier > > ~Craig > > On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Changed it to; >> >> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64:$mask), >> (ins i2048mem:$src), >> "GATHER_256B\t{$src, {$dst}{${mask}}|${dst} >> {${mask}}, $src}", >> [(set VR_2048:$dst, VK64:$mask, (v64i32 >> (masked_gather addr:$src)))], >> IIC_MOV_MEM>, TA; >> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; >> Now getting following error: >> >> Unhandled memory encoding VK64 >> Unhandled memory encoding >> UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347! >> >> What to do? >> >> >> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> i am getting this error >>> error: Variable not defined: '_' >>> for _.KRCWM >>> what to do? >>> >>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> Hello, >>>> I did as you said, >>>> >>>> Please tell me whether the following correct now?? >>>> >>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), >>>> "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} >>>> {${mask}}, $src2}"), >>>> [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 >>>> (GatherNode (VR_2048:$src1), _.KRCWM:$mask, >>>> VR_2048:$src2))], >>>> IIC_MOV_MEM>, TA; >>>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>; >>>> >>>> Thank You >>>> >>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com> >>>> wrote: >>>> >>>>> masked_gather returns two results. The data and the modified mask. >>>>> Note the $dst and the $mask_wb in the pattern below. >>>>> >>>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr, >>>>> X86VectorVTInfo _, >>>>> X86MemOperand memop, PatFrag GatherNode> { >>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask >>>>> $mask_wb", >>>>> ExeDomain = _.ExeDomain in >>>>> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, >>>>> _.KRCWM:$mask_wb), >>>>> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), >>>>> !strconcat(OpcodeStr#_.Suffix, >>>>> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"), >>>>> [(set _.RC:$dst, _.KRCWM:$mask_wb, >>>>> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, >>>>> vectoraddr:$src2))]>, EVEX, EVEX_K, >>>>> EVEX_CD8<_.EltSize, CD8VT1>; >>>>> } >>>>> >>>>> ~Craig >>>>> >>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>> wrote: >>>>> >>>>>> i want to implement gather for v64i32. i wrote following code. >>>>>> >>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>> i2048mem:$src), >>>>>> "GATHER_256B\t{$src, $dst|$dst, $src}", >>>>>> [(set VR_2048:$dst, (v64i32 (masked_gather >>>>>> addr:$src)))], >>>>>> IIC_MOV_MEM>, TA; >>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; >>>>>> >>>>>> Also i wrote this line in isellowering.h >>>>>> >>>>>> setOperationAction(ISD::MGATHER, >>>>>> MVT::v64i32, Legal); >>>>>> >>>>>> But I am getting following error: >>>>>> >>>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >>>>>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init >>>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: >>>>>> Unhandled"' failed. >>>>>> >>>>>> What is my mistake? >>>>>> >>>>>> Please help me. >>>>>> >>>>>> >>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I am trying to implement vector shuffle for v64i32. Is the following >>>>>>> correct? >>>>>>> >>>>>>> >>>>>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>>>>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, >>>>>>> $src2, $dst|$dst, $src1, $src2}", >>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 >>>>>>> VR_2048:$src2)))]>, TA; >>>>>>> >>>>>>> Please help. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at gmail.com >>>>>>> > wrote: >>>>>>> >>>>>>>> i managed to get rid of above error for VT.is2048BitVector()). >>>>>>>> >>>>>>>> this was implemented already. >>>>>>>> >>>>>>>> now will try define other vectors like VT.is4096BitVector()). >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed < >>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>> >>>>>>>>> Thank you. actually i have to implement both i32 and i64. so i >>>>>>>>> implemented two instructions now one broadcastS other broadcastD. Although >>>>>>>>> while doing broadcast from memory to register i was getting no such error >>>>>>>>> with 1 instruction and other patterns i64, i32 etc. but then also i >>>>>>>>> implemented its 2 versions single and double. >>>>>>>>> >>>>>>>>> Actually, i am trying to compile matrix multiplication code for >>>>>>>>> greater size vector. There i need to include many new instructions in my >>>>>>>>> backend like shuffle, gather etc. For now i am getting the following error. >>>>>>>>> >>>>>>>>> >>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>>>>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>>>>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>>>>>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>>>>>>>> vector type"' failed. >>>>>>>>> >>>>>>>>> i tried including is2048Bit Vector() and others. also in >>>>>>>>> vectortype.h i included these types for EVT but was unable to compile >>>>>>>>> backend and getting errors. >>>>>>>>> >>>>>>>>> Please help. >>>>>>>>> >>>>>>>>> Thank You >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper < >>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> You need a new instruction. And your scalar register size needs >>>>>>>>>> to match your vector element size. So GR32 instead of GR64 >>>>>>>>>> >>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed < >>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Sorry to disturb, >>>>>>>>>>> Now i want to implement instruction to broadcast scalar register >>>>>>>>>>> content to vector. >>>>>>>>>>> >>>>>>>>>>> like this; >>>>>>>>>>> vpbroadcastq zmm0, rsi >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I tried implementing it as follows; >>>>>>>>>>> >>>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), >>>>>>>>>>> (ins GR64:$src), >>>>>>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>>>>> GR64:$src)))], >>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Is it fine? Also do i need to define a new instruction for this >>>>>>>>>>> like BROADCASTR_256B? can i use the previous instruction BROADCAST_256B >>>>>>>>>>> (the one that broadcast memory scalar to vector) and just define new >>>>>>>>>>> pattern? >>>>>>>>>>> >>>>>>>>>>> Please help. >>>>>>>>>>> >>>>>>>>>>> Thank You >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thank You so much. >>>>>>>>>>>> >>>>>>>>>>>> Wao you are simply genius. >>>>>>>>>>>> initially I didnt include load in both the main instruction and >>>>>>>>>>>> pattern so i included in both as follows: >>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast ( >>>>>>>>>>>> loadi32 addr:$src))))], >>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>> >>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>> And it worked perfectly. >>>>>>>>>>>> >>>>>>>>>>>> Thank You again. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Your pattern needs to be >>>>>>>>>>>>> >>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>> >>>>>>>>>>>>> ~Craig >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>>>>>> >>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>> >>>>>>>>>>>>>> i am getting error. >>>>>>>>>>>>>> What is wrong with this pattern? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> in x86 it is; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> mine is >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 >>>>>>>>>>>>>>>> VR512:$src), sub_xmm))>; >>>>>>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>>>>>> Why its not working then? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>>>>>>>> instrinfo.td >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 >>>>>>>>>>>>>>>>>>> X86ISD::VBROADCAST t62 >>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, >>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> added the setoperationaction line in isellowering.cpp. >>>>>>>>>>>>>>>>>>>>> now getting the following error. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, >>>>>>>>>>>>>>>>>>>>>> Custom); >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR >>>>>>>>>>>>>>>>>>>>>>>> is not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included >>>>>>>>>>>>>>>>>>>>>>>>> broadcast instruction in instructioninfo.td. but >>>>>>>>>>>>>>>>>>>>>>>>> i made no changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 >>>>>>>>>>>>>>>>>>>>>>>>> BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with >>>>>>>>>>>>>>>>>>>>>>>>>>> constant something like this; >>>>>>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] + >>>>>>>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for vector >>>>>>>>>>>>>>>>>>>>>>>>>>> of 64 elements. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>> ~Craig >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/a4a6e843/attachment-0001.html>
Craig Topper via llvm-dev
2017-Aug-07 17:37 UTC
[llvm-dev] VBROADCAST Implementation Issues
You need this line from AVX512 code to tell the register allocation system that $src1/$dst and $mask/$mask_wb to use the same register. And the early clobber tells it that $dst and $src2 cannot use the same register. let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb" ~Craig On Mon, Aug 7, 2017 at 10:19 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> Thank You. Still getting errors.I have modified my instructions as you > said as follows: > > > def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb), > (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), > "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} > {${mask}}, $src2}", > [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 > (masked_gather (VR_2048:$src1), VK64WM:$mask, > addr:$src2)))], > IIC_MOV_MEM>, TA; > > def: Pat<(v64f32 (masked_gather (VR_2048:$src1), > (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, > addr:$src2)>; > > > Now getting this error: > > llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void > llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier(): > Assertion `numPhysicalOperands >= 2 + additionalOperands && > numPhysicalOperands <= 4 + additionalOperands && "Unexpected number of > operands for MRMSrcMemFrm"' failed. > > > > > > > > > On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at gmail.com> > wrote: > >> masked_gather takes 3 inputs. not just an address. See the AVX512 pattern >> is pasted earlier >> >> ~Craig >> >> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> Changed it to; >>> >>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64:$mask), >>> (ins i2048mem:$src), >>> "GATHER_256B\t{$src, {$dst}{${mask}}|${dst} >>> {${mask}}, $src}", >>> [(set VR_2048:$dst, VK64:$mask, (v64i32 >>> (masked_gather addr:$src)))], >>> IIC_MOV_MEM>, TA; >>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; >>> Now getting following error: >>> >>> Unhandled memory encoding VK64 >>> Unhandled memory encoding >>> UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347! >>> >>> What to do? >>> >>> >>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> i am getting this error >>>> error: Variable not defined: '_' >>>> for _.KRCWM >>>> what to do? >>>> >>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>> wrote: >>>> >>>>> Hello, >>>>> I did as you said, >>>>> >>>>> Please tell me whether the following correct now?? >>>>> >>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), >>>>> "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} >>>>> {${mask}}, $src2}"), >>>>> [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 >>>>> (GatherNode (VR_2048:$src1), _.KRCWM:$mask, >>>>> VR_2048:$src2))], >>>>> IIC_MOV_MEM>, TA; >>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>; >>>>> >>>>> Thank You >>>>> >>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com> >>>>> wrote: >>>>> >>>>>> masked_gather returns two results. The data and the modified mask. >>>>>> Note the $dst and the $mask_wb in the pattern below. >>>>>> >>>>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr, >>>>>> X86VectorVTInfo _, >>>>>> X86MemOperand memop, PatFrag GatherNode> { >>>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask >>>>>> $mask_wb", >>>>>> ExeDomain = _.ExeDomain in >>>>>> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, >>>>>> _.KRCWM:$mask_wb), >>>>>> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), >>>>>> !strconcat(OpcodeStr#_.Suffix, >>>>>> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"), >>>>>> [(set _.RC:$dst, _.KRCWM:$mask_wb, >>>>>> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, >>>>>> vectoraddr:$src2))]>, EVEX, EVEX_K, >>>>>> EVEX_CD8<_.EltSize, CD8VT1>; >>>>>> } >>>>>> >>>>>> ~Craig >>>>>> >>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> i want to implement gather for v64i32. i wrote following code. >>>>>>> >>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>> i2048mem:$src), >>>>>>> "GATHER_256B\t{$src, $dst|$dst, $src}", >>>>>>> [(set VR_2048:$dst, (v64i32 (masked_gather >>>>>>> addr:$src)))], >>>>>>> IIC_MOV_MEM>, TA; >>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), >>>>>>> (GATHER_256B addr:$src)>; >>>>>>> >>>>>>> Also i wrote this line in isellowering.h >>>>>>> >>>>>>> setOperationAction(ISD::MGATHER, >>>>>>> MVT::v64i32, Legal); >>>>>>> >>>>>>> But I am getting following error: >>>>>>> >>>>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >>>>>>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init >>>>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: >>>>>>> Unhandled"' failed. >>>>>>> >>>>>>> What is my mistake? >>>>>>> >>>>>>> Please help me. >>>>>>> >>>>>>> >>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com >>>>>>> > wrote: >>>>>>> >>>>>>>> I am trying to implement vector shuffle for v64i32. Is the >>>>>>>> following correct? >>>>>>>> >>>>>>>> >>>>>>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>>>>>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, >>>>>>>> $src2, $dst|$dst, $src1, $src2}", >>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 >>>>>>>> VR_2048:$src2)))]>, TA; >>>>>>>> >>>>>>>> Please help. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed < >>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>> >>>>>>>>> i managed to get rid of above error for VT.is2048BitVector()). >>>>>>>>> >>>>>>>>> this was implemented already. >>>>>>>>> >>>>>>>>> now will try define other vectors like VT.is4096BitVector()). >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Thank you. actually i have to implement both i32 and i64. so i >>>>>>>>>> implemented two instructions now one broadcastS other broadcastD. Although >>>>>>>>>> while doing broadcast from memory to register i was getting no such error >>>>>>>>>> with 1 instruction and other patterns i64, i32 etc. but then also i >>>>>>>>>> implemented its 2 versions single and double. >>>>>>>>>> >>>>>>>>>> Actually, i am trying to compile matrix multiplication code for >>>>>>>>>> greater size vector. There i need to include many new instructions in my >>>>>>>>>> backend like shuffle, gather etc. For now i am getting the following error. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>>>>>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>>>>>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>>>>>>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>>>>>>>>> vector type"' failed. >>>>>>>>>> >>>>>>>>>> i tried including is2048Bit Vector() and others. also in >>>>>>>>>> vectortype.h i included these types for EVT but was unable to compile >>>>>>>>>> backend and getting errors. >>>>>>>>>> >>>>>>>>>> Please help. >>>>>>>>>> >>>>>>>>>> Thank You >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper < >>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> You need a new instruction. And your scalar register size needs >>>>>>>>>>> to match your vector element size. So GR32 instead of GR64 >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Sorry to disturb, >>>>>>>>>>>> Now i want to implement instruction to broadcast scalar >>>>>>>>>>>> register content to vector. >>>>>>>>>>>> >>>>>>>>>>>> like this; >>>>>>>>>>>> vpbroadcastq zmm0, rsi >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I tried implementing it as follows; >>>>>>>>>>>> >>>>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), >>>>>>>>>>>> (ins GR64:$src), >>>>>>>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>>>>>> GR64:$src)))], >>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Is it fine? Also do i need to define a new instruction for this >>>>>>>>>>>> like BROADCASTR_256B? can i use the previous instruction BROADCAST_256B >>>>>>>>>>>> (the one that broadcast memory scalar to vector) and just define new >>>>>>>>>>>> pattern? >>>>>>>>>>>> >>>>>>>>>>>> Please help. >>>>>>>>>>>> >>>>>>>>>>>> Thank You >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thank You so much. >>>>>>>>>>>>> >>>>>>>>>>>>> Wao you are simply genius. >>>>>>>>>>>>> initially I didnt include load in both the main instruction >>>>>>>>>>>>> and pattern so i included in both as follows: >>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>>>>>>> (loadi32 addr:$src))))], >>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>> >>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>> And it worked perfectly. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank You again. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Your pattern needs to be >>>>>>>>>>>>>> >>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>> >>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> i am getting error. >>>>>>>>>>>>>>> What is wrong with this pattern? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> in x86 it is; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> mine is >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 >>>>>>>>>>>>>>>>> VR512:$src), sub_xmm))>; >>>>>>>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>>>>>>> Why its not working then? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>>>>>>>>> instrinfo.td >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 >>>>>>>>>>>>>>>>>>>> X86ISD::VBROADCAST t62 >>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, >>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> added the setoperationaction line in >>>>>>>>>>>>>>>>>>>>>> isellowering.cpp. now getting the following error. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, >>>>>>>>>>>>>>>>>>>>>>> Custom); >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR >>>>>>>>>>>>>>>>>>>>>>>>> is not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included >>>>>>>>>>>>>>>>>>>>>>>>>> broadcast instruction in instructioninfo.td. but >>>>>>>>>>>>>>>>>>>>>>>>>> i made no changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 >>>>>>>>>>>>>>>>>>>>>>>>>> BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with >>>>>>>>>>>>>>>>>>>>>>>>>>>> constant something like this; >>>>>>>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] + >>>>>>>>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for >>>>>>>>>>>>>>>>>>>>>>>>>>>> vector of 64 elements. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>> ~Craig >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/3fd36f0a/attachment.html>
hameeza ahmed via llvm-dev
2017-Aug-07 17:39 UTC
[llvm-dev] VBROADCAST Implementation Issues
Where to add this line? Sorry I didnt understand it. On Mon, Aug 7, 2017 at 10:37 PM, Craig Topper <craig.topper at gmail.com> wrote:> You need this line from AVX512 code to tell the register allocation system > that $src1/$dst and $mask/$mask_wb to use the same register. And the early > clobber tells it that $dst and $src2 cannot use the same register. > > let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb" > > ~Craig > > On Mon, Aug 7, 2017 at 10:19 AM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Thank You. Still getting errors.I have modified my instructions as you >> said as follows: >> >> >> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), >> "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} >> {${mask}}, $src2}", >> [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 >> (masked_gather (VR_2048:$src1), VK64WM:$mask, >> addr:$src2)))], >> IIC_MOV_MEM>, TA; >> >> def: Pat<(v64f32 (masked_gather (VR_2048:$src1), >> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, >> addr:$src2)>; >> >> >> Now getting this error: >> >> llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void >> llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier(): >> Assertion `numPhysicalOperands >= 2 + additionalOperands && >> numPhysicalOperands <= 4 + additionalOperands && "Unexpected number of >> operands for MRMSrcMemFrm"' failed. >> >> >> >> >> >> >> >> >> On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at gmail.com> >> wrote: >> >>> masked_gather takes 3 inputs. not just an address. See the AVX512 >>> pattern is pasted earlier >>> >>> ~Craig >>> >>> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> Changed it to; >>>> >>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64:$mask), >>>> (ins i2048mem:$src), >>>> "GATHER_256B\t{$src, {$dst}{${mask}}|${dst} >>>> {${mask}}, $src}", >>>> [(set VR_2048:$dst, VK64:$mask, (v64i32 >>>> (masked_gather addr:$src)))], >>>> IIC_MOV_MEM>, TA; >>>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; >>>> Now getting following error: >>>> >>>> Unhandled memory encoding VK64 >>>> Unhandled memory encoding >>>> UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347! >>>> >>>> What to do? >>>> >>>> >>>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>> wrote: >>>> >>>>> i am getting this error >>>>> error: Variable not defined: '_' >>>>> for _.KRCWM >>>>> what to do? >>>>> >>>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> I did as you said, >>>>>> >>>>>> Please tell me whether the following correct now?? >>>>>> >>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), >>>>>> "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} >>>>>> {${mask}}, $src2}"), >>>>>> [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 >>>>>> (GatherNode (VR_2048:$src1), _.KRCWM:$mask, >>>>>> VR_2048:$src2))], >>>>>> IIC_MOV_MEM>, TA; >>>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>; >>>>>> >>>>>> Thank You >>>>>> >>>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> masked_gather returns two results. The data and the modified mask. >>>>>>> Note the $dst and the $mask_wb in the pattern below. >>>>>>> >>>>>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr, >>>>>>> X86VectorVTInfo _, >>>>>>> X86MemOperand memop, PatFrag GatherNode> { >>>>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask >>>>>>> $mask_wb", >>>>>>> ExeDomain = _.ExeDomain in >>>>>>> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, >>>>>>> _.KRCWM:$mask_wb), >>>>>>> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), >>>>>>> !strconcat(OpcodeStr#_.Suffix, >>>>>>> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"), >>>>>>> [(set _.RC:$dst, _.KRCWM:$mask_wb, >>>>>>> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, >>>>>>> vectoraddr:$src2))]>, EVEX, EVEX_K, >>>>>>> EVEX_CD8<_.EltSize, CD8VT1>; >>>>>>> } >>>>>>> >>>>>>> ~Craig >>>>>>> >>>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> i want to implement gather for v64i32. i wrote following code. >>>>>>>> >>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>>> i2048mem:$src), >>>>>>>> "GATHER_256B\t{$src, $dst|$dst, $src}", >>>>>>>> [(set VR_2048:$dst, (v64i32 (masked_gather >>>>>>>> addr:$src)))], >>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), >>>>>>>> (GATHER_256B addr:$src)>; >>>>>>>> >>>>>>>> Also i wrote this line in isellowering.h >>>>>>>> >>>>>>>> setOperationAction(ISD::MGATHER, >>>>>>>> MVT::v64i32, Legal); >>>>>>>> >>>>>>>> But I am getting following error: >>>>>>>> >>>>>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >>>>>>>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init >>>>>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: >>>>>>>> Unhandled"' failed. >>>>>>>> >>>>>>>> What is my mistake? >>>>>>>> >>>>>>>> Please help me. >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed < >>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>> >>>>>>>>> I am trying to implement vector shuffle for v64i32. Is the >>>>>>>>> following correct? >>>>>>>>> >>>>>>>>> >>>>>>>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>>>>>>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, >>>>>>>>> $src2, $dst|$dst, $src1, $src2}", >>>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 >>>>>>>>> VR_2048:$src2)))]>, TA; >>>>>>>>> >>>>>>>>> Please help. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> i managed to get rid of above error for VT.is2048BitVector()). >>>>>>>>>> >>>>>>>>>> this was implemented already. >>>>>>>>>> >>>>>>>>>> now will try define other vectors like VT.is4096BitVector()). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed < >>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Thank you. actually i have to implement both i32 and i64. so i >>>>>>>>>>> implemented two instructions now one broadcastS other broadcastD. Although >>>>>>>>>>> while doing broadcast from memory to register i was getting no such error >>>>>>>>>>> with 1 instruction and other patterns i64, i32 etc. but then also i >>>>>>>>>>> implemented its 2 versions single and double. >>>>>>>>>>> >>>>>>>>>>> Actually, i am trying to compile matrix multiplication code for >>>>>>>>>>> greater size vector. There i need to include many new instructions in my >>>>>>>>>>> backend like shuffle, gather etc. For now i am getting the following error. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>>>>>>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>>>>>>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>>>>>>>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>>>>>>>>>> vector type"' failed. >>>>>>>>>>> >>>>>>>>>>> i tried including is2048Bit Vector() and others. also in >>>>>>>>>>> vectortype.h i included these types for EVT but was unable to compile >>>>>>>>>>> backend and getting errors. >>>>>>>>>>> >>>>>>>>>>> Please help. >>>>>>>>>>> >>>>>>>>>>> Thank You >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper < >>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> You need a new instruction. And your scalar register size needs >>>>>>>>>>>> to match your vector element size. So GR32 instead of GR64 >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Sorry to disturb, >>>>>>>>>>>>> Now i want to implement instruction to broadcast scalar >>>>>>>>>>>>> register content to vector. >>>>>>>>>>>>> >>>>>>>>>>>>> like this; >>>>>>>>>>>>> vpbroadcastq zmm0, rsi >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I tried implementing it as follows; >>>>>>>>>>>>> >>>>>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), >>>>>>>>>>>>> (ins GR64:$src), >>>>>>>>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>>>>>>> GR64:$src)))], >>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>>>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Is it fine? Also do i need to define a new instruction for >>>>>>>>>>>>> this like BROADCASTR_256B? can i use the previous instruction >>>>>>>>>>>>> BROADCAST_256B (the one that broadcast memory scalar to vector) and just >>>>>>>>>>>>> define new pattern? >>>>>>>>>>>>> >>>>>>>>>>>>> Please help. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank You >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed < >>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thank You so much. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Wao you are simply genius. >>>>>>>>>>>>>> initially I didnt include load in both the main instruction >>>>>>>>>>>>>> and pattern so i included in both as follows: >>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>> (X86VBroadcast (loadi32 addr:$src))))], >>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>> >>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>> And it worked perfectly. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank You again. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Your pattern needs to be >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> i am getting error. >>>>>>>>>>>>>>>> What is wrong with this pattern? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> in x86 it is; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> mine is >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 >>>>>>>>>>>>>>>>>> VR512:$src), sub_xmm))>; >>>>>>>>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>>>>>>>> Why its not working then? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>>>>>>>>>> instrinfo.td >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 >>>>>>>>>>>>>>>>>>>>> X86ISD::VBROADCAST t62 >>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, >>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> added the setoperationaction line in >>>>>>>>>>>>>>>>>>>>>>> isellowering.cpp. now getting the following error. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, >>>>>>>>>>>>>>>>>>>>>>>> Custom); >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR >>>>>>>>>>>>>>>>>>>>>>>>>> is not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included >>>>>>>>>>>>>>>>>>>>>>>>>>> broadcast instruction in instructioninfo.td. >>>>>>>>>>>>>>>>>>>>>>>>>>> but i made no changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 >>>>>>>>>>>>>>>>>>>>>>>>>>> BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed >>>>>>>>>>>>>>>>>>>>>>>>>>>> <hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with >>>>>>>>>>>>>>>>>>>>>>>>>>>>> constant something like this; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] >>>>>>>>>>>>>>>>>>>>>>>>>>>>> + a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for >>>>>>>>>>>>>>>>>>>>>>>>>>>>> vector of 64 elements. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>> ~Craig >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/d557d8fd/attachment-0001.html>
hameeza ahmed via llvm-dev
2017-Aug-07 17:57 UTC
[llvm-dev] VBROADCAST Implementation Issues
Now getting this error: /lib/Target/X86/X86InstrInfo.td:3318:1: error: In GATHER_256B: Unrecognized node 'VR_2048'! On Mon, Aug 7, 2017 at 10:53 PM, Craig Topper <craig.topper at gmail.com> wrote:> You need to add EVEX_K and EVEX_4V to the end of your instruction after TA. > > ~Craig > > On Mon, Aug 7, 2017 at 10:47 AM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Thank You. Now getting this error: >> >> Unhandled memory encoding VK64WM >> Unhandled memory encoding >> >> >> On Mon, Aug 7, 2017 at 10:43 PM, Craig Topper <craig.topper at gmail.com> >> wrote: >> >>> Right before your "def GATHER_256B" add the 'let' line like so >>> >>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb" in >>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), >>> "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} >>> {${mask}}, $src2}", >>> [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 >>> (masked_gather (VR_2048:$src1), VK64WM:$mask, >>> addr:$src2)))], >>> IIC_MOV_MEM>, TA; >>> >>> def: Pat<(v64f32 (masked_gather (VR_2048:$src1), >>> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, >>> addr:$src2)>; >>> >>> ~Craig >>> >>> On Mon, Aug 7, 2017 at 10:39 AM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> Where to add this line? >>>> Sorry I didnt understand it. >>>> >>>> On Mon, Aug 7, 2017 at 10:37 PM, Craig Topper <craig.topper at gmail.com> >>>> wrote: >>>> >>>>> You need this line from AVX512 code to tell the register allocation >>>>> system that $src1/$dst and $mask/$mask_wb to use the same register. And the >>>>> early clobber tells it that $dst and $src2 cannot use the same register. >>>>> >>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb" >>>>> >>>>> ~Craig >>>>> >>>>> On Mon, Aug 7, 2017 at 10:19 AM, hameeza ahmed <hahmed2305 at gmail.com> >>>>> wrote: >>>>> >>>>>> Thank You. Still getting errors.I have modified my instructions as >>>>>> you said as follows: >>>>>> >>>>>> >>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), >>>>>> "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} >>>>>> {${mask}}, $src2}", >>>>>> [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 >>>>>> (masked_gather (VR_2048:$src1), VK64WM:$mask, >>>>>> addr:$src2)))], >>>>>> IIC_MOV_MEM>, TA; >>>>>> >>>>>> def: Pat<(v64f32 (masked_gather (VR_2048:$src1), >>>>>> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, >>>>>> addr:$src2)>; >>>>>> >>>>>> >>>>>> Now getting this error: >>>>>> >>>>>> llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void >>>>>> llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier(): >>>>>> Assertion `numPhysicalOperands >= 2 + additionalOperands && >>>>>> numPhysicalOperands <= 4 + additionalOperands && "Unexpected number of >>>>>> operands for MRMSrcMemFrm"' failed. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> masked_gather takes 3 inputs. not just an address. See the AVX512 >>>>>>> pattern is pasted earlier >>>>>>> >>>>>>> ~Craig >>>>>>> >>>>>>> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Changed it to; >>>>>>>> >>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>>>> VK64:$mask), (ins i2048mem:$src), >>>>>>>> "GATHER_256B\t{$src, {$dst}{${mask}}|${dst} >>>>>>>> {${mask}}, $src}", >>>>>>>> [(set VR_2048:$dst, VK64:$mask, (v64i32 >>>>>>>> (masked_gather addr:$src)))], >>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), >>>>>>>> (GATHER_256B addr:$src)>; >>>>>>>> Now getting following error: >>>>>>>> >>>>>>>> Unhandled memory encoding VK64 >>>>>>>> Unhandled memory encoding >>>>>>>> UNREACHABLE executed at /utils/TableGen/X86Recognizabl >>>>>>>> eInstr.cpp:1347! >>>>>>>> >>>>>>>> What to do? >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> i am getting this error >>>>>>>>> error: Variable not defined: '_' >>>>>>>>> for _.KRCWM >>>>>>>>> what to do? >>>>>>>>> >>>>>>>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> I did as you said, >>>>>>>>>> >>>>>>>>>> Please tell me whether the following correct now?? >>>>>>>>>> >>>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>>>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), >>>>>>>>>> "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} >>>>>>>>>> {${mask}}, $src2}"), >>>>>>>>>> [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 >>>>>>>>>> (GatherNode (VR_2048:$src1), _.KRCWM:$mask, >>>>>>>>>> VR_2048:$src2))], >>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), >>>>>>>>>> (GATHER_256B addr:$src2)>; >>>>>>>>>> >>>>>>>>>> Thank You >>>>>>>>>> >>>>>>>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper < >>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> masked_gather returns two results. The data and the modified >>>>>>>>>>> mask. Note the $dst and the $mask_wb in the pattern below. >>>>>>>>>>> >>>>>>>>>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr, >>>>>>>>>>> X86VectorVTInfo _, >>>>>>>>>>> X86MemOperand memop, PatFrag >>>>>>>>>>> GatherNode> { >>>>>>>>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask >>>>>>>>>>> $mask_wb", >>>>>>>>>>> ExeDomain = _.ExeDomain in >>>>>>>>>>> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, >>>>>>>>>>> _.KRCWM:$mask_wb), >>>>>>>>>>> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), >>>>>>>>>>> !strconcat(OpcodeStr#_.Suffix, >>>>>>>>>>> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, >>>>>>>>>>> $src2}"), >>>>>>>>>>> [(set _.RC:$dst, _.KRCWM:$mask_wb, >>>>>>>>>>> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, >>>>>>>>>>> vectoraddr:$src2))]>, EVEX, EVEX_K, >>>>>>>>>>> EVEX_CD8<_.EltSize, CD8VT1>; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> ~Craig >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> i want to implement gather for v64i32. i wrote following code. >>>>>>>>>>>> >>>>>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>>>>>>> i2048mem:$src), >>>>>>>>>>>> "GATHER_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (masked_gather >>>>>>>>>>>> addr:$src)))], >>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), >>>>>>>>>>>> (GATHER_256B addr:$src)>; >>>>>>>>>>>> >>>>>>>>>>>> Also i wrote this line in isellowering.h >>>>>>>>>>>> >>>>>>>>>>>> setOperationAction(ISD::MGATHER, >>>>>>>>>>>> MVT::v64i32, Legal); >>>>>>>>>>>> >>>>>>>>>>>> But I am getting following error: >>>>>>>>>>>> >>>>>>>>>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >>>>>>>>>>>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init >>>>>>>>>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: >>>>>>>>>>>> Unhandled"' failed. >>>>>>>>>>>> >>>>>>>>>>>> What is my mistake? >>>>>>>>>>>> >>>>>>>>>>>> Please help me. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I am trying to implement vector shuffle for v64i32. Is the >>>>>>>>>>>>> following correct? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>>>>>>>>>>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, >>>>>>>>>>>>> $src2, $dst|$dst, $src1, $src2}", >>>>>>>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), >>>>>>>>>>>>> (v64i32 VR_2048:$src2)))]>, TA; >>>>>>>>>>>>> >>>>>>>>>>>>> Please help. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed < >>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> i managed to get rid of above error for >>>>>>>>>>>>>> VT.is2048BitVector()). >>>>>>>>>>>>>> >>>>>>>>>>>>>> this was implemented already. >>>>>>>>>>>>>> >>>>>>>>>>>>>> now will try define other vectors like VT.is4096BitVector()). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed < >>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you. actually i have to implement both i32 and i64. so >>>>>>>>>>>>>>> i implemented two instructions now one broadcastS other broadcastD. >>>>>>>>>>>>>>> Although while doing broadcast from memory to register i was getting no >>>>>>>>>>>>>>> such error with 1 instruction and other patterns i64, i32 etc. but then >>>>>>>>>>>>>>> also i implemented its 2 versions single and double. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Actually, i am trying to compile matrix multiplication code >>>>>>>>>>>>>>> for greater size vector. There i need to include many new instructions in >>>>>>>>>>>>>>> my backend like shuffle, gather etc. For now i am getting the following >>>>>>>>>>>>>>> error. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: >>>>>>>>>>>>>>> llvm::SDValue getOnesVector(llvm::EVT, const llvm::X86Subtarget &, >>>>>>>>>>>>>>> llvm::SelectionDAG &, const llvm::SDLoc &): Assertion `(VT.is128BitVector() >>>>>>>>>>>>>>> || VT.is256BitVector() || VT.is512BitVector()) && "Expected a >>>>>>>>>>>>>>> 128/256/512-bit vector type"' failed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> i tried including is2048Bit Vector() and others. also in >>>>>>>>>>>>>>> vectortype.h i included these types for EVT but was unable to compile >>>>>>>>>>>>>>> backend and getting errors. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please help. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper < >>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You need a new instruction. And your scalar register size >>>>>>>>>>>>>>>> needs to match your vector element size. So GR32 instead of GR64 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed < >>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sorry to disturb, >>>>>>>>>>>>>>>>> Now i want to implement instruction to broadcast scalar >>>>>>>>>>>>>>>>> register content to vector. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> like this; >>>>>>>>>>>>>>>>> vpbroadcastq zmm0, rsi >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I tried implementing it as follows; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs >>>>>>>>>>>>>>>>> VR_2048:$dst), (ins GR64:$src), >>>>>>>>>>>>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>> (X86VBroadcast GR64:$src)))], >>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>>>>>>>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is it fine? Also do i need to define a new instruction for >>>>>>>>>>>>>>>>> this like BROADCASTR_256B? can i use the previous instruction >>>>>>>>>>>>>>>>> BROADCAST_256B (the one that broadcast memory scalar to vector) and just >>>>>>>>>>>>>>>>> define new pattern? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please help. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed < >>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank You so much. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Wao you are simply genius. >>>>>>>>>>>>>>>>>> initially I didnt include load in both the main >>>>>>>>>>>>>>>>>> instruction and pattern so i included in both as follows: >>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>> (X86VBroadcast (loadi32 addr:$src))))], >>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>> And it worked perfectly. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank You again. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Your pattern needs to be >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> i am getting error. >>>>>>>>>>>>>>>>>>>> What is wrong with this pattern? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> in x86 it is; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 >>>>>>>>>>>>>>>>>>>>> addr:$src), >>>>>>>>>>>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> mine is >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 >>>>>>>>>>>>>>>>>>>>>> VR512:$src), sub_xmm))>; >>>>>>>>>>>>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>>>>>>>>>>>> Why its not working then? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> as you said; these are instructions that i defined >>>>>>>>>>>>>>>>>>>>>>>> in instrinfo.td >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, >>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 >>>>>>>>>>>>>>>>>>>>>>>>> X86ISD::VBROADCAST t62 >>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, >>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> added the setoperationaction line in >>>>>>>>>>>>>>>>>>>>>>>>>>> isellowering.cpp. now getting the following error. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, >>>>>>>>>>>>>>>>>>>>>>>>>>>> Custom); >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed >>>>>>>>>>>>>>>>>>>>>>>>>>>> <hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper >>>>>>>>>>>>>>>>>>>>>>>>>>>>> <craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ahmed <hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> broadcast instruction in instructioninfo.td. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but i made no changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64, undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64, undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64, undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Topper <craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "vbroadcast" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ahmed <hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with constant something like this; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> vector of 64 elements. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (outs VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (v64i32 (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/ae7e356e/attachment-0001.html>