hameeza ahmed via llvm-dev
2017-Aug-07 18:12 UTC
[llvm-dev] VBROADCAST Implementation Issues
Where to create it? In IR? How to achieve this? On Mon, Aug 7, 2017 at 11:10 PM, Craig Topper <craig.topper at gmail.com> wrote:> I don't think a standalone pattern outside of an instruction can support > multiple return values. > > So you'll need to create a separate FP gather instruction. > > ~Craig > > On Mon, Aug 7, 2017 at 11:08 AM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Ok I removed the parenthesis; now my code looks >> >> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), >> "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} >> {${mask}}, $src2}", >> [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 >> (masked_gather VR_2048:$src1, VK64WM:$mask, >> addr:$src2)))], >> IIC_MOV_MEM>, EVEX, EVEX_K, TA; >> >> def: Pat<(v64f32 (masked_gather (VR_2048:$src1), >> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, >> addr:$src2)>; >> >> Now getting this error; >> >> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init *, >> llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: Unhandled"' >> failed. >> >> >> What to do? >> >> >> >> >> >> >> >> On Mon, Aug 7, 2017 at 11:01 PM, Craig Topper <craig.topper at gmail.com> >> wrote: >> >>> Remove the parentheses around "(VR_2048:$src1)" >>> >>> ~Craig >>> >>> On Mon, Aug 7, 2017 at 10:57 AM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> Now getting this error: >>>> /lib/Target/X86/X86InstrInfo.td:3318:1: error: In GATHER_256B: >>>> Unrecognized node 'VR_2048'! >>>> >>>> >>>> >>>> >>>> On Mon, Aug 7, 2017 at 10:53 PM, Craig Topper <craig.topper at gmail.com> >>>> wrote: >>>> >>>>> You need to add EVEX_K and EVEX_4V to the end of your instruction >>>>> after TA. >>>>> >>>>> ~Craig >>>>> >>>>> On Mon, Aug 7, 2017 at 10:47 AM, hameeza ahmed <hahmed2305 at gmail.com> >>>>> wrote: >>>>> >>>>>> Thank You. Now getting this error: >>>>>> >>>>>> Unhandled memory encoding VK64WM >>>>>> Unhandled memory encoding >>>>>> >>>>>> >>>>>> On Mon, Aug 7, 2017 at 10:43 PM, Craig Topper <craig.topper at gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Right before your "def GATHER_256B" add the 'let' line like so >>>>>>> >>>>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask >>>>>>> $mask_wb" in >>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>>> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), >>>>>>> "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} >>>>>>> {${mask}}, $src2}", >>>>>>> [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 >>>>>>> (masked_gather (VR_2048:$src1), VK64WM:$mask, >>>>>>> addr:$src2)))], >>>>>>> IIC_MOV_MEM>, TA; >>>>>>> >>>>>>> def: Pat<(v64f32 (masked_gather (VR_2048:$src1), >>>>>>> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, >>>>>>> addr:$src2)>; >>>>>>> >>>>>>> ~Craig >>>>>>> >>>>>>> On Mon, Aug 7, 2017 at 10:39 AM, hameeza ahmed <hahmed2305 at gmail.com >>>>>>> > wrote: >>>>>>> >>>>>>>> Where to add this line? >>>>>>>> Sorry I didnt understand it. >>>>>>>> >>>>>>>> On Mon, Aug 7, 2017 at 10:37 PM, Craig Topper < >>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>> >>>>>>>>> You need this line from AVX512 code to tell the register >>>>>>>>> allocation system that $src1/$dst and $mask/$mask_wb to use the same >>>>>>>>> register. And the early clobber tells it that $dst and $src2 cannot use the >>>>>>>>> same register. >>>>>>>>> >>>>>>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask >>>>>>>>> $mask_wb" >>>>>>>>> >>>>>>>>> ~Craig >>>>>>>>> >>>>>>>>> On Mon, Aug 7, 2017 at 10:19 AM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Thank You. Still getting errors.I have modified my instructions >>>>>>>>>> as you said as follows: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>>>>>> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), >>>>>>>>>> "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} >>>>>>>>>> {${mask}}, $src2}", >>>>>>>>>> [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 >>>>>>>>>> (masked_gather (VR_2048:$src1), VK64WM:$mask, >>>>>>>>>> addr:$src2)))], >>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>> >>>>>>>>>> def: Pat<(v64f32 (masked_gather (VR_2048:$src1), >>>>>>>>>> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, >>>>>>>>>> addr:$src2)>; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Now getting this error: >>>>>>>>>> >>>>>>>>>> llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void >>>>>>>>>> llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier(): >>>>>>>>>> Assertion `numPhysicalOperands >= 2 + additionalOperands && >>>>>>>>>> numPhysicalOperands <= 4 + additionalOperands && "Unexpected number of >>>>>>>>>> operands for MRMSrcMemFrm"' failed. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper < >>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> masked_gather takes 3 inputs. not just an address. See the >>>>>>>>>>> AVX512 pattern is pasted earlier >>>>>>>>>>> >>>>>>>>>>> ~Craig >>>>>>>>>>> >>>>>>>>>>> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Changed it to; >>>>>>>>>>>> >>>>>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>>>>>>>> VK64:$mask), (ins i2048mem:$src), >>>>>>>>>>>> "GATHER_256B\t{$src, {$dst}{${mask}}|${dst} >>>>>>>>>>>> {${mask}}, $src}", >>>>>>>>>>>> [(set VR_2048:$dst, VK64:$mask, (v64i32 >>>>>>>>>>>> (masked_gather addr:$src)))], >>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), >>>>>>>>>>>> (GATHER_256B addr:$src)>; >>>>>>>>>>>> Now getting following error: >>>>>>>>>>>> >>>>>>>>>>>> Unhandled memory encoding VK64 >>>>>>>>>>>> Unhandled memory encoding >>>>>>>>>>>> UNREACHABLE executed at /utils/TableGen/X86Recognizabl >>>>>>>>>>>> eInstr.cpp:1347! >>>>>>>>>>>> >>>>>>>>>>>> What to do? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> i am getting this error >>>>>>>>>>>>> error: Variable not defined: '_' >>>>>>>>>>>>> for _.KRCWM >>>>>>>>>>>>> what to do? >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed < >>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> I did as you said, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please tell me whether the following correct now?? >>>>>>>>>>>>>> >>>>>>>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>>>>>>>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), >>>>>>>>>>>>>> "GATHER_256B\t{$src2, >>>>>>>>>>>>>> {$dst}{${mask}}|${dst} {${mask}}, $src2}"), >>>>>>>>>>>>>> [(set VR_2048:$dst, _.KRCWM:$mask_wb, >>>>>>>>>>>>>> (v64i32 (GatherNode (VR_2048:$src1), _.KRCWM:$mask, >>>>>>>>>>>>>> VR_2048:$src2))], >>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), >>>>>>>>>>>>>> (GATHER_256B addr:$src2)>; >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper < >>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> masked_gather returns two results. The data and the modified >>>>>>>>>>>>>>> mask. Note the $dst and the $mask_wb in the pattern below. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr, >>>>>>>>>>>>>>> X86VectorVTInfo _, >>>>>>>>>>>>>>> X86MemOperand memop, PatFrag >>>>>>>>>>>>>>> GatherNode> { >>>>>>>>>>>>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask >>>>>>>>>>>>>>> = $mask_wb", >>>>>>>>>>>>>>> ExeDomain = _.ExeDomain in >>>>>>>>>>>>>>> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, >>>>>>>>>>>>>>> _.KRCWM:$mask_wb), >>>>>>>>>>>>>>> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), >>>>>>>>>>>>>>> !strconcat(OpcodeStr#_.Suffix, >>>>>>>>>>>>>>> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, >>>>>>>>>>>>>>> $src2}"), >>>>>>>>>>>>>>> [(set _.RC:$dst, _.KRCWM:$mask_wb, >>>>>>>>>>>>>>> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, >>>>>>>>>>>>>>> vectoraddr:$src2))]>, EVEX, EVEX_K, >>>>>>>>>>>>>>> EVEX_CD8<_.EltSize, CD8VT1>; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed < >>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> i want to implement gather for v64i32. i wrote following >>>>>>>>>>>>>>>> code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>>>>>> "GATHER_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>> (masked_gather addr:$src)))], >>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), >>>>>>>>>>>>>>>> (GATHER_256B addr:$src)>; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also i wrote this line in isellowering.h >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> setOperationAction(ISD::MGATHER, >>>>>>>>>>>>>>>> MVT::v64i32, Legal); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> But I am getting following error: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >>>>>>>>>>>>>>>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init >>>>>>>>>>>>>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: >>>>>>>>>>>>>>>> Unhandled"' failed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What is my mistake? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please help me. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed < >>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am trying to implement vector shuffle for v64i32. Is the >>>>>>>>>>>>>>>>> following correct? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs >>>>>>>>>>>>>>>>> VR_2048:$dst), >>>>>>>>>>>>>>>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, >>>>>>>>>>>>>>>>> $src2, $dst|$dst, $src1, $src2}", >>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), >>>>>>>>>>>>>>>>> (v64i32 VR_2048:$src2)))]>, TA; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please help. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed < >>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> i managed to get rid of above error for >>>>>>>>>>>>>>>>>> VT.is2048BitVector()). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> this was implemented already. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> now will try define other vectors like >>>>>>>>>>>>>>>>>> VT.is4096BitVector()). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you. actually i have to implement both i32 and >>>>>>>>>>>>>>>>>>> i64. so i implemented two instructions now one broadcastS other broadcastD. >>>>>>>>>>>>>>>>>>> Although while doing broadcast from memory to register i was getting no >>>>>>>>>>>>>>>>>>> such error with 1 instruction and other patterns i64, i32 etc. but then >>>>>>>>>>>>>>>>>>> also i implemented its 2 versions single and double. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Actually, i am trying to compile matrix multiplication >>>>>>>>>>>>>>>>>>> code for greater size vector. There i need to include many new instructions >>>>>>>>>>>>>>>>>>> in my backend like shuffle, gather etc. For now i am getting the following >>>>>>>>>>>>>>>>>>> error. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: >>>>>>>>>>>>>>>>>>> llvm::SDValue getOnesVector(llvm::EVT, const llvm::X86Subtarget &, >>>>>>>>>>>>>>>>>>> llvm::SelectionDAG &, const llvm::SDLoc &): Assertion `(VT.is128BitVector() >>>>>>>>>>>>>>>>>>> || VT.is256BitVector() || VT.is512BitVector()) && "Expected a >>>>>>>>>>>>>>>>>>> 128/256/512-bit vector type"' failed. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> i tried including is2048Bit Vector() and others. also >>>>>>>>>>>>>>>>>>> in vectortype.h i included these types for EVT but was unable to compile >>>>>>>>>>>>>>>>>>> backend and getting errors. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Please help. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper < >>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> You need a new instruction. And your scalar register >>>>>>>>>>>>>>>>>>>> size needs to match your vector element size. So GR32 instead of GR64 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed < >>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Sorry to disturb, >>>>>>>>>>>>>>>>>>>>> Now i want to implement instruction to broadcast >>>>>>>>>>>>>>>>>>>>> scalar register content to vector. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> like this; >>>>>>>>>>>>>>>>>>>>> vpbroadcastq zmm0, rsi >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I tried implementing it as follows; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs >>>>>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins GR64:$src), >>>>>>>>>>>>>>>>>>>>> "BROADCASTR_256B\t{$src, >>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>> (X86VBroadcast GR64:$src)))], >>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>>>>>>>>>>>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Is it fine? Also do i need to define a new instruction >>>>>>>>>>>>>>>>>>>>> for this like BROADCASTR_256B? can i use the previous instruction >>>>>>>>>>>>>>>>>>>>> BROADCAST_256B (the one that broadcast memory scalar to vector) and just >>>>>>>>>>>>>>>>>>>>> define new pattern? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Please help. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank You so much. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Wao you are simply genius. >>>>>>>>>>>>>>>>>>>>>> initially I didnt include load in both the main >>>>>>>>>>>>>>>>>>>>>> instruction and pattern so i included in both as follows: >>>>>>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, >>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>> (X86VBroadcast (loadi32 addr:$src))))], >>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>>>> And it worked perfectly. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank You again. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Your pattern needs to be >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 >>>>>>>>>>>>>>>>>>>>>>> addr:$src))), (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> it runs fine with v64i32. but with the following >>>>>>>>>>>>>>>>>>>>>>>> pattern >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> i am getting error. >>>>>>>>>>>>>>>>>>>>>>>> What is wrong with this pattern? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> in x86 it is; >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 >>>>>>>>>>>>>>>>>>>>>>>>> addr:$src), >>>>>>>>>>>>>>>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> mine is >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>>>>>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 >>>>>>>>>>>>>>>>>>>>>>>>>> VR512:$src))), >>>>>>>>>>>>>>>>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 >>>>>>>>>>>>>>>>>>>>>>>>>> VR512:$src), sub_xmm))>; >>>>>>>>>>>>>>>>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>>>>>>>>>>>>>>>> Why its not working then? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> as you said; these are instructions that i >>>>>>>>>>>>>>>>>>>>>>>>>>>> defined in instrinfo.td >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, >>>>>>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> X86ISD::VBROADCAST t62 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> t65, undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> added the setoperationaction line in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> isellowering.cpp. now getting the following error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Topper <craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> v64i32, Custom); >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ahmed <hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Topper <craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> X86TargetLowering::LowerBUILD_VECTOR is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ahmed <hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> included broadcast instruction in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instructioninfo.td. but i made no >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t0, t64, undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t0, t64, undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t0, t64, undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Topper <craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "vbroadcast" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hameeza ahmed <hahmed2305 at gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> vector with constant something like this; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> match? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for vector of 64 elements. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MRMSrcMem, (outs VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (v64i32 (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/a56bfdb2/attachment-0001.html>