Craig Topper via llvm-dev
2017-Aug-05 19:24 UTC
[llvm-dev] VBROADCAST Implementation Issues
It looks like X86TargetLowering::LowerBUILD_VECTOR is not creating a broadcast node for your wider vector type. ~Craig On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> Thank You. > > I made your mentioned changes and included broadcast instruction in > instructioninfo.td. but i made no changes in isellowering.cpp file. > > Still getting the following error. > > > > > LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR t62, t62, t62, t62, > t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, > t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, > t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, > t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62 > t62: f32,ch = load<LD4[ConstantPool]> t0, t64, undef:i64 > t64: i64 = X86ISD::Wrapper TargetConstantPool:i64<float > 0x3FC99999A0000000> 0 > t63: i64 = TargetConstantPool<float 0x3FC99999A0000000> 0 > t8: i64 = undef > t62: f32,ch = load<LD4[ConstantPool]> t0, t64, undef:i64 > t64: i64 = X86ISD::Wrapper TargetConstantPool:i64<float > 0x3FC99999A0000000> 0 > t63: i64 = TargetConstantPool<float 0x3FC99999A0000000> 0 > t8: i64 = undef > t62: f32,ch = load<LD4[ConstantPool]> t0, t64, undef:i64 > t64: i64 = X86ISD::Wrapper TargetConstantPool:i64<float > 0x3FC99999A0000000> 0 > t63: i64 = TargetConstantPool<float 0x3FC99999A0000000> 0 > ................. > In function: stencil > > > > > How to resolve this? > > Please help.. > > > > > > > > > > > > > > > > > On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <craig.topper at gmail.com> > wrote: > >> You need to use X86VBroadcast not "vbroadcast" >> >> ~Craig >> >> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> Hello, >>> >>> i have a c code which multiplies vector with constant something like >>> this; >>> float con=0.2; >>> for (k = 0; k < N; k++) { >>> for (i = 1; i <= N-2; i++) >>> for (j = 1; j <= N-2; j++) >>> b[i][j] = con * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + >>> a[i][j+1]); >>> >>> >>> now in LLVM IR I m getting; >>> >>> %22 = fmul <64 x float> %21, <float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000> >>> >>> but its assembly in x86 gives; >>> .LCPI0_0: >>> .long 1045220557 # float 0.200000003 >>> >>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>> >>> vmulps zmm2, zmm2, zmm1 >>> >>> how does it lowered the above IR code into vbroadcastss? >>> >>> What would be the pattern here to match? >>> >>> I want to implement similar broadcast for vector of 64 elements. >>> >>> i tried the following code; >>> >>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs VREGG:$dst), (ins >>> immem:$src), >>> "BROADCAST_DWORD\t{$src, $dst|$dst, $src}", >>> [(set VREGG:$dst, (v64i32 (vbroadcast addr:$src)))], >>> IIC_MOV_MEM>, TA; >>> >>> Please help me. I am stuck at this point. >>> >>> Thank You >>> Regards >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170805/d7182e7d/attachment.html>