hameeza ahmed via llvm-dev
2017-Aug-06 21:21 UTC
[llvm-dev] VBROADCAST Implementation Issues
i want to implement gather for v64i32. i wrote following code.
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
i2048mem:$src),
"GATHER_256B\t{$src, $dst|$dst, $src}",
[(set VR_2048:$dst, (v64i32 (masked_gather
addr:$src)))],
IIC_MOV_MEM>, TA;
def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>;
Also i wrote this line in isellowering.h
setOperationAction(ISD::MGATHER, MVT::v64i32,
Legal);
But I am getting following error:
llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init *,
llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME:
Unhandled"'
failed.
What is my mistake?
Please help me.
On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com>
wrote:
> I am trying to implement vector shuffle for v64i32. Is the following
> correct?
>
>
> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst),
> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2,
> $dst|$dst, $src1, $src2}",
> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32
> VR_2048:$src2)))]>, TA;
>
> Please help.
>
>
>
>
> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> i managed to get rid of above error for VT.is2048BitVector()).
>>
>> this was implemented already.
>>
>> now will try define other vectors like VT.is4096BitVector()).
>>
>>
>>
>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>> wrote:
>>
>>> Thank you. actually i have to implement both i32 and i64. so i
>>> implemented two instructions now one broadcastS other broadcastD.
Although
>>> while doing broadcast from memory to register i was getting no such
error
>>> with 1 instruction and other patterns i64, i32 etc. but then also i
>>> implemented its 2 versions single and double.
>>>
>>> Actually, i am trying to compile matrix multiplication code for
greater
>>> size vector. There i need to include many new instructions in my
backend
>>> like shuffle, gather etc. For now i am getting the following error.
>>>
>>>
>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>
>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue
>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &,
llvm::SelectionDAG &,
>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() ||
>>> VT.is256BitVector() || VT.is512BitVector()) &&
"Expected a 128/256/512-bit
>>> vector type"' failed.
>>>
>>> i tried including is2048Bit Vector() and others. also in
vectortype.h i
>>> included these types for EVT but was unable to compile backend and
getting
>>> errors.
>>>
>>> Please help.
>>>
>>> Thank You
>>>
>>>
>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <craig.topper at
gmail.com>
>>> wrote:
>>>
>>>> You need a new instruction. And your scalar register size needs
to
>>>> match your vector element size. So GR32 instead of GR64
>>>>
>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <hahmed2305 at
gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry to disturb,
>>>>> Now i want to implement instruction to broadcast scalar
register
>>>>> content to vector.
>>>>>
>>>>> like this;
>>>>> vpbroadcastq zmm0, rsi
>>>>>
>>>>>
>>>>> I tried implementing it as follows;
>>>>>
>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs
VR_2048:$dst), (ins
>>>>> GR64:$src),
>>>>> "BROADCASTR_256B\t{$src,
$dst|$dst, $src}",
>>>>> [(set VR_2048:$dst, (v64i32
(X86VBroadcast
>>>>> GR64:$src)))],
>>>>> IIC_MOV_MEM>, TA;
>>>>>
>>>>>
>>>>>
>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)),
>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>
>>>>>
>>>>> Is it fine? Also do i need to define a new instruction for
this like
>>>>> BROADCASTR_256B? can i use the previous instruction
BROADCAST_256B (the one
>>>>> that broadcast memory scalar to vector) and just define new
pattern?
>>>>>
>>>>> Please help.
>>>>>
>>>>> Thank You
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thank You so much.
>>>>>>
>>>>>> Wao you are simply genius.
>>>>>> initially I didnt include load in both the main
instruction and
>>>>>> pattern so i included in both as follows:
>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs
VR_2048:$dst), (ins
>>>>>> i2048mem:$src),
>>>>>> "BROADCAST_256B\t{$src,
$dst|$dst, $src}",
>>>>>> [(set VR_2048:$dst, (v64i32
(X86VBroadcast (
>>>>>> loadi32 addr:$src))))],
>>>>>> IIC_MOV_MEM>, TA;
>>>>>>
>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32
addr:$src))),
>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>> And it worked perfectly.
>>>>>>
>>>>>> Thank You again.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper
<craig.topper at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Your pattern needs to be
>>>>>>>
>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32
addr:$src))),
>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>
>>>>>>> ~Craig
>>>>>>>
>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> it runs fine with v64i32. but with the
following pattern
>>>>>>>>
>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>
>>>>>>>> i am getting error.
>>>>>>>> What is wrong with this pattern?
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed
<hahmed2305 at gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> in x86 it is;
>>>>>>>>>
>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>> (VBROADCASTSSZm addr:$src)>;
>>>>>>>>>
>>>>>>>>> mine is
>>>>>>>>>
>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
addr:$src)),
>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza
ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> for v16f32 it is defined as;
>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32
VR512:$src))),
>>>>>>>>>> (VBROADCASTSSZr
(EXTRACT_SUBREG (v16f32 VR512:$src),
>>>>>>>>>> sub_xmm))>;
>>>>>>>>>> which is similar to mine.
>>>>>>>>>> Why its not working then?
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig
Topper <
>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> You need a pattern for v64f32 too.
>>>>>>>>>>>
>>>>>>>>>>> ~Craig
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM,
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> as you said; these are
instructions that i defined in
>>>>>>>>>>>> instrinfo.td
>>>>>>>>>>>>
>>>>>>>>>>>> def BROADCAST_256B : I<0x31,
MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>> [(set
VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>>>>>>> addr:$src)))],
>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM,
hameeza ahmed <
>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I did as you said;
>>>>>>>>>>>>> now getting this error:
>>>>>>>>>>>>>
>>>>>>>>>>>>> LLVM ERROR: Cannot select:
t63: v64f32 = X86ISD::VBROADCAST t62
>>>>>>>>>>>>> t62: f32,ch =
load<LD4[ConstantPool]> t0, t65, undef:i64
>>>>>>>>>>>>> t65: i64 =
X86ISD::Wrapper TargetConstantPool:i64<float
>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>> t64: i64 =
TargetConstantPool<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>> t8: i64 = undef
>>>>>>>>>>>>> In function: stencil
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14
AM, Craig Topper <
>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Add
VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at
1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> added the
setoperationaction line in isellowering.cpp. now
>>>>>>>>>>>>>>> getting the
following error.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> llc:
/lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>> llvm::SDValue
LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>> *, const
llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>> "Unsupported
vector type for broadcast."' failed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What should I do?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017
at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Well first have
you done this for your type
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Aug 5,
2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How to do
this task??
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Aug
6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It
looks like X86TargetLowering::LowerBUILD_VECTOR is
>>>>>>>>>>>>>>>>>> not
creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat,
Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I
made your mentioned changes and included broadcast
>>>>>>>>>>>>>>>>>>>
instruction in instructioninfo.td. but i made no
>>>>>>>>>>>>>>>>>>>
changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR
>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>> In
function: stencil
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> How
to resolve this?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with constant
>>>>>>>>>>>>>>>>>>>>>
something like this;
>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>
a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>
.long 1045220557 # float 0.200000003
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for vector of 64
>>>>>>>>>>>>>>>>>>>>>
elements.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>> ~Craig
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/6a188f2b/attachment-0001.html>
Craig Topper via llvm-dev
2017-Aug-06 21:57 UTC
[llvm-dev] VBROADCAST Implementation Issues
masked_gather returns two results. The data and the modified mask. Note the
$dst and the $mask_wb in the pattern below.
multiclass avx512_gather<bits<8> opc, string OpcodeStr, X86VectorVTInfo
_,
X86MemOperand memop, PatFrag GatherNode> {
let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask =
$mask_wb",
ExeDomain = _.ExeDomain in
def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb),
(ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
!strconcat(OpcodeStr#_.Suffix,
"\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"),
[(set _.RC:$dst, _.KRCWM:$mask_wb,
(GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask,
vectoraddr:$src2))]>, EVEX, EVEX_K,
EVEX_CD8<_.EltSize, CD8VT1>;
}
~Craig
On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com>
wrote:
> i want to implement gather for v64i32. i wrote following code.
>
> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
> i2048mem:$src),
> "GATHER_256B\t{$src, $dst|$dst, $src}",
> [(set VR_2048:$dst, (v64i32 (masked_gather
> addr:$src)))],
> IIC_MOV_MEM>, TA;
> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
addr:$src)>;
>
> Also i wrote this line in isellowering.h
>
> setOperationAction(ISD::MGATHER, MVT::v64i32,
> Legal);
>
> But I am getting following error:
>
> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init *,
> llvm::StringRef): Assertion `New->getNumTypes() == 1 &&
"FIXME: Unhandled"'
> failed.
>
> What is my mistake?
>
> Please help me.
>
>
> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> I am trying to implement vector shuffle for v64i32. Is the following
>> correct?
>>
>>
>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst),
>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
$src2,
>> $dst|$dst, $src1, $src2}",
>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32
>> VR_2048:$src2)))]>, TA;
>>
>> Please help.
>>
>>
>>
>>
>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>> wrote:
>>
>>> i managed to get rid of above error for VT.is2048BitVector()).
>>>
>>> this was implemented already.
>>>
>>> now will try define other vectors like VT.is4096BitVector()).
>>>
>>>
>>>
>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> Thank you. actually i have to implement both i32 and i64. so i
>>>> implemented two instructions now one broadcastS other
broadcastD. Although
>>>> while doing broadcast from memory to register i was getting no
such error
>>>> with 1 instruction and other patterns i64, i32 etc. but then
also i
>>>> implemented its 2 versions single and double.
>>>>
>>>> Actually, i am trying to compile matrix multiplication code for
greater
>>>> size vector. There i need to include many new instructions in
my backend
>>>> like shuffle, gather etc. For now i am getting the following
error.
>>>>
>>>>
>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>
>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue
>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &,
llvm::SelectionDAG &,
>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() ||
>>>> VT.is256BitVector() || VT.is512BitVector()) &&
"Expected a 128/256/512-bit
>>>> vector type"' failed.
>>>>
>>>> i tried including is2048Bit Vector() and others. also in
vectortype.h
>>>> i included these types for EVT but was unable to compile
backend and
>>>> getting errors.
>>>>
>>>> Please help.
>>>>
>>>> Thank You
>>>>
>>>>
>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <craig.topper
at gmail.com>
>>>> wrote:
>>>>
>>>>> You need a new instruction. And your scalar register size
needs to
>>>>> match your vector element size. So GR32 instead of GR64
>>>>>
>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <hahmed2305
at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Sorry to disturb,
>>>>>> Now i want to implement instruction to broadcast scalar
register
>>>>>> content to vector.
>>>>>>
>>>>>> like this;
>>>>>> vpbroadcastq zmm0, rsi
>>>>>>
>>>>>>
>>>>>> I tried implementing it as follows;
>>>>>>
>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs
VR_2048:$dst), (ins
>>>>>> GR64:$src),
>>>>>> "BROADCASTR_256B\t{$src,
$dst|$dst, $src}",
>>>>>> [(set VR_2048:$dst, (v64i32
(X86VBroadcast
>>>>>> GR64:$src)))],
>>>>>> IIC_MOV_MEM>, TA;
>>>>>>
>>>>>>
>>>>>>
>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)),
>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>
>>>>>>
>>>>>> Is it fine? Also do i need to define a new instruction
for this like
>>>>>> BROADCASTR_256B? can i use the previous instruction
BROADCAST_256B (the one
>>>>>> that broadcast memory scalar to vector) and just define
new pattern?
>>>>>>
>>>>>> Please help.
>>>>>>
>>>>>> Thank You
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank You so much.
>>>>>>>
>>>>>>> Wao you are simply genius.
>>>>>>> initially I didnt include load in both the main
instruction and
>>>>>>> pattern so i included in both as follows:
>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs
VR_2048:$dst), (ins
>>>>>>> i2048mem:$src),
>>>>>>> "BROADCAST_256B\t{$src,
$dst|$dst, $src}",
>>>>>>> [(set VR_2048:$dst, (v64i32
(X86VBroadcast (
>>>>>>> loadi32 addr:$src))))],
>>>>>>> IIC_MOV_MEM>, TA;
>>>>>>>
>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32
addr:$src))),
>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>> And it worked perfectly.
>>>>>>>
>>>>>>> Thank You again.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper
<craig.topper at gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Your pattern needs to be
>>>>>>>>
>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32
addr:$src))),
>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>
>>>>>>>> ~Craig
>>>>>>>>
>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed
<hahmed2305 at gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> it runs fine with v64i32. but with the
following pattern
>>>>>>>>>
>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
addr:$src)),
>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>
>>>>>>>>> i am getting error.
>>>>>>>>> What is wrong with this pattern?
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza
ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> in x86 it is;
>>>>>>>>>>
>>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>> (VBROADCASTSSZm
addr:$src)>;
>>>>>>>>>>
>>>>>>>>>> mine is
>>>>>>>>>>
>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
addr:$src)),
>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza
ahmed <
>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> for v16f32 it is defined as;
>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast
(v16f32 VR512:$src))),
>>>>>>>>>>> (VBROADCASTSSZr
(EXTRACT_SUBREG (v16f32 VR512:$src),
>>>>>>>>>>> sub_xmm))>;
>>>>>>>>>>> which is similar to mine.
>>>>>>>>>>> Why its not working then?
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM,
Craig Topper <
>>>>>>>>>>> craig.topper at gmail.com>
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> You need a pattern for v64f32
too.
>>>>>>>>>>>>
>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM,
hameeza ahmed <
>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> as you said; these are
instructions that i defined in
>>>>>>>>>>>>> instrinfo.td
>>>>>>>>>>>>>
>>>>>>>>>>>>> def BROADCAST_256B :
I<0x31, MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>>
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>> [(set
VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>>>>>>>> addr:$src)))],
>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28
AM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did as you said;
>>>>>>>>>>>>>> now getting this error:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> LLVM ERROR: Cannot
select: t63: v64f32 = X86ISD::VBROADCAST
>>>>>>>>>>>>>> t62
>>>>>>>>>>>>>> t62: f32,ch =
load<LD4[ConstantPool]> t0, t65, undef:i64
>>>>>>>>>>>>>> t65: i64 =
X86ISD::Wrapper TargetConstantPool:i64<float
>>>>>>>>>>>>>> 0x3FC99999A0000000>
0
>>>>>>>>>>>>>> t64: i64 =
TargetConstantPool<float 0x3FC99999A0000000>
>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>> t8: i64 = undef
>>>>>>>>>>>>>> In function: stencil
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at
1:14 AM, Craig Topper <
>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Add
VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Aug 5, 2017
at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> added the
setoperationaction line in isellowering.cpp. now
>>>>>>>>>>>>>>>> getting the
following error.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> llc:
/lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>> llvm::SDValue
LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>> *, const
llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What should I
do?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6,
2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Well first
have you done this for your type
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Aug
5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>> hahmed2305
at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> How to
do this task??
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun,
Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It
looks like X86TargetLowering::LowerBUILD_VECTOR is
>>>>>>>>>>>>>>>>>>> not
creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included broadcast
>>>>>>>>>>>>>>>>>>>>
instruction in instructioninfo.td. but i made no
>>>>>>>>>>>>>>>>>>>>
changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with constant
>>>>>>>>>>>>>>>>>>>>>>
something like this;
>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>
a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557 # float 0.200000003
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for vector of
>>>>>>>>>>>>>>>>>>>>>>
64 elements.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>> ~Craig
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170806/912e0a88/attachment-0001.html>
hameeza ahmed via llvm-dev
2017-Aug-07 08:13 UTC
[llvm-dev] VBROADCAST Implementation Issues
Hello,
I did as you said,
Please tell me whether the following correct now??
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb),
(VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
"GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}},
$src2}"),
[(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32
(GatherNode (VR_2048:$src1), _.KRCWM:$mask,
VR_2048:$src2))],
IIC_MOV_MEM>, TA;
def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>;
Thank You
On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com>
wrote:
> masked_gather returns two results. The data and the modified mask. Note
> the $dst and the $mask_wb in the pattern below.
>
> multiclass avx512_gather<bits<8> opc, string OpcodeStr,
X86VectorVTInfo _,
> X86MemOperand memop, PatFrag GatherNode> {
> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask =
$mask_wb",
> ExeDomain = _.ExeDomain in
> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb),
> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
> !strconcat(OpcodeStr#_.Suffix,
> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}},
$src2}"),
> [(set _.RC:$dst, _.KRCWM:$mask_wb,
> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask,
> vectoraddr:$src2))]>, EVEX, EVEX_K,
> EVEX_CD8<_.EltSize, CD8VT1>;
> }
>
> ~Craig
>
> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> i want to implement gather for v64i32. i wrote following code.
>>
>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
>> i2048mem:$src),
>> "GATHER_256B\t{$src, $dst|$dst, $src}",
>> [(set VR_2048:$dst, (v64i32 (masked_gather
>> addr:$src)))],
>> IIC_MOV_MEM>, TA;
>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
addr:$src)>;
>>
>> Also i wrote this line in isellowering.h
>>
>> setOperationAction(ISD::MGATHER, MVT::v64i32,
>> Legal);
>>
>> But I am getting following error:
>>
>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init
*,
>> llvm::StringRef): Assertion `New->getNumTypes() == 1 &&
"FIXME: Unhandled"'
>> failed.
>>
>> What is my mistake?
>>
>> Please help me.
>>
>>
>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at
gmail.com>
>> wrote:
>>
>>> I am trying to implement vector shuffle for v64i32. Is the
following
>>> correct?
>>>
>>>
>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst),
>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
$src2,
>>> $dst|$dst, $src1, $src2}",
>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32
>>> VR_2048:$src2)))]>, TA;
>>>
>>> Please help.
>>>
>>>
>>>
>>>
>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> i managed to get rid of above error for VT.is2048BitVector()).
>>>>
>>>> this was implemented already.
>>>>
>>>> now will try define other vectors like VT.is4096BitVector()).
>>>>
>>>>
>>>>
>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305
at gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you. actually i have to implement both i32 and i64.
so i
>>>>> implemented two instructions now one broadcastS other
broadcastD. Although
>>>>> while doing broadcast from memory to register i was getting
no such error
>>>>> with 1 instruction and other patterns i64, i32 etc. but
then also i
>>>>> implemented its 2 versions single and double.
>>>>>
>>>>> Actually, i am trying to compile matrix multiplication code
for
>>>>> greater size vector. There i need to include many new
instructions in my
>>>>> backend like shuffle, gather etc. For now i am getting the
following error.
>>>>>
>>>>>
>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR
Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>
>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525:
llvm::SDValue
>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &,
llvm::SelectionDAG &,
>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector()
||
>>>>> VT.is256BitVector() || VT.is512BitVector()) &&
"Expected a 128/256/512-bit
>>>>> vector type"' failed.
>>>>>
>>>>> i tried including is2048Bit Vector() and others. also in
vectortype.h
>>>>> i included these types for EVT but was unable to compile
backend and
>>>>> getting errors.
>>>>>
>>>>> Please help.
>>>>>
>>>>> Thank You
>>>>>
>>>>>
>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper
<craig.topper at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> You need a new instruction. And your scalar register
size needs to
>>>>>> match your vector element size. So GR32 instead of GR64
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Sorry to disturb,
>>>>>>> Now i want to implement instruction to broadcast
scalar register
>>>>>>> content to vector.
>>>>>>>
>>>>>>> like this;
>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>
>>>>>>>
>>>>>>> I tried implementing it as follows;
>>>>>>>
>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs
VR_2048:$dst), (ins
>>>>>>> GR64:$src),
>>>>>>> "BROADCASTR_256B\t{$src,
$dst|$dst, $src}",
>>>>>>> [(set VR_2048:$dst, (v64i32
(X86VBroadcast
>>>>>>> GR64:$src)))],
>>>>>>> IIC_MOV_MEM>, TA;
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)),
>>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>>
>>>>>>>
>>>>>>> Is it fine? Also do i need to define a new
instruction for this like
>>>>>>> BROADCASTR_256B? can i use the previous instruction
BROADCAST_256B (the one
>>>>>>> that broadcast memory scalar to vector) and just
define new pattern?
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>> Thank You
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thank You so much.
>>>>>>>>
>>>>>>>> Wao you are simply genius.
>>>>>>>> initially I didnt include load in both the main
instruction and
>>>>>>>> pattern so i included in both as follows:
>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem,
(outs VR_2048:$dst), (ins
>>>>>>>> i2048mem:$src),
>>>>>>>>
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>> [(set VR_2048:$dst, (v64i32
(X86VBroadcast (
>>>>>>>> loadi32 addr:$src))))],
>>>>>>>> IIC_MOV_MEM>, TA;
>>>>>>>>
>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32
addr:$src))),
>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>> And it worked perfectly.
>>>>>>>>
>>>>>>>> Thank You again.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper
<
>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Your pattern needs to be
>>>>>>>>>
>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32
addr:$src))),
>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>
>>>>>>>>> ~Craig
>>>>>>>>>
>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza
ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> it runs fine with v64i32. but with the
following pattern
>>>>>>>>>>
>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
addr:$src)),
>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>
>>>>>>>>>> i am getting error.
>>>>>>>>>> What is wrong with this pattern?
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza
ahmed <
>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> in x86 it is;
>>>>>>>>>>>
>>>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>> (VBROADCASTSSZm
addr:$src)>;
>>>>>>>>>>>
>>>>>>>>>>> mine is
>>>>>>>>>>>
>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
addr:$src)),
>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM,
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> for v16f32 it is defined as;
>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast
(v16f32 VR512:$src))),
>>>>>>>>>>>> (VBROADCASTSSZr
(EXTRACT_SUBREG (v16f32 VR512:$src),
>>>>>>>>>>>> sub_xmm))>;
>>>>>>>>>>>> which is similar to mine.
>>>>>>>>>>>> Why its not working then?
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM,
Craig Topper <
>>>>>>>>>>>> craig.topper at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> You need a pattern for
v64f32 too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37
PM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> as you said; these are
instructions that i defined in
>>>>>>>>>>>>>> instrinfo.td
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def BROADCAST_256B :
I<0x31, MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>>>
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>>>
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>> (X86VBroadcast
addr:$src)))],
>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at
1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I did as you said;
>>>>>>>>>>>>>>> now getting this
error:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> LLVM ERROR: Cannot
select: t63: v64f32 = X86ISD::VBROADCAST
>>>>>>>>>>>>>>> t62
>>>>>>>>>>>>>>> t62: f32,ch =
load<LD4[ConstantPool]> t0, t65, undef:i64
>>>>>>>>>>>>>>> t65: i64 =
X86ISD::Wrapper TargetConstantPool:i64<float
>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>> t64: i64 =
TargetConstantPool<float
>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>> t8: i64 = undef
>>>>>>>>>>>>>>> In function:
stencil
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017
at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Add
VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Aug 5,
2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> added the
setoperationaction line in isellowering.cpp. now
>>>>>>>>>>>>>>>>> getting the
following error.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> llc:
/lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>> *, const
llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What should
I do?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Aug
6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Well
first have you done this for your type
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat,
Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> How
to do this task??
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
It looks like X86TargetLowering::LowerBUILD_VECTOR is
>>>>>>>>>>>>>>>>>>>>
not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included broadcast
>>>>>>>>>>>>>>>>>>>>>
instruction in instructioninfo.td. but i made no
>>>>>>>>>>>>>>>>>>>>>
changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>
constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>>
a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557 # float 0.200000003
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for vector of
>>>>>>>>>>>>>>>>>>>>>>>
64 elements.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>> ~Craig
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/a5db2fe6/attachment.html>