hameeza ahmed via llvm-dev
2017-Aug-07 08:13 UTC
[llvm-dev] VBROADCAST Implementation Issues
Hello,
I did as you said,
Please tell me whether the following correct now??
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb),
(VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
"GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}},
$src2}"),
[(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32
(GatherNode (VR_2048:$src1), _.KRCWM:$mask,
VR_2048:$src2))],
IIC_MOV_MEM>, TA;
def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>;
Thank You
On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com>
wrote:
> masked_gather returns two results. The data and the modified mask. Note
> the $dst and the $mask_wb in the pattern below.
>
> multiclass avx512_gather<bits<8> opc, string OpcodeStr,
X86VectorVTInfo _,
> X86MemOperand memop, PatFrag GatherNode> {
> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask =
$mask_wb",
> ExeDomain = _.ExeDomain in
> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb),
> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
> !strconcat(OpcodeStr#_.Suffix,
> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}},
$src2}"),
> [(set _.RC:$dst, _.KRCWM:$mask_wb,
> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask,
> vectoraddr:$src2))]>, EVEX, EVEX_K,
> EVEX_CD8<_.EltSize, CD8VT1>;
> }
>
> ~Craig
>
> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> i want to implement gather for v64i32. i wrote following code.
>>
>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
>> i2048mem:$src),
>> "GATHER_256B\t{$src, $dst|$dst, $src}",
>> [(set VR_2048:$dst, (v64i32 (masked_gather
>> addr:$src)))],
>> IIC_MOV_MEM>, TA;
>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
addr:$src)>;
>>
>> Also i wrote this line in isellowering.h
>>
>> setOperationAction(ISD::MGATHER, MVT::v64i32,
>> Legal);
>>
>> But I am getting following error:
>>
>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init
*,
>> llvm::StringRef): Assertion `New->getNumTypes() == 1 &&
"FIXME: Unhandled"'
>> failed.
>>
>> What is my mistake?
>>
>> Please help me.
>>
>>
>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at
gmail.com>
>> wrote:
>>
>>> I am trying to implement vector shuffle for v64i32. Is the
following
>>> correct?
>>>
>>>
>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst),
>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
$src2,
>>> $dst|$dst, $src1, $src2}",
>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32
>>> VR_2048:$src2)))]>, TA;
>>>
>>> Please help.
>>>
>>>
>>>
>>>
>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> i managed to get rid of above error for VT.is2048BitVector()).
>>>>
>>>> this was implemented already.
>>>>
>>>> now will try define other vectors like VT.is4096BitVector()).
>>>>
>>>>
>>>>
>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305
at gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you. actually i have to implement both i32 and i64.
so i
>>>>> implemented two instructions now one broadcastS other
broadcastD. Although
>>>>> while doing broadcast from memory to register i was getting
no such error
>>>>> with 1 instruction and other patterns i64, i32 etc. but
then also i
>>>>> implemented its 2 versions single and double.
>>>>>
>>>>> Actually, i am trying to compile matrix multiplication code
for
>>>>> greater size vector. There i need to include many new
instructions in my
>>>>> backend like shuffle, gather etc. For now i am getting the
following error.
>>>>>
>>>>>
>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR
Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>
>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525:
llvm::SDValue
>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &,
llvm::SelectionDAG &,
>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector()
||
>>>>> VT.is256BitVector() || VT.is512BitVector()) &&
"Expected a 128/256/512-bit
>>>>> vector type"' failed.
>>>>>
>>>>> i tried including is2048Bit Vector() and others. also in
vectortype.h
>>>>> i included these types for EVT but was unable to compile
backend and
>>>>> getting errors.
>>>>>
>>>>> Please help.
>>>>>
>>>>> Thank You
>>>>>
>>>>>
>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper
<craig.topper at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> You need a new instruction. And your scalar register
size needs to
>>>>>> match your vector element size. So GR32 instead of GR64
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Sorry to disturb,
>>>>>>> Now i want to implement instruction to broadcast
scalar register
>>>>>>> content to vector.
>>>>>>>
>>>>>>> like this;
>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>
>>>>>>>
>>>>>>> I tried implementing it as follows;
>>>>>>>
>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs
VR_2048:$dst), (ins
>>>>>>> GR64:$src),
>>>>>>> "BROADCASTR_256B\t{$src,
$dst|$dst, $src}",
>>>>>>> [(set VR_2048:$dst, (v64i32
(X86VBroadcast
>>>>>>> GR64:$src)))],
>>>>>>> IIC_MOV_MEM>, TA;
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)),
>>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>>
>>>>>>>
>>>>>>> Is it fine? Also do i need to define a new
instruction for this like
>>>>>>> BROADCASTR_256B? can i use the previous instruction
BROADCAST_256B (the one
>>>>>>> that broadcast memory scalar to vector) and just
define new pattern?
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>> Thank You
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thank You so much.
>>>>>>>>
>>>>>>>> Wao you are simply genius.
>>>>>>>> initially I didnt include load in both the main
instruction and
>>>>>>>> pattern so i included in both as follows:
>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem,
(outs VR_2048:$dst), (ins
>>>>>>>> i2048mem:$src),
>>>>>>>>
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>> [(set VR_2048:$dst, (v64i32
(X86VBroadcast (
>>>>>>>> loadi32 addr:$src))))],
>>>>>>>> IIC_MOV_MEM>, TA;
>>>>>>>>
>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32
addr:$src))),
>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>> And it worked perfectly.
>>>>>>>>
>>>>>>>> Thank You again.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper
<
>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Your pattern needs to be
>>>>>>>>>
>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32
addr:$src))),
>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>
>>>>>>>>> ~Craig
>>>>>>>>>
>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza
ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> it runs fine with v64i32. but with the
following pattern
>>>>>>>>>>
>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
addr:$src)),
>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>
>>>>>>>>>> i am getting error.
>>>>>>>>>> What is wrong with this pattern?
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza
ahmed <
>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> in x86 it is;
>>>>>>>>>>>
>>>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>> (VBROADCASTSSZm
addr:$src)>;
>>>>>>>>>>>
>>>>>>>>>>> mine is
>>>>>>>>>>>
>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
addr:$src)),
>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM,
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> for v16f32 it is defined as;
>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast
(v16f32 VR512:$src))),
>>>>>>>>>>>> (VBROADCASTSSZr
(EXTRACT_SUBREG (v16f32 VR512:$src),
>>>>>>>>>>>> sub_xmm))>;
>>>>>>>>>>>> which is similar to mine.
>>>>>>>>>>>> Why its not working then?
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM,
Craig Topper <
>>>>>>>>>>>> craig.topper at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> You need a pattern for
v64f32 too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37
PM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> as you said; these are
instructions that i defined in
>>>>>>>>>>>>>> instrinfo.td
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def BROADCAST_256B :
I<0x31, MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>>>
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>>>
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>> (X86VBroadcast
addr:$src)))],
>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at
1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I did as you said;
>>>>>>>>>>>>>>> now getting this
error:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> LLVM ERROR: Cannot
select: t63: v64f32 = X86ISD::VBROADCAST
>>>>>>>>>>>>>>> t62
>>>>>>>>>>>>>>> t62: f32,ch =
load<LD4[ConstantPool]> t0, t65, undef:i64
>>>>>>>>>>>>>>> t65: i64 =
X86ISD::Wrapper TargetConstantPool:i64<float
>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>> t64: i64 =
TargetConstantPool<float
>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>> t8: i64 = undef
>>>>>>>>>>>>>>> In function:
stencil
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017
at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Add
VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Aug 5,
2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> added the
setoperationaction line in isellowering.cpp. now
>>>>>>>>>>>>>>>>> getting the
following error.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> llc:
/lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>> *, const
llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What should
I do?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Aug
6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Well
first have you done this for your type
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat,
Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> How
to do this task??
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
It looks like X86TargetLowering::LowerBUILD_VECTOR is
>>>>>>>>>>>>>>>>>>>>
not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included broadcast
>>>>>>>>>>>>>>>>>>>>>
instruction in instructioninfo.td. but i made no
>>>>>>>>>>>>>>>>>>>>>
changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>
constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>>
a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557 # float 0.200000003
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for vector of
>>>>>>>>>>>>>>>>>>>>>>>
64 elements.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>> ~Craig
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/a5db2fe6/attachment.html>
hameeza ahmed via llvm-dev
2017-Aug-07 08:20 UTC
[llvm-dev] VBROADCAST Implementation Issues
i am getting this error error: Variable not defined: '_' for _.KRCWM what to do? On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> Hello, > I did as you said, > > Please tell me whether the following correct now?? > > def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, > _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), > "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} > {${mask}}, $src2}"), > [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 > (GatherNode (VR_2048:$src1), _.KRCWM:$mask, > VR_2048:$src2))], > IIC_MOV_MEM>, TA; > def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>; > > Thank You > > On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com> > wrote: > >> masked_gather returns two results. The data and the modified mask. Note >> the $dst and the $mask_wb in the pattern below. >> >> multiclass avx512_gather<bits<8> opc, string OpcodeStr, X86VectorVTInfo _, >> X86MemOperand memop, PatFrag GatherNode> { >> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb", >> ExeDomain = _.ExeDomain in >> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb), >> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), >> !strconcat(OpcodeStr#_.Suffix, >> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"), >> [(set _.RC:$dst, _.KRCWM:$mask_wb, >> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, >> vectoraddr:$src2))]>, EVEX, EVEX_K, >> EVEX_CD8<_.EltSize, CD8VT1>; >> } >> >> ~Craig >> >> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> i want to implement gather for v64i32. i wrote following code. >>> >>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >>> i2048mem:$src), >>> "GATHER_256B\t{$src, $dst|$dst, $src}", >>> [(set VR_2048:$dst, (v64i32 (masked_gather >>> addr:$src)))], >>> IIC_MOV_MEM>, TA; >>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; >>> >>> Also i wrote this line in isellowering.h >>> >>> setOperationAction(ISD::MGATHER, MVT::v64i32, >>> Legal); >>> >>> But I am getting following error: >>> >>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init >>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: >>> Unhandled"' failed. >>> >>> What is my mistake? >>> >>> Please help me. >>> >>> >>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> I am trying to implement vector shuffle for v64i32. Is the following >>>> correct? >>>> >>>> >>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2, >>>> $dst|$dst, $src1, $src2}", >>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 >>>> VR_2048:$src2)))]>, TA; >>>> >>>> Please help. >>>> >>>> >>>> >>>> >>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>> wrote: >>>> >>>>> i managed to get rid of above error for VT.is2048BitVector()). >>>>> >>>>> this was implemented already. >>>>> >>>>> now will try define other vectors like VT.is4096BitVector()). >>>>> >>>>> >>>>> >>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>> wrote: >>>>> >>>>>> Thank you. actually i have to implement both i32 and i64. so i >>>>>> implemented two instructions now one broadcastS other broadcastD. Although >>>>>> while doing broadcast from memory to register i was getting no such error >>>>>> with 1 instruction and other patterns i64, i32 etc. but then also i >>>>>> implemented its 2 versions single and double. >>>>>> >>>>>> Actually, i am trying to compile matrix multiplication code for >>>>>> greater size vector. There i need to include many new instructions in my >>>>>> backend like shuffle, gather etc. For now i am getting the following error. >>>>>> >>>>>> >>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>>>>> vector type"' failed. >>>>>> >>>>>> i tried including is2048Bit Vector() and others. also in >>>>>> vectortype.h i included these types for EVT but was unable to compile >>>>>> backend and getting errors. >>>>>> >>>>>> Please help. >>>>>> >>>>>> Thank You >>>>>> >>>>>> >>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <craig.topper at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> You need a new instruction. And your scalar register size needs to >>>>>>> match your vector element size. So GR32 instead of GR64 >>>>>>> >>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <hahmed2305 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Sorry to disturb, >>>>>>>> Now i want to implement instruction to broadcast scalar register >>>>>>>> content to vector. >>>>>>>> >>>>>>>> like this; >>>>>>>> vpbroadcastq zmm0, rsi >>>>>>>> >>>>>>>> >>>>>>>> I tried implementing it as follows; >>>>>>>> >>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), (ins >>>>>>>> GR64:$src), >>>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>> GR64:$src)))], >>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>>> >>>>>>>> >>>>>>>> Is it fine? Also do i need to define a new instruction for this >>>>>>>> like BROADCASTR_256B? can i use the previous instruction BROADCAST_256B >>>>>>>> (the one that broadcast memory scalar to vector) and just define new >>>>>>>> pattern? >>>>>>>> >>>>>>>> Please help. >>>>>>>> >>>>>>>> Thank You >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed <hahmed2305 at gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> Thank You so much. >>>>>>>>> >>>>>>>>> Wao you are simply genius. >>>>>>>>> initially I didnt include load in both the main instruction and >>>>>>>>> pattern so i included in both as follows: >>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>>>> i2048mem:$src), >>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast ( >>>>>>>>> loadi32 addr:$src))))], >>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>> >>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>> And it worked perfectly. >>>>>>>>> >>>>>>>>> Thank You again. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Your pattern needs to be >>>>>>>>>> >>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>> >>>>>>>>>> ~Craig >>>>>>>>>> >>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>>> >>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>> >>>>>>>>>>> i am getting error. >>>>>>>>>>> What is wrong with this pattern? >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> in x86 it is; >>>>>>>>>>>> >>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>> >>>>>>>>>>>> mine is >>>>>>>>>>>> >>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 VR512:$src), >>>>>>>>>>>>> sub_xmm))>; >>>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>>> Why its not working then? >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>>>>> instrinfo.td >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 = X86ISD::VBROADCAST >>>>>>>>>>>>>>>> t62 >>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, undef:i64 >>>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper TargetConstantPool:i64<float >>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> added the setoperationaction line in isellowering.cpp. >>>>>>>>>>>>>>>>>> now getting the following error. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom); >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR is >>>>>>>>>>>>>>>>>>>>> not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included broadcast >>>>>>>>>>>>>>>>>>>>>> instruction in instructioninfo.td. but i made no >>>>>>>>>>>>>>>>>>>>>> changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR >>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with >>>>>>>>>>>>>>>>>>>>>>>> constant something like this; >>>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] + >>>>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for vector of >>>>>>>>>>>>>>>>>>>>>>>> 64 elements. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>> ~Craig >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/d282f9ed/attachment-0001.html>
hameeza ahmed via llvm-dev
2017-Aug-07 08:54 UTC
[llvm-dev] VBROADCAST Implementation Issues
Changed it to;
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64:$mask), (ins
i2048mem:$src),
"GATHER_256B\t{$src, {$dst}{${mask}}|${dst} {${mask}},
$src}",
[(set VR_2048:$dst, VK64:$mask, (v64i32 (masked_gather
addr:$src)))],
IIC_MOV_MEM>, TA;
def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>;
Now getting following error:
Unhandled memory encoding VK64
Unhandled memory encoding
UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347!
What to do?
On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com>
wrote:
> i am getting this error
> error: Variable not defined: '_'
> for _.KRCWM
> what to do?
>
> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> Hello,
>> I did as you said,
>>
>> Please tell me whether the following correct now??
>>
>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
>> "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst}
>> {${mask}}, $src2}"),
>> [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32
>> (GatherNode (VR_2048:$src1), _.KRCWM:$mask,
>> VR_2048:$src2))],
>> IIC_MOV_MEM>, TA;
>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B
addr:$src2)>;
>>
>> Thank You
>>
>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at
gmail.com>
>> wrote:
>>
>>> masked_gather returns two results. The data and the modified mask.
Note
>>> the $dst and the $mask_wb in the pattern below.
>>>
>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr,
X86VectorVTInfo
>>> _,
>>> X86MemOperand memop, PatFrag
GatherNode> {
>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask =
$mask_wb",
>>> ExeDomain = _.ExeDomain in
>>> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst,
_.KRCWM:$mask_wb),
>>> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
>>> !strconcat(OpcodeStr#_.Suffix,
>>> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}},
$src2}"),
>>> [(set _.RC:$dst, _.KRCWM:$mask_wb,
>>> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask,
>>> vectoraddr:$src2))]>, EVEX, EVEX_K,
>>> EVEX_CD8<_.EltSize, CD8VT1>;
>>> }
>>>
>>> ~Craig
>>>
>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> i want to implement gather for v64i32. i wrote following code.
>>>>
>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst),
(ins
>>>> i2048mem:$src),
>>>> "GATHER_256B\t{$src, $dst|$dst,
$src}",
>>>> [(set VR_2048:$dst, (v64i32 (masked_gather
>>>> addr:$src)))],
>>>> IIC_MOV_MEM>, TA;
>>>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
addr:$src)>;
>>>>
>>>> Also i wrote this line in isellowering.h
>>>>
>>>> setOperationAction(ISD::MGATHER,
>>>> MVT::v64i32, Legal);
>>>>
>>>> But I am getting following error:
>>>>
>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>>>> llvm::TreePatternNode
*llvm::TreePattern::ParseTreePattern(llvm::Init
>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1
&& "FIXME:
>>>> Unhandled"' failed.
>>>>
>>>> What is my mistake?
>>>>
>>>> Please help me.
>>>>
>>>>
>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305
at gmail.com>
>>>> wrote:
>>>>
>>>>> I am trying to implement vector shuffle for v64i32. Is the
following
>>>>> correct?
>>>>>
>>>>>
>>>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs
VR_2048:$dst),
>>>>> (ins VR_2048:$src1,
VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2,
>>>>> $dst|$dst, $src1, $src2}",
>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1),
(v64i32
>>>>> VR_2048:$src2)))]>, TA;
>>>>>
>>>>> Please help.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> i managed to get rid of above error for
VT.is2048BitVector()).
>>>>>>
>>>>>> this was implemented already.
>>>>>>
>>>>>> now will try define other vectors like
VT.is4096BitVector()).
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you. actually i have to implement both i32
and i64. so i
>>>>>>> implemented two instructions now one broadcastS
other broadcastD. Although
>>>>>>> while doing broadcast from memory to register i was
getting no such error
>>>>>>> with 1 instruction and other patterns i64, i32 etc.
but then also i
>>>>>>> implemented its 2 versions single and double.
>>>>>>>
>>>>>>> Actually, i am trying to compile matrix
multiplication code for
>>>>>>> greater size vector. There i need to include many
new instructions in my
>>>>>>> backend like shuffle, gather etc. For now i am
getting the following error.
>>>>>>>
>>>>>>>
>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR
Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>
>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525:
llvm::SDValue
>>>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget
&, llvm::SelectionDAG &,
>>>>>>> const llvm::SDLoc &): Assertion
`(VT.is128BitVector() ||
>>>>>>> VT.is256BitVector() || VT.is512BitVector())
&& "Expected a 128/256/512-bit
>>>>>>> vector type"' failed.
>>>>>>>
>>>>>>> i tried including is2048Bit Vector() and others.
also in
>>>>>>> vectortype.h i included these types for EVT but was
unable to compile
>>>>>>> backend and getting errors.
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>> Thank You
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper
<craig.topper at gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> You need a new instruction. And your scalar
register size needs to
>>>>>>>> match your vector element size. So GR32 instead
of GR64
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Sorry to disturb,
>>>>>>>>> Now i want to implement instruction to
broadcast scalar register
>>>>>>>>> content to vector.
>>>>>>>>>
>>>>>>>>> like this;
>>>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I tried implementing it as follows;
>>>>>>>>>
>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg,
(outs VR_2048:$dst), (ins
>>>>>>>>> GR64:$src),
>>>>>>>>>
"BROADCASTR_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>> [(set VR_2048:$dst,
(v64i32 (X86VBroadcast
>>>>>>>>> GR64:$src)))],
>>>>>>>>> IIC_MOV_MEM>, TA;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
GR64:$src)),
>>>>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Is it fine? Also do i need to define a new
instruction for this
>>>>>>>>> like BROADCASTR_256B? can i use the
previous instruction BROADCAST_256B
>>>>>>>>> (the one that broadcast memory scalar to
vector) and just define new
>>>>>>>>> pattern?
>>>>>>>>>
>>>>>>>>> Please help.
>>>>>>>>>
>>>>>>>>> Thank You
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza
ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thank You so much.
>>>>>>>>>>
>>>>>>>>>> Wao you are simply genius.
>>>>>>>>>> initially I didnt include load in both
the main instruction and
>>>>>>>>>> pattern so i included in both as
follows:
>>>>>>>>>> def BROADCAST_256B : I<0x31,
MRMSrcMem, (outs VR_2048:$dst), (ins
>>>>>>>>>> i2048mem:$src),
>>>>>>>>>>
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>> [(set VR_2048:$dst,
(v64i32 (X86VBroadcast (
>>>>>>>>>> loadi32 addr:$src))))],
>>>>>>>>>> IIC_MOV_MEM>,
TA;
>>>>>>>>>>
>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
(loadf32 addr:$src))),
>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>> And it worked perfectly.
>>>>>>>>>>
>>>>>>>>>> Thank You again.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig
Topper <
>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Your pattern needs to be
>>>>>>>>>>>
>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
(loadf32 addr:$src))),
>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>
>>>>>>>>>>> ~Craig
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM,
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> it runs fine with v64i32. but
with the following pattern
>>>>>>>>>>>>
>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>
>>>>>>>>>>>> i am getting error.
>>>>>>>>>>>> What is wrong with this
pattern?
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM,
hameeza ahmed <
>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> in x86 it is;
>>>>>>>>>>>>>
>>>>>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>>>> (VBROADCASTSSZm
addr:$src)>;
>>>>>>>>>>>>>
>>>>>>>>>>>>> mine is
>>>>>>>>>>>>>
>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59
AM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> for v16f32 it is
defined as;
>>>>>>>>>>>>>> : Pat<(v16f32
(X86VBroadcast (v16f32 VR512:$src))),
>>>>>>>>>>>>>>
(VBROADCASTSSZr (EXTRACT_SUBREG (v16f32
>>>>>>>>>>>>>> VR512:$src),
sub_xmm))>;
>>>>>>>>>>>>>> which is similar to
mine.
>>>>>>>>>>>>>> Why its not working
then?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at
1:45 AM, Craig Topper <
>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You need a pattern
for v64f32 too.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Aug 5, 2017
at 1:37 PM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> as you said;
these are instructions that i defined in
>>>>>>>>>>>>>>>> instrinfo.td
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> def
BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>> VR_2048:$dst),
(ins i2048mem:$src),
>>>>>>>>>>>>>>>>
"BROADCAST_256B\t{$src, $dst|$dst,
>>>>>>>>>>>>>>>> $src}",
>>>>>>>>>>>>>>>>
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>> (X86VBroadcast
addr:$src)))],
>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> def:
Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6,
2017 at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I did as
you said;
>>>>>>>>>>>>>>>>> now getting
this error:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> LLVM ERROR:
Cannot select: t63: v64f32
>>>>>>>>>>>>>>>>>
X86ISD::VBROADCAST t62
>>>>>>>>>>>>>>>>> t62:
f32,ch = load<LD4[ConstantPool]> t0, t65, undef:i64
>>>>>>>>>>>>>>>>> t65:
i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>> t64:
i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>> t8: i64
= undef
>>>>>>>>>>>>>>>>> In
function: stencil
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Aug
6, 2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Add
VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat,
Aug 5, 2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
added the setoperationaction line in isellowering.cpp.
>>>>>>>>>>>>>>>>>>> now
getting the following error.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
llc: /lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>>>> *,
const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
What should I do?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sun, Aug 6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Well first have you done this for your type
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
How to do this task??
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
It looks like X86TargetLowering::LowerBUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>>
is not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included broadcast
>>>>>>>>>>>>>>>>>>>>>>>
instruction in instructioninfo.td. but i made no
>>>>>>>>>>>>>>>>>>>>>>>
changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32
>>>>>>>>>>>>>>>>>>>>>>>
BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>>>
constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>>>>
a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557 # float 0.200000003
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for vector
>>>>>>>>>>>>>>>>>>>>>>>>>
of 64 elements.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>> ~Craig
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/684e0944/attachment.html>