thr3ads.net - llvm dev - [llvm-dev] VBROADCAST Implementation Issues [Aug 2017]

If this information is useful, please help other people find it:
Share via:

hameeza ahmed via llvm-dev

2017-Aug-07 08:13 UTC

[llvm-dev] VBROADCAST Implementation Issues

Hello,
I did as you said,

Please tell me whether the following correct now??

def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb),
(VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
                    "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}},
$src2}"),
                    [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32
(GatherNode  (VR_2048:$src1), _.KRCWM:$mask,
                     VR_2048:$src2))],
                    IIC_MOV_MEM>, TA;
def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>;

Thank You

On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com>
wrote:
> masked_gather returns two results. The data and the modified mask. Note
> the $dst and the $mask_wb in the pattern below.
>
> multiclass avx512_gather<bits<8> opc, string OpcodeStr,
X86VectorVTInfo _,
>                          X86MemOperand memop, PatFrag GatherNode> {
>   let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask =
$mask_wb",
>       ExeDomain = _.ExeDomain in
>   def rm  : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb),
>             (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
>             !strconcat(OpcodeStr#_.Suffix,
>             "\t{$src2, ${dst} {${mask}}|${dst} {${mask}},
$src2}"),
>             [(set _.RC:$dst, _.KRCWM:$mask_wb,
>               (GatherNode  (_.VT _.RC:$src1), _.KRCWM:$mask,
>                      vectoraddr:$src2))]>, EVEX, EVEX_K,
>              EVEX_CD8<_.EltSize, CD8VT1>;
> }
>
> ~Craig
>
> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> i want to implement gather for v64i32. i wrote following code.
>>
>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
>> i2048mem:$src),
>>                     "GATHER_256B\t{$src, $dst|$dst, $src}",
>>                     [(set VR_2048:$dst, (v64i32 (masked_gather
>> addr:$src)))],
>>                     IIC_MOV_MEM>, TA;
>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
addr:$src)>;
>>
>> Also i wrote this line in isellowering.h
>>
>>               setOperationAction(ISD::MGATHER,             MVT::v64i32,
>> Legal);
>>
>> But I am getting following error:
>>
>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init
*,
>> llvm::StringRef): Assertion `New->getNumTypes() == 1 &&
"FIXME: Unhandled"'
>> failed.
>>
>> What is my mistake?
>>
>> Please help me.
>>
>>
>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at
gmail.com>
>> wrote:
>>
>>> I am trying to implement vector shuffle for v64i32. Is the
following
>>> correct?
>>>
>>>
>>> def VSHUFFLE_256B  : I<0xE8, MRMDestReg, (outs VR_2048:$dst),
>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
$src2,
>>> $dst|$dst, $src1, $src2}",
>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32
>>> VR_2048:$src2)))]>, TA;
>>>
>>> Please help.
>>>
>>>
>>>
>>>
>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> i managed to get rid of above error for VT.is2048BitVector()).
>>>>
>>>> this was implemented already.
>>>>
>>>> now will try define other vectors like VT.is4096BitVector()).
>>>>
>>>>
>>>>
>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305
at gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you. actually i have to implement both i32 and i64.
so i
>>>>> implemented two instructions now one broadcastS other
broadcastD. Although
>>>>> while doing broadcast from memory to register i was getting
no such error
>>>>> with 1 instruction and other patterns i64, i32 etc. but
then also i
>>>>> implemented its 2 versions single and double.
>>>>>
>>>>> Actually, i am trying to compile matrix multiplication code
for
>>>>> greater size vector. There i need to include many new
instructions in my
>>>>> backend like shuffle, gather etc. For now i am getting the
following error.
>>>>>
>>>>>
>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR
Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>
>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525:
llvm::SDValue
>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &,
llvm::SelectionDAG &,
>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector()
||
>>>>> VT.is256BitVector() || VT.is512BitVector()) &&
"Expected a 128/256/512-bit
>>>>> vector type"' failed.
>>>>>
>>>>>  i tried including is2048Bit Vector() and others. also in
vectortype.h
>>>>> i included these types for EVT but was unable to compile
backend and
>>>>> getting errors.
>>>>>
>>>>> Please help.
>>>>>
>>>>> Thank You
>>>>>
>>>>>
>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper
<craig.topper at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> You need a new instruction. And your scalar register
size needs to
>>>>>> match your vector element size. So GR32 instead of GR64
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Sorry to disturb,
>>>>>>> Now i want to implement instruction to broadcast
scalar register
>>>>>>> content to vector.
>>>>>>>
>>>>>>> like this;
>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>
>>>>>>>
>>>>>>> I tried implementing it as follows;
>>>>>>>
>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs
VR_2048:$dst), (ins
>>>>>>> GR64:$src),
>>>>>>>                     "BROADCASTR_256B\t{$src,
$dst|$dst, $src}",
>>>>>>>                     [(set VR_2048:$dst, (v64i32
(X86VBroadcast
>>>>>>>  GR64:$src)))],
>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)),
>>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>>
>>>>>>>
>>>>>>> Is it fine? Also do i need to define a new
instruction for this like
>>>>>>> BROADCASTR_256B? can i use the previous instruction
BROADCAST_256B (the one
>>>>>>> that broadcast memory scalar to vector) and just
define new pattern?
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>> Thank You
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thank You so much.
>>>>>>>>
>>>>>>>> Wao you are simply genius.
>>>>>>>> initially I didnt include load in both the main
instruction and
>>>>>>>> pattern so i included in both as follows:
>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem,
(outs VR_2048:$dst), (ins
>>>>>>>> i2048mem:$src),
>>>>>>>>                    
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>                     [(set VR_2048:$dst, (v64i32
(X86VBroadcast (
>>>>>>>> loadi32 addr:$src))))],
>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>
>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32
addr:$src))),
>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>> And it worked perfectly.
>>>>>>>>
>>>>>>>> Thank You again.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper
<
>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Your pattern needs to be
>>>>>>>>>
>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32
addr:$src))),
>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>
>>>>>>>>> ~Craig
>>>>>>>>>
>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza
ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> it runs fine with v64i32. but with the
following pattern
>>>>>>>>>>
>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
addr:$src)),
>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>
>>>>>>>>>> i am getting error.
>>>>>>>>>> What is wrong with this pattern?
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza
ahmed <
>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> in x86 it is;
>>>>>>>>>>>
>>>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>>           (VBROADCASTSSZm
addr:$src)>;
>>>>>>>>>>>
>>>>>>>>>>> mine is
>>>>>>>>>>>
>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
addr:$src)),
>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM,
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> for v16f32 it is defined as;
>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast
(v16f32 VR512:$src))),
>>>>>>>>>>>>           (VBROADCASTSSZr
(EXTRACT_SUBREG (v16f32 VR512:$src),
>>>>>>>>>>>> sub_xmm))>;
>>>>>>>>>>>> which is similar to mine.
>>>>>>>>>>>> Why its not working then?
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM,
Craig Topper <
>>>>>>>>>>>> craig.topper at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> You need a pattern for
v64f32 too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37
PM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> as you said; these are
instructions that i defined in
>>>>>>>>>>>>>> instrinfo.td
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def BROADCAST_256B :
I<0x31, MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>>>                    
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>>>                    
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>> (X86VBroadcast
addr:$src)))],
>>>>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at
1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I did as you said;
>>>>>>>>>>>>>>> now getting this
error:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> LLVM ERROR: Cannot
select: t63: v64f32 = X86ISD::VBROADCAST
>>>>>>>>>>>>>>> t62
>>>>>>>>>>>>>>>   t62: f32,ch =
load<LD4[ConstantPool]> t0, t65, undef:i64
>>>>>>>>>>>>>>>     t65: i64 =
X86ISD::Wrapper TargetConstantPool:i64<float
>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>       t64: i64 =
TargetConstantPool<float
>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>     t8: i64 = undef
>>>>>>>>>>>>>>> In function:
stencil
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017
at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Add
VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Aug 5,
2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> added the
setoperationaction line in isellowering.cpp. now
>>>>>>>>>>>>>>>>> getting the
following error.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> llc:
/lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>> *, const
llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What should
I do?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Aug
6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Well
first have you done this for your type
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat,
Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> How
to do this task??
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
It looks like X86TargetLowering::LowerBUILD_VECTOR is
>>>>>>>>>>>>>>>>>>>>
not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included broadcast
>>>>>>>>>>>>>>>>>>>>>
instruction in instructioninfo.td. but i made no
>>>>>>>>>>>>>>>>>>>>>
changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>
constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>>
a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557              # float 0.200000003
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for vector of
>>>>>>>>>>>>>>>>>>>>>>>
64 elements.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>> ~Craig
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/a5db2fe6/attachment.html>

hameeza ahmed via llvm-dev

2017-Aug-07 08:20 UTC

head link

[llvm-dev] VBROADCAST Implementation Issues

i am getting this error
error: Variable not defined: '_'
for _.KRCWM
what to do?

On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at gmail.com>
wrote:
> Hello,
> I did as you said,
>
> Please tell me whether the following correct now??
>
> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
>                     "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst}
> {${mask}}, $src2}"),
>                     [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32
> (GatherNode  (VR_2048:$src1), _.KRCWM:$mask,
>                      VR_2048:$src2))],
>                     IIC_MOV_MEM>, TA;
> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>;
>
> Thank You
>
> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at
gmail.com>
> wrote:
>
>> masked_gather returns two results. The data and the modified mask. Note
>> the $dst and the $mask_wb in the pattern below.
>>
>> multiclass avx512_gather<bits<8> opc, string OpcodeStr,
X86VectorVTInfo _,
>>                          X86MemOperand memop, PatFrag GatherNode> {
>>   let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask =
$mask_wb",
>>       ExeDomain = _.ExeDomain in
>>   def rm  : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst,
_.KRCWM:$mask_wb),
>>             (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
>>             !strconcat(OpcodeStr#_.Suffix,
>>             "\t{$src2, ${dst} {${mask}}|${dst} {${mask}},
$src2}"),
>>             [(set _.RC:$dst, _.KRCWM:$mask_wb,
>>               (GatherNode  (_.VT _.RC:$src1), _.KRCWM:$mask,
>>                      vectoraddr:$src2))]>, EVEX, EVEX_K,
>>              EVEX_CD8<_.EltSize, CD8VT1>;
>> }
>>
>> ~Craig
>>
>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>> wrote:
>>
>>> i want to implement gather for v64i32. i wrote following code.
>>>
>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
>>> i2048mem:$src),
>>>                     "GATHER_256B\t{$src, $dst|$dst,
$src}",
>>>                     [(set VR_2048:$dst, (v64i32 (masked_gather
>>> addr:$src)))],
>>>                     IIC_MOV_MEM>, TA;
>>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
addr:$src)>;
>>>
>>> Also i wrote this line in isellowering.h
>>>
>>>               setOperationAction(ISD::MGATHER,            
MVT::v64i32,
>>> Legal);
>>>
>>> But I am getting following error:
>>>
>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>>> llvm::TreePatternNode
*llvm::TreePattern::ParseTreePattern(llvm::Init
>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1
&& "FIXME:
>>> Unhandled"' failed.
>>>
>>> What is my mistake?
>>>
>>> Please help me.
>>>
>>>
>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> I am trying to implement vector shuffle for v64i32. Is the
following
>>>> correct?
>>>>
>>>>
>>>> def VSHUFFLE_256B  : I<0xE8, MRMDestReg, (outs
VR_2048:$dst),
>>>> (ins VR_2048:$src1,
VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2,
>>>> $dst|$dst, $src1, $src2}",
>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1),
(v64i32
>>>> VR_2048:$src2)))]>, TA;
>>>>
>>>> Please help.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305
at gmail.com>
>>>> wrote:
>>>>
>>>>> i managed to get rid of above error for
VT.is2048BitVector()).
>>>>>
>>>>> this was implemented already.
>>>>>
>>>>> now will try define other vectors like
VT.is4096BitVector()).
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you. actually i have to implement both i32 and
i64. so i
>>>>>> implemented two instructions now one broadcastS other
broadcastD. Although
>>>>>> while doing broadcast from memory to register i was
getting no such error
>>>>>> with 1 instruction and other patterns i64, i32 etc. but
then also i
>>>>>> implemented its 2 versions single and double.
>>>>>>
>>>>>> Actually, i am trying to compile matrix multiplication
code for
>>>>>> greater size vector. There i need to include many new
instructions in my
>>>>>> backend like shuffle, gather etc. For now i am getting
the following error.
>>>>>>
>>>>>>
>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR
Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>
>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525:
llvm::SDValue
>>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget
&, llvm::SelectionDAG &,
>>>>>> const llvm::SDLoc &): Assertion
`(VT.is128BitVector() ||
>>>>>> VT.is256BitVector() || VT.is512BitVector()) &&
"Expected a 128/256/512-bit
>>>>>> vector type"' failed.
>>>>>>
>>>>>>  i tried including is2048Bit Vector() and others. also
in
>>>>>> vectortype.h i included these types for EVT but was
unable to compile
>>>>>> backend and getting errors.
>>>>>>
>>>>>> Please help.
>>>>>>
>>>>>> Thank You
>>>>>>
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper
<craig.topper at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> You need a new instruction. And your scalar
register size needs to
>>>>>>> match your vector element size. So GR32 instead of
GR64
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sorry to disturb,
>>>>>>>> Now i want to implement instruction to
broadcast scalar register
>>>>>>>> content to vector.
>>>>>>>>
>>>>>>>> like this;
>>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>>
>>>>>>>>
>>>>>>>> I tried implementing it as follows;
>>>>>>>>
>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg,
(outs VR_2048:$dst), (ins
>>>>>>>> GR64:$src),
>>>>>>>>                    
"BROADCASTR_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>                     [(set VR_2048:$dst, (v64i32
(X86VBroadcast
>>>>>>>>  GR64:$src)))],
>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)),
>>>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>>>
>>>>>>>>
>>>>>>>> Is it fine? Also do i need to define a new
instruction for this
>>>>>>>> like BROADCASTR_256B? can i use the previous
instruction BROADCAST_256B
>>>>>>>> (the one that broadcast memory scalar to
vector) and just define new
>>>>>>>> pattern?
>>>>>>>>
>>>>>>>> Please help.
>>>>>>>>
>>>>>>>> Thank You
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed
<hahmed2305 at gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Thank You so much.
>>>>>>>>>
>>>>>>>>> Wao you are simply genius.
>>>>>>>>> initially I didnt include load in both the
main instruction and
>>>>>>>>> pattern so i included in both as follows:
>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem,
(outs VR_2048:$dst), (ins
>>>>>>>>> i2048mem:$src),
>>>>>>>>>                    
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>                     [(set VR_2048:$dst,
(v64i32 (X86VBroadcast (
>>>>>>>>> loadi32 addr:$src))))],
>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>
>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32
addr:$src))),
>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>> And it worked perfectly.
>>>>>>>>>
>>>>>>>>> Thank You again.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig
Topper <
>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Your pattern needs to be
>>>>>>>>>>
>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
(loadf32 addr:$src))),
>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>
>>>>>>>>>> ~Craig
>>>>>>>>>>
>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza
ahmed <
>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> it runs fine with v64i32. but with
the following pattern
>>>>>>>>>>>
>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
addr:$src)),
>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>
>>>>>>>>>>> i am getting error.
>>>>>>>>>>> What is wrong with this pattern?
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM,
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> in x86 it is;
>>>>>>>>>>>>
>>>>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>>>           (VBROADCASTSSZm
addr:$src)>;
>>>>>>>>>>>>
>>>>>>>>>>>> mine is
>>>>>>>>>>>>
>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM,
hameeza ahmed <
>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> for v16f32 it is defined
as;
>>>>>>>>>>>>> : Pat<(v16f32
(X86VBroadcast (v16f32 VR512:$src))),
>>>>>>>>>>>>>           (VBROADCASTSSZr
(EXTRACT_SUBREG (v16f32 VR512:$src),
>>>>>>>>>>>>> sub_xmm))>;
>>>>>>>>>>>>> which is similar to mine.
>>>>>>>>>>>>> Why its not working then?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45
AM, Craig Topper <
>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> You need a pattern for
v64f32 too.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at
1:37 PM, hameeza ahmed <
>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> as you said; these
are instructions that i defined in
>>>>>>>>>>>>>>> instrinfo.td
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> def BROADCAST_256B
: I<0x31, MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>>>>> (ins
i2048mem:$src),
>>>>>>>>>>>>>>>                    
"BROADCAST_256B\t{$src, $dst|$dst,
>>>>>>>>>>>>>>> $src}",
>>>>>>>>>>>>>>>                    
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>> (X86VBroadcast
addr:$src)))],
>>>>>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017
at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I did as you
said;
>>>>>>>>>>>>>>>> now getting
this error:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> LLVM ERROR:
Cannot select: t63: v64f32 = X86ISD::VBROADCAST
>>>>>>>>>>>>>>>> t62
>>>>>>>>>>>>>>>>   t62: f32,ch =
load<LD4[ConstantPool]> t0, t65, undef:i64
>>>>>>>>>>>>>>>>     t65: i64 =
X86ISD::Wrapper TargetConstantPool:i64<float
>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>       t64: i64
= TargetConstantPool<float
>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>     t8: i64 =
undef
>>>>>>>>>>>>>>>> In function:
stencil
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6,
2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Add
VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Aug
5, 2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>> hahmed2305
at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> added
the setoperationaction line in isellowering.cpp.
>>>>>>>>>>>>>>>>>> now
getting the following error.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> llc:
/lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>>> *,
const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> What
should I do?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun,
Aug 6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Well first have you done this for your type
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
How to do this task??
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
It looks like X86TargetLowering::LowerBUILD_VECTOR is
>>>>>>>>>>>>>>>>>>>>>
not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included broadcast
>>>>>>>>>>>>>>>>>>>>>>
instruction in instructioninfo.td. but i made no
>>>>>>>>>>>>>>>>>>>>>>
changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>>
constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>>>
a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557              # float 0.200000003
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for vector of
>>>>>>>>>>>>>>>>>>>>>>>>
64 elements.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>> ~Craig
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/d282f9ed/attachment-0001.html>

hameeza ahmed via llvm-dev

2017-Aug-07 08:54 UTC

head link

[llvm-dev] VBROADCAST Implementation Issues

Changed it to;

def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64:$mask), (ins
i2048mem:$src),
                    "GATHER_256B\t{$src, {$dst}{${mask}}|${dst} {${mask}},
$src}",
                    [(set VR_2048:$dst, VK64:$mask, (v64i32 (masked_gather
addr:$src)))],
                    IIC_MOV_MEM>, TA;
def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>;
Now getting following error:

Unhandled memory encoding VK64
Unhandled memory encoding
UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347!

What to do?


On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com>
wrote:
> i am getting this error
> error: Variable not defined: '_'
> for _.KRCWM
> what to do?
>
> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> Hello,
>> I did as you said,
>>
>> Please tell me whether the following correct now??
>>
>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
>>                     "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst}
>> {${mask}}, $src2}"),
>>                     [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32
>> (GatherNode  (VR_2048:$src1), _.KRCWM:$mask,
>>                      VR_2048:$src2))],
>>                     IIC_MOV_MEM>, TA;
>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B
addr:$src2)>;
>>
>> Thank You
>>
>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at
gmail.com>
>> wrote:
>>
>>> masked_gather returns two results. The data and the modified mask.
Note
>>> the $dst and the $mask_wb in the pattern below.
>>>
>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr,
X86VectorVTInfo
>>> _,
>>>                          X86MemOperand memop, PatFrag
GatherNode> {
>>>   let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask =
$mask_wb",
>>>       ExeDomain = _.ExeDomain in
>>>   def rm  : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst,
_.KRCWM:$mask_wb),
>>>             (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
>>>             !strconcat(OpcodeStr#_.Suffix,
>>>             "\t{$src2, ${dst} {${mask}}|${dst} {${mask}},
$src2}"),
>>>             [(set _.RC:$dst, _.KRCWM:$mask_wb,
>>>               (GatherNode  (_.VT _.RC:$src1), _.KRCWM:$mask,
>>>                      vectoraddr:$src2))]>, EVEX, EVEX_K,
>>>              EVEX_CD8<_.EltSize, CD8VT1>;
>>> }
>>>
>>> ~Craig
>>>
>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> i want to implement gather for v64i32. i wrote following code.
>>>>
>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst),
(ins
>>>> i2048mem:$src),
>>>>                     "GATHER_256B\t{$src, $dst|$dst,
$src}",
>>>>                     [(set VR_2048:$dst, (v64i32 (masked_gather
>>>> addr:$src)))],
>>>>                     IIC_MOV_MEM>, TA;
>>>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
addr:$src)>;
>>>>
>>>> Also i wrote this line in isellowering.h
>>>>
>>>>               setOperationAction(ISD::MGATHER,
>>>> MVT::v64i32, Legal);
>>>>
>>>> But I am getting following error:
>>>>
>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>>>> llvm::TreePatternNode
*llvm::TreePattern::ParseTreePattern(llvm::Init
>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1
&& "FIXME:
>>>> Unhandled"' failed.
>>>>
>>>> What is my mistake?
>>>>
>>>> Please help me.
>>>>
>>>>
>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305
at gmail.com>
>>>> wrote:
>>>>
>>>>> I am trying to implement vector shuffle for v64i32. Is the
following
>>>>> correct?
>>>>>
>>>>>
>>>>> def VSHUFFLE_256B  : I<0xE8, MRMDestReg, (outs
VR_2048:$dst),
>>>>> (ins VR_2048:$src1,
VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2,
>>>>> $dst|$dst, $src1, $src2}",
>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1),
(v64i32
>>>>> VR_2048:$src2)))]>, TA;
>>>>>
>>>>> Please help.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> i managed to get rid of above error for
VT.is2048BitVector()).
>>>>>>
>>>>>> this was implemented already.
>>>>>>
>>>>>> now will try define other vectors like
VT.is4096BitVector()).
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you. actually i have to implement both i32
and i64. so i
>>>>>>> implemented two instructions now one broadcastS
other broadcastD. Although
>>>>>>> while doing broadcast from memory to register i was
getting no such error
>>>>>>> with 1 instruction and other patterns i64, i32 etc.
but then also i
>>>>>>> implemented its 2 versions single and double.
>>>>>>>
>>>>>>> Actually, i am trying to compile matrix
multiplication code for
>>>>>>> greater size vector. There i need to include many
new instructions in my
>>>>>>> backend like shuffle, gather etc. For now i am
getting the following error.
>>>>>>>
>>>>>>>
>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR
Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>,
>>>>>>> Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>
>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525:
llvm::SDValue
>>>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget
&, llvm::SelectionDAG &,
>>>>>>> const llvm::SDLoc &): Assertion
`(VT.is128BitVector() ||
>>>>>>> VT.is256BitVector() || VT.is512BitVector())
&& "Expected a 128/256/512-bit
>>>>>>> vector type"' failed.
>>>>>>>
>>>>>>>  i tried including is2048Bit Vector() and others.
also in
>>>>>>> vectortype.h i included these types for EVT but was
unable to compile
>>>>>>> backend and getting errors.
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>> Thank You
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper
<craig.topper at gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> You need a new instruction. And your scalar
register size needs to
>>>>>>>> match your vector element size. So GR32 instead
of GR64
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Sorry to disturb,
>>>>>>>>> Now i want to implement instruction to
broadcast scalar register
>>>>>>>>> content to vector.
>>>>>>>>>
>>>>>>>>> like this;
>>>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I tried implementing it as follows;
>>>>>>>>>
>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg,
(outs VR_2048:$dst), (ins
>>>>>>>>> GR64:$src),
>>>>>>>>>                    
"BROADCASTR_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>                     [(set VR_2048:$dst,
(v64i32 (X86VBroadcast
>>>>>>>>>  GR64:$src)))],
>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
GR64:$src)),
>>>>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Is it fine? Also do i need to define a new
instruction for this
>>>>>>>>> like BROADCASTR_256B? can i use the
previous instruction BROADCAST_256B
>>>>>>>>> (the one that broadcast memory scalar to
vector) and just define new
>>>>>>>>> pattern?
>>>>>>>>>
>>>>>>>>> Please help.
>>>>>>>>>
>>>>>>>>> Thank You
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza
ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thank You so much.
>>>>>>>>>>
>>>>>>>>>> Wao you are simply genius.
>>>>>>>>>> initially I didnt include load in both
the main instruction and
>>>>>>>>>> pattern so i included in both as
follows:
>>>>>>>>>> def BROADCAST_256B : I<0x31,
MRMSrcMem, (outs VR_2048:$dst), (ins
>>>>>>>>>> i2048mem:$src),
>>>>>>>>>>                    
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>                     [(set VR_2048:$dst,
(v64i32 (X86VBroadcast (
>>>>>>>>>> loadi32 addr:$src))))],
>>>>>>>>>>                     IIC_MOV_MEM>,
TA;
>>>>>>>>>>
>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
(loadf32 addr:$src))),
>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>> And it worked perfectly.
>>>>>>>>>>
>>>>>>>>>> Thank You again.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig
Topper <
>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Your pattern needs to be
>>>>>>>>>>>
>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
(loadf32 addr:$src))),
>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>
>>>>>>>>>>> ~Craig
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM,
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> it runs fine with v64i32. but
with the following pattern
>>>>>>>>>>>>
>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>
>>>>>>>>>>>> i am getting error.
>>>>>>>>>>>> What is wrong with this
pattern?
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM,
hameeza ahmed <
>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> in x86 it is;
>>>>>>>>>>>>>
>>>>>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>>>>           (VBROADCASTSSZm
addr:$src)>;
>>>>>>>>>>>>>
>>>>>>>>>>>>> mine is
>>>>>>>>>>>>>
>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59
AM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> for v16f32 it is
defined as;
>>>>>>>>>>>>>> : Pat<(v16f32
(X86VBroadcast (v16f32 VR512:$src))),
>>>>>>>>>>>>>>          
(VBROADCASTSSZr (EXTRACT_SUBREG (v16f32
>>>>>>>>>>>>>> VR512:$src),
sub_xmm))>;
>>>>>>>>>>>>>> which is similar to
mine.
>>>>>>>>>>>>>> Why its not working
then?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at
1:45 AM, Craig Topper <
>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You need a pattern
for v64f32 too.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Aug 5, 2017
at 1:37 PM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> as you said;
these are instructions that i defined in
>>>>>>>>>>>>>>>> instrinfo.td
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> def
BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>> VR_2048:$dst),
(ins i2048mem:$src),
>>>>>>>>>>>>>>>>                
"BROADCAST_256B\t{$src, $dst|$dst,
>>>>>>>>>>>>>>>> $src}",
>>>>>>>>>>>>>>>>                
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>> (X86VBroadcast
addr:$src)))],
>>>>>>>>>>>>>>>>                
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> def:
Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6,
2017 at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I did as
you said;
>>>>>>>>>>>>>>>>> now getting
this error:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> LLVM ERROR:
Cannot select: t63: v64f32
>>>>>>>>>>>>>>>>>
X86ISD::VBROADCAST t62
>>>>>>>>>>>>>>>>>   t62:
f32,ch = load<LD4[ConstantPool]> t0, t65, undef:i64
>>>>>>>>>>>>>>>>>     t65:
i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>       t64:
i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>     t8: i64
= undef
>>>>>>>>>>>>>>>>> In
function: stencil
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Aug
6, 2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Add
VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat,
Aug 5, 2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
added the setoperationaction line in isellowering.cpp.
>>>>>>>>>>>>>>>>>>> now
getting the following error.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
llc: /lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>>>> *,
const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
What should I do?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sun, Aug 6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Well first have you done this for your type
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
How to do this task??
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
It looks like X86TargetLowering::LowerBUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>>
is not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included broadcast
>>>>>>>>>>>>>>>>>>>>>>>
instruction in instructioninfo.td. but i made no
>>>>>>>>>>>>>>>>>>>>>>>
changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32
>>>>>>>>>>>>>>>>>>>>>>>
BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>>>
constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>>>>
a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557              # float 0.200000003
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for vector
>>>>>>>>>>>>>>>>>>>>>>>>>
of 64 elements.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>> ~Craig
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/684e0944/attachment.html>

llvm dev - Aug 2017 - VBROADCAST Implementation Issues

[llvm-dev] VBROADCAST Implementation Issues

[llvm-dev] VBROADCAST Implementation Issues

[llvm-dev] VBROADCAST Implementation Issues