thr3ads.net - llvm dev - [llvm-dev] VBROADCAST Implementation Issues [Aug 2017]

If this information is useful, please help other people find it:
Share via:

hameeza ahmed via llvm-dev

2017-Aug-07 17:19 UTC

[llvm-dev] VBROADCAST Implementation Issues

Thank You. Still getting errors.I have modified my instructions as you said
as follows:


def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb),
(ins VR_2048:$src1, VK64WM:$mask,  i2048mem:$src2),
                    "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
{${mask}}, $src2}",
                    [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
(masked_gather  (VR_2048:$src1), VK64WM:$mask,
                     addr:$src2)))],
                    IIC_MOV_MEM>, TA;

def: Pat<(v64f32 (masked_gather (VR_2048:$src1),
(VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask,
addr:$src2)>;


Now getting this error:

llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void
llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier():
Assertion `numPhysicalOperands >= 2 + additionalOperands &&
numPhysicalOperands <= 4 + additionalOperands && "Unexpected
number of
operands for MRMSrcMemFrm"' failed.








On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at gmail.com>
wrote:
> masked_gather takes 3 inputs. not just an address. See the AVX512 pattern
> is pasted earlier
>
> ~Craig
>
> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> Changed it to;
>>
>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
VK64:$mask),
>> (ins i2048mem:$src),
>>                     "GATHER_256B\t{$src, {$dst}{${mask}}|${dst}
>> {${mask}}, $src}",
>>                     [(set VR_2048:$dst, VK64:$mask, (v64i32
>> (masked_gather addr:$src)))],
>>                     IIC_MOV_MEM>, TA;
>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
addr:$src)>;
>> Now getting following error:
>>
>> Unhandled memory encoding VK64
>> Unhandled memory encoding
>> UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347!
>>
>> What to do?
>>
>>
>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>> wrote:
>>
>>> i am getting this error
>>> error: Variable not defined: '_'
>>> for _.KRCWM
>>> what to do?
>>>
>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>> I did as you said,
>>>>
>>>> Please tell me whether the following correct now??
>>>>
>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins
i2048mem:$src2),
>>>>                     "GATHER_256B\t{$src2,
{$dst}{${mask}}|${dst}
>>>> {${mask}}, $src2}"),
>>>>                     [(set VR_2048:$dst, _.KRCWM:$mask_wb,
(v64i32
>>>> (GatherNode  (VR_2048:$src1), _.KRCWM:$mask,
>>>>                      VR_2048:$src2))],
>>>>                     IIC_MOV_MEM>, TA;
>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B
addr:$src2)>;
>>>>
>>>> Thank You
>>>>
>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper
at gmail.com>
>>>> wrote:
>>>>
>>>>> masked_gather returns two results. The data and the
modified mask.
>>>>> Note the $dst and the $mask_wb in the pattern below.
>>>>>
>>>>> multiclass avx512_gather<bits<8> opc, string
OpcodeStr,
>>>>> X86VectorVTInfo _,
>>>>>                          X86MemOperand memop, PatFrag
GatherNode> {
>>>>>   let Constraints = "@earlyclobber $dst, $src1 = $dst,
$mask >>>>> $mask_wb",
>>>>>       ExeDomain = _.ExeDomain in
>>>>>   def rm  : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst,
>>>>> _.KRCWM:$mask_wb),
>>>>>             (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
>>>>>             !strconcat(OpcodeStr#_.Suffix,
>>>>>             "\t{$src2, ${dst} {${mask}}|${dst}
{${mask}}, $src2}"),
>>>>>             [(set _.RC:$dst, _.KRCWM:$mask_wb,
>>>>>               (GatherNode  (_.VT _.RC:$src1),
_.KRCWM:$mask,
>>>>>                      vectoraddr:$src2))]>, EVEX, EVEX_K,
>>>>>              EVEX_CD8<_.EltSize, CD8VT1>;
>>>>> }
>>>>>
>>>>> ~Craig
>>>>>
>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> i want to implement gather for v64i32. i wrote
following code.
>>>>>>
>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs
VR_2048:$dst), (ins
>>>>>> i2048mem:$src),
>>>>>>                     "GATHER_256B\t{$src,
$dst|$dst, $src}",
>>>>>>                     [(set VR_2048:$dst, (v64i32
(masked_gather
>>>>>> addr:$src)))],
>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)),
(GATHER_256B addr:$src)>;
>>>>>>
>>>>>> Also i wrote this line in isellowering.h
>>>>>>
>>>>>>               setOperationAction(ISD::MGATHER,
>>>>>> MVT::v64i32, Legal);
>>>>>>
>>>>>> But I am getting following error:
>>>>>>
>>>>>> llvm-tblgen:
/utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>>>>>> llvm::TreePatternNode
*llvm::TreePattern::ParseTreePattern(llvm::Init
>>>>>> *, llvm::StringRef): Assertion `New->getNumTypes()
== 1 && "FIXME:
>>>>>> Unhandled"' failed.
>>>>>>
>>>>>> What is my mistake?
>>>>>>
>>>>>> Please help me.
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am trying to implement vector shuffle for v64i32.
Is the following
>>>>>>> correct?
>>>>>>>
>>>>>>>
>>>>>>> def VSHUFFLE_256B  : I<0xE8, MRMDestReg, (outs
VR_2048:$dst),
>>>>>>> (ins VR_2048:$src1,
VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
>>>>>>> $src2, $dst|$dst, $src1, $src2}",
>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32
VR_2048:$src1), (v64i32
>>>>>>> VR_2048:$src2)))]>, TA;
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed
<hahmed2305 at gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> i managed to get rid of above error for
VT.is2048BitVector()).
>>>>>>>>
>>>>>>>> this was implemented already.
>>>>>>>>
>>>>>>>> now will try define other vectors like
VT.is4096BitVector()).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed
<
>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thank you. actually i have to implement
both i32 and i64. so i
>>>>>>>>> implemented two instructions now one
broadcastS other broadcastD. Although
>>>>>>>>> while doing broadcast from memory to
register i was getting no such error
>>>>>>>>> with 1 instruction and other patterns i64,
i32 etc. but then also i
>>>>>>>>> implemented its 2 versions single and
double.
>>>>>>>>>
>>>>>>>>> Actually, i am trying to compile matrix
multiplication code for
>>>>>>>>> greater size vector. There i need to
include many new instructions in my
>>>>>>>>> backend like shuffle, gather etc. For now i
am getting the following error.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR
Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>
>>>>>>>>> llc:
/lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue
>>>>>>>>> getOnesVector(llvm::EVT, const
llvm::X86Subtarget &, llvm::SelectionDAG &,
>>>>>>>>> const llvm::SDLoc &): Assertion
`(VT.is128BitVector() ||
>>>>>>>>> VT.is256BitVector() || VT.is512BitVector())
&& "Expected a 128/256/512-bit
>>>>>>>>> vector type"' failed.
>>>>>>>>>
>>>>>>>>>  i tried including is2048Bit Vector() and
others. also in
>>>>>>>>> vectortype.h i included these types for EVT
but was unable to compile
>>>>>>>>> backend and getting errors.
>>>>>>>>>
>>>>>>>>> Please help.
>>>>>>>>>
>>>>>>>>> Thank You
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig
Topper <
>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> You need a new instruction. And your
scalar register size needs
>>>>>>>>>> to match your vector element size. So
GR32 instead of GR64
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza
ahmed <
>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sorry to disturb,
>>>>>>>>>>> Now i want to implement instruction
to broadcast scalar register
>>>>>>>>>>> content to vector.
>>>>>>>>>>>
>>>>>>>>>>> like this;
>>>>>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I tried implementing it as follows;
>>>>>>>>>>>
>>>>>>>>>>> def BROADCASTR_256B : I<0x21,
MRMSrcReg, (outs VR_2048:$dst),
>>>>>>>>>>> (ins GR64:$src),
>>>>>>>>>>>                    
"BROADCASTR_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>                     [(set
VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>>>>>>  GR64:$src)))],
>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
GR64:$src)),
>>>>>>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Is it fine? Also do i need to
define a new instruction for this
>>>>>>>>>>> like BROADCASTR_256B? can i use the
previous instruction BROADCAST_256B
>>>>>>>>>>> (the one that broadcast memory
scalar to vector) and just define new
>>>>>>>>>>> pattern?
>>>>>>>>>>>
>>>>>>>>>>> Please help.
>>>>>>>>>>>
>>>>>>>>>>> Thank You
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM,
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank You so much.
>>>>>>>>>>>>
>>>>>>>>>>>> Wao you are simply genius.
>>>>>>>>>>>> initially I didnt include load
in both the main instruction and
>>>>>>>>>>>> pattern so i included in both
as follows:
>>>>>>>>>>>> def BROADCAST_256B : I<0x31,
MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>                    
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>                     [(set
VR_2048:$dst, (v64i32 (X86VBroadcast (
>>>>>>>>>>>> loadi32 addr:$src))))],
>>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>
>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>> And it worked perfectly.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank You again.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM,
Craig Topper <
>>>>>>>>>>>> craig.topper at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Your pattern needs to be
>>>>>>>>>>>>>
>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47
PM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> it runs fine with
v64i32. but with the following pattern
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i am getting error.
>>>>>>>>>>>>>> What is wrong with this
pattern?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at
2:01 AM, hameeza ahmed <
>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> in x86 it is;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>>>>>>          
(VBROADCASTSSZm addr:$src)>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> mine is
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017
at 1:59 AM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> for v16f32 it
is defined as;
>>>>>>>>>>>>>>>> :
Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))),
>>>>>>>>>>>>>>>>          
(VBROADCASTSSZr (EXTRACT_SUBREG (v16f32
>>>>>>>>>>>>>>>> VR512:$src),
sub_xmm))>;
>>>>>>>>>>>>>>>> which is
similar to mine.
>>>>>>>>>>>>>>>> Why its not
working then?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6,
2017 at 1:45 AM, Craig Topper <
>>>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You need a
pattern for v64f32 too.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Aug
5, 2017 at 1:37 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>> hahmed2305
at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> as you
said; these are instructions that i defined in
>>>>>>>>>>>>>>>>>>
instrinfo.td
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> def
BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>
VR_2048:$dst), (ins i2048mem:$src),
>>>>>>>>>>>>>>>>>>        
"BROADCAST_256B\t{$src, $dst|$dst,
>>>>>>>>>>>>>>>>>>
$src}",
>>>>>>>>>>>>>>>>>>        
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>>
(X86VBroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>        
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> def:
Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun,
Aug 6, 2017 at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I
did as you said;
>>>>>>>>>>>>>>>>>>> now
getting this error:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t63: v64f32
>>>>>>>>>>>>>>>>>>>
X86ISD::VBROADCAST t62
>>>>>>>>>>>>>>>>>>>  
t62: f32,ch = load<LD4[ConstantPool]> t0, t65,
>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>    
t65: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>    
t64: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>    
t8: i64 = undef
>>>>>>>>>>>>>>>>>>> In
function: stencil
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sun, Aug 6, 2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Add VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
added the setoperationaction line in isellowering.cpp.
>>>>>>>>>>>>>>>>>>>>>
now getting the following error.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
llc: /lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>>>>>>
*, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
What should I do?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Well first have you done this for your type
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32,
>>>>>>>>>>>>>>>>>>>>>>
Custom);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
How to do this task??
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
It looks like X86TargetLowering::LowerBUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>>>>
is not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included
>>>>>>>>>>>>>>>>>>>>>>>>>
broadcast instruction in instructioninfo.td. but
>>>>>>>>>>>>>>>>>>>>>>>>>
i made no changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32
>>>>>>>>>>>>>>>>>>>>>>>>>
BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>>>>>
constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>>>>>>
a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557              # float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0.200000003
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for vector
>>>>>>>>>>>>>>>>>>>>>>>>>>>
of 64 elements.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>> ~Craig
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/a4a6e843/attachment-0001.html>

Craig Topper via llvm-dev

2017-Aug-07 17:37 UTC

head link

[llvm-dev] VBROADCAST Implementation Issues

You need this line from AVX512 code to tell the register allocation system
that $src1/$dst and $mask/$mask_wb to use the same register. And the early
clobber tells it that $dst and $src2 cannot use the same register.

let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb"

~Craig

On Mon, Aug 7, 2017 at 10:19 AM, hameeza ahmed <hahmed2305 at gmail.com>
wrote:
> Thank You. Still getting errors.I have modified my instructions as you
> said as follows:
>
>
> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
VK64WM:$mask_wb),
> (ins VR_2048:$src1, VK64WM:$mask,  i2048mem:$src2),
>                     "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
> {${mask}}, $src2}",
>                     [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
> (masked_gather  (VR_2048:$src1), VK64WM:$mask,
>                      addr:$src2)))],
>                     IIC_MOV_MEM>, TA;
>
> def: Pat<(v64f32 (masked_gather (VR_2048:$src1),
> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask,
> addr:$src2)>;
>
>
> Now getting this error:
>
> llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void
> llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier():
> Assertion `numPhysicalOperands >= 2 + additionalOperands &&
> numPhysicalOperands <= 4 + additionalOperands &&
"Unexpected number of
> operands for MRMSrcMemFrm"' failed.
>
>
>
>
>
>
>
>
> On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at
gmail.com>
> wrote:
>
>> masked_gather takes 3 inputs. not just an address. See the AVX512
pattern
>> is pasted earlier
>>
>> ~Craig
>>
>> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at
gmail.com>
>> wrote:
>>
>>> Changed it to;
>>>
>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
VK64:$mask),
>>> (ins i2048mem:$src),
>>>                     "GATHER_256B\t{$src,
{$dst}{${mask}}|${dst}
>>> {${mask}}, $src}",
>>>                     [(set VR_2048:$dst, VK64:$mask, (v64i32
>>> (masked_gather addr:$src)))],
>>>                     IIC_MOV_MEM>, TA;
>>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
addr:$src)>;
>>> Now getting following error:
>>>
>>> Unhandled memory encoding VK64
>>> Unhandled memory encoding
>>> UNREACHABLE executed at
/utils/TableGen/X86RecognizableInstr.cpp:1347!
>>>
>>> What to do?
>>>
>>>
>>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> i am getting this error
>>>> error: Variable not defined: '_'
>>>> for _.KRCWM
>>>> what to do?
>>>>
>>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>> I did as you said,
>>>>>
>>>>> Please tell me whether the following correct now??
>>>>>
>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
>>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins
i2048mem:$src2),
>>>>>                     "GATHER_256B\t{$src2,
{$dst}{${mask}}|${dst}
>>>>> {${mask}}, $src2}"),
>>>>>                     [(set VR_2048:$dst, _.KRCWM:$mask_wb,
(v64i32
>>>>> (GatherNode  (VR_2048:$src1), _.KRCWM:$mask,
>>>>>                      VR_2048:$src2))],
>>>>>                     IIC_MOV_MEM>, TA;
>>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B
addr:$src2)>;
>>>>>
>>>>> Thank You
>>>>>
>>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper
<craig.topper at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> masked_gather returns two results. The data and the
modified mask.
>>>>>> Note the $dst and the $mask_wb in the pattern below.
>>>>>>
>>>>>> multiclass avx512_gather<bits<8> opc, string
OpcodeStr,
>>>>>> X86VectorVTInfo _,
>>>>>>                          X86MemOperand memop, PatFrag
GatherNode> {
>>>>>>   let Constraints = "@earlyclobber $dst, $src1 =
$dst, $mask >>>>>> $mask_wb",
>>>>>>       ExeDomain = _.ExeDomain in
>>>>>>   def rm  : AVX5128I<opc, MRMSrcMem, (outs
_.RC:$dst,
>>>>>> _.KRCWM:$mask_wb),
>>>>>>             (ins _.RC:$src1, _.KRCWM:$mask,
memop:$src2),
>>>>>>             !strconcat(OpcodeStr#_.Suffix,
>>>>>>             "\t{$src2, ${dst} {${mask}}|${dst}
{${mask}}, $src2}"),
>>>>>>             [(set _.RC:$dst, _.KRCWM:$mask_wb,
>>>>>>               (GatherNode  (_.VT _.RC:$src1),
_.KRCWM:$mask,
>>>>>>                      vectoraddr:$src2))]>, EVEX,
EVEX_K,
>>>>>>              EVEX_CD8<_.EltSize, CD8VT1>;
>>>>>> }
>>>>>>
>>>>>> ~Craig
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> i want to implement gather for v64i32. i wrote
following code.
>>>>>>>
>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs
VR_2048:$dst), (ins
>>>>>>> i2048mem:$src),
>>>>>>>                     "GATHER_256B\t{$src,
$dst|$dst, $src}",
>>>>>>>                     [(set VR_2048:$dst, (v64i32
(masked_gather
>>>>>>> addr:$src)))],
>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)),
>>>>>>> (GATHER_256B addr:$src)>;
>>>>>>>
>>>>>>> Also i wrote this line in isellowering.h
>>>>>>>
>>>>>>>               setOperationAction(ISD::MGATHER,
>>>>>>> MVT::v64i32, Legal);
>>>>>>>
>>>>>>> But I am getting following error:
>>>>>>>
>>>>>>> llvm-tblgen:
/utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>>>>>>> llvm::TreePatternNode
*llvm::TreePattern::ParseTreePattern(llvm::Init
>>>>>>> *, llvm::StringRef): Assertion
`New->getNumTypes() == 1 && "FIXME:
>>>>>>> Unhandled"' failed.
>>>>>>>
>>>>>>> What is my mistake?
>>>>>>>
>>>>>>> Please help me.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed
<hahmed2305 at gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> I am trying to implement vector shuffle for
v64i32. Is the
>>>>>>>> following correct?
>>>>>>>>
>>>>>>>>
>>>>>>>> def VSHUFFLE_256B  : I<0xE8, MRMDestReg,
(outs VR_2048:$dst),
>>>>>>>> (ins VR_2048:$src1,
VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
>>>>>>>> $src2, $dst|$dst, $src1, $src2}",
>>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32
VR_2048:$src1), (v64i32
>>>>>>>> VR_2048:$src2)))]>, TA;
>>>>>>>>
>>>>>>>> Please help.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed
<
>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> i managed to get rid of above error for
VT.is2048BitVector()).
>>>>>>>>>
>>>>>>>>> this was implemented already.
>>>>>>>>>
>>>>>>>>> now will try define other vectors like
VT.is4096BitVector()).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza
ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thank you. actually i have to implement
both i32 and i64. so i
>>>>>>>>>> implemented two instructions now one
broadcastS other broadcastD. Although
>>>>>>>>>> while doing broadcast from memory to
register i was getting no such error
>>>>>>>>>> with 1 instruction and other patterns
i64, i32 etc. but then also i
>>>>>>>>>> implemented its 2 versions single and
double.
>>>>>>>>>>
>>>>>>>>>> Actually, i am trying to compile matrix
multiplication code for
>>>>>>>>>> greater size vector. There i need to
include many new instructions in my
>>>>>>>>>> backend like shuffle, gather etc. For
now i am getting the following error.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR
Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>
>>>>>>>>>> llc:
/lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue
>>>>>>>>>> getOnesVector(llvm::EVT, const
llvm::X86Subtarget &, llvm::SelectionDAG &,
>>>>>>>>>> const llvm::SDLoc &): Assertion
`(VT.is128BitVector() ||
>>>>>>>>>> VT.is256BitVector() ||
VT.is512BitVector()) && "Expected a 128/256/512-bit
>>>>>>>>>> vector type"' failed.
>>>>>>>>>>
>>>>>>>>>>  i tried including is2048Bit Vector()
and others. also in
>>>>>>>>>> vectortype.h i included these types for
EVT but was unable to compile
>>>>>>>>>> backend and getting errors.
>>>>>>>>>>
>>>>>>>>>> Please help.
>>>>>>>>>>
>>>>>>>>>> Thank You
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig
Topper <
>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> You need a new instruction. And
your scalar register size needs
>>>>>>>>>>> to match your vector element size.
So GR32 instead of GR64
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sorry to disturb,
>>>>>>>>>>>> Now i want to implement
instruction to broadcast scalar
>>>>>>>>>>>> register content to vector.
>>>>>>>>>>>>
>>>>>>>>>>>> like this;
>>>>>>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I tried implementing it as
follows;
>>>>>>>>>>>>
>>>>>>>>>>>> def BROADCASTR_256B :
I<0x21, MRMSrcReg, (outs VR_2048:$dst),
>>>>>>>>>>>> (ins GR64:$src),
>>>>>>>>>>>>                    
"BROADCASTR_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>                     [(set
VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>>>>>>>  GR64:$src)))],
>>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast GR64:$src)),
>>>>>>>>>>>> (BROADCASTR_256B
GR64:$src)>;
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Is it fine? Also do i need to
define a new instruction for this
>>>>>>>>>>>> like BROADCASTR_256B? can i use
the previous instruction BROADCAST_256B
>>>>>>>>>>>> (the one that broadcast memory
scalar to vector) and just define new
>>>>>>>>>>>> pattern?
>>>>>>>>>>>>
>>>>>>>>>>>> Please help.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank You
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM,
hameeza ahmed <
>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thank You so much.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Wao you are simply genius.
>>>>>>>>>>>>> initially I didnt include
load in both the main instruction
>>>>>>>>>>>>> and pattern so i included
in both as follows:
>>>>>>>>>>>>> def BROADCAST_256B :
I<0x31, MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>>                    
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>>                     [(set
VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>>>>>>>> (loadi32 addr:$src))))],
>>>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>
>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>> And it worked perfectly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank You again.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28
AM, Craig Topper <
>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Your pattern needs to
be
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at
2:47 PM, hameeza ahmed <
>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> it runs fine with
v64i32. but with the following pattern
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> i am getting error.
>>>>>>>>>>>>>>> What is wrong with
this pattern?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017
at 2:01 AM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> in x86 it is;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>>>>>>>          
(VBROADCASTSSZm addr:$src)>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> mine is
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> def:
Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6,
2017 at 1:59 AM, hameeza ahmed <
>>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> for v16f32
it is defined as;
>>>>>>>>>>>>>>>>> :
Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))),
>>>>>>>>>>>>>>>>>          
(VBROADCASTSSZr (EXTRACT_SUBREG (v16f32
>>>>>>>>>>>>>>>>>
VR512:$src), sub_xmm))>;
>>>>>>>>>>>>>>>>> which is
similar to mine.
>>>>>>>>>>>>>>>>> Why its not
working then?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Aug
6, 2017 at 1:45 AM, Craig Topper <
>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You
need a pattern for v64f32 too.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat,
Aug 5, 2017 at 1:37 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> as
you said; these are instructions that i defined in
>>>>>>>>>>>>>>>>>>>
instrinfo.td
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> def
BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>
VR_2048:$dst), (ins i2048mem:$src),
>>>>>>>>>>>>>>>>>>>    
"BROADCAST_256B\t{$src, $dst|$dst,
>>>>>>>>>>>>>>>>>>>
$src}",
>>>>>>>>>>>>>>>>>>>    
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>
(X86VBroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
I did as you said;
>>>>>>>>>>>>>>>>>>>>
now getting this error:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t63: v64f32
>>>>>>>>>>>>>>>>>>>>
X86ISD::VBROADCAST t62
>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t65,
>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>
t65: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>
t64: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Add VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
added the setoperationaction line in
>>>>>>>>>>>>>>>>>>>>>>
isellowering.cpp. now getting the following error.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
llc: /lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>>>>>>>
*, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
What should I do?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Well first have you done this for your type
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32,
>>>>>>>>>>>>>>>>>>>>>>>
Custom);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
How to do this task??
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
It looks like X86TargetLowering::LowerBUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>>>>>
is not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included
>>>>>>>>>>>>>>>>>>>>>>>>>>
broadcast instruction in instructioninfo.td. but
>>>>>>>>>>>>>>>>>>>>>>>>>>
i made no changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32
>>>>>>>>>>>>>>>>>>>>>>>>>>
BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557              # float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0.200000003
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vector of 64 elements.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>> ~Craig
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/3fd36f0a/attachment.html>

hameeza ahmed via llvm-dev

2017-Aug-07 17:39 UTC

head link

[llvm-dev] VBROADCAST Implementation Issues

Where to add this line?
Sorry I didnt understand it.

On Mon, Aug 7, 2017 at 10:37 PM, Craig Topper <craig.topper at gmail.com>
wrote:
> You need this line from AVX512 code to tell the register allocation system
> that $src1/$dst and $mask/$mask_wb to use the same register. And the early
> clobber tells it that $dst and $src2 cannot use the same register.
>
> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask =
$mask_wb"
>
> ~Craig
>
> On Mon, Aug 7, 2017 at 10:19 AM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> Thank You. Still getting errors.I have modified my instructions as you
>> said as follows:
>>
>>
>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
>> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask,  i2048mem:$src2),
>>                     "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
>> {${mask}}, $src2}",
>>                     [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
>> (masked_gather  (VR_2048:$src1), VK64WM:$mask,
>>                      addr:$src2)))],
>>                     IIC_MOV_MEM>, TA;
>>
>> def: Pat<(v64f32 (masked_gather (VR_2048:$src1),
>> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1,
VK64WM:$mask,
>> addr:$src2)>;
>>
>>
>> Now getting this error:
>>
>> llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void
>> llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier():
>> Assertion `numPhysicalOperands >= 2 + additionalOperands &&
>> numPhysicalOperands <= 4 + additionalOperands &&
"Unexpected number of
>> operands for MRMSrcMemFrm"' failed.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at
gmail.com>
>> wrote:
>>
>>> masked_gather takes 3 inputs. not just an address. See the AVX512
>>> pattern is pasted earlier
>>>
>>> ~Craig
>>>
>>> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> Changed it to;
>>>>
>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
VK64:$mask),
>>>> (ins i2048mem:$src),
>>>>                     "GATHER_256B\t{$src,
{$dst}{${mask}}|${dst}
>>>> {${mask}}, $src}",
>>>>                     [(set VR_2048:$dst, VK64:$mask, (v64i32
>>>> (masked_gather addr:$src)))],
>>>>                     IIC_MOV_MEM>, TA;
>>>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
addr:$src)>;
>>>> Now getting following error:
>>>>
>>>> Unhandled memory encoding VK64
>>>> Unhandled memory encoding
>>>> UNREACHABLE executed at
/utils/TableGen/X86RecognizableInstr.cpp:1347!
>>>>
>>>> What to do?
>>>>
>>>>
>>>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>>> wrote:
>>>>
>>>>> i am getting this error
>>>>> error: Variable not defined: '_'
>>>>> for _.KRCWM
>>>>> what to do?
>>>>>
>>>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>> I did as you said,
>>>>>>
>>>>>> Please tell me whether the following correct now??
>>>>>>
>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs
VR_2048:$dst,
>>>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins
i2048mem:$src2),
>>>>>>                     "GATHER_256B\t{$src2,
{$dst}{${mask}}|${dst}
>>>>>> {${mask}}, $src2}"),
>>>>>>                     [(set VR_2048:$dst,
_.KRCWM:$mask_wb, (v64i32
>>>>>> (GatherNode  (VR_2048:$src1), _.KRCWM:$mask,
>>>>>>                      VR_2048:$src2))],
>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>> def: Pat<(v64f32 (GatherNode addr:$src2)),
(GATHER_256B addr:$src2)>;
>>>>>>
>>>>>> Thank You
>>>>>>
>>>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper
<craig.topper at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> masked_gather returns two results. The data and the
modified mask.
>>>>>>> Note the $dst and the $mask_wb in the pattern
below.
>>>>>>>
>>>>>>> multiclass avx512_gather<bits<8> opc,
string OpcodeStr,
>>>>>>> X86VectorVTInfo _,
>>>>>>>                          X86MemOperand memop,
PatFrag GatherNode> {
>>>>>>>   let Constraints = "@earlyclobber $dst, $src1
= $dst, $mask >>>>>>> $mask_wb",
>>>>>>>       ExeDomain = _.ExeDomain in
>>>>>>>   def rm  : AVX5128I<opc, MRMSrcMem, (outs
_.RC:$dst,
>>>>>>> _.KRCWM:$mask_wb),
>>>>>>>             (ins _.RC:$src1, _.KRCWM:$mask,
memop:$src2),
>>>>>>>             !strconcat(OpcodeStr#_.Suffix,
>>>>>>>             "\t{$src2, ${dst} {${mask}}|${dst}
{${mask}}, $src2}"),
>>>>>>>             [(set _.RC:$dst, _.KRCWM:$mask_wb,
>>>>>>>               (GatherNode  (_.VT _.RC:$src1),
_.KRCWM:$mask,
>>>>>>>                      vectoraddr:$src2))]>, EVEX,
EVEX_K,
>>>>>>>              EVEX_CD8<_.EltSize, CD8VT1>;
>>>>>>> }
>>>>>>>
>>>>>>> ~Craig
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> i want to implement gather for v64i32. i wrote
following code.
>>>>>>>>
>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs
VR_2048:$dst), (ins
>>>>>>>> i2048mem:$src),
>>>>>>>>                     "GATHER_256B\t{$src,
$dst|$dst, $src}",
>>>>>>>>                     [(set VR_2048:$dst, (v64i32
(masked_gather
>>>>>>>> addr:$src)))],
>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)),
>>>>>>>> (GATHER_256B addr:$src)>;
>>>>>>>>
>>>>>>>> Also i wrote this line in isellowering.h
>>>>>>>>
>>>>>>>>               setOperationAction(ISD::MGATHER,
>>>>>>>> MVT::v64i32, Legal);
>>>>>>>>
>>>>>>>> But I am getting following error:
>>>>>>>>
>>>>>>>> llvm-tblgen:
/utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>>>>>>>> llvm::TreePatternNode
*llvm::TreePattern::ParseTreePattern(llvm::Init
>>>>>>>> *, llvm::StringRef): Assertion
`New->getNumTypes() == 1 && "FIXME:
>>>>>>>> Unhandled"' failed.
>>>>>>>>
>>>>>>>> What is my mistake?
>>>>>>>>
>>>>>>>> Please help me.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed
<
>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I am trying to implement vector shuffle for
v64i32. Is the
>>>>>>>>> following correct?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> def VSHUFFLE_256B  : I<0xE8, MRMDestReg,
(outs VR_2048:$dst),
>>>>>>>>> (ins VR_2048:$src1,
VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
>>>>>>>>> $src2, $dst|$dst, $src1, $src2}",
>>>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32
VR_2048:$src1), (v64i32
>>>>>>>>> VR_2048:$src2)))]>, TA;
>>>>>>>>>
>>>>>>>>> Please help.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza
ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> i managed to get rid of above error for
VT.is2048BitVector()).
>>>>>>>>>>
>>>>>>>>>> this was implemented already.
>>>>>>>>>>
>>>>>>>>>> now will try define other vectors like
VT.is4096BitVector()).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM,
hameeza ahmed <
>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thank you. actually i have to
implement both i32 and i64. so i
>>>>>>>>>>> implemented two instructions now
one broadcastS other broadcastD. Although
>>>>>>>>>>> while doing broadcast from memory
to register i was getting no such error
>>>>>>>>>>> with 1 instruction and other
patterns i64, i32 etc. but then also i
>>>>>>>>>>> implemented its 2 versions single
and double.
>>>>>>>>>>>
>>>>>>>>>>> Actually, i am trying to compile
matrix multiplication code for
>>>>>>>>>>> greater size vector. There i need
to include many new instructions in my
>>>>>>>>>>> backend like shuffle, gather etc.
For now i am getting the following error.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Legalizing: t208: v64i32 =
BUILD_VECTOR Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>
>>>>>>>>>>> llc:
/lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue
>>>>>>>>>>> getOnesVector(llvm::EVT, const
llvm::X86Subtarget &, llvm::SelectionDAG &,
>>>>>>>>>>> const llvm::SDLoc &): Assertion
`(VT.is128BitVector() ||
>>>>>>>>>>> VT.is256BitVector() ||
VT.is512BitVector()) && "Expected a 128/256/512-bit
>>>>>>>>>>> vector type"' failed.
>>>>>>>>>>>
>>>>>>>>>>>  i tried including is2048Bit
Vector() and others. also in
>>>>>>>>>>> vectortype.h i included these types
for EVT but was unable to compile
>>>>>>>>>>> backend and getting errors.
>>>>>>>>>>>
>>>>>>>>>>> Please help.
>>>>>>>>>>>
>>>>>>>>>>> Thank You
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM,
Craig Topper <
>>>>>>>>>>> craig.topper at gmail.com>
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> You need a new instruction. And
your scalar register size needs
>>>>>>>>>>>> to match your vector element
size. So GR32 instead of GR64
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM
hameeza ahmed <
>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry to disturb,
>>>>>>>>>>>>> Now i want to implement
instruction to broadcast scalar
>>>>>>>>>>>>> register content to vector.
>>>>>>>>>>>>>
>>>>>>>>>>>>> like this;
>>>>>>>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I tried implementing it as
follows;
>>>>>>>>>>>>>
>>>>>>>>>>>>> def BROADCASTR_256B :
I<0x21, MRMSrcReg, (outs VR_2048:$dst),
>>>>>>>>>>>>> (ins GR64:$src),
>>>>>>>>>>>>>                    
"BROADCASTR_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>>                     [(set
VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>>>>>>>>  GR64:$src)))],
>>>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast GR64:$src)),
>>>>>>>>>>>>> (BROADCASTR_256B
GR64:$src)>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is it fine? Also do i need
to define a new instruction for
>>>>>>>>>>>>> this like BROADCASTR_256B?
can i use the previous instruction
>>>>>>>>>>>>> BROADCAST_256B (the one
that broadcast memory scalar to vector) and just
>>>>>>>>>>>>> define new pattern?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank You
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10
AM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank You so much.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Wao you are simply
genius.
>>>>>>>>>>>>>> initially I didnt
include load in both the main instruction
>>>>>>>>>>>>>> and pattern so i
included in both as follows:
>>>>>>>>>>>>>> def BROADCAST_256B :
I<0x31, MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>>>                    
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>>>                    
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>> (X86VBroadcast (loadi32
addr:$src))))],
>>>>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>> And it worked
perfectly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank You again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at
4:28 AM, Craig Topper <
>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Your pattern needs
to be
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Aug 5, 2017
at 2:47 PM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> it runs fine
with v64i32. but with the following pattern
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> def:
Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> i am getting
error.
>>>>>>>>>>>>>>>> What is wrong
with this pattern?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6,
2017 at 2:01 AM, hameeza ahmed <
>>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> in x86 it
is;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>>>>>>>>          
(VBROADCASTSSZm addr:$src)>;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> mine is
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> def:
Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Aug
6, 2017 at 1:59 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>> hahmed2305
at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> for
v16f32 it is defined as;
>>>>>>>>>>>>>>>>>> :
Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))),
>>>>>>>>>>>>>>>>>>        
(VBROADCASTSSZr (EXTRACT_SUBREG (v16f32
>>>>>>>>>>>>>>>>>>
VR512:$src), sub_xmm))>;
>>>>>>>>>>>>>>>>>> which
is similar to mine.
>>>>>>>>>>>>>>>>>> Why its
not working then?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun,
Aug 6, 2017 at 1:45 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You
need a pattern for v64f32 too.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
as you said; these are instructions that i defined in
>>>>>>>>>>>>>>>>>>>>
instrinfo.td
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
def BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>
VR_2048:$dst), (ins i2048mem:$src),
>>>>>>>>>>>>>>>>>>>>
"BROADCAST_256B\t{$src, $dst|$dst,
>>>>>>>>>>>>>>>>>>>>
$src}",
>>>>>>>>>>>>>>>>>>>>
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>
(X86VBroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I did as you said;
>>>>>>>>>>>>>>>>>>>>>
now getting this error:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t63: v64f32
>>>>>>>>>>>>>>>>>>>>>
X86ISD::VBROADCAST t62
>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t65,
>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>
t65: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t64: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Add VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
added the setoperationaction line in
>>>>>>>>>>>>>>>>>>>>>>>
isellowering.cpp. now getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
llc: /lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>>>>>>>>
*, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
What should I do?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Well first have you done this for your type
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32,
>>>>>>>>>>>>>>>>>>>>>>>>
Custom);
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
How to do this task??
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
It looks like X86TargetLowering::LowerBUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>>>>>>
is not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included
>>>>>>>>>>>>>>>>>>>>>>>>>>>
broadcast instruction in instructioninfo.td.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
but i made no changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32
>>>>>>>>>>>>>>>>>>>>>>>>>>>
BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
+ a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557              # float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0.200000003
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vector of 64 elements.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/d557d8fd/attachment-0001.html>

hameeza ahmed via llvm-dev

2017-Aug-07 17:57 UTC

head link

[llvm-dev] VBROADCAST Implementation Issues

Now getting this error:
/lib/Target/X86/X86InstrInfo.td:3318:1: error: In GATHER_256B: Unrecognized
node 'VR_2048'!




On Mon, Aug 7, 2017 at 10:53 PM, Craig Topper <craig.topper at gmail.com>
wrote:
> You need to add EVEX_K and EVEX_4V to the end of your instruction after TA.
>
> ~Craig
>
> On Mon, Aug 7, 2017 at 10:47 AM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> Thank You. Now getting this error:
>>
>> Unhandled memory encoding VK64WM
>> Unhandled memory encoding
>>
>>
>> On Mon, Aug 7, 2017 at 10:43 PM, Craig Topper <craig.topper at
gmail.com>
>> wrote:
>>
>>> Right before your "def GATHER_256B" add the 'let'
line like so
>>>
>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask =
$mask_wb" in
>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
>>> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, 
i2048mem:$src2),
>>>                     "GATHER_256B\t{$src2, {$dst}
{${mask}}|${dst}
>>> {${mask}}, $src2}",
>>>                     [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
>>> (masked_gather  (VR_2048:$src1), VK64WM:$mask,
>>>                      addr:$src2)))],
>>>                     IIC_MOV_MEM>, TA;
>>>
>>> def: Pat<(v64f32 (masked_gather (VR_2048:$src1),
>>> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1,
VK64WM:$mask,
>>> addr:$src2)>;
>>>
>>> ~Craig
>>>
>>> On Mon, Aug 7, 2017 at 10:39 AM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> Where to add this line?
>>>> Sorry I didnt understand it.
>>>>
>>>> On Mon, Aug 7, 2017 at 10:37 PM, Craig Topper <craig.topper
at gmail.com>
>>>> wrote:
>>>>
>>>>> You need this line from AVX512 code to tell the register
allocation
>>>>> system that $src1/$dst and $mask/$mask_wb to use the same
register. And the
>>>>> early clobber tells it that $dst and $src2 cannot use the
same register.
>>>>>
>>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst,
$mask = $mask_wb"
>>>>>
>>>>> ~Craig
>>>>>
>>>>> On Mon, Aug 7, 2017 at 10:19 AM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thank You. Still getting errors.I have modified my
instructions as
>>>>>> you said as follows:
>>>>>>
>>>>>>
>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs
VR_2048:$dst,
>>>>>> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, 
i2048mem:$src2),
>>>>>>                     "GATHER_256B\t{$src2, {$dst}
{${mask}}|${dst}
>>>>>> {${mask}}, $src2}",
>>>>>>                     [(set VR_2048:$dst,
VK64WM:$mask_wb, (v64i32
>>>>>> (masked_gather  (VR_2048:$src1), VK64WM:$mask,
>>>>>>                      addr:$src2)))],
>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>
>>>>>> def: Pat<(v64f32 (masked_gather (VR_2048:$src1),
>>>>>> (VK64WM:$mask),(addr:$src2))), (GATHER_256B
VR_2048:$src1, VK64WM:$mask,
>>>>>> addr:$src2)>;
>>>>>>
>>>>>>
>>>>>> Now getting this error:
>>>>>>
>>>>>> llvm-tblgen:
/utils/TableGen/X86RecognizableInstr.cpp:687: void
>>>>>>
llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier():
>>>>>> Assertion `numPhysicalOperands >= 2 +
additionalOperands &&
>>>>>> numPhysicalOperands <= 4 + additionalOperands
&& "Unexpected number of
>>>>>> operands for MRMSrcMemFrm"' failed.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper
<craig.topper at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> masked_gather takes 3 inputs. not just an address.
See the AVX512
>>>>>>> pattern is pasted earlier
>>>>>>>
>>>>>>> ~Craig
>>>>>>>
>>>>>>> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Changed it to;
>>>>>>>>
>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs
VR_2048:$dst,
>>>>>>>> VK64:$mask), (ins i2048mem:$src),
>>>>>>>>                     "GATHER_256B\t{$src,
{$dst}{${mask}}|${dst}
>>>>>>>> {${mask}}, $src}",
>>>>>>>>                     [(set VR_2048:$dst,
VK64:$mask, (v64i32
>>>>>>>> (masked_gather addr:$src)))],
>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)),
>>>>>>>> (GATHER_256B addr:$src)>;
>>>>>>>> Now getting following error:
>>>>>>>>
>>>>>>>> Unhandled memory encoding VK64
>>>>>>>> Unhandled memory encoding
>>>>>>>> UNREACHABLE executed at
/utils/TableGen/X86Recognizabl
>>>>>>>> eInstr.cpp:1347!
>>>>>>>>
>>>>>>>> What to do?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed
<hahmed2305 at gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> i am getting this error
>>>>>>>>> error: Variable not defined: '_'
>>>>>>>>> for _.KRCWM
>>>>>>>>> what to do?
>>>>>>>>>
>>>>>>>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza
ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>> I did as you said,
>>>>>>>>>>
>>>>>>>>>> Please tell me whether the following
correct now??
>>>>>>>>>>
>>>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem,
(outs VR_2048:$dst,
>>>>>>>>>> _.KRCWM:$mask_wb), (VR_2048:$src1,
_.KRCWM:$mask, ins i2048mem:$src2),
>>>>>>>>>>                    
"GATHER_256B\t{$src2, {$dst}{${mask}}|${dst}
>>>>>>>>>> {${mask}}, $src2}"),
>>>>>>>>>>                     [(set VR_2048:$dst,
_.KRCWM:$mask_wb, (v64i32
>>>>>>>>>> (GatherNode  (VR_2048:$src1),
_.KRCWM:$mask,
>>>>>>>>>>                      VR_2048:$src2))],
>>>>>>>>>>                     IIC_MOV_MEM>,
TA;
>>>>>>>>>> def: Pat<(v64f32 (GatherNode
addr:$src2)),
>>>>>>>>>> (GATHER_256B addr:$src2)>;
>>>>>>>>>>
>>>>>>>>>> Thank You
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig
Topper <
>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> masked_gather returns two results.
The data and the modified
>>>>>>>>>>> mask. Note the $dst and the
$mask_wb in the pattern below.
>>>>>>>>>>>
>>>>>>>>>>> multiclass
avx512_gather<bits<8> opc, string OpcodeStr,
>>>>>>>>>>> X86VectorVTInfo _,
>>>>>>>>>>>                         
X86MemOperand memop, PatFrag
>>>>>>>>>>> GatherNode> {
>>>>>>>>>>>   let Constraints =
"@earlyclobber $dst, $src1 = $dst, $mask
>>>>>>>>>>> $mask_wb",
>>>>>>>>>>>       ExeDomain = _.ExeDomain in
>>>>>>>>>>>   def rm  : AVX5128I<opc,
MRMSrcMem, (outs _.RC:$dst,
>>>>>>>>>>> _.KRCWM:$mask_wb),
>>>>>>>>>>>             (ins _.RC:$src1,
_.KRCWM:$mask, memop:$src2),
>>>>>>>>>>>            
!strconcat(OpcodeStr#_.Suffix,
>>>>>>>>>>>             "\t{$src2, ${dst}
{${mask}}|${dst} {${mask}},
>>>>>>>>>>> $src2}"),
>>>>>>>>>>>             [(set _.RC:$dst,
_.KRCWM:$mask_wb,
>>>>>>>>>>>               (GatherNode  (_.VT
_.RC:$src1), _.KRCWM:$mask,
>>>>>>>>>>>                     
vectoraddr:$src2))]>, EVEX, EVEX_K,
>>>>>>>>>>>              EVEX_CD8<_.EltSize,
CD8VT1>;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> ~Craig
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:21 PM,
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> i want to implement gather for
v64i32. i wrote following code.
>>>>>>>>>>>>
>>>>>>>>>>>> def GATHER_256B : I<0x68,
MRMSrcMem, (outs VR_2048:$dst), (ins
>>>>>>>>>>>> i2048mem:$src),
>>>>>>>>>>>>                    
"GATHER_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>                     [(set
VR_2048:$dst, (v64i32 (masked_gather
>>>>>>>>>>>> addr:$src)))],
>>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>> def: Pat<(v64f32
(masked_gather addr:$src)),
>>>>>>>>>>>> (GATHER_256B addr:$src)>;
>>>>>>>>>>>>
>>>>>>>>>>>> Also i wrote this line in
isellowering.h
>>>>>>>>>>>>
>>>>>>>>>>>>              
setOperationAction(ISD::MGATHER,
>>>>>>>>>>>> MVT::v64i32, Legal);
>>>>>>>>>>>>
>>>>>>>>>>>> But I am getting following
error:
>>>>>>>>>>>>
>>>>>>>>>>>> llvm-tblgen:
/utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>>>>>>>>>>>> llvm::TreePatternNode
*llvm::TreePattern::ParseTreePattern(llvm::Init
>>>>>>>>>>>> *, llvm::StringRef): Assertion
`New->getNumTypes() == 1 && "FIXME:
>>>>>>>>>>>> Unhandled"' failed.
>>>>>>>>>>>>
>>>>>>>>>>>> What is my mistake?
>>>>>>>>>>>>
>>>>>>>>>>>> Please help me.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 7, 2017 at 12:03
AM, hameeza ahmed <
>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am trying to implement
vector shuffle for v64i32. Is the
>>>>>>>>>>>>> following correct?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> def VSHUFFLE_256B  :
I<0xE8, MRMDestReg, (outs VR_2048:$dst),
>>>>>>>>>>>>> (ins VR_2048:$src1,
VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
>>>>>>>>>>>>> $src2, $dst|$dst, $src1,
$src2}",
>>>>>>>>>>>>> [(set VR_2048:$dst,
(shufflevector (v64i32 VR_2048:$src1),
>>>>>>>>>>>>> (v64i32
VR_2048:$src2)))]>, TA;
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please help.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Aug 6, 2017 at
11:48 PM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> i managed to get rid of
above error for
>>>>>>>>>>>>>> VT.is2048BitVector()).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> this was implemented
already.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> now will try define
other vectors like VT.is4096BitVector()).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at
11:11 PM, hameeza ahmed <
>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you. actually
i have to implement both i32 and i64. so
>>>>>>>>>>>>>>> i implemented two
instructions now one broadcastS other broadcastD.
>>>>>>>>>>>>>>> Although while
doing broadcast from memory to register i was getting no
>>>>>>>>>>>>>>> such error with 1
instruction and other patterns i64, i32 etc. but then
>>>>>>>>>>>>>>> also i implemented
its 2 versions single and double.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Actually, i am
trying to compile matrix multiplication code
>>>>>>>>>>>>>>> for greater size
vector. There i need to include many new instructions in
>>>>>>>>>>>>>>> my backend like
shuffle, gather etc. For now i am getting the following
>>>>>>>>>>>>>>> error.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Legalizing: t208:
v64i32 = BUILD_VECTOR Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>
>>>>>>>>>>>>>>> llc:
/lib/Target/X86/X86ISelLowering.cpp:5525:
>>>>>>>>>>>>>>> llvm::SDValue
getOnesVector(llvm::EVT, const llvm::X86Subtarget &,
>>>>>>>>>>>>>>> llvm::SelectionDAG
&, const llvm::SDLoc &): Assertion `(VT.is128BitVector()
>>>>>>>>>>>>>>> ||
VT.is256BitVector() || VT.is512BitVector()) && "Expected a
>>>>>>>>>>>>>>> 128/256/512-bit
vector type"' failed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  i tried including
is2048Bit Vector() and others. also in
>>>>>>>>>>>>>>> vectortype.h i
included these types for EVT but was unable to compile
>>>>>>>>>>>>>>> backend and getting
errors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank You
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017
at 8:42 PM, Craig Topper <
>>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You need a new
instruction. And your scalar register size
>>>>>>>>>>>>>>>> needs to match
your vector element size. So GR32 instead of GR64
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6,
2017 at 5:44 AM hameeza ahmed <
>>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sorry to
disturb,
>>>>>>>>>>>>>>>>> Now i want
to implement instruction to broadcast scalar
>>>>>>>>>>>>>>>>> register
content to vector.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> like this;
>>>>>>>>>>>>>>>>>
vpbroadcastq zmm0, rsi
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I tried
implementing it as follows;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> def
BROADCASTR_256B : I<0x21, MRMSrcReg, (outs
>>>>>>>>>>>>>>>>>
VR_2048:$dst), (ins GR64:$src),
>>>>>>>>>>>>>>>>>            
"BROADCASTR_256B\t{$src, $dst|$dst,
>>>>>>>>>>>>>>>>>
$src}",
>>>>>>>>>>>>>>>>>            
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>
(X86VBroadcast  GR64:$src)))],
>>>>>>>>>>>>>>>>>            
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> def:
Pat<(v64f32 (X86VBroadcast GR64:$src)),
>>>>>>>>>>>>>>>>>
(BROADCASTR_256B GR64:$src)>;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is it fine?
Also do i need to define a new instruction for
>>>>>>>>>>>>>>>>> this like
BROADCASTR_256B? can i use the previous instruction
>>>>>>>>>>>>>>>>>
BROADCAST_256B (the one that broadcast memory scalar to vector) and just
>>>>>>>>>>>>>>>>> define new
pattern?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please
help.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank You
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Aug
6, 2017 at 5:10 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>> hahmed2305
at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank
You so much.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Wao you
are simply genius.
>>>>>>>>>>>>>>>>>>
initially I didnt include load in both the main
>>>>>>>>>>>>>>>>>>
instruction and pattern so i included in both as follows:
>>>>>>>>>>>>>>>>>> def
BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>
VR_2048:$dst), (ins i2048mem:$src),
>>>>>>>>>>>>>>>>>>        
"BROADCAST_256B\t{$src, $dst|$dst,
>>>>>>>>>>>>>>>>>>
$src}",
>>>>>>>>>>>>>>>>>>        
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>>
(X86VBroadcast (loadi32 addr:$src))))],
>>>>>>>>>>>>>>>>>>        
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> def:
Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>> And it
worked perfectly.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank
You again.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun,
Aug 6, 2017 at 4:28 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Your pattern needs to be
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
it runs fine with v64i32. but with the following pattern
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
i am getting error.
>>>>>>>>>>>>>>>>>>>>
What is wrong with this pattern?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
in x86 it is;
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
def : Pat<(int_x86_avx512_vbroadcast_ss_512
>>>>>>>>>>>>>>>>>>>>>
addr:$src),
>>>>>>>>>>>>>>>>>>>>>
(VBROADCASTSSZm addr:$src)>;
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
mine is
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
for v16f32 it is defined as;
>>>>>>>>>>>>>>>>>>>>>>
: Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))),
>>>>>>>>>>>>>>>>>>>>>>
(VBROADCASTSSZr (EXTRACT_SUBREG (v16f32
>>>>>>>>>>>>>>>>>>>>>>
VR512:$src), sub_xmm))>;
>>>>>>>>>>>>>>>>>>>>>>
which is similar to mine.
>>>>>>>>>>>>>>>>>>>>>>
Why its not working then?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
You need a pattern for v64f32 too.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
as you said; these are instructions that i defined
>>>>>>>>>>>>>>>>>>>>>>>>
in instrinfo.td
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>>
VR_2048:$dst), (ins i2048mem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_256B\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>>
(X86VBroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
I did as you said;
>>>>>>>>>>>>>>>>>>>>>>>>>
now getting this error:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t63: v64f32
>>>>>>>>>>>>>>>>>>>>>>>>>
X86ISD::VBROADCAST t62
>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t65,
>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>
t65: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Add VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
added the setoperationaction line in
>>>>>>>>>>>>>>>>>>>>>>>>>>>
isellowering.cpp. now getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
llc: /lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>>>>>>>>>>>>
*, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
What should I do?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Well first have you done this for your type
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Custom);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
How to do this task??
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
It looks like X86TargetLowering::LowerBUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
is not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ahmed <hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
broadcast instruction in instructioninfo.td.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
but i made no changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64, undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64, undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64, undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Topper <craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ahmed <hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
with constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] +
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557              # float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0.200000003
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip +
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vector of 64 elements.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(outs VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(v64i32 (vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/ae7e356e/attachment-0001.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Aug 2017 - VBROADCAST Implementation Issues

[llvm-dev] VBROADCAST Implementation Issues

[llvm-dev] VBROADCAST Implementation Issues

[llvm-dev] VBROADCAST Implementation Issues

[llvm-dev] VBROADCAST Implementation Issues

Seemingly Similar Threads