hameeza ahmed via llvm-dev
2017-Aug-07  17:19 UTC
[llvm-dev] VBROADCAST Implementation Issues
Thank You. Still getting errors.I have modified my instructions as you said
as follows:
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb),
(ins VR_2048:$src1, VK64WM:$mask,  i2048mem:$src2),
                    "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
{${mask}}, $src2}",
                    [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
(masked_gather  (VR_2048:$src1), VK64WM:$mask,
                     addr:$src2)))],
                    IIC_MOV_MEM>, TA;
def: Pat<(v64f32 (masked_gather (VR_2048:$src1),
(VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask,
addr:$src2)>;
Now getting this error:
llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void
llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier():
Assertion `numPhysicalOperands >= 2 + additionalOperands &&
numPhysicalOperands <= 4 + additionalOperands && "Unexpected
number of
operands for MRMSrcMemFrm"' failed.
On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at gmail.com>
wrote:
> masked_gather takes 3 inputs. not just an address. See the AVX512 pattern
> is pasted earlier
>
> ~Craig
>
> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> Changed it to;
>>
>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
VK64:$mask),
>> (ins i2048mem:$src),
>>                     "GATHER_256B\t{$src, {$dst}{${mask}}|${dst}
>> {${mask}}, $src}",
>>                     [(set VR_2048:$dst, VK64:$mask, (v64i32
>> (masked_gather addr:$src)))],
>>                     IIC_MOV_MEM>, TA;
>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
addr:$src)>;
>> Now getting following error:
>>
>> Unhandled memory encoding VK64
>> Unhandled memory encoding
>> UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347!
>>
>> What to do?
>>
>>
>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>> wrote:
>>
>>> i am getting this error
>>> error: Variable not defined: '_'
>>> for _.KRCWM
>>> what to do?
>>>
>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>> I did as you said,
>>>>
>>>> Please tell me whether the following correct now??
>>>>
>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins
i2048mem:$src2),
>>>>                     "GATHER_256B\t{$src2,
{$dst}{${mask}}|${dst}
>>>> {${mask}}, $src2}"),
>>>>                     [(set VR_2048:$dst, _.KRCWM:$mask_wb,
(v64i32
>>>> (GatherNode  (VR_2048:$src1), _.KRCWM:$mask,
>>>>                      VR_2048:$src2))],
>>>>                     IIC_MOV_MEM>, TA;
>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B
addr:$src2)>;
>>>>
>>>> Thank You
>>>>
>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper
at gmail.com>
>>>> wrote:
>>>>
>>>>> masked_gather returns two results. The data and the
modified mask.
>>>>> Note the $dst and the $mask_wb in the pattern below.
>>>>>
>>>>> multiclass avx512_gather<bits<8> opc, string
OpcodeStr,
>>>>> X86VectorVTInfo _,
>>>>>                          X86MemOperand memop, PatFrag
GatherNode> {
>>>>>   let Constraints = "@earlyclobber $dst, $src1 = $dst,
$mask >>>>> $mask_wb",
>>>>>       ExeDomain = _.ExeDomain in
>>>>>   def rm  : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst,
>>>>> _.KRCWM:$mask_wb),
>>>>>             (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
>>>>>             !strconcat(OpcodeStr#_.Suffix,
>>>>>             "\t{$src2, ${dst} {${mask}}|${dst}
{${mask}}, $src2}"),
>>>>>             [(set _.RC:$dst, _.KRCWM:$mask_wb,
>>>>>               (GatherNode  (_.VT _.RC:$src1),
_.KRCWM:$mask,
>>>>>                      vectoraddr:$src2))]>, EVEX, EVEX_K,
>>>>>              EVEX_CD8<_.EltSize, CD8VT1>;
>>>>> }
>>>>>
>>>>> ~Craig
>>>>>
>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> i want to implement gather for v64i32. i wrote
following code.
>>>>>>
>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs
VR_2048:$dst), (ins
>>>>>> i2048mem:$src),
>>>>>>                     "GATHER_256B\t{$src,
$dst|$dst, $src}",
>>>>>>                     [(set VR_2048:$dst, (v64i32
(masked_gather
>>>>>> addr:$src)))],
>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)),
(GATHER_256B addr:$src)>;
>>>>>>
>>>>>> Also i wrote this line in isellowering.h
>>>>>>
>>>>>>               setOperationAction(ISD::MGATHER,
>>>>>> MVT::v64i32, Legal);
>>>>>>
>>>>>> But I am getting following error:
>>>>>>
>>>>>> llvm-tblgen:
/utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>>>>>> llvm::TreePatternNode
*llvm::TreePattern::ParseTreePattern(llvm::Init
>>>>>> *, llvm::StringRef): Assertion `New->getNumTypes()
== 1 && "FIXME:
>>>>>> Unhandled"' failed.
>>>>>>
>>>>>> What is my mistake?
>>>>>>
>>>>>> Please help me.
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am trying to implement vector shuffle for v64i32.
Is the following
>>>>>>> correct?
>>>>>>>
>>>>>>>
>>>>>>> def VSHUFFLE_256B  : I<0xE8, MRMDestReg, (outs
VR_2048:$dst),
>>>>>>> (ins VR_2048:$src1,
VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
>>>>>>> $src2, $dst|$dst, $src1, $src2}",
>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32
VR_2048:$src1), (v64i32
>>>>>>> VR_2048:$src2)))]>, TA;
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed
<hahmed2305 at gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> i managed to get rid of above error for
VT.is2048BitVector()).
>>>>>>>>
>>>>>>>> this was implemented already.
>>>>>>>>
>>>>>>>> now will try define other vectors like
VT.is4096BitVector()).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed
<
>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thank you. actually i have to implement
both i32 and i64. so i
>>>>>>>>> implemented two instructions now one
broadcastS other broadcastD. Although
>>>>>>>>> while doing broadcast from memory to
register i was getting no such error
>>>>>>>>> with 1 instruction and other patterns i64,
i32 etc. but then also i
>>>>>>>>> implemented its 2 versions single and
double.
>>>>>>>>>
>>>>>>>>> Actually, i am trying to compile matrix
multiplication code for
>>>>>>>>> greater size vector. There i need to
include many new instructions in my
>>>>>>>>> backend like shuffle, gather etc. For now i
am getting the following error.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR
Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>,
Constant:i32<-1>, Constant:i32<-1>
>>>>>>>>> llc:
/lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue
>>>>>>>>> getOnesVector(llvm::EVT, const
llvm::X86Subtarget &, llvm::SelectionDAG &,
>>>>>>>>> const llvm::SDLoc &): Assertion
`(VT.is128BitVector() ||
>>>>>>>>> VT.is256BitVector() || VT.is512BitVector())
&& "Expected a 128/256/512-bit
>>>>>>>>> vector type"' failed.
>>>>>>>>>
>>>>>>>>>  i tried including is2048Bit Vector() and
others. also in
>>>>>>>>> vectortype.h i included these types for EVT
but was unable to compile
>>>>>>>>> backend and getting errors.
>>>>>>>>>
>>>>>>>>> Please help.
>>>>>>>>>
>>>>>>>>> Thank You
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig
Topper <
>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> You need a new instruction. And your
scalar register size needs
>>>>>>>>>> to match your vector element size. So
GR32 instead of GR64
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza
ahmed <
>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sorry to disturb,
>>>>>>>>>>> Now i want to implement instruction
to broadcast scalar register
>>>>>>>>>>> content to vector.
>>>>>>>>>>>
>>>>>>>>>>> like this;
>>>>>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I tried implementing it as follows;
>>>>>>>>>>>
>>>>>>>>>>> def BROADCASTR_256B : I<0x21,
MRMSrcReg, (outs VR_2048:$dst),
>>>>>>>>>>> (ins GR64:$src),
>>>>>>>>>>>                    
"BROADCASTR_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>                     [(set
VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>>>>>>  GR64:$src)))],
>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast
GR64:$src)),
>>>>>>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Is it fine? Also do i need to
define a new instruction for this
>>>>>>>>>>> like BROADCASTR_256B? can i use the
previous instruction BROADCAST_256B
>>>>>>>>>>> (the one that broadcast memory
scalar to vector) and just define new
>>>>>>>>>>> pattern?
>>>>>>>>>>>
>>>>>>>>>>> Please help.
>>>>>>>>>>>
>>>>>>>>>>> Thank You
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM,
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank You so much.
>>>>>>>>>>>>
>>>>>>>>>>>> Wao you are simply genius.
>>>>>>>>>>>> initially I didnt include load
in both the main instruction and
>>>>>>>>>>>> pattern so i included in both
as follows:
>>>>>>>>>>>> def BROADCAST_256B : I<0x31,
MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>                    
"BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>                     [(set
VR_2048:$dst, (v64i32 (X86VBroadcast (
>>>>>>>>>>>> loadi32 addr:$src))))],
>>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>
>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>> And it worked perfectly.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank You again.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM,
Craig Topper <
>>>>>>>>>>>> craig.topper at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Your pattern needs to be
>>>>>>>>>>>>>
>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47
PM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> it runs fine with
v64i32. but with the following pattern
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i am getting error.
>>>>>>>>>>>>>> What is wrong with this
pattern?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at
2:01 AM, hameeza ahmed <
>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> in x86 it is;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> def :
Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>>>>>>          
(VBROADCASTSSZm addr:$src)>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> mine is
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> def: Pat<(v64f32
(X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>> (BROADCAST_256B
addr:$src)>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017
at 1:59 AM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> for v16f32 it
is defined as;
>>>>>>>>>>>>>>>> :
Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))),
>>>>>>>>>>>>>>>>          
(VBROADCASTSSZr (EXTRACT_SUBREG (v16f32
>>>>>>>>>>>>>>>> VR512:$src),
sub_xmm))>;
>>>>>>>>>>>>>>>> which is
similar to mine.
>>>>>>>>>>>>>>>> Why its not
working then?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6,
2017 at 1:45 AM, Craig Topper <
>>>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You need a
pattern for v64f32 too.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Aug
5, 2017 at 1:37 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>> hahmed2305
at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> as you
said; these are instructions that i defined in
>>>>>>>>>>>>>>>>>>
instrinfo.td
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> def
BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>
VR_2048:$dst), (ins i2048mem:$src),
>>>>>>>>>>>>>>>>>>        
"BROADCAST_256B\t{$src, $dst|$dst,
>>>>>>>>>>>>>>>>>>
$src}",
>>>>>>>>>>>>>>>>>>        
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>>
(X86VBroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>        
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> def:
Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun,
Aug 6, 2017 at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I
did as you said;
>>>>>>>>>>>>>>>>>>> now
getting this error:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t63: v64f32
>>>>>>>>>>>>>>>>>>>
X86ISD::VBROADCAST t62
>>>>>>>>>>>>>>>>>>>  
t62: f32,ch = load<LD4[ConstantPool]> t0, t65,
>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>    
t65: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>    
t64: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>    
t8: i64 = undef
>>>>>>>>>>>>>>>>>>> In
function: stencil
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sun, Aug 6, 2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Add VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
added the setoperationaction line in isellowering.cpp.
>>>>>>>>>>>>>>>>>>>>>
now getting the following error.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
llc: /lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>>>>>>
*, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
What should I do?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Well first have you done this for your type
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR, v64i32,
>>>>>>>>>>>>>>>>>>>>>>
Custom);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
How to do this task??
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
It looks like X86TargetLowering::LowerBUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>>>>
is not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and included
>>>>>>>>>>>>>>>>>>>>>>>>>
broadcast instruction in instructioninfo.td. but
>>>>>>>>>>>>>>>>>>>>>>>>>
i made no changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32
>>>>>>>>>>>>>>>>>>>>>>>>>
BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>
undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>>>>>
constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>>>>>>
a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557              # float
>>>>>>>>>>>>>>>>>>>>>>>>>>>
0.200000003
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast for vector
>>>>>>>>>>>>>>>>>>>>>>>>>>>
of 64 elements.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>>>>>
VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>>>>>
(vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>> ~Craig
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/a4a6e843/attachment-0001.html>
Craig Topper via llvm-dev
2017-Aug-07  17:37 UTC
[llvm-dev] VBROADCAST Implementation Issues
You need this line from AVX512 code to tell the register allocation system that $src1/$dst and $mask/$mask_wb to use the same register. And the early clobber tells it that $dst and $src2 cannot use the same register. let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb" ~Craig On Mon, Aug 7, 2017 at 10:19 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> Thank You. Still getting errors.I have modified my instructions as you > said as follows: > > > def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb), > (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), > "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} > {${mask}}, $src2}", > [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 > (masked_gather (VR_2048:$src1), VK64WM:$mask, > addr:$src2)))], > IIC_MOV_MEM>, TA; > > def: Pat<(v64f32 (masked_gather (VR_2048:$src1), > (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, > addr:$src2)>; > > > Now getting this error: > > llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void > llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier(): > Assertion `numPhysicalOperands >= 2 + additionalOperands && > numPhysicalOperands <= 4 + additionalOperands && "Unexpected number of > operands for MRMSrcMemFrm"' failed. > > > > > > > > > On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at gmail.com> > wrote: > >> masked_gather takes 3 inputs. not just an address. See the AVX512 pattern >> is pasted earlier >> >> ~Craig >> >> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> Changed it to; >>> >>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64:$mask), >>> (ins i2048mem:$src), >>> "GATHER_256B\t{$src, {$dst}{${mask}}|${dst} >>> {${mask}}, $src}", >>> [(set VR_2048:$dst, VK64:$mask, (v64i32 >>> (masked_gather addr:$src)))], >>> IIC_MOV_MEM>, TA; >>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; >>> Now getting following error: >>> >>> Unhandled memory encoding VK64 >>> Unhandled memory encoding >>> UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347! >>> >>> What to do? >>> >>> >>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> i am getting this error >>>> error: Variable not defined: '_' >>>> for _.KRCWM >>>> what to do? >>>> >>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>> wrote: >>>> >>>>> Hello, >>>>> I did as you said, >>>>> >>>>> Please tell me whether the following correct now?? >>>>> >>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), >>>>> "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} >>>>> {${mask}}, $src2}"), >>>>> [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 >>>>> (GatherNode (VR_2048:$src1), _.KRCWM:$mask, >>>>> VR_2048:$src2))], >>>>> IIC_MOV_MEM>, TA; >>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>; >>>>> >>>>> Thank You >>>>> >>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com> >>>>> wrote: >>>>> >>>>>> masked_gather returns two results. The data and the modified mask. >>>>>> Note the $dst and the $mask_wb in the pattern below. >>>>>> >>>>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr, >>>>>> X86VectorVTInfo _, >>>>>> X86MemOperand memop, PatFrag GatherNode> { >>>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask >>>>>> $mask_wb", >>>>>> ExeDomain = _.ExeDomain in >>>>>> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, >>>>>> _.KRCWM:$mask_wb), >>>>>> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), >>>>>> !strconcat(OpcodeStr#_.Suffix, >>>>>> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"), >>>>>> [(set _.RC:$dst, _.KRCWM:$mask_wb, >>>>>> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, >>>>>> vectoraddr:$src2))]>, EVEX, EVEX_K, >>>>>> EVEX_CD8<_.EltSize, CD8VT1>; >>>>>> } >>>>>> >>>>>> ~Craig >>>>>> >>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> i want to implement gather for v64i32. i wrote following code. >>>>>>> >>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>> i2048mem:$src), >>>>>>> "GATHER_256B\t{$src, $dst|$dst, $src}", >>>>>>> [(set VR_2048:$dst, (v64i32 (masked_gather >>>>>>> addr:$src)))], >>>>>>> IIC_MOV_MEM>, TA; >>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), >>>>>>> (GATHER_256B addr:$src)>; >>>>>>> >>>>>>> Also i wrote this line in isellowering.h >>>>>>> >>>>>>> setOperationAction(ISD::MGATHER, >>>>>>> MVT::v64i32, Legal); >>>>>>> >>>>>>> But I am getting following error: >>>>>>> >>>>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >>>>>>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init >>>>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: >>>>>>> Unhandled"' failed. >>>>>>> >>>>>>> What is my mistake? >>>>>>> >>>>>>> Please help me. >>>>>>> >>>>>>> >>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com >>>>>>> > wrote: >>>>>>> >>>>>>>> I am trying to implement vector shuffle for v64i32. Is the >>>>>>>> following correct? >>>>>>>> >>>>>>>> >>>>>>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>>>>>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, >>>>>>>> $src2, $dst|$dst, $src1, $src2}", >>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 >>>>>>>> VR_2048:$src2)))]>, TA; >>>>>>>> >>>>>>>> Please help. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed < >>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>> >>>>>>>>> i managed to get rid of above error for VT.is2048BitVector()). >>>>>>>>> >>>>>>>>> this was implemented already. >>>>>>>>> >>>>>>>>> now will try define other vectors like VT.is4096BitVector()). >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Thank you. actually i have to implement both i32 and i64. so i >>>>>>>>>> implemented two instructions now one broadcastS other broadcastD. Although >>>>>>>>>> while doing broadcast from memory to register i was getting no such error >>>>>>>>>> with 1 instruction and other patterns i64, i32 etc. but then also i >>>>>>>>>> implemented its 2 versions single and double. >>>>>>>>>> >>>>>>>>>> Actually, i am trying to compile matrix multiplication code for >>>>>>>>>> greater size vector. There i need to include many new instructions in my >>>>>>>>>> backend like shuffle, gather etc. For now i am getting the following error. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>>>>>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>>>>>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>>>>>>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>>>>>>>>> vector type"' failed. >>>>>>>>>> >>>>>>>>>> i tried including is2048Bit Vector() and others. also in >>>>>>>>>> vectortype.h i included these types for EVT but was unable to compile >>>>>>>>>> backend and getting errors. >>>>>>>>>> >>>>>>>>>> Please help. >>>>>>>>>> >>>>>>>>>> Thank You >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper < >>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> You need a new instruction. And your scalar register size needs >>>>>>>>>>> to match your vector element size. So GR32 instead of GR64 >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Sorry to disturb, >>>>>>>>>>>> Now i want to implement instruction to broadcast scalar >>>>>>>>>>>> register content to vector. >>>>>>>>>>>> >>>>>>>>>>>> like this; >>>>>>>>>>>> vpbroadcastq zmm0, rsi >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I tried implementing it as follows; >>>>>>>>>>>> >>>>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), >>>>>>>>>>>> (ins GR64:$src), >>>>>>>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>>>>>> GR64:$src)))], >>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Is it fine? Also do i need to define a new instruction for this >>>>>>>>>>>> like BROADCASTR_256B? can i use the previous instruction BROADCAST_256B >>>>>>>>>>>> (the one that broadcast memory scalar to vector) and just define new >>>>>>>>>>>> pattern? >>>>>>>>>>>> >>>>>>>>>>>> Please help. >>>>>>>>>>>> >>>>>>>>>>>> Thank You >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thank You so much. >>>>>>>>>>>>> >>>>>>>>>>>>> Wao you are simply genius. >>>>>>>>>>>>> initially I didnt include load in both the main instruction >>>>>>>>>>>>> and pattern so i included in both as follows: >>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>>>>>>> (loadi32 addr:$src))))], >>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>> >>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>> And it worked perfectly. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank You again. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Your pattern needs to be >>>>>>>>>>>>>> >>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>> >>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> i am getting error. >>>>>>>>>>>>>>> What is wrong with this pattern? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> in x86 it is; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> mine is >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 >>>>>>>>>>>>>>>>> VR512:$src), sub_xmm))>; >>>>>>>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>>>>>>> Why its not working then? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>>>>>>>>> instrinfo.td >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 >>>>>>>>>>>>>>>>>>>> X86ISD::VBROADCAST t62 >>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, >>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> added the setoperationaction line in >>>>>>>>>>>>>>>>>>>>>> isellowering.cpp. now getting the following error. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, >>>>>>>>>>>>>>>>>>>>>>> Custom); >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR >>>>>>>>>>>>>>>>>>>>>>>>> is not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included >>>>>>>>>>>>>>>>>>>>>>>>>> broadcast instruction in instructioninfo.td. but >>>>>>>>>>>>>>>>>>>>>>>>>> i made no changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 >>>>>>>>>>>>>>>>>>>>>>>>>> BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with >>>>>>>>>>>>>>>>>>>>>>>>>>>> constant something like this; >>>>>>>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] + >>>>>>>>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for >>>>>>>>>>>>>>>>>>>>>>>>>>>> vector of 64 elements. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>> ~Craig >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/3fd36f0a/attachment.html>
hameeza ahmed via llvm-dev
2017-Aug-07  17:39 UTC
[llvm-dev] VBROADCAST Implementation Issues
Where to add this line? Sorry I didnt understand it. On Mon, Aug 7, 2017 at 10:37 PM, Craig Topper <craig.topper at gmail.com> wrote:> You need this line from AVX512 code to tell the register allocation system > that $src1/$dst and $mask/$mask_wb to use the same register. And the early > clobber tells it that $dst and $src2 cannot use the same register. > > let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb" > > ~Craig > > On Mon, Aug 7, 2017 at 10:19 AM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Thank You. Still getting errors.I have modified my instructions as you >> said as follows: >> >> >> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), >> "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} >> {${mask}}, $src2}", >> [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 >> (masked_gather (VR_2048:$src1), VK64WM:$mask, >> addr:$src2)))], >> IIC_MOV_MEM>, TA; >> >> def: Pat<(v64f32 (masked_gather (VR_2048:$src1), >> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, >> addr:$src2)>; >> >> >> Now getting this error: >> >> llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void >> llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier(): >> Assertion `numPhysicalOperands >= 2 + additionalOperands && >> numPhysicalOperands <= 4 + additionalOperands && "Unexpected number of >> operands for MRMSrcMemFrm"' failed. >> >> >> >> >> >> >> >> >> On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at gmail.com> >> wrote: >> >>> masked_gather takes 3 inputs. not just an address. See the AVX512 >>> pattern is pasted earlier >>> >>> ~Craig >>> >>> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> Changed it to; >>>> >>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64:$mask), >>>> (ins i2048mem:$src), >>>> "GATHER_256B\t{$src, {$dst}{${mask}}|${dst} >>>> {${mask}}, $src}", >>>> [(set VR_2048:$dst, VK64:$mask, (v64i32 >>>> (masked_gather addr:$src)))], >>>> IIC_MOV_MEM>, TA; >>>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>; >>>> Now getting following error: >>>> >>>> Unhandled memory encoding VK64 >>>> Unhandled memory encoding >>>> UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347! >>>> >>>> What to do? >>>> >>>> >>>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>> wrote: >>>> >>>>> i am getting this error >>>>> error: Variable not defined: '_' >>>>> for _.KRCWM >>>>> what to do? >>>>> >>>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> I did as you said, >>>>>> >>>>>> Please tell me whether the following correct now?? >>>>>> >>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), >>>>>> "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} >>>>>> {${mask}}, $src2}"), >>>>>> [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 >>>>>> (GatherNode (VR_2048:$src1), _.KRCWM:$mask, >>>>>> VR_2048:$src2))], >>>>>> IIC_MOV_MEM>, TA; >>>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>; >>>>>> >>>>>> Thank You >>>>>> >>>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> masked_gather returns two results. The data and the modified mask. >>>>>>> Note the $dst and the $mask_wb in the pattern below. >>>>>>> >>>>>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr, >>>>>>> X86VectorVTInfo _, >>>>>>> X86MemOperand memop, PatFrag GatherNode> { >>>>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask >>>>>>> $mask_wb", >>>>>>> ExeDomain = _.ExeDomain in >>>>>>> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, >>>>>>> _.KRCWM:$mask_wb), >>>>>>> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), >>>>>>> !strconcat(OpcodeStr#_.Suffix, >>>>>>> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"), >>>>>>> [(set _.RC:$dst, _.KRCWM:$mask_wb, >>>>>>> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, >>>>>>> vectoraddr:$src2))]>, EVEX, EVEX_K, >>>>>>> EVEX_CD8<_.EltSize, CD8VT1>; >>>>>>> } >>>>>>> >>>>>>> ~Craig >>>>>>> >>>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> i want to implement gather for v64i32. i wrote following code. >>>>>>>> >>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>>> i2048mem:$src), >>>>>>>> "GATHER_256B\t{$src, $dst|$dst, $src}", >>>>>>>> [(set VR_2048:$dst, (v64i32 (masked_gather >>>>>>>> addr:$src)))], >>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), >>>>>>>> (GATHER_256B addr:$src)>; >>>>>>>> >>>>>>>> Also i wrote this line in isellowering.h >>>>>>>> >>>>>>>> setOperationAction(ISD::MGATHER, >>>>>>>> MVT::v64i32, Legal); >>>>>>>> >>>>>>>> But I am getting following error: >>>>>>>> >>>>>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >>>>>>>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init >>>>>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: >>>>>>>> Unhandled"' failed. >>>>>>>> >>>>>>>> What is my mistake? >>>>>>>> >>>>>>>> Please help me. >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed < >>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>> >>>>>>>>> I am trying to implement vector shuffle for v64i32. Is the >>>>>>>>> following correct? >>>>>>>>> >>>>>>>>> >>>>>>>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>>>>>>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, >>>>>>>>> $src2, $dst|$dst, $src1, $src2}", >>>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32 >>>>>>>>> VR_2048:$src2)))]>, TA; >>>>>>>>> >>>>>>>>> Please help. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> i managed to get rid of above error for VT.is2048BitVector()). >>>>>>>>>> >>>>>>>>>> this was implemented already. >>>>>>>>>> >>>>>>>>>> now will try define other vectors like VT.is4096BitVector()). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed < >>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Thank you. actually i have to implement both i32 and i64. so i >>>>>>>>>>> implemented two instructions now one broadcastS other broadcastD. Although >>>>>>>>>>> while doing broadcast from memory to register i was getting no such error >>>>>>>>>>> with 1 instruction and other patterns i64, i32 etc. but then also i >>>>>>>>>>> implemented its 2 versions single and double. >>>>>>>>>>> >>>>>>>>>>> Actually, i am trying to compile matrix multiplication code for >>>>>>>>>>> greater size vector. There i need to include many new instructions in my >>>>>>>>>>> backend like shuffle, gather etc. For now i am getting the following error. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue >>>>>>>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &, >>>>>>>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() || >>>>>>>>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit >>>>>>>>>>> vector type"' failed. >>>>>>>>>>> >>>>>>>>>>> i tried including is2048Bit Vector() and others. also in >>>>>>>>>>> vectortype.h i included these types for EVT but was unable to compile >>>>>>>>>>> backend and getting errors. >>>>>>>>>>> >>>>>>>>>>> Please help. >>>>>>>>>>> >>>>>>>>>>> Thank You >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper < >>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> You need a new instruction. And your scalar register size needs >>>>>>>>>>>> to match your vector element size. So GR32 instead of GR64 >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Sorry to disturb, >>>>>>>>>>>>> Now i want to implement instruction to broadcast scalar >>>>>>>>>>>>> register content to vector. >>>>>>>>>>>>> >>>>>>>>>>>>> like this; >>>>>>>>>>>>> vpbroadcastq zmm0, rsi >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I tried implementing it as follows; >>>>>>>>>>>>> >>>>>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), >>>>>>>>>>>>> (ins GR64:$src), >>>>>>>>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (X86VBroadcast >>>>>>>>>>>>> GR64:$src)))], >>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>>>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Is it fine? Also do i need to define a new instruction for >>>>>>>>>>>>> this like BROADCASTR_256B? can i use the previous instruction >>>>>>>>>>>>> BROADCAST_256B (the one that broadcast memory scalar to vector) and just >>>>>>>>>>>>> define new pattern? >>>>>>>>>>>>> >>>>>>>>>>>>> Please help. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank You >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed < >>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thank You so much. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Wao you are simply genius. >>>>>>>>>>>>>> initially I didnt include load in both the main instruction >>>>>>>>>>>>>> and pattern so i included in both as follows: >>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), >>>>>>>>>>>>>> (ins i2048mem:$src), >>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>> (X86VBroadcast (loadi32 addr:$src))))], >>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>> >>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>> And it worked perfectly. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank You again. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Your pattern needs to be >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> i am getting error. >>>>>>>>>>>>>>>> What is wrong with this pattern? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> in x86 it is; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src), >>>>>>>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> mine is >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 >>>>>>>>>>>>>>>>>> VR512:$src), sub_xmm))>; >>>>>>>>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>>>>>>>> Why its not working then? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> as you said; these are instructions that i defined in >>>>>>>>>>>>>>>>>>>> instrinfo.td >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 >>>>>>>>>>>>>>>>>>>>> X86ISD::VBROADCAST t62 >>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, >>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> added the setoperationaction line in >>>>>>>>>>>>>>>>>>>>>>> isellowering.cpp. now getting the following error. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, >>>>>>>>>>>>>>>>>>>>>>>> Custom); >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR >>>>>>>>>>>>>>>>>>>>>>>>>> is not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included >>>>>>>>>>>>>>>>>>>>>>>>>>> broadcast instruction in instructioninfo.td. >>>>>>>>>>>>>>>>>>>>>>>>>>> but i made no changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 >>>>>>>>>>>>>>>>>>>>>>>>>>> BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, >>>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast" >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed >>>>>>>>>>>>>>>>>>>>>>>>>>>> <hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with >>>>>>>>>>>>>>>>>>>>>>>>>>>>> constant something like this; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + a[i-1][j] >>>>>>>>>>>>>>>>>>>>>>>>>>>>> + a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for >>>>>>>>>>>>>>>>>>>>>>>>>>>>> vector of 64 elements. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>> ~Craig >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/d557d8fd/attachment-0001.html>
hameeza ahmed via llvm-dev
2017-Aug-07  17:57 UTC
[llvm-dev] VBROADCAST Implementation Issues
Now getting this error: /lib/Target/X86/X86InstrInfo.td:3318:1: error: In GATHER_256B: Unrecognized node 'VR_2048'! On Mon, Aug 7, 2017 at 10:53 PM, Craig Topper <craig.topper at gmail.com> wrote:> You need to add EVEX_K and EVEX_4V to the end of your instruction after TA. > > ~Craig > > On Mon, Aug 7, 2017 at 10:47 AM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Thank You. Now getting this error: >> >> Unhandled memory encoding VK64WM >> Unhandled memory encoding >> >> >> On Mon, Aug 7, 2017 at 10:43 PM, Craig Topper <craig.topper at gmail.com> >> wrote: >> >>> Right before your "def GATHER_256B" add the 'let' line like so >>> >>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb" in >>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), >>> "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} >>> {${mask}}, $src2}", >>> [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 >>> (masked_gather (VR_2048:$src1), VK64WM:$mask, >>> addr:$src2)))], >>> IIC_MOV_MEM>, TA; >>> >>> def: Pat<(v64f32 (masked_gather (VR_2048:$src1), >>> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, >>> addr:$src2)>; >>> >>> ~Craig >>> >>> On Mon, Aug 7, 2017 at 10:39 AM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> Where to add this line? >>>> Sorry I didnt understand it. >>>> >>>> On Mon, Aug 7, 2017 at 10:37 PM, Craig Topper <craig.topper at gmail.com> >>>> wrote: >>>> >>>>> You need this line from AVX512 code to tell the register allocation >>>>> system that $src1/$dst and $mask/$mask_wb to use the same register. And the >>>>> early clobber tells it that $dst and $src2 cannot use the same register. >>>>> >>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb" >>>>> >>>>> ~Craig >>>>> >>>>> On Mon, Aug 7, 2017 at 10:19 AM, hameeza ahmed <hahmed2305 at gmail.com> >>>>> wrote: >>>>> >>>>>> Thank You. Still getting errors.I have modified my instructions as >>>>>> you said as follows: >>>>>> >>>>>> >>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), >>>>>> "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} >>>>>> {${mask}}, $src2}", >>>>>> [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 >>>>>> (masked_gather (VR_2048:$src1), VK64WM:$mask, >>>>>> addr:$src2)))], >>>>>> IIC_MOV_MEM>, TA; >>>>>> >>>>>> def: Pat<(v64f32 (masked_gather (VR_2048:$src1), >>>>>> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask, >>>>>> addr:$src2)>; >>>>>> >>>>>> >>>>>> Now getting this error: >>>>>> >>>>>> llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void >>>>>> llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier(): >>>>>> Assertion `numPhysicalOperands >= 2 + additionalOperands && >>>>>> numPhysicalOperands <= 4 + additionalOperands && "Unexpected number of >>>>>> operands for MRMSrcMemFrm"' failed. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> masked_gather takes 3 inputs. not just an address. See the AVX512 >>>>>>> pattern is pasted earlier >>>>>>> >>>>>>> ~Craig >>>>>>> >>>>>>> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Changed it to; >>>>>>>> >>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>>>> VK64:$mask), (ins i2048mem:$src), >>>>>>>> "GATHER_256B\t{$src, {$dst}{${mask}}|${dst} >>>>>>>> {${mask}}, $src}", >>>>>>>> [(set VR_2048:$dst, VK64:$mask, (v64i32 >>>>>>>> (masked_gather addr:$src)))], >>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), >>>>>>>> (GATHER_256B addr:$src)>; >>>>>>>> Now getting following error: >>>>>>>> >>>>>>>> Unhandled memory encoding VK64 >>>>>>>> Unhandled memory encoding >>>>>>>> UNREACHABLE executed at /utils/TableGen/X86Recognizabl >>>>>>>> eInstr.cpp:1347! >>>>>>>> >>>>>>>> What to do? >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> i am getting this error >>>>>>>>> error: Variable not defined: '_' >>>>>>>>> for _.KRCWM >>>>>>>>> what to do? >>>>>>>>> >>>>>>>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed < >>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> I did as you said, >>>>>>>>>> >>>>>>>>>> Please tell me whether the following correct now?? >>>>>>>>>> >>>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, >>>>>>>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), >>>>>>>>>> "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} >>>>>>>>>> {${mask}}, $src2}"), >>>>>>>>>> [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 >>>>>>>>>> (GatherNode (VR_2048:$src1), _.KRCWM:$mask, >>>>>>>>>> VR_2048:$src2))], >>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), >>>>>>>>>> (GATHER_256B addr:$src2)>; >>>>>>>>>> >>>>>>>>>> Thank You >>>>>>>>>> >>>>>>>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper < >>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> masked_gather returns two results. The data and the modified >>>>>>>>>>> mask. Note the $dst and the $mask_wb in the pattern below. >>>>>>>>>>> >>>>>>>>>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr, >>>>>>>>>>> X86VectorVTInfo _, >>>>>>>>>>> X86MemOperand memop, PatFrag >>>>>>>>>>> GatherNode> { >>>>>>>>>>> let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask >>>>>>>>>>> $mask_wb", >>>>>>>>>>> ExeDomain = _.ExeDomain in >>>>>>>>>>> def rm : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, >>>>>>>>>>> _.KRCWM:$mask_wb), >>>>>>>>>>> (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2), >>>>>>>>>>> !strconcat(OpcodeStr#_.Suffix, >>>>>>>>>>> "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, >>>>>>>>>>> $src2}"), >>>>>>>>>>> [(set _.RC:$dst, _.KRCWM:$mask_wb, >>>>>>>>>>> (GatherNode (_.VT _.RC:$src1), _.KRCWM:$mask, >>>>>>>>>>> vectoraddr:$src2))]>, EVEX, EVEX_K, >>>>>>>>>>> EVEX_CD8<_.EltSize, CD8VT1>; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> ~Craig >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed < >>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> i want to implement gather for v64i32. i wrote following code. >>>>>>>>>>>> >>>>>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins >>>>>>>>>>>> i2048mem:$src), >>>>>>>>>>>> "GATHER_256B\t{$src, $dst|$dst, $src}", >>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 (masked_gather >>>>>>>>>>>> addr:$src)))], >>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), >>>>>>>>>>>> (GATHER_256B addr:$src)>; >>>>>>>>>>>> >>>>>>>>>>>> Also i wrote this line in isellowering.h >>>>>>>>>>>> >>>>>>>>>>>> setOperationAction(ISD::MGATHER, >>>>>>>>>>>> MVT::v64i32, Legal); >>>>>>>>>>>> >>>>>>>>>>>> But I am getting following error: >>>>>>>>>>>> >>>>>>>>>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134: >>>>>>>>>>>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init >>>>>>>>>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: >>>>>>>>>>>> Unhandled"' failed. >>>>>>>>>>>> >>>>>>>>>>>> What is my mistake? >>>>>>>>>>>> >>>>>>>>>>>> Please help me. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed < >>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I am trying to implement vector shuffle for v64i32. Is the >>>>>>>>>>>>> following correct? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> def VSHUFFLE_256B : I<0xE8, MRMDestReg, (outs VR_2048:$dst), >>>>>>>>>>>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, >>>>>>>>>>>>> $src2, $dst|$dst, $src1, $src2}", >>>>>>>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), >>>>>>>>>>>>> (v64i32 VR_2048:$src2)))]>, TA; >>>>>>>>>>>>> >>>>>>>>>>>>> Please help. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed < >>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> i managed to get rid of above error for >>>>>>>>>>>>>> VT.is2048BitVector()). >>>>>>>>>>>>>> >>>>>>>>>>>>>> this was implemented already. >>>>>>>>>>>>>> >>>>>>>>>>>>>> now will try define other vectors like VT.is4096BitVector()). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed < >>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you. actually i have to implement both i32 and i64. so >>>>>>>>>>>>>>> i implemented two instructions now one broadcastS other broadcastD. >>>>>>>>>>>>>>> Although while doing broadcast from memory to register i was getting no >>>>>>>>>>>>>>> such error with 1 instruction and other patterns i64, i32 etc. but then >>>>>>>>>>>>>>> also i implemented its 2 versions single and double. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Actually, i am trying to compile matrix multiplication code >>>>>>>>>>>>>>> for greater size vector. There i need to include many new instructions in >>>>>>>>>>>>>>> my backend like shuffle, gather etc. For now i am getting the following >>>>>>>>>>>>>>> error. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, >>>>>>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1> >>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: >>>>>>>>>>>>>>> llvm::SDValue getOnesVector(llvm::EVT, const llvm::X86Subtarget &, >>>>>>>>>>>>>>> llvm::SelectionDAG &, const llvm::SDLoc &): Assertion `(VT.is128BitVector() >>>>>>>>>>>>>>> || VT.is256BitVector() || VT.is512BitVector()) && "Expected a >>>>>>>>>>>>>>> 128/256/512-bit vector type"' failed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> i tried including is2048Bit Vector() and others. also in >>>>>>>>>>>>>>> vectortype.h i included these types for EVT but was unable to compile >>>>>>>>>>>>>>> backend and getting errors. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please help. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper < >>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You need a new instruction. And your scalar register size >>>>>>>>>>>>>>>> needs to match your vector element size. So GR32 instead of GR64 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed < >>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sorry to disturb, >>>>>>>>>>>>>>>>> Now i want to implement instruction to broadcast scalar >>>>>>>>>>>>>>>>> register content to vector. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> like this; >>>>>>>>>>>>>>>>> vpbroadcastq zmm0, rsi >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I tried implementing it as follows; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs >>>>>>>>>>>>>>>>> VR_2048:$dst), (ins GR64:$src), >>>>>>>>>>>>>>>>> "BROADCASTR_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>> (X86VBroadcast GR64:$src)))], >>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)), >>>>>>>>>>>>>>>>> (BROADCASTR_256B GR64:$src)>; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is it fine? Also do i need to define a new instruction for >>>>>>>>>>>>>>>>> this like BROADCASTR_256B? can i use the previous instruction >>>>>>>>>>>>>>>>> BROADCAST_256B (the one that broadcast memory scalar to vector) and just >>>>>>>>>>>>>>>>> define new pattern? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please help. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed < >>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank You so much. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Wao you are simply genius. >>>>>>>>>>>>>>>>>> initially I didnt include load in both the main >>>>>>>>>>>>>>>>>> instruction and pattern so i included in both as follows: >>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, $dst|$dst, >>>>>>>>>>>>>>>>>> $src}", >>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>> (X86VBroadcast (loadi32 addr:$src))))], >>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>> And it worked perfectly. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank You again. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper < >>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Your pattern needs to be >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))), >>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> it runs fine with v64i32. but with the following pattern >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> i am getting error. >>>>>>>>>>>>>>>>>>>> What is wrong with this pattern? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> in x86 it is; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 >>>>>>>>>>>>>>>>>>>>> addr:$src), >>>>>>>>>>>>>>>>>>>>> (VBROADCASTSSZm addr:$src)>; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> mine is >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> for v16f32 it is defined as; >>>>>>>>>>>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))), >>>>>>>>>>>>>>>>>>>>>> (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 >>>>>>>>>>>>>>>>>>>>>> VR512:$src), sub_xmm))>; >>>>>>>>>>>>>>>>>>>>>> which is similar to mine. >>>>>>>>>>>>>>>>>>>>>> Why its not working then? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> You need a pattern for v64f32 too. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> as you said; these are instructions that i defined >>>>>>>>>>>>>>>>>>>>>>>> in instrinfo.td >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs >>>>>>>>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src), >>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_256B\t{$src, >>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>> [(set VR_2048:$dst, (v64i32 >>>>>>>>>>>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)), >>>>>>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>; >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I did as you said; >>>>>>>>>>>>>>>>>>>>>>>>> now getting this error: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 >>>>>>>>>>>>>>>>>>>>>>>>> X86ISD::VBROADCAST t62 >>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, t65, >>>>>>>>>>>>>>>>>>>>>>>>> undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>> t65: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed < >>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> added the setoperationaction line in >>>>>>>>>>>>>>>>>>>>>>>>>>> isellowering.cpp. now getting the following error. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801: >>>>>>>>>>>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode >>>>>>>>>>>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion >>>>>>>>>>>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) && >>>>>>>>>>>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> What should I do? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper < >>>>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Well first have you done this for your type >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, >>>>>>>>>>>>>>>>>>>>>>>>>>>> Custom); >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed >>>>>>>>>>>>>>>>>>>>>>>>>>>> <hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> How to do this task?? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper >>>>>>>>>>>>>>>>>>>>>>>>>>>>> <craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is not creating a broadcast node for your wider vector type. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ahmed <hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> broadcast instruction in instructioninfo.td. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but i made no changes in isellowering.cpp file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Still getting the following error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64, undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64, undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t8: i64 = undef >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t62: f32,ch = load<LD4[ConstantPool]> t0, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64, undef:i64 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t64: i64 = X86ISD::Wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t63: i64 = TargetConstantPool<float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ................. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How to resolve this? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help.. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Topper <craig.topper at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "vbroadcast" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ahmed <hahmed2305 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with constant something like this; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float con=0.2; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (k = 0; k < N; k++) { >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (i = 1; i <= N-2; i++) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for (j = 1; j <= N-2; j++) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> b[i][j] = con * (a[i][j] + >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> %22 = fmul <64 x float> %21, <float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557 # float >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> vector of 64 elements. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i tried the following code; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (outs VREGG:$dst), (ins immem:$src), >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "BROADCAST_DWORD\t{$src, $dst|$dst, $src}", >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [(set VREGG:$dst, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (v64i32 (vbroadcast addr:$src)))], >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IIC_MOV_MEM>, TA; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> ~Craig >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/ae7e356e/attachment-0001.html>