thr3ads.net - llvm dev - [llvm-dev] VBROADCAST Implementation Issues [Aug 2017]

If this information is useful, please help other people find it:
Share via:
hameeza ahmed via llvm-dev
2017-Aug-07 18:12 UTC
[llvm-dev] VBROADCAST Implementation Issues

Where to create it? In IR? How to achieve this?

On Mon, Aug 7, 2017 at 11:10 PM, Craig Topper <craig.topper at gmail.com>
wrote:
> I don't think a standalone pattern outside of an instruction can
support
> multiple return values.
>
> So you'll need to create a separate FP gather instruction.
>
> ~Craig
>
> On Mon, Aug 7, 2017 at 11:08 AM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> Ok I removed the parenthesis; now my code looks
>>
>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
>> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask,  i2048mem:$src2),
>>                     "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
>> {${mask}}, $src2}",
>>                     [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
>> (masked_gather  VR_2048:$src1, VK64WM:$mask,
>>                      addr:$src2)))],
>>                     IIC_MOV_MEM>, EVEX, EVEX_K, TA;
>>
>> def: Pat<(v64f32 (masked_gather (VR_2048:$src1),
>> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1,
VK64WM:$mask,
>> addr:$src2)>;
>>
>> Now getting this error;
>>
>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init
*,
>> llvm::StringRef): Assertion `New->getNumTypes() == 1 &&
"FIXME: Unhandled"'
>> failed.
>>
>>
>> What to do?
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Aug 7, 2017 at 11:01 PM, Craig Topper <craig.topper at
gmail.com>
>> wrote:
>>
>>> Remove the parentheses around "(VR_2048:$src1)"
>>>
>>> ~Craig
>>>
>>> On Mon, Aug 7, 2017 at 10:57 AM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> Now getting this error:
>>>> /lib/Target/X86/X86InstrInfo.td:3318:1: error: In GATHER_256B:
>>>> Unrecognized node 'VR_2048'!
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Aug 7, 2017 at 10:53 PM, Craig Topper <craig.topper
at gmail.com>
>>>> wrote:
>>>>
>>>>> You need to add EVEX_K and EVEX_4V to the end of your
instruction
>>>>> after TA.
>>>>>
>>>>> ~Craig
>>>>>
>>>>> On Mon, Aug 7, 2017 at 10:47 AM, hameeza ahmed
<hahmed2305 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thank You. Now getting this error:
>>>>>>
>>>>>> Unhandled memory encoding VK64WM
>>>>>> Unhandled memory encoding
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 7, 2017 at 10:43 PM, Craig Topper
<craig.topper at gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Right before your "def GATHER_256B" add
the 'let' line like so
>>>>>>>
>>>>>>> let Constraints = "@earlyclobber $dst, $src1 =
$dst, $mask >>>>>>> $mask_wb" in
>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs
VR_2048:$dst,
>>>>>>> VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask,
i2048mem:$src2),
>>>>>>>                     "GATHER_256B\t{$src2,
{$dst} {${mask}}|${dst}
>>>>>>> {${mask}}, $src2}",
>>>>>>>                     [(set VR_2048:$dst,
VK64WM:$mask_wb, (v64i32
>>>>>>> (masked_gather  (VR_2048:$src1), VK64WM:$mask,
>>>>>>>                      addr:$src2)))],
>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>
>>>>>>> def: Pat<(v64f32 (masked_gather (VR_2048:$src1),
>>>>>>> (VK64WM:$mask),(addr:$src2))), (GATHER_256B
VR_2048:$src1, VK64WM:$mask,
>>>>>>> addr:$src2)>;
>>>>>>>
>>>>>>> ~Craig
>>>>>>>
>>>>>>> On Mon, Aug 7, 2017 at 10:39 AM, hameeza ahmed
<hahmed2305 at gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Where to add this line?
>>>>>>>> Sorry I didnt understand it.
>>>>>>>>
>>>>>>>> On Mon, Aug 7, 2017 at 10:37 PM, Craig Topper
<
>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> You need this line from AVX512 code to tell
the register
>>>>>>>>> allocation system that $src1/$dst and
$mask/$mask_wb to use the same
>>>>>>>>> register. And the early clobber tells it
that $dst and $src2 cannot use the
>>>>>>>>> same register.
>>>>>>>>>
>>>>>>>>> let Constraints = "@earlyclobber $dst,
$src1 = $dst, $mask >>>>>>>>> $mask_wb"
>>>>>>>>>
>>>>>>>>> ~Craig
>>>>>>>>>
>>>>>>>>> On Mon, Aug 7, 2017 at 10:19 AM, hameeza
ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thank You. Still getting errors.I have
modified my instructions
>>>>>>>>>> as you said as follows:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem,
(outs VR_2048:$dst,
>>>>>>>>>> VK64WM:$mask_wb), (ins VR_2048:$src1,
VK64WM:$mask,  i2048mem:$src2),
>>>>>>>>>>                    
"GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
>>>>>>>>>> {${mask}}, $src2}",
>>>>>>>>>>                     [(set VR_2048:$dst,
VK64WM:$mask_wb, (v64i32
>>>>>>>>>> (masked_gather  (VR_2048:$src1),
VK64WM:$mask,
>>>>>>>>>>                      addr:$src2)))],
>>>>>>>>>>                     IIC_MOV_MEM>,
TA;
>>>>>>>>>>
>>>>>>>>>> def: Pat<(v64f32 (masked_gather
(VR_2048:$src1),
>>>>>>>>>> (VK64WM:$mask),(addr:$src2))),
(GATHER_256B VR_2048:$src1, VK64WM:$mask,
>>>>>>>>>> addr:$src2)>;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Now getting this error:
>>>>>>>>>>
>>>>>>>>>> llvm-tblgen:
/utils/TableGen/X86RecognizableInstr.cpp:687: void
>>>>>>>>>>
llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier():
>>>>>>>>>> Assertion `numPhysicalOperands >= 2
+ additionalOperands &&
>>>>>>>>>> numPhysicalOperands <= 4 +
additionalOperands && "Unexpected number of
>>>>>>>>>> operands for MRMSrcMemFrm"'
failed.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 7, 2017 at 8:23 PM, Craig
Topper <
>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> masked_gather takes 3 inputs. not
just an address. See the
>>>>>>>>>>> AVX512 pattern is pasted earlier
>>>>>>>>>>>
>>>>>>>>>>> ~Craig
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 7, 2017 at 1:54 AM,
hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Changed it to;
>>>>>>>>>>>>
>>>>>>>>>>>> def GATHER_256B : I<0x68,
MRMSrcMem, (outs VR_2048:$dst,
>>>>>>>>>>>> VK64:$mask), (ins
i2048mem:$src),
>>>>>>>>>>>>                    
"GATHER_256B\t{$src, {$dst}{${mask}}|${dst}
>>>>>>>>>>>> {${mask}}, $src}",
>>>>>>>>>>>>                     [(set
VR_2048:$dst, VK64:$mask, (v64i32
>>>>>>>>>>>> (masked_gather addr:$src)))],
>>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>> def: Pat<(v64f32
(masked_gather addr:$src)),
>>>>>>>>>>>> (GATHER_256B addr:$src)>;
>>>>>>>>>>>> Now getting following error:
>>>>>>>>>>>>
>>>>>>>>>>>> Unhandled memory encoding VK64
>>>>>>>>>>>> Unhandled memory encoding
>>>>>>>>>>>> UNREACHABLE executed at
/utils/TableGen/X86Recognizabl
>>>>>>>>>>>> eInstr.cpp:1347!
>>>>>>>>>>>>
>>>>>>>>>>>> What to do?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 7, 2017 at 1:20 PM,
hameeza ahmed <
>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> i am getting this error
>>>>>>>>>>>>> error: Variable not
defined: '_'
>>>>>>>>>>>>> for _.KRCWM
>>>>>>>>>>>>> what to do?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 7, 2017 at 1:13
PM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>> I did as you said,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please tell me whether
the following correct now??
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def GATHER_256B :
I<0x68, MRMSrcMem, (outs VR_2048:$dst,
>>>>>>>>>>>>>> _.KRCWM:$mask_wb),
(VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
>>>>>>>>>>>>>>                    
"GATHER_256B\t{$src2,
>>>>>>>>>>>>>> {$dst}{${mask}}|${dst}
{${mask}}, $src2}"),
>>>>>>>>>>>>>>                    
[(set VR_2048:$dst, _.KRCWM:$mask_wb,
>>>>>>>>>>>>>> (v64i32 (GatherNode 
(VR_2048:$src1), _.KRCWM:$mask,
>>>>>>>>>>>>>>                     
VR_2048:$src2))],
>>>>>>>>>>>>>>                    
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>> def: Pat<(v64f32
(GatherNode addr:$src2)),
>>>>>>>>>>>>>> (GATHER_256B
addr:$src2)>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank You
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Aug 7, 2017 at
2:57 AM, Craig Topper <
>>>>>>>>>>>>>> craig.topper at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> masked_gather
returns two results. The data and the modified
>>>>>>>>>>>>>>> mask. Note the $dst
and the $mask_wb in the pattern below.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> multiclass
avx512_gather<bits<8> opc, string OpcodeStr,
>>>>>>>>>>>>>>> X86VectorVTInfo _,
>>>>>>>>>>>>>>>                    
X86MemOperand memop, PatFrag
>>>>>>>>>>>>>>> GatherNode> {
>>>>>>>>>>>>>>>   let Constraints =
"@earlyclobber $dst, $src1 = $dst, $mask
>>>>>>>>>>>>>>> = $mask_wb",
>>>>>>>>>>>>>>>       ExeDomain =
_.ExeDomain in
>>>>>>>>>>>>>>>   def rm  :
AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst,
>>>>>>>>>>>>>>> _.KRCWM:$mask_wb),
>>>>>>>>>>>>>>>             (ins
_.RC:$src1, _.KRCWM:$mask, memop:$src2),
>>>>>>>>>>>>>>>            
!strconcat(OpcodeStr#_.Suffix,
>>>>>>>>>>>>>>>            
"\t{$src2, ${dst} {${mask}}|${dst} {${mask}},
>>>>>>>>>>>>>>> $src2}"),
>>>>>>>>>>>>>>>             [(set
_.RC:$dst, _.KRCWM:$mask_wb,
>>>>>>>>>>>>>>>              
(GatherNode  (_.VT _.RC:$src1), _.KRCWM:$mask,
>>>>>>>>>>>>>>>                    
vectoraddr:$src2))]>, EVEX, EVEX_K,
>>>>>>>>>>>>>>>             
EVEX_CD8<_.EltSize, CD8VT1>;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017
at 2:21 PM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> i want to
implement gather for v64i32. i wrote following
>>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> def GATHER_256B
: I<0x68, MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>>>>>> (ins
i2048mem:$src),
>>>>>>>>>>>>>>>>                
"GATHER_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>>>>>                
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>> (masked_gather
addr:$src)))],
>>>>>>>>>>>>>>>>                
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>> def:
Pat<(v64f32 (masked_gather addr:$src)),
>>>>>>>>>>>>>>>> (GATHER_256B
addr:$src)>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also i wrote
this line in isellowering.h
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>              
setOperationAction(ISD::MGATHER,
>>>>>>>>>>>>>>>> MVT::v64i32,
Legal);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> But I am
getting following error:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> llvm-tblgen:
/utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>>>>>>>>>>>>>>>>
llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init
>>>>>>>>>>>>>>>> *,
llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME:
>>>>>>>>>>>>>>>>
Unhandled"' failed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What is my
mistake?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please help me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Aug 7,
2017 at 12:03 AM, hameeza ahmed <
>>>>>>>>>>>>>>>> hahmed2305 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am trying
to implement vector shuffle for v64i32. Is the
>>>>>>>>>>>>>>>>> following
correct?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> def
VSHUFFLE_256B  : I<0xE8, MRMDestReg, (outs
>>>>>>>>>>>>>>>>>
VR_2048:$dst),
>>>>>>>>>>>>>>>>> (ins
VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
>>>>>>>>>>>>>>>>> $src2,
$dst|$dst, $src1, $src2}",
>>>>>>>>>>>>>>>>> [(set
VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1),
>>>>>>>>>>>>>>>>> (v64i32
VR_2048:$src2)))]>, TA;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please
help.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Aug
6, 2017 at 11:48 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>> hahmed2305
at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> i
managed to get rid of above error for
>>>>>>>>>>>>>>>>>>
VT.is2048BitVector()).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> this
was implemented already.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> now
will try define other vectors like
>>>>>>>>>>>>>>>>>>
VT.is4096BitVector()).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun,
Aug 6, 2017 at 11:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Thank you. actually i have to implement both i32 and
>>>>>>>>>>>>>>>>>>>
i64. so i implemented two instructions now one broadcastS other broadcastD.
>>>>>>>>>>>>>>>>>>>
Although while doing broadcast from memory to register i was getting no
>>>>>>>>>>>>>>>>>>>
such error with 1 instruction and other patterns i64, i32 etc. but then
>>>>>>>>>>>>>>>>>>>
also i implemented its 2 versions single and double.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Actually, i am trying to compile matrix multiplication
>>>>>>>>>>>>>>>>>>>
code for greater size vector. There i need to include many new instructions
>>>>>>>>>>>>>>>>>>> in
my backend like shuffle, gather etc. For now i am getting the following
>>>>>>>>>>>>>>>>>>>
error.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Legalizing: t208: v64i32 = BUILD_VECTOR
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>,
>>>>>>>>>>>>>>>>>>>
Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
Constant:i32<-1>
>>>>>>>>>>>>>>>>>>>
llc: /lib/Target/X86/X86ISelLowering.cpp:5525:
>>>>>>>>>>>>>>>>>>>
llvm::SDValue getOnesVector(llvm::EVT, const llvm::X86Subtarget &,
>>>>>>>>>>>>>>>>>>>
llvm::SelectionDAG &, const llvm::SDLoc &): Assertion
`(VT.is128BitVector()
>>>>>>>>>>>>>>>>>>> ||
VT.is256BitVector() || VT.is512BitVector()) && "Expected a
>>>>>>>>>>>>>>>>>>>
128/256/512-bit vector type"' failed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  i
tried including is2048Bit Vector() and others. also
>>>>>>>>>>>>>>>>>>> in
vectortype.h i included these types for EVT but was unable to compile
>>>>>>>>>>>>>>>>>>>
backend and getting errors.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Please help.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
You need a new instruction. And your scalar register
>>>>>>>>>>>>>>>>>>>>
size needs to match your vector element size. So GR32 instead of GR64
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Sorry to disturb,
>>>>>>>>>>>>>>>>>>>>>
Now i want to implement instruction to broadcast
>>>>>>>>>>>>>>>>>>>>>
scalar register content to vector.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
like this;
>>>>>>>>>>>>>>>>>>>>>
vpbroadcastq zmm0, rsi
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I tried implementing it as follows;
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs
>>>>>>>>>>>>>>>>>>>>>
VR_2048:$dst), (ins GR64:$src),
>>>>>>>>>>>>>>>>>>>>>
"BROADCASTR_256B\t{$src,
>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>
(X86VBroadcast  GR64:$src)))],
>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast GR64:$src)),
>>>>>>>>>>>>>>>>>>>>>
(BROADCASTR_256B GR64:$src)>;
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Is it fine? Also do i need to define a new instruction
>>>>>>>>>>>>>>>>>>>>>
for this like BROADCASTR_256B? can i use the previous instruction
>>>>>>>>>>>>>>>>>>>>>
BROADCAST_256B (the one that broadcast memory scalar to vector) and just
>>>>>>>>>>>>>>>>>>>>>
define new pattern?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Please help.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Thank You so much.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Wao you are simply genius.
>>>>>>>>>>>>>>>>>>>>>>
initially I didnt include load in both the main
>>>>>>>>>>>>>>>>>>>>>>
instruction and pattern so i included in both as follows:
>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>
VR_2048:$dst), (ins i2048mem:$src),
>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_256B\t{$src,
>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>
(X86VBroadcast (loadi32 addr:$src))))],
>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>>>>
And it worked perfectly.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Thank You again.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Your pattern needs to be
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast (loadf32
>>>>>>>>>>>>>>>>>>>>>>>
addr:$src))), (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
it runs fine with v64i32. but with the following
>>>>>>>>>>>>>>>>>>>>>>>>
pattern
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
i am getting error.
>>>>>>>>>>>>>>>>>>>>>>>>
What is wrong with this pattern?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
in x86 it is;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
def : Pat<(int_x86_avx512_vbroadcast_ss_512
>>>>>>>>>>>>>>>>>>>>>>>>>
addr:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>
(VBROADCASTSSZm addr:$src)>;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
mine is
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
for v16f32 it is defined as;
>>>>>>>>>>>>>>>>>>>>>>>>>>
: Pat<(v16f32 (X86VBroadcast (v16f32
>>>>>>>>>>>>>>>>>>>>>>>>>>
VR512:$src))),
>>>>>>>>>>>>>>>>>>>>>>>>>>
(VBROADCASTSSZr (EXTRACT_SUBREG (v16f32
>>>>>>>>>>>>>>>>>>>>>>>>>>
VR512:$src), sub_xmm))>;
>>>>>>>>>>>>>>>>>>>>>>>>>>
which is similar to mine.
>>>>>>>>>>>>>>>>>>>>>>>>>>
Why its not working then?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
You need a pattern for v64f32 too.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
as you said; these are instructions that i
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
defined in instrinfo.td
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
VR_2048:$dst), (ins i2048mem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_256B\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
$dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(X86VBroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I did as you said;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
now getting this error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t63: v64f32
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
X86ISD::VBROADCAST t62
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]> t0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t65, undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t65: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Add VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
added the setoperationaction line in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
isellowering.cpp. now getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
llc: /lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
*, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
`(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
What should I do?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:36 AM, Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Topper <craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Well first have you done this for your type
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
setOperationAction(ISD::BUILD_VECTOR,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
v64i32, Custom);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:29 PM, hameeza
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ahmed <hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
How to do this task??
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sun, Aug 6, 2017 at 12:24 AM, Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Topper <craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
It looks like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
X86TargetLowering::LowerBUILD_VECTOR is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 12:19 PM, hameeza
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ahmed <hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I made your mentioned changes and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
included broadcast instruction in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
instructioninfo.td. but i made no
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
LLVM ERROR: Cannot select: t29: v64f32
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62, t62, t62, t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t0, t64, undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t0, t64, undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t62: f32,ch = load<LD4[ConstantPool]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t0, t64, undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
t63: i64
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TargetConstantPool<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.................
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
How to resolve this?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Please help..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 11:19 PM, Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Topper <craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
You need to use X86VBroadcast not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Sat, Aug 5, 2017 at 10:50 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
hameeza ahmed <hahmed2305 at gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
i have a c code which multiplies
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vector with constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
b[i][j] = con * (a[i][j] +
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.long 1045220557              # float
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0.200000003
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vbroadcastss zmm1, dword ptr [rip +
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
how does it lowered the above IR code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
into vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
What would be the pattern here to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
match?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I want to implement similar broadcast
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for vector of 64 elements.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
def BROADCAST_DWORD : I<0x60,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
MRMSrcMem, (outs VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"BROADCAST_DWORD\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[(set VREGG:$dst,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(v64i32 (vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Please help me. I am stuck at this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank You
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
--
>>>>>>>>>>>>>>>>>>>>
~Craig
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/a56bfdb2/attachment-0001.html>
llvm dev - Aug 2017 - VBROADCAST Implementation Issues

[llvm-dev] VBROADCAST Implementation Issues