thr3ads.net - llvm dev - [llvm-dev] KNL Vectorization with larger vector width [Jul 2018]

If this information is useful, please help other people find it:
Share via:

hameeza ahmed via llvm-dev

2018-Jul-24 17:34 UTC

[llvm-dev] KNL Vectorization with larger vector width

Hello,
I need help here. I am able to adjust the vector width through
WidestRegister value. When number of iterations=31 and  I set vector
width=32 it gives <16xi32> and <8xi32> instructions.

However if i replicate same behavior with number of iterations=63 and  I
set vector width=64, no vector instructions are emitted. it should do as
previous and gives <32xi32> and <16xi32> vector instructions.

How to do this?
What adjustments are needed?

Please help

I m trying this but unable to solve.

Thank You

On Tue, Jul 24, 2018 at 4:44 PM, hameeza ahmed <hahmed2305 at gmail.com>
wrote:
> Hello,
> Do i need to change following function;
>
> unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) {
>   if (Vector && !ST->hasSSE1())
>     return 0;
>
>   if (ST->is64Bit()) {
>     if (Vector && ST->hasAVX512())
>       return 32;
>     return 16;
>   }
>   return 8;
> }
>
> to
>
> if (ST->is2048Bit()) {
>     if (Vector && ST->hasAVX512())
>       return 1024;
>     return 512;
>   }
>   return 256;
>
>
> please help...
>
> On Tue, Jul 24, 2018 at 5:05 AM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> Thank You.
>> Right now to see the effect i did following changes;
>>
>> unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) {
>>   if (Vector) {
>>     if (ST->hasAVX512())
>>       return 65536;
>>
>> here i changed 512 to 65536. Then in loopvectorize.cpp i did following;
>>
>>  assert(MaxVectorSize <= 2048 && "Did not expect to
pack so many elements"
>>                                 " into one vector!");
>>
>> changed 64 to 2048.
>>
>> It runs fine. I can see in IR <2048xi32> or <1024xi64>
emission.
>>
>> But I cannot see the vector mix like in default knl if iterations=15 we
>> see 1<8xi32> and rest scalar. so here when i keep iteration=2047
i get all
>> scalar why is that so? similarly in polly as well i cant see vector
mixes
>> like its happening for KNL it emits <v16i32>,
<v8i32>,<v4i32>...so here it
>> should emit recursively like <v2048i32> <v1024i32>
<v512i32>.....<v32i32>
>>
>> how to do this?
>>
>> What am i missing here?
>> what further changes do i need to make?
>>
>> Please help...
>>
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 1:52 AM, Friedman, Eli <efriedma at
codeaurora.org>
>> wrote:
>>
>>> On 7/23/2018 12:40 PM, hameeza ahmed wrote:
>>>
>>>> Thank You. I got it. Version issue.
>>>>
>>>> TTI.getRegisterBitWidth(true)
>>>>
>>>> How to put my target machine info in TTI?
>>>>
>>>
>>> Each target has an implementation, e.g.
X86TTIImpl::getRegisterBitWidth.
>>>
>>>
>>> -Eli
>>>
>>> --
>>> Employee of Qualcomm Innovation Center, Inc.
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a
>>> Linux Foundation Collaborative Project
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180724/b303f357/attachment.html>

Friedman, Eli via llvm-dev

2018-Jul-24 17:54 UTC

head link

[llvm-dev] KNL Vectorization with larger vector width

There currently isn't any implementation of epilog loop vectorization 
(see https://reviews.llvm.org/D30247, but it never got merged).

In some cases you might get lucky with loop unrolling plus SLP 
vectorization.

-Eli

On 7/24/2018 10:34 AM, hameeza ahmed wrote:> Hello,
> I need help here. I am able to adjust the vector width through 
> WidestRegister value. When number of iterations=31 and  I set vector 
> width=32 it gives <16xi32> and <8xi32> instructions.
>
> However if i replicate same behavior with number of iterations=63 and  
> I set vector width=64, no vector instructions are emitted. it should 
> do as previous and gives <32xi32> and <16xi32> vector
instructions.
>
> How to do this?
> What adjustments are needed?
>
> Please help
>
> I m trying this but unable to solve.
>
> Thank You
>
> On Tue, Jul 24, 2018 at 4:44 PM, hameeza ahmed <hahmed2305 at gmail.com 
> <mailto:hahmed2305 at gmail.com>> wrote:
>
>     Hello,
>     Do i need to change following function;
>
>     unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) {
>       if (Vector && !ST->hasSSE1())
>         return 0;
>
>       if (ST->is64Bit()) {
>         if (Vector && ST->hasAVX512())
>           return 32;
>         return 16;
>       }
>       return 8;
>     }
>
>     to
>
>     if (ST->is2048Bit()) {
>         if (Vector && ST->hasAVX512())
>           return 1024;
>         return 512;
>       }
>       return 256;
>
>
>     please help...
>
>     On Tue, Jul 24, 2018 at 5:05 AM, hameeza ahmed
>     <hahmed2305 at gmail.com <mailto:hahmed2305 at gmail.com>>
wrote:
>
>         Thank You.
>         Right now to see the effect i did following changes;
>
>         unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) {
>           if (Vector) {
>             if (ST->hasAVX512())
>               return 65536;
>
>         here i changed 512 to 65536. Then in loopvectorize.cpp i did
>         following;
>
>          assert(MaxVectorSize <= 2048 && "Did not expect to
pack so
>         many elements"
>           " into one vector!");
>
>         changed 64 to 2048.
>
>         It runs fine. I can see in IR <2048xi32> or <1024xi64>
emission.
>
>         But I cannot see the vector mix like in default knl if
>         iterations=15 we see 1<8xi32> and rest scalar. so here when i
>         keep iteration=2047 i get all scalar why is that so? similarly
>         in polly as well i cant see vector mixes like its happening
>         for KNL it emits <v16i32>, <v8i32>,<v4i32>...so
here it should
>         emit recursively like <v2048i32> <v1024i32>
<v512i32>.....<v32i32>
>
>         how to do this?
>
>         What am i missing here?
>         what further changes do i need to make?
>
>         Please help...
>
>
>
>
>
>
>         On Tue, Jul 24, 2018 at 1:52 AM, Friedman, Eli
>         <efriedma at codeaurora.org <mailto:efriedma at
codeaurora.org>> wrote:
>
>             On 7/23/2018 12:40 PM, hameeza ahmed wrote:
>
>                 Thank You. I got it. Version issue.
>
>                 TTI.getRegisterBitWidth(true)
>
>                 How to put my target machine info in TTI?
>
>
>             Each target has an implementation, e.g.
>             X86TTIImpl::getRegisterBitWidth.
>
>
>             -Eli
>
>             -- 
>             Employee of Qualcomm Innovation Center, Inc.
>             Qualcomm Innovation Center, Inc. is a member of Code
>             Aurora Forum, a Linux Foundation Collaborative Project
>
>
>
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "Polly Development" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to polly-dev+unsubscribe at googlegroups.com 
> <mailto:polly-dev+unsubscribe at googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180724/d8964fde/attachment-0001.html>

hameeza ahmed via llvm-dev

2018-Jul-28 11:12 UTC

head link

[llvm-dev] KNL Vectorization with larger vector width

Thank You.

I am currently seeing how LLVM treats remainder loops. For eg with 63 loop
iterations i get 3 v16i32 and 15 scalars. I want to use v8 and v4 for 15
remainder instructions. How to do this?

I am seeing LoopVectorize.cpp but unable to find the code lines that deal
with remainder scalar loop iterations.

Please help..



On Tue, Jul 24, 2018 at 10:54 PM, Friedman, Eli <efriedma at
codeaurora.org>
wrote:
> There currently isn't any implementation of epilog loop vectorization
(see
> https://reviews.llvm.org/D30247, but it never got merged).
>
> In some cases you might get lucky with loop unrolling plus SLP
> vectorization.
>
> -Eli
>
>
> On 7/24/2018 10:34 AM, hameeza ahmed wrote:
>
> Hello,
> I need help here. I am able to adjust the vector width through
> WidestRegister value. When number of iterations=31 and  I set vector
> width=32 it gives <16xi32> and <8xi32> instructions.
>
> However if i replicate same behavior with number of iterations=63 and  I
> set vector width=64, no vector instructions are emitted. it should do as
> previous and gives <32xi32> and <16xi32> vector instructions.
>
> How to do this?
> What adjustments are needed?
>
> Please help
>
> I m trying this but unable to solve.
>
> Thank You
>
> On Tue, Jul 24, 2018 at 4:44 PM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> Hello,
>> Do i need to change following function;
>>
>> unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) {
>>   if (Vector && !ST->hasSSE1())
>>     return 0;
>>
>>   if (ST->is64Bit()) {
>>     if (Vector && ST->hasAVX512())
>>       return 32;
>>     return 16;
>>   }
>>   return 8;
>> }
>>
>> to
>>
>> if (ST->is2048Bit()) {
>>     if (Vector && ST->hasAVX512())
>>       return 1024;
>>     return 512;
>>   }
>>   return 256;
>>
>>
>> please help...
>>
>> On Tue, Jul 24, 2018 at 5:05 AM, hameeza ahmed <hahmed2305 at
gmail.com>
>> wrote:
>>
>>> Thank You.
>>> Right now to see the effect i did following changes;
>>>
>>> unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) {
>>>   if (Vector) {
>>>     if (ST->hasAVX512())
>>>       return 65536;
>>>
>>> here i changed 512 to 65536. Then in loopvectorize.cpp i did
following;
>>>
>>>  assert(MaxVectorSize <= 2048 && "Did not expect to
pack so many
>>> elements"
>>>                                 " into one vector!");
>>>
>>> changed 64 to 2048.
>>>
>>> It runs fine. I can see in IR <2048xi32> or <1024xi64>
emission.
>>>
>>> But I cannot see the vector mix like in default knl if
iterations=15 we
>>> see 1<8xi32> and rest scalar. so here when i keep
iteration=2047 i get all
>>> scalar why is that so? similarly in polly as well i cant see vector
mixes
>>> like its happening for KNL it emits <v16i32>,
<v8i32>,<v4i32>...so here it
>>> should emit recursively like <v2048i32> <v1024i32>
<v512i32>.....<v32i32>
>>>
>>> how to do this?
>>>
>>> What am i missing here?
>>> what further changes do i need to make?
>>>
>>> Please help...
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 24, 2018 at 1:52 AM, Friedman, Eli <efriedma at
codeaurora.org>
>>> wrote:
>>>
>>>> On 7/23/2018 12:40 PM, hameeza ahmed wrote:
>>>>
>>>>> Thank You. I got it. Version issue.
>>>>>
>>>>> TTI.getRegisterBitWidth(true)
>>>>>
>>>>> How to put my target machine info in TTI?
>>>>>
>>>>
>>>> Each target has an implementation, e.g.
X86TTIImpl::getRegisterBitWidth.
>>>>
>>>>
>>>>
>>>> -Eli
>>>>
>>>> --
>>>> Employee of Qualcomm Innovation Center, Inc.
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora
Forum, a
>>>> Linux Foundation Collaborative Project
>>>>
>>>>
>>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Polly Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to polly-dev+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180728/c38cee5d/attachment.html>

llvm dev - Jul 2018 - KNL Vectorization with larger vector width

[llvm-dev] KNL Vectorization with larger vector width

[llvm-dev] KNL Vectorization with larger vector width

[llvm-dev] KNL Vectorization with larger vector width