hameeza ahmed via llvm-dev
2018-Jul-24 17:34 UTC
[llvm-dev] KNL Vectorization with larger vector width
Hello, I need help here. I am able to adjust the vector width through WidestRegister value. When number of iterations=31 and I set vector width=32 it gives <16xi32> and <8xi32> instructions. However if i replicate same behavior with number of iterations=63 and I set vector width=64, no vector instructions are emitted. it should do as previous and gives <32xi32> and <16xi32> vector instructions. How to do this? What adjustments are needed? Please help I m trying this but unable to solve. Thank You On Tue, Jul 24, 2018 at 4:44 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> Hello, > Do i need to change following function; > > unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) { > if (Vector && !ST->hasSSE1()) > return 0; > > if (ST->is64Bit()) { > if (Vector && ST->hasAVX512()) > return 32; > return 16; > } > return 8; > } > > to > > if (ST->is2048Bit()) { > if (Vector && ST->hasAVX512()) > return 1024; > return 512; > } > return 256; > > > please help... > > On Tue, Jul 24, 2018 at 5:05 AM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Thank You. >> Right now to see the effect i did following changes; >> >> unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) { >> if (Vector) { >> if (ST->hasAVX512()) >> return 65536; >> >> here i changed 512 to 65536. Then in loopvectorize.cpp i did following; >> >> assert(MaxVectorSize <= 2048 && "Did not expect to pack so many elements" >> " into one vector!"); >> >> changed 64 to 2048. >> >> It runs fine. I can see in IR <2048xi32> or <1024xi64> emission. >> >> But I cannot see the vector mix like in default knl if iterations=15 we >> see 1<8xi32> and rest scalar. so here when i keep iteration=2047 i get all >> scalar why is that so? similarly in polly as well i cant see vector mixes >> like its happening for KNL it emits <v16i32>, <v8i32>,<v4i32>...so here it >> should emit recursively like <v2048i32> <v1024i32> <v512i32>.....<v32i32> >> >> how to do this? >> >> What am i missing here? >> what further changes do i need to make? >> >> Please help... >> >> >> >> >> >> >> On Tue, Jul 24, 2018 at 1:52 AM, Friedman, Eli <efriedma at codeaurora.org> >> wrote: >> >>> On 7/23/2018 12:40 PM, hameeza ahmed wrote: >>> >>>> Thank You. I got it. Version issue. >>>> >>>> TTI.getRegisterBitWidth(true) >>>> >>>> How to put my target machine info in TTI? >>>> >>> >>> Each target has an implementation, e.g. X86TTIImpl::getRegisterBitWidth. >>> >>> >>> -Eli >>> >>> -- >>> Employee of Qualcomm Innovation Center, Inc. >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a >>> Linux Foundation Collaborative Project >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180724/b303f357/attachment.html>
Friedman, Eli via llvm-dev
2018-Jul-24 17:54 UTC
[llvm-dev] KNL Vectorization with larger vector width
There currently isn't any implementation of epilog loop vectorization (see https://reviews.llvm.org/D30247, but it never got merged). In some cases you might get lucky with loop unrolling plus SLP vectorization. -Eli On 7/24/2018 10:34 AM, hameeza ahmed wrote:> Hello, > I need help here. I am able to adjust the vector width through > WidestRegister value. When number of iterations=31 and I set vector > width=32 it gives <16xi32> and <8xi32> instructions. > > However if i replicate same behavior with number of iterations=63 and > I set vector width=64, no vector instructions are emitted. it should > do as previous and gives <32xi32> and <16xi32> vector instructions. > > How to do this? > What adjustments are needed? > > Please help > > I m trying this but unable to solve. > > Thank You > > On Tue, Jul 24, 2018 at 4:44 PM, hameeza ahmed <hahmed2305 at gmail.com > <mailto:hahmed2305 at gmail.com>> wrote: > > Hello, > Do i need to change following function; > > unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) { > if (Vector && !ST->hasSSE1()) > return 0; > > if (ST->is64Bit()) { > if (Vector && ST->hasAVX512()) > return 32; > return 16; > } > return 8; > } > > to > > if (ST->is2048Bit()) { > if (Vector && ST->hasAVX512()) > return 1024; > return 512; > } > return 256; > > > please help... > > On Tue, Jul 24, 2018 at 5:05 AM, hameeza ahmed > <hahmed2305 at gmail.com <mailto:hahmed2305 at gmail.com>> wrote: > > Thank You. > Right now to see the effect i did following changes; > > unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) { > if (Vector) { > if (ST->hasAVX512()) > return 65536; > > here i changed 512 to 65536. Then in loopvectorize.cpp i did > following; > > assert(MaxVectorSize <= 2048 && "Did not expect to pack so > many elements" > " into one vector!"); > > changed 64 to 2048. > > It runs fine. I can see in IR <2048xi32> or <1024xi64> emission. > > But I cannot see the vector mix like in default knl if > iterations=15 we see 1<8xi32> and rest scalar. so here when i > keep iteration=2047 i get all scalar why is that so? similarly > in polly as well i cant see vector mixes like its happening > for KNL it emits <v16i32>, <v8i32>,<v4i32>...so here it should > emit recursively like <v2048i32> <v1024i32> <v512i32>.....<v32i32> > > how to do this? > > What am i missing here? > what further changes do i need to make? > > Please help... > > > > > > > On Tue, Jul 24, 2018 at 1:52 AM, Friedman, Eli > <efriedma at codeaurora.org <mailto:efriedma at codeaurora.org>> wrote: > > On 7/23/2018 12:40 PM, hameeza ahmed wrote: > > Thank You. I got it. Version issue. > > TTI.getRegisterBitWidth(true) > > How to put my target machine info in TTI? > > > Each target has an implementation, e.g. > X86TTIImpl::getRegisterBitWidth. > > > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code > Aurora Forum, a Linux Foundation Collaborative Project > > > > > -- > You received this message because you are subscribed to the Google > Groups "Polly Development" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to polly-dev+unsubscribe at googlegroups.com > <mailto:polly-dev+unsubscribe at googlegroups.com>. > For more options, visit https://groups.google.com/d/optout.-- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180724/d8964fde/attachment-0001.html>
hameeza ahmed via llvm-dev
2018-Jul-28 11:12 UTC
[llvm-dev] KNL Vectorization with larger vector width
Thank You. I am currently seeing how LLVM treats remainder loops. For eg with 63 loop iterations i get 3 v16i32 and 15 scalars. I want to use v8 and v4 for 15 remainder instructions. How to do this? I am seeing LoopVectorize.cpp but unable to find the code lines that deal with remainder scalar loop iterations. Please help.. On Tue, Jul 24, 2018 at 10:54 PM, Friedman, Eli <efriedma at codeaurora.org> wrote:> There currently isn't any implementation of epilog loop vectorization (see > https://reviews.llvm.org/D30247, but it never got merged). > > In some cases you might get lucky with loop unrolling plus SLP > vectorization. > > -Eli > > > On 7/24/2018 10:34 AM, hameeza ahmed wrote: > > Hello, > I need help here. I am able to adjust the vector width through > WidestRegister value. When number of iterations=31 and I set vector > width=32 it gives <16xi32> and <8xi32> instructions. > > However if i replicate same behavior with number of iterations=63 and I > set vector width=64, no vector instructions are emitted. it should do as > previous and gives <32xi32> and <16xi32> vector instructions. > > How to do this? > What adjustments are needed? > > Please help > > I m trying this but unable to solve. > > Thank You > > On Tue, Jul 24, 2018 at 4:44 PM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Hello, >> Do i need to change following function; >> >> unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) { >> if (Vector && !ST->hasSSE1()) >> return 0; >> >> if (ST->is64Bit()) { >> if (Vector && ST->hasAVX512()) >> return 32; >> return 16; >> } >> return 8; >> } >> >> to >> >> if (ST->is2048Bit()) { >> if (Vector && ST->hasAVX512()) >> return 1024; >> return 512; >> } >> return 256; >> >> >> please help... >> >> On Tue, Jul 24, 2018 at 5:05 AM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> Thank You. >>> Right now to see the effect i did following changes; >>> >>> unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) { >>> if (Vector) { >>> if (ST->hasAVX512()) >>> return 65536; >>> >>> here i changed 512 to 65536. Then in loopvectorize.cpp i did following; >>> >>> assert(MaxVectorSize <= 2048 && "Did not expect to pack so many >>> elements" >>> " into one vector!"); >>> >>> changed 64 to 2048. >>> >>> It runs fine. I can see in IR <2048xi32> or <1024xi64> emission. >>> >>> But I cannot see the vector mix like in default knl if iterations=15 we >>> see 1<8xi32> and rest scalar. so here when i keep iteration=2047 i get all >>> scalar why is that so? similarly in polly as well i cant see vector mixes >>> like its happening for KNL it emits <v16i32>, <v8i32>,<v4i32>...so here it >>> should emit recursively like <v2048i32> <v1024i32> <v512i32>.....<v32i32> >>> >>> how to do this? >>> >>> What am i missing here? >>> what further changes do i need to make? >>> >>> Please help... >>> >>> >>> >>> >>> >>> >>> On Tue, Jul 24, 2018 at 1:52 AM, Friedman, Eli <efriedma at codeaurora.org> >>> wrote: >>> >>>> On 7/23/2018 12:40 PM, hameeza ahmed wrote: >>>> >>>>> Thank You. I got it. Version issue. >>>>> >>>>> TTI.getRegisterBitWidth(true) >>>>> >>>>> How to put my target machine info in TTI? >>>>> >>>> >>>> Each target has an implementation, e.g. X86TTIImpl::getRegisterBitWidth. >>>> >>>> >>>> >>>> -Eli >>>> >>>> -- >>>> Employee of Qualcomm Innovation Center, Inc. >>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a >>>> Linux Foundation Collaborative Project >>>> >>>> >>> >> > -- > You received this message because you are subscribed to the Google Groups > "Polly Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to polly-dev+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180728/c38cee5d/attachment.html>