Eric Christopher via llvm-dev
2017-Nov-03 02:18 UTC
[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hello all, >> >> >> >> I would like to propose adding the -mprefer-avx256 and -mprefer-avx128 >> command line flags supported by latest GCC to clang. These flags will be >> used to limit the vector register size presented by TTI to the vectorizers. >> The backend will still be able to use wider registers for code written >> using the instrinsics in x86intrin.h. And the backend will still be able to >> use AVX512VL instructions and the additional XMM16-31 and YMM16-31 >> registers. >> >> >> >> Motivation: >> >> -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU >> frequency that may offset the gains from using the wider register size. See >> section 15.26 of Intel® 64 and IA-32 Architectures Optimization Reference >> Manual published October 2017. >> > > I note the doc mentions that 256-bit AVX operations also have the same > issue with reducing the CPU frequency, which is nice to see documented! > > There's also the issues discussed here < > http://www.agner.org/optimize/blog/read.php?i=165> (and elsewhere) > related to warm-up time for the 256-bit execution pipeline, which is > another issue with using wide-vector ops. > > > -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture >> are only 256-bits wide. 512-bit instructions using these ALUs must use both >> ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization >> Reference Manual published October 2017. >> > > >> Implementation Plan: >> >> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not >> mapped to any CPU. >> >> -Add mprefer-avx256 and mprefer-avx128 and the corresponding >> -mno-prefer-avx128/256 options to clang's driver Options.td file. I believe >> this will allow clang to pass these straight through to the -target-feature >> attribute in IR. >> >> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is >> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return >> 256 if AVX is enabled and prefer-avx128 is not set. >> > > Instead of multiple flags that have difficult to understand intersecting > behavior, one flag with a value would be better. E.g., what should > "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the > answer, it's confusing. (Similarly with other such combinations). Just a > single arg "-mprefer-avx={128/256/512}" (with no "no" version) seems easier > to understand to me (keeping the same behavior as you mention: asking to > prefer a larger width than is supported by your architecture should be fine > but ignored). > >I agree with this. It's a little more plumbing as far as subtarget features etc (represent via an optional value or just various "set the avx width" features - the latter being easier, but uglier), however, it's probably the right thing to do. I was looking at this myself just a couple weeks ago and think this is the right direction (when and how to turn things off) - and probably makes sense to be a default for these architectures? We might end up needing to check a couple of additional TTI places, but it sounds like you're on top of it. :) Thanks very much for doing this work. -eric> > > There may be some other backend changes needed, but I plan to address >> those as we find them. >> >> >> At a later point, consider making -mprefer-avx256 the default for Skylake >> Server due to the above mentioned performance considerations. >> > > > > > >> > Does this sound reasonable? >> >> >> >> *Latest Intel Optimization manual available here: >> https://software.intel.com/en-us/articles/intel-sdm#optimization >> >> >> -Craig Topper >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171103/16c66407/attachment.html>
Craig Topper via llvm-dev
2017-Nov-03 04:47 UTC
[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
That's a very good point about the ordering of the command line options. gcc's current implementation treats -mprefer-avx256 has "prefer 256 over 512" and -mprefer-avx128 as "prefer 128 over 256". Which feels weird for other reasons, but has less of an ordering ambiguity. -mprefer-avx128 has been in gcc for many years and predates the creation of avx512. -mprefer-avx256 was added a couple months ago. We've had an internal conversation with the implementor of -mprefer-avx256 in gcc about making -mprefer-avx128 affect 512-bit vectors as well. I'll bring up the ambiguity issue with them. Do we want to be compatible with gcc here? ~Craig On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo at gmail.com> wrote:> > > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Hello all, >>> >>> >>> >>> I would like to propose adding the -mprefer-avx256 and -mprefer-avx128 >>> command line flags supported by latest GCC to clang. These flags will be >>> used to limit the vector register size presented by TTI to the vectorizers. >>> The backend will still be able to use wider registers for code written >>> using the instrinsics in x86intrin.h. And the backend will still be able to >>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31 >>> registers. >>> >>> >>> >>> Motivation: >>> >>> -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU >>> frequency that may offset the gains from using the wider register size. See >>> section 15.26 of Intel® 64 and IA-32 Architectures Optimization Reference >>> Manual published October 2017. >>> >> >> I note the doc mentions that 256-bit AVX operations also have the same >> issue with reducing the CPU frequency, which is nice to see documented! >> >> There's also the issues discussed here <http://www.agner.org/ >> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time >> for the 256-bit execution pipeline, which is another issue with using >> wide-vector ops. >> >> >> -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture >>> are only 256-bits wide. 512-bit instructions using these ALUs must use both >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization >>> Reference Manual published October 2017. >>> >> >> >>> Implementation Plan: >>> >>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not >>> mapped to any CPU. >>> >>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding >>> -mno-prefer-avx128/256 options to clang's driver Options.td file. I believe >>> this will allow clang to pass these straight through to the -target-feature >>> attribute in IR. >>> >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is >>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return >>> 256 if AVX is enabled and prefer-avx128 is not set. >>> >> >> Instead of multiple flags that have difficult to understand intersecting >> behavior, one flag with a value would be better. E.g., what should >> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the >> answer, it's confusing. (Similarly with other such combinations). Just a >> single arg "-mprefer-avx={128/256/512}" (with no "no" version) seems easier >> to understand to me (keeping the same behavior as you mention: asking to >> prefer a larger width than is supported by your architecture should be fine >> but ignored). >> >> > I agree with this. It's a little more plumbing as far as subtarget > features etc (represent via an optional value or just various "set the avx > width" features - the latter being easier, but uglier), however, it's > probably the right thing to do. > > I was looking at this myself just a couple weeks ago and think this is the > right direction (when and how to turn things off) - and probably makes > sense to be a default for these architectures? We might end up needing to > check a couple of additional TTI places, but it sounds like you're on top > of it. :) > > Thanks very much for doing this work. > > -eric > > >> >> >> There may be some other backend changes needed, but I plan to address >>> those as we find them. >>> >>> >>> At a later point, consider making -mprefer-avx256 the default for >>> Skylake Server due to the above mentioned performance considerations. >>> >> >> >> >> >> >>> >> Does this sound reasonable? >>> >>> >>> >>> *Latest Intel Optimization manual available here: >>> https://software.intel.com/en-us/articles/intel-sdm#optimization >>> >>> >>> -Craig Topper >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171102/f8425e83/attachment-0001.html>
Tobias Grosser via llvm-dev
2017-Nov-07 09:02 UTC
[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote:> That's a very good point about the ordering of the command line options. > gcc's current implementation treats -mprefer-avx256 has "prefer 256 over > 512" and -mprefer-avx128 as "prefer 128 over 256". Which feels weird for > other reasons, but has less of an ordering ambiguity. > > -mprefer-avx128 has been in gcc for many years and predates the creation > of > avx512. -mprefer-avx256 was added a couple months ago. > > We've had an internal conversation with the implementor of > -mprefer-avx256 > in gcc about making -mprefer-avx128 affect 512-bit vectors as well. I'll > bring up the ambiguity issue with them. > > Do we want to be compatible with gcc here?I certainly believe we would want to be compatible with gcc (if we use the same names). Best, Tobias> > ~Craig > > On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo at gmail.com> > wrote: > > > > > > > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev < > > llvm-dev at lists.llvm.org> wrote: > > > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev < > >> llvm-dev at lists.llvm.org> wrote: > >> > >>> Hello all, > >>> > >>> > >>> > >>> I would like to propose adding the -mprefer-avx256 and -mprefer-avx128 > >>> command line flags supported by latest GCC to clang. These flags will be > >>> used to limit the vector register size presented by TTI to the vectorizers. > >>> The backend will still be able to use wider registers for code written > >>> using the instrinsics in x86intrin.h. And the backend will still be able to > >>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31 > >>> registers. > >>> > >>> > >>> > >>> Motivation: > >>> > >>> -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU > >>> frequency that may offset the gains from using the wider register size. See > >>> section 15.26 of Intel® 64 and IA-32 Architectures Optimization Reference > >>> Manual published October 2017. > >>> > >> > >> I note the doc mentions that 256-bit AVX operations also have the same > >> issue with reducing the CPU frequency, which is nice to see documented! > >> > >> There's also the issues discussed here <http://www.agner.org/ > >> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time > >> for the 256-bit execution pipeline, which is another issue with using > >> wide-vector ops. > >> > >> > >> -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture > >>> are only 256-bits wide. 512-bit instructions using these ALUs must use both > >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization > >>> Reference Manual published October 2017. > >>> > >> > >> > >>> Implementation Plan: > >>> > >>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not > >>> mapped to any CPU. > >>> > >>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding > >>> -mno-prefer-avx128/256 options to clang's driver Options.td file. I believe > >>> this will allow clang to pass these straight through to the -target-feature > >>> attribute in IR. > >>> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is > >>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return > >>> 256 if AVX is enabled and prefer-avx128 is not set. > >>> > >> > >> Instead of multiple flags that have difficult to understand intersecting > >> behavior, one flag with a value would be better. E.g., what should > >> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the > >> answer, it's confusing. (Similarly with other such combinations). Just a > >> single arg "-mprefer-avx={128/256/512}" (with no "no" version) seems easier > >> to understand to me (keeping the same behavior as you mention: asking to > >> prefer a larger width than is supported by your architecture should be fine > >> but ignored). > >> > >> > > I agree with this. It's a little more plumbing as far as subtarget > > features etc (represent via an optional value or just various "set the avx > > width" features - the latter being easier, but uglier), however, it's > > probably the right thing to do. > > > > I was looking at this myself just a couple weeks ago and think this is the > > right direction (when and how to turn things off) - and probably makes > > sense to be a default for these architectures? We might end up needing to > > check a couple of additional TTI places, but it sounds like you're on top > > of it. :) > > > > Thanks very much for doing this work. > > > > -eric > > > > > >> > >> > >> There may be some other backend changes needed, but I plan to address > >>> those as we find them. > >>> > >>> > >>> At a later point, consider making -mprefer-avx256 the default for > >>> Skylake Server due to the above mentioned performance considerations. > >>> > >> > >> > >> > >> > >> > >>> > >> Does this sound reasonable? > >>> > >>> > >>> > >>> *Latest Intel Optimization manual available here: > >>> https://software.intel.com/en-us/articles/intel-sdm#optimization > >>> > >>> > >>> -Craig Topper > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>> > >>> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reasonably Related Threads
- RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
- RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
- RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
- RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
- RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available