Craig Topper via llvm-dev
2017-Nov-09 23:21 UTC
[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
I agree that a less x86 specific command line makes sense. I've been having an internal discussions with gcc folks and their evaluating switching to something like -mprefer-vector-width=128/256/512/none Based on the current performance data we're seeing, we think we need to ultimately default skylake-avx512 to -mprefer-vector-width=256. If we go with a target independent option/implementation is there someway we could still affect the default behavior in a target specific way? ~Craig On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <spatel at rotateright.com> wrote:> It's clear from the Intel docs how this has evolved, but from a compiler > perspective, this isn't a Skylake "feature" :) ... nor an Intel feature, > nor an x86 feature. > > It's a generic programmer hint for any target with multiple potential > vector lengths. > > On x86, there's already a potential use case for this hint with a > different starting motivation: re-vectorization. That's where we take C > code that uses 128-bit vector intrinsics and selectively widen it to 256- > or 512-bit vector ops based on a newer CPU target than the code was > originally written for. > > I think it's just a matter of time before a customer requests the same > ability for another target (maybe they already have and I don't know about > it). So we should have a solution that recognizes that possibility. > > Note that having a target-independent implementation in the optimizer > doesn't preclude a flag alias in clang to maintain compatibility with gcc. > > > > On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote: >> > That's a very good point about the ordering of the command line options. >> > gcc's current implementation treats -mprefer-avx256 has "prefer 256 over >> > 512" and -mprefer-avx128 as "prefer 128 over 256". Which feels weird for >> > other reasons, but has less of an ordering ambiguity. >> > >> > -mprefer-avx128 has been in gcc for many years and predates the creation >> > of >> > avx512. -mprefer-avx256 was added a couple months ago. >> > >> > We've had an internal conversation with the implementor of >> > -mprefer-avx256 >> > in gcc about making -mprefer-avx128 affect 512-bit vectors as well. I'll >> > bring up the ambiguity issue with them. >> > >> > Do we want to be compatible with gcc here? >> >> I certainly believe we would want to be compatible with gcc (if we use >> the same names). >> >> Best, >> Tobias >> >> > >> > ~Craig >> > >> > On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo at gmail.com> >> > wrote: >> > >> > > >> > > >> > > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev < >> > > llvm-dev at lists.llvm.org> wrote: >> > > >> > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev < >> > >> llvm-dev at lists.llvm.org> wrote: >> > >> >> > >>> Hello all, >> > >>> >> > >>> >> > >>> >> > >>> I would like to propose adding the -mprefer-avx256 and >> -mprefer-avx128 >> > >>> command line flags supported by latest GCC to clang. These flags >> will be >> > >>> used to limit the vector register size presented by TTI to the >> vectorizers. >> > >>> The backend will still be able to use wider registers for code >> written >> > >>> using the instrinsics in x86intrin.h. And the backend will still be >> able to >> > >>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31 >> > >>> registers. >> > >>> >> > >>> >> > >>> >> > >>> Motivation: >> > >>> >> > >>> -Using 512-bit operations on some Intel CPUs may cause a decrease >> in CPU >> > >>> frequency that may offset the gains from using the wider register >> size. See >> > >>> section 15.26 of Intel® 64 and IA-32 Architectures Optimization >> Reference >> > >>> Manual published October 2017. >> > >>> >> > >> >> > >> I note the doc mentions that 256-bit AVX operations also have the >> same >> > >> issue with reducing the CPU frequency, which is nice to see >> documented! >> > >> >> > >> There's also the issues discussed here <http://www.agner.org/ >> > >> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time >> > >> for the 256-bit execution pipeline, which is another issue with using >> > >> wide-vector ops. >> > >> >> > >> >> > >> -The vector ALUs on ports 0 and 1 of the Skylake Server >> microarchitecture >> > >>> are only 256-bits wide. 512-bit instructions using these ALUs must >> use both >> > >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures >> Optimization >> > >>> Reference Manual published October 2017. >> > >>> >> > >> >> > >> >> > >>> Implementation Plan: >> > >>> >> > >>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td >> not >> > >>> mapped to any CPU. >> > >>> >> > >>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding >> > >>> -mno-prefer-avx128/256 options to clang's driver Options.td file. I >> believe >> > >>> this will allow clang to pass these straight through to the >> -target-feature >> > >>> attribute in IR. >> > >>> >> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if >> AVX512 is >> > >>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly >> return >> > >>> 256 if AVX is enabled and prefer-avx128 is not set. >> > >>> >> > >> >> > >> Instead of multiple flags that have difficult to understand >> intersecting >> > >> behavior, one flag with a value would be better. E.g., what should >> > >> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter >> the >> > >> answer, it's confusing. (Similarly with other such combinations). >> Just a >> > >> single arg "-mprefer-avx={128/256/512}" (with no "no" version) seems >> easier >> > >> to understand to me (keeping the same behavior as you mention: >> asking to >> > >> prefer a larger width than is supported by your architecture should >> be fine >> > >> but ignored). >> > >> >> > >> >> > > I agree with this. It's a little more plumbing as far as subtarget >> > > features etc (represent via an optional value or just various "set >> the avx >> > > width" features - the latter being easier, but uglier), however, it's >> > > probably the right thing to do. >> > > >> > > I was looking at this myself just a couple weeks ago and think this >> is the >> > > right direction (when and how to turn things off) - and probably makes >> > > sense to be a default for these architectures? We might end up >> needing to >> > > check a couple of additional TTI places, but it sounds like you're on >> top >> > > of it. :) >> > > >> > > Thanks very much for doing this work. >> > > >> > > -eric >> > > >> > > >> > >> >> > >> >> > >> There may be some other backend changes needed, but I plan to address >> > >>> those as we find them. >> > >>> >> > >>> >> > >>> At a later point, consider making -mprefer-avx256 the default for >> > >>> Skylake Server due to the above mentioned performance >> considerations. >> > >>> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >>> >> > >> Does this sound reasonable? >> > >>> >> > >>> >> > >>> >> > >>> *Latest Intel Optimization manual available here: >> > >>> https://software.intel.com/en-us/articles/intel-sdm#optimization >> > >>> >> > >>> >> > >>> -Craig Topper >> > >>> >> > >>> _______________________________________________ >> > >>> LLVM Developers mailing list >> > >>> llvm-dev at lists.llvm.org >> > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >>> >> > >>> _______________________________________________ >> > >> LLVM Developers mailing list >> > >> llvm-dev at lists.llvm.org >> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >> >> > > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171109/bb3a4fe2/attachment-0001.html>
Sanjay Patel via llvm-dev
2017-Nov-10 15:39 UTC
[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
We can tie a user preference / override to a CPU model. We do something like that for square root estimates already (although it does use a SubtargetFeature currently for x86; ideally, we'd key that off of something in the CPU scheduler model). On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper <craig.topper at gmail.com> wrote:> I agree that a less x86 specific command line makes sense. I've been > having an internal discussions with gcc folks and their evaluating > switching to something like -mprefer-vector-width=128/256/512/none > > Based on the current performance data we're seeing, we think we need to > ultimately default skylake-avx512 to -mprefer-vector-width=256. If we go > with a target independent option/implementation is there someway we could > still affect the default behavior in a target specific way? > > ~Craig > > On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <spatel at rotateright.com> > wrote: > >> It's clear from the Intel docs how this has evolved, but from a compiler >> perspective, this isn't a Skylake "feature" :) ... nor an Intel feature, >> nor an x86 feature. >> >> It's a generic programmer hint for any target with multiple potential >> vector lengths. >> >> On x86, there's already a potential use case for this hint with a >> different starting motivation: re-vectorization. That's where we take C >> code that uses 128-bit vector intrinsics and selectively widen it to 256- >> or 512-bit vector ops based on a newer CPU target than the code was >> originally written for. >> >> I think it's just a matter of time before a customer requests the same >> ability for another target (maybe they already have and I don't know about >> it). So we should have a solution that recognizes that possibility. >> >> Note that having a target-independent implementation in the optimizer >> doesn't preclude a flag alias in clang to maintain compatibility with gcc. >> >> >> >> On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote: >>> > That's a very good point about the ordering of the command line >>> options. >>> > gcc's current implementation treats -mprefer-avx256 has "prefer 256 >>> over >>> > 512" and -mprefer-avx128 as "prefer 128 over 256". Which feels weird >>> for >>> > other reasons, but has less of an ordering ambiguity. >>> > >>> > -mprefer-avx128 has been in gcc for many years and predates the >>> creation >>> > of >>> > avx512. -mprefer-avx256 was added a couple months ago. >>> > >>> > We've had an internal conversation with the implementor of >>> > -mprefer-avx256 >>> > in gcc about making -mprefer-avx128 affect 512-bit vectors as well. >>> I'll >>> > bring up the ambiguity issue with them. >>> > >>> > Do we want to be compatible with gcc here? >>> >>> I certainly believe we would want to be compatible with gcc (if we use >>> the same names). >>> >>> Best, >>> Tobias >>> >>> > >>> > ~Craig >>> > >>> > On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo at gmail.com> >>> > wrote: >>> > >>> > > >>> > > >>> > > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev < >>> > > llvm-dev at lists.llvm.org> wrote: >>> > > >>> > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev < >>> > >> llvm-dev at lists.llvm.org> wrote: >>> > >> >>> > >>> Hello all, >>> > >>> >>> > >>> >>> > >>> >>> > >>> I would like to propose adding the -mprefer-avx256 and >>> -mprefer-avx128 >>> > >>> command line flags supported by latest GCC to clang. These flags >>> will be >>> > >>> used to limit the vector register size presented by TTI to the >>> vectorizers. >>> > >>> The backend will still be able to use wider registers for code >>> written >>> > >>> using the instrinsics in x86intrin.h. And the backend will still >>> be able to >>> > >>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31 >>> > >>> registers. >>> > >>> >>> > >>> >>> > >>> >>> > >>> Motivation: >>> > >>> >>> > >>> -Using 512-bit operations on some Intel CPUs may cause a decrease >>> in CPU >>> > >>> frequency that may offset the gains from using the wider register >>> size. See >>> > >>> section 15.26 of Intel® 64 and IA-32 Architectures Optimization >>> Reference >>> > >>> Manual published October 2017. >>> > >>> >>> > >> >>> > >> I note the doc mentions that 256-bit AVX operations also have the >>> same >>> > >> issue with reducing the CPU frequency, which is nice to see >>> documented! >>> > >> >>> > >> There's also the issues discussed here <http://www.agner.org/ >>> > >> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up >>> time >>> > >> for the 256-bit execution pipeline, which is another issue with >>> using >>> > >> wide-vector ops. >>> > >> >>> > >> >>> > >> -The vector ALUs on ports 0 and 1 of the Skylake Server >>> microarchitecture >>> > >>> are only 256-bits wide. 512-bit instructions using these ALUs must >>> use both >>> > >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures >>> Optimization >>> > >>> Reference Manual published October 2017. >>> > >>> >>> > >> >>> > >> >>> > >>> Implementation Plan: >>> > >>> >>> > >>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in >>> X86.td not >>> > >>> mapped to any CPU. >>> > >>> >>> > >>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding >>> > >>> -mno-prefer-avx128/256 options to clang's driver Options.td file. >>> I believe >>> > >>> this will allow clang to pass these straight through to the >>> -target-feature >>> > >>> attribute in IR. >>> > >>> >>> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if >>> AVX512 is >>> > >>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly >>> return >>> > >>> 256 if AVX is enabled and prefer-avx128 is not set. >>> > >>> >>> > >> >>> > >> Instead of multiple flags that have difficult to understand >>> intersecting >>> > >> behavior, one flag with a value would be better. E.g., what should >>> > >> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter >>> the >>> > >> answer, it's confusing. (Similarly with other such combinations). >>> Just a >>> > >> single arg "-mprefer-avx={128/256/512}" (with no "no" version) >>> seems easier >>> > >> to understand to me (keeping the same behavior as you mention: >>> asking to >>> > >> prefer a larger width than is supported by your architecture should >>> be fine >>> > >> but ignored). >>> > >> >>> > >> >>> > > I agree with this. It's a little more plumbing as far as subtarget >>> > > features etc (represent via an optional value or just various "set >>> the avx >>> > > width" features - the latter being easier, but uglier), however, it's >>> > > probably the right thing to do. >>> > > >>> > > I was looking at this myself just a couple weeks ago and think this >>> is the >>> > > right direction (when and how to turn things off) - and probably >>> makes >>> > > sense to be a default for these architectures? We might end up >>> needing to >>> > > check a couple of additional TTI places, but it sounds like you're >>> on top >>> > > of it. :) >>> > > >>> > > Thanks very much for doing this work. >>> > > >>> > > -eric >>> > > >>> > > >>> > >> >>> > >> >>> > >> There may be some other backend changes needed, but I plan to >>> address >>> > >>> those as we find them. >>> > >>> >>> > >>> >>> > >>> At a later point, consider making -mprefer-avx256 the default for >>> > >>> Skylake Server due to the above mentioned performance >>> considerations. >>> > >>> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >>> >>> > >> Does this sound reasonable? >>> > >>> >>> > >>> >>> > >>> >>> > >>> *Latest Intel Optimization manual available here: >>> > >>> https://software.intel.com/en-us/articles/intel-sdm#optimization >>> > >>> >>> > >>> >>> > >>> -Craig Topper >>> > >>> >>> > >>> _______________________________________________ >>> > >>> LLVM Developers mailing list >>> > >>> llvm-dev at lists.llvm.org >>> > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> > >>> >>> > >>> _______________________________________________ >>> > >> LLVM Developers mailing list >>> > >> llvm-dev at lists.llvm.org >>> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> > >> >>> > > >>> > _______________________________________________ >>> > LLVM Developers mailing list >>> > llvm-dev at lists.llvm.org >>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171110/9ee05bb5/attachment.html>
Craig Topper via llvm-dev
2017-Nov-11 01:04 UTC
[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
Are you referring to the X86TargetLowering::isFsqrtCheap hook? ~Craig On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel <spatel at rotateright.com> wrote:> We can tie a user preference / override to a CPU model. We do something > like that for square root estimates already (although it does use a > SubtargetFeature currently for x86; ideally, we'd key that off of something > in the CPU scheduler model). > > > On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper <craig.topper at gmail.com> > wrote: > >> I agree that a less x86 specific command line makes sense. I've been >> having an internal discussions with gcc folks and their evaluating >> switching to something like -mprefer-vector-width=128/256/512/none >> >> Based on the current performance data we're seeing, we think we need to >> ultimately default skylake-avx512 to -mprefer-vector-width=256. If we go >> with a target independent option/implementation is there someway we could >> still affect the default behavior in a target specific way? >> >> ~Craig >> >> On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <spatel at rotateright.com> >> wrote: >> >>> It's clear from the Intel docs how this has evolved, but from a compiler >>> perspective, this isn't a Skylake "feature" :) ... nor an Intel feature, >>> nor an x86 feature. >>> >>> It's a generic programmer hint for any target with multiple potential >>> vector lengths. >>> >>> On x86, there's already a potential use case for this hint with a >>> different starting motivation: re-vectorization. That's where we take C >>> code that uses 128-bit vector intrinsics and selectively widen it to 256- >>> or 512-bit vector ops based on a newer CPU target than the code was >>> originally written for. >>> >>> I think it's just a matter of time before a customer requests the same >>> ability for another target (maybe they already have and I don't know about >>> it). So we should have a solution that recognizes that possibility. >>> >>> Note that having a target-independent implementation in the optimizer >>> doesn't preclude a flag alias in clang to maintain compatibility with gcc. >>> >>> >>> >>> On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote: >>>> > That's a very good point about the ordering of the command line >>>> options. >>>> > gcc's current implementation treats -mprefer-avx256 has "prefer 256 >>>> over >>>> > 512" and -mprefer-avx128 as "prefer 128 over 256". Which feels weird >>>> for >>>> > other reasons, but has less of an ordering ambiguity. >>>> > >>>> > -mprefer-avx128 has been in gcc for many years and predates the >>>> creation >>>> > of >>>> > avx512. -mprefer-avx256 was added a couple months ago. >>>> > >>>> > We've had an internal conversation with the implementor of >>>> > -mprefer-avx256 >>>> > in gcc about making -mprefer-avx128 affect 512-bit vectors as well. >>>> I'll >>>> > bring up the ambiguity issue with them. >>>> > >>>> > Do we want to be compatible with gcc here? >>>> >>>> I certainly believe we would want to be compatible with gcc (if we use >>>> the same names). >>>> >>>> Best, >>>> Tobias >>>> >>>> > >>>> > ~Craig >>>> > >>>> > On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo at gmail.com> >>>> > wrote: >>>> > >>>> > > >>>> > > >>>> > > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev < >>>> > > llvm-dev at lists.llvm.org> wrote: >>>> > > >>>> > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev < >>>> > >> llvm-dev at lists.llvm.org> wrote: >>>> > >> >>>> > >>> Hello all, >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> I would like to propose adding the -mprefer-avx256 and >>>> -mprefer-avx128 >>>> > >>> command line flags supported by latest GCC to clang. These flags >>>> will be >>>> > >>> used to limit the vector register size presented by TTI to the >>>> vectorizers. >>>> > >>> The backend will still be able to use wider registers for code >>>> written >>>> > >>> using the instrinsics in x86intrin.h. And the backend will still >>>> be able to >>>> > >>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31 >>>> > >>> registers. >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> Motivation: >>>> > >>> >>>> > >>> -Using 512-bit operations on some Intel CPUs may cause a decrease >>>> in CPU >>>> > >>> frequency that may offset the gains from using the wider register >>>> size. See >>>> > >>> section 15.26 of Intel® 64 and IA-32 Architectures Optimization >>>> Reference >>>> > >>> Manual published October 2017. >>>> > >>> >>>> > >> >>>> > >> I note the doc mentions that 256-bit AVX operations also have the >>>> same >>>> > >> issue with reducing the CPU frequency, which is nice to see >>>> documented! >>>> > >> >>>> > >> There's also the issues discussed here <http://www.agner.org/ >>>> > >> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up >>>> time >>>> > >> for the 256-bit execution pipeline, which is another issue with >>>> using >>>> > >> wide-vector ops. >>>> > >> >>>> > >> >>>> > >> -The vector ALUs on ports 0 and 1 of the Skylake Server >>>> microarchitecture >>>> > >>> are only 256-bits wide. 512-bit instructions using these ALUs >>>> must use both >>>> > >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures >>>> Optimization >>>> > >>> Reference Manual published October 2017. >>>> > >>> >>>> > >> >>>> > >> >>>> > >>> Implementation Plan: >>>> > >>> >>>> > >>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in >>>> X86.td not >>>> > >>> mapped to any CPU. >>>> > >>> >>>> > >>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding >>>> > >>> -mno-prefer-avx128/256 options to clang's driver Options.td file. >>>> I believe >>>> > >>> this will allow clang to pass these straight through to the >>>> -target-feature >>>> > >>> attribute in IR. >>>> > >>> >>>> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if >>>> AVX512 is >>>> > >>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly >>>> return >>>> > >>> 256 if AVX is enabled and prefer-avx128 is not set. >>>> > >>> >>>> > >> >>>> > >> Instead of multiple flags that have difficult to understand >>>> intersecting >>>> > >> behavior, one flag with a value would be better. E.g., what should >>>> > >> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter >>>> the >>>> > >> answer, it's confusing. (Similarly with other such combinations). >>>> Just a >>>> > >> single arg "-mprefer-avx={128/256/512}" (with no "no" version) >>>> seems easier >>>> > >> to understand to me (keeping the same behavior as you mention: >>>> asking to >>>> > >> prefer a larger width than is supported by your architecture >>>> should be fine >>>> > >> but ignored). >>>> > >> >>>> > >> >>>> > > I agree with this. It's a little more plumbing as far as subtarget >>>> > > features etc (represent via an optional value or just various "set >>>> the avx >>>> > > width" features - the latter being easier, but uglier), however, >>>> it's >>>> > > probably the right thing to do. >>>> > > >>>> > > I was looking at this myself just a couple weeks ago and think this >>>> is the >>>> > > right direction (when and how to turn things off) - and probably >>>> makes >>>> > > sense to be a default for these architectures? We might end up >>>> needing to >>>> > > check a couple of additional TTI places, but it sounds like you're >>>> on top >>>> > > of it. :) >>>> > > >>>> > > Thanks very much for doing this work. >>>> > > >>>> > > -eric >>>> > > >>>> > > >>>> > >> >>>> > >> >>>> > >> There may be some other backend changes needed, but I plan to >>>> address >>>> > >>> those as we find them. >>>> > >>> >>>> > >>> >>>> > >>> At a later point, consider making -mprefer-avx256 the default for >>>> > >>> Skylake Server due to the above mentioned performance >>>> considerations. >>>> > >>> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >>> >>>> > >> Does this sound reasonable? >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> *Latest Intel Optimization manual available here: >>>> > >>> https://software.intel.com/en-us/articles/intel-sdm#optimization >>>> > >>> >>>> > >>> >>>> > >>> -Craig Topper >>>> > >>> >>>> > >>> _______________________________________________ >>>> > >>> LLVM Developers mailing list >>>> > >>> llvm-dev at lists.llvm.org >>>> > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> > >>> >>>> > >>> _______________________________________________ >>>> > >> LLVM Developers mailing list >>>> > >> llvm-dev at lists.llvm.org >>>> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> > >> >>>> > > >>>> > _______________________________________________ >>>> > LLVM Developers mailing list >>>> > llvm-dev at lists.llvm.org >>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171110/aa419eef/attachment.html>
Seemingly Similar Threads
- RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
- RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
- RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
- RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
- RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available