thr3ads.net - llvm dev - [llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Craig Topper via llvm-dev

2017-Nov-13 22:15 UTC

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

On Sat, Nov 11, 2017 at 8:52 PM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> On 11/11/2017 09:52 PM, UE US via llvm-dev wrote:
>
> If skylake is that bad at AVX2
>
>
> I don't think this says anything negative about AVX2, but AVX-512.
>
> it belongs in -mcpu / -march IMO.
>
>
> No. We'd still want to enable the architectural features for vector
> intrinsics and the like.
>
I took this to mean that the feature should be enabled by default for
-march=skylake-avx512.


>
>
> Based on the current performance data we're seeing, we think we need to
> ultimately default skylake-avx512 to -mprefer-vector-width=256.
>
>
> Craig, is this for both integer and floating-point code?
>
I believe so, but I'll try to get confirmation from the people with more
data.

>
>
>  -Hal
>
>    Most people will build for the standard x86_64-pc-linux or whatever
> anyway,  and completely ignore the change. This will mainly affect those
> who build their own software and optimize for their system, and lots there
> have probably caught on to this already.  I always thought that's what
> -march was made for, really.
>
> GNOMETOYS
>
> On Sat, Nov 11, 2017 at 10:25 AM, Sanjay Patel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Yes - I was thinking of FeatureFastScalarFSQRT / FeatureFastVectorFSQRT
>> which are used by isFsqrtCheap(). These were added to override the
default
>> x86 sqrt estimate codegen with:
>> https://reviews.llvm.org/D21379
>>
>> But I'm not sure we really need that kind of hack. Can we adjust
the
>> attribute in clang based on the target cpu? Ie, if you have something
like:
>> $ clang -O2 -march=skylake-avx512 foo.c
>>
>> Then you can detect that in the clang driver and pass
>> -mprefer-vector-width=256 to clang codegen as an option? Clang codegen
then
>> adds that function attribute to everything it outputs. Then, the
>> vectorizers and/or backend detect that attribute and adjust their
behavior
>> based on it.
>>
>Do we have a precedent for setting a target independent flag from a target
specific cpu string in the clang driver? Want to make sure I understand
what the processing on such a thing would look like. Particularly to get
the order right so the user can override it.

>
>> So I don't think we should be messing with any kind of type
legality
>> checking because that stuff should all be correct already. We're
just
>> choosing a vector size based on a pref. I think we should even allow
the
>> pref to go bigger than a legal type. This came up somewhere on llvm-dev
or
>> in a bug recently in the context of vector reductions.
>>
>>
>>
>> On Fri, Nov 10, 2017 at 6:04 PM, Craig Topper <craig.topper at
gmail.com>
>> wrote:
>>
>>> Are you referring to the X86TargetLowering::isFsqrtCheap hook?
>>>
>>> ~Craig
>>>
>>> On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel <spatel at
rotateright.com>
>>> wrote:
>>>
>>>> We can tie a user preference / override to a CPU model. We do
something
>>>> like that for square root estimates already (although it does
use a
>>>> SubtargetFeature currently for x86; ideally, we'd key that
off of something
>>>> in the CPU scheduler model).
>>>>
>>>>
>>>> On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper <craig.topper
at gmail.com>
>>>> wrote:
>>>>
>>>>> I agree that a less x86 specific command line makes sense.
I've been
>>>>> having an internal discussions with gcc folks and their
evaluating
>>>>> switching to something like
-mprefer-vector-width=128/256/512/none
>>>>>
>>>>> Based on the current performance data we're seeing, we
think we need
>>>>> to ultimately default skylake-avx512 to
-mprefer-vector-width=256. If we go
>>>>> with a target independent option/implementation is there
someway we could
>>>>> still affect the default behavior in a target specific way?
>>>>>
>>>>> ~Craig
>>>>>
>>>>> On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <spatel at
rotateright.com>
>>>>> wrote:
>>>>>
>>>>>> It's clear from the Intel docs how this has
evolved, but from a
>>>>>> compiler perspective, this isn't a Skylake
"feature" :) ... nor an Intel
>>>>>> feature, nor an x86 feature.
>>>>>>
>>>>>> It's a generic programmer hint for any target with
multiple potential
>>>>>> vector lengths.
>>>>>>
>>>>>> On x86, there's already a potential use case for
this hint with a
>>>>>> different starting motivation: re-vectorization.
That's where we take C
>>>>>> code that uses 128-bit vector intrinsics and
selectively widen it to 256-
>>>>>> or 512-bit vector ops based on a newer CPU target than
the code was
>>>>>> originally written for.
>>>>>>
>>>>>> I think it's just a matter of time before a
customer requests the
>>>>>> same ability for another target (maybe they already
have and I don't know
>>>>>> about it). So we should have a solution that recognizes
that possibility.
>>>>>>
>>>>>> Note that having a target-independent implementation in
the optimizer
>>>>>> doesn't preclude a flag alias in clang to maintain
compatibility with gcc.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via
llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> On Fri, Nov 3, 2017, at 05:47, Craig Topper via
llvm-dev wrote:
>>>>>>> > That's a very good point about the
ordering of the command line
>>>>>>> options.
>>>>>>> > gcc's current implementation treats
-mprefer-avx256 has "prefer
>>>>>>> 256 over
>>>>>>> > 512" and -mprefer-avx128 as "prefer
128 over 256". Which feels
>>>>>>> weird for
>>>>>>> > other reasons, but has less of an ordering
ambiguity.
>>>>>>> >
>>>>>>> > -mprefer-avx128 has been in gcc for many years
and predates the
>>>>>>> creation
>>>>>>> > of
>>>>>>> > avx512. -mprefer-avx256 was added a couple
months ago.
>>>>>>> >
>>>>>>> > We've had an internal conversation with
the implementor of
>>>>>>> > -mprefer-avx256
>>>>>>> > in gcc about making -mprefer-avx128 affect
512-bit vectors as
>>>>>>> well. I'll
>>>>>>> > bring up the ambiguity issue with them.
>>>>>>> >
>>>>>>> > Do we want to be compatible with gcc here?
>>>>>>>
>>>>>>> I certainly believe we would want to be compatible
with gcc (if we
>>>>>>> use
>>>>>>> the same names).
>>>>>>>
>>>>>>> Best,
>>>>>>> Tobias
>>>>>>>
>>>>>>> >
>>>>>>> > ~Craig
>>>>>>> >
>>>>>>> > On Thu, Nov 2, 2017 at 7:18 PM, Eric
Christopher <
>>>>>>> echristo at gmail.com>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > On Thu, Nov 2, 2017 at 7:05 PM James Y
Knight via llvm-dev <
>>>>>>> > > llvm-dev at lists.llvm.org> wrote:
>>>>>>> > >
>>>>>>> > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig
Topper via llvm-dev <
>>>>>>> > >> llvm-dev at lists.llvm.org> wrote:
>>>>>>> > >>
>>>>>>> > >>> Hello all,
>>>>>>> > >>>
>>>>>>> > >>>
>>>>>>> > >>>
>>>>>>> > >>> I would like to propose adding
the -mprefer-avx256 and
>>>>>>> -mprefer-avx128
>>>>>>> > >>> command line flags supported by
latest GCC to clang. These
>>>>>>> flags will be
>>>>>>> > >>> used to limit the vector register
size presented by TTI to the
>>>>>>> vectorizers.
>>>>>>> > >>> The backend will still be able to
use wider registers for code
>>>>>>> written
>>>>>>> > >>> using the instrinsics in
x86intrin.h. And the backend will
>>>>>>> still be able to
>>>>>>> > >>> use AVX512VL instructions and the
additional XMM16-31 and
>>>>>>> YMM16-31
>>>>>>> > >>> registers.
>>>>>>> > >>>
>>>>>>> > >>>
>>>>>>> > >>>
>>>>>>> > >>> Motivation:
>>>>>>> > >>>
>>>>>>> > >>> -Using 512-bit operations on some
Intel CPUs may cause a
>>>>>>> decrease in CPU
>>>>>>> > >>> frequency that may offset the
gains from using the wider
>>>>>>> register size. See
>>>>>>> > >>> section 15.26 of Intel® 64 and
IA-32 Architectures
>>>>>>> Optimization Reference
>>>>>>> > >>> Manual published October 2017.
>>>>>>> > >>>
>>>>>>> > >>
>>>>>>> > >> I note the doc mentions that 256-bit
AVX operations also have
>>>>>>> the same
>>>>>>> > >> issue with reducing the CPU
frequency, which is nice to see
>>>>>>> documented!
>>>>>>> > >>
>>>>>>> > >> There's also the issues discussed
here <http://www.agner.org/
>>>>>>> > >> optimize/blog/read.php?i=165> (and
elsewhere) related to
>>>>>>> warm-up time
>>>>>>> > >> for the 256-bit execution pipeline,
which is another issue with
>>>>>>> using
>>>>>>> > >> wide-vector ops.
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >> -The vector ALUs on ports 0 and 1 of
the Skylake Server
>>>>>>> microarchitecture
>>>>>>> > >>> are only 256-bits wide. 512-bit
instructions using these ALUs
>>>>>>> must use both
>>>>>>> > >>> ports. See section 2.1 of Intel®
64 and IA-32 Architectures
>>>>>>> Optimization
>>>>>>> > >>> Reference Manual published
October 2017.
>>>>>>> > >>>
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >>>  Implementation Plan:
>>>>>>> > >>>
>>>>>>> > >>> -Add prefer-avx256 and
prefer-avx128 as SubtargetFeatures in
>>>>>>> X86.td not
>>>>>>> > >>> mapped to any CPU.
>>>>>>> > >>>
>>>>>>> > >>> -Add mprefer-avx256 and
mprefer-avx128 and the corresponding
>>>>>>> > >>> -mno-prefer-avx128/256 options to
clang's driver Options.td
>>>>>>> file. I believe
>>>>>>> > >>> this will allow clang to pass
these straight through to the
>>>>>>> -target-feature
>>>>>>> > >>> attribute in IR.
>>>>>>> > >>>
>>>>>>> > >>> -Modify
X86TTIImpl::getRegisterBitWidth to only return 512 if
>>>>>>> AVX512 is
>>>>>>> > >>> enabled and prefer-avx256 and
prefer-avx128 is not set.
>>>>>>> Similarly return
>>>>>>> > >>> 256 if AVX is enabled and
prefer-avx128 is not set.
>>>>>>> > >>>
>>>>>>> > >>
>>>>>>> > >> Instead of multiple flags that have
difficult to understand
>>>>>>> intersecting
>>>>>>> > >> behavior, one flag with a value would
be better. E.g., what
>>>>>>> should
>>>>>>> > >> "-mprefer-avx256 -mprefer-avx128
-mno-prefer-avx256" do? No
>>>>>>> matter the
>>>>>>> > >> answer, it's confusing.
(Similarly with other such
>>>>>>> combinations). Just a
>>>>>>> > >> single arg
"-mprefer-avx={128/256/512}" (with no "no" version)
>>>>>>> seems easier
>>>>>>> > >> to understand to me (keeping the same
behavior as you mention:
>>>>>>> asking to
>>>>>>> > >> prefer a larger width than is
supported by your architecture
>>>>>>> should be fine
>>>>>>> > >> but ignored).
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > > I agree with this. It's a little more
plumbing as far as
>>>>>>> subtarget
>>>>>>> > > features etc (represent via an optional
value or just various
>>>>>>> "set the avx
>>>>>>> > > width" features - the latter being
easier, but uglier), however,
>>>>>>> it's
>>>>>>> > > probably the right thing to do.
>>>>>>> > >
>>>>>>> > > I was looking at this myself just a
couple weeks ago and think
>>>>>>> this is the
>>>>>>> > > right direction (when and how to turn
things off) - and probably
>>>>>>> makes
>>>>>>> > > sense to be a default for these
architectures? We might end up
>>>>>>> needing to
>>>>>>> > > check a couple of additional TTI places,
but it sounds like
>>>>>>> you're on top
>>>>>>> > > of it. :)
>>>>>>> > >
>>>>>>> > > Thanks very much for doing this work.
>>>>>>> > >
>>>>>>> > > -eric
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >> There may be some other backend
changes needed, but I plan to
>>>>>>> address
>>>>>>> > >>> those as we find them.
>>>>>>> > >>>
>>>>>>> > >>>
>>>>>>> > >>> At a later point, consider making
-mprefer-avx256 the default
>>>>>>> for
>>>>>>> > >>> Skylake Server due to the above
mentioned performance
>>>>>>> considerations.
>>>>>>> > >>>
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >>>
>>>>>>> > >> Does this sound reasonable?
>>>>>>> > >>>
>>>>>>> > >>>
>>>>>>> > >>>
>>>>>>> > >>> *Latest Intel Optimization manual
available here:
>>>>>>> > >>>
https://software.intel.com/en-us/articles/intel-sdm#optimiza
>>>>>>> tion
>>>>>>> > >>>
>>>>>>> > >>>
>>>>>>> > >>> -Craig Topper
>>>>>>> > >>>
>>>>>>> > >>>
_______________________________________________
>>>>>>> > >>> LLVM Developers mailing list
>>>>>>> > >>> llvm-dev at lists.llvm.org
>>>>>>> > >>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>> > >>>
>>>>>>> > >>>
_______________________________________________
>>>>>>> > >> LLVM Developers mailing list
>>>>>>> > >> llvm-dev at lists.llvm.org
>>>>>>> > >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>> > >>
>>>>>>> > >
>>>>>>> >
_______________________________________________
>>>>>>> > LLVM Developers mailing list
>>>>>>> > llvm-dev at lists.llvm.org
>>>>>>> >
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171113/c0ceb080/attachment-0001.html>

Sanjay Patel via llvm-dev

2017-Nov-13 23:45 UTC

head link

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

On Mon, Nov 13, 2017 at 3:15 PM, Craig Topper <craig.topper at gmail.com>
wrote:
> Do we have a precedent for setting a target independent flag from a target
> specific cpu string in the clang driver? Want to make sure I understand
> what the processing on such a thing would look like. Particularly to get
> the order right so the user can override it.
>
I think Clang::AddX86TargetArgs() has a target CPU in its arg list, so you
could do some checking/adding in there, but I'm just guessing at what's
the
right way to do this - ask on cfe-dev?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171113/51e37143/attachment.html>

Eric Christopher via llvm-dev

2017-Nov-13 23:49 UTC

head link

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

On Mon, Nov 13, 2017 at 2:15 PM Craig Topper via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Sat, Nov 11, 2017 at 8:52 PM, Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>> On 11/11/2017 09:52 PM, UE US via llvm-dev wrote:
>>
>> If skylake is that bad at AVX2
>>
>>
>> I don't think this says anything negative about AVX2, but AVX-512.
>>
>Right. I think we're at AVX/AVX2 is "bad" on Haswell/Broadwell and
AVX512
is "bad" on Skylake. At least in the "random autovectorization
spread out"
aspect.

>
>>
>> it belongs in -mcpu / -march IMO.
>>
>>
>> No. We'd still want to enable the architectural features for vector
>> intrinsics and the like.
>>
>
> I took this to mean that the feature should be enabled by default for
> -march=skylake-avx512.
>

Agreed.

-eric

>
>
>
>>
>>
>> Based on the current performance data we're seeing, we think we
need to
>> ultimately default skylake-avx512 to -mprefer-vector-width=256.
>>
>>
>> Craig, is this for both integer and floating-point code?
>>
>
> I believe so, but I'll try to get confirmation from the people with
more
> data.
>
>
>>
>>
>>  -Hal
>>
>>    Most people will build for the standard x86_64-pc-linux or whatever
>> anyway,  and completely ignore the change. This will mainly affect
those
>> who build their own software and optimize for their system, and lots
there
>> have probably caught on to this already.  I always thought that's
what
>> -march was made for, really.
>>
>> GNOMETOYS
>>
>> On Sat, Nov 11, 2017 at 10:25 AM, Sanjay Patel via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Yes - I was thinking of FeatureFastScalarFSQRT /
FeatureFastVectorFSQRT
>>> which are used by isFsqrtCheap(). These were added to override the
default
>>> x86 sqrt estimate codegen with:
>>> https://reviews.llvm.org/D21379
>>>
>>> But I'm not sure we really need that kind of hack. Can we
adjust the
>>> attribute in clang based on the target cpu? Ie, if you have
something like:
>>> $ clang -O2 -march=skylake-avx512 foo.c
>>>
>>> Then you can detect that in the clang driver and pass
>>> -mprefer-vector-width=256 to clang codegen as an option? Clang
codegen then
>>> adds that function attribute to everything it outputs. Then, the
>>> vectorizers and/or backend detect that attribute and adjust their
behavior
>>> based on it.
>>>
>>
> Do we have a precedent for setting a target independent flag from a target
> specific cpu string in the clang driver? Want to make sure I understand
> what the processing on such a thing would look like. Particularly to get
> the order right so the user can override it.
>
>
>>
>>> So I don't think we should be messing with any kind of type
legality
>>> checking because that stuff should all be correct already.
We're just
>>> choosing a vector size based on a pref. I think we should even
allow the
>>> pref to go bigger than a legal type. This came up somewhere on
llvm-dev or
>>> in a bug recently in the context of vector reductions.
>>>
>>>
>>>
>>> On Fri, Nov 10, 2017 at 6:04 PM, Craig Topper <craig.topper at
gmail.com>
>>> wrote:
>>>
>>>> Are you referring to the X86TargetLowering::isFsqrtCheap hook?
>>>>
>>>> ~Craig
>>>>
>>>> On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel <spatel at
rotateright.com>
>>>> wrote:
>>>>
>>>>> We can tie a user preference / override to a CPU model. We
do
>>>>> something like that for square root estimates already
(although it does use
>>>>> a SubtargetFeature currently for x86; ideally, we'd key
that off of
>>>>> something in the CPU scheduler model).
>>>>>
>>>>>
>>>>> On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper
<craig.topper at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I agree that a less x86 specific command line makes
sense. I've been
>>>>>> having an internal discussions with gcc folks and their
evaluating
>>>>>> switching to something like
-mprefer-vector-width=128/256/512/none
>>>>>>
>>>>>> Based on the current performance data we're seeing,
we think we need
>>>>>> to ultimately default skylake-avx512 to
-mprefer-vector-width=256. If we go
>>>>>> with a target independent option/implementation is
there someway we could
>>>>>> still affect the default behavior in a target specific
way?
>>>>>>
>>>>>> ~Craig
>>>>>>
>>>>>> On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <spatel
at rotateright.com>
>>>>>> wrote:
>>>>>>
>>>>>>> It's clear from the Intel docs how this has
evolved, but from a
>>>>>>> compiler perspective, this isn't a Skylake
"feature" :) ... nor an Intel
>>>>>>> feature, nor an x86 feature.
>>>>>>>
>>>>>>> It's a generic programmer hint for any target
with multiple
>>>>>>> potential vector lengths.
>>>>>>>
>>>>>>> On x86, there's already a potential use case
for this hint with a
>>>>>>> different starting motivation: re-vectorization.
That's where we take C
>>>>>>> code that uses 128-bit vector intrinsics and
selectively widen it to 256-
>>>>>>> or 512-bit vector ops based on a newer CPU target
than the code was
>>>>>>> originally written for.
>>>>>>>
>>>>>>> I think it's just a matter of time before a
customer requests the
>>>>>>> same ability for another target (maybe they already
have and I don't know
>>>>>>> about it). So we should have a solution that
recognizes that possibility.
>>>>>>>
>>>>>>> Note that having a target-independent
implementation in the
>>>>>>> optimizer doesn't preclude a flag alias in
clang to maintain compatibility
>>>>>>> with gcc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via
llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> On Fri, Nov 3, 2017, at 05:47, Craig Topper via
llvm-dev wrote:
>>>>>>>> > That's a very good point about the
ordering of the command line
>>>>>>>> options.
>>>>>>>> > gcc's current implementation treats
-mprefer-avx256 has "prefer
>>>>>>>> 256 over
>>>>>>>> > 512" and -mprefer-avx128 as
"prefer 128 over 256". Which feels
>>>>>>>> weird for
>>>>>>>> > other reasons, but has less of an ordering
ambiguity.
>>>>>>>> >
>>>>>>>> > -mprefer-avx128 has been in gcc for many
years and predates the
>>>>>>>> creation
>>>>>>>> > of
>>>>>>>> > avx512. -mprefer-avx256 was added a couple
months ago.
>>>>>>>> >
>>>>>>>> > We've had an internal conversation
with the implementor of
>>>>>>>> > -mprefer-avx256
>>>>>>>> > in gcc about making -mprefer-avx128 affect
512-bit vectors as
>>>>>>>> well. I'll
>>>>>>>> > bring up the ambiguity issue with them.
>>>>>>>> >
>>>>>>>> > Do we want to be compatible with gcc here?
>>>>>>>>
>>>>>>>> I certainly believe we would want to be
compatible with gcc (if we
>>>>>>>> use
>>>>>>>> the same names).
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Tobias
>>>>>>>>
>>>>>>>> >
>>>>>>>> > ~Craig
>>>>>>>> >
>>>>>>>> > On Thu, Nov 2, 2017 at 7:18 PM, Eric
Christopher <
>>>>>>>> echristo at gmail.com>
>>>>>>>> > wrote:
>>>>>>>> >
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > > On Thu, Nov 2, 2017 at 7:05 PM James
Y Knight via llvm-dev <
>>>>>>>> > > llvm-dev at lists.llvm.org> wrote:
>>>>>>>> > >
>>>>>>>> > >> On Wed, Nov 1, 2017 at 7:35 PM,
Craig Topper via llvm-dev <
>>>>>>>> > >> llvm-dev at lists.llvm.org>
wrote:
>>>>>>>> > >>
>>>>>>>> > >>> Hello all,
>>>>>>>> > >>>
>>>>>>>> > >>>
>>>>>>>> > >>>
>>>>>>>> > >>> I would like to propose
adding the -mprefer-avx256 and
>>>>>>>> -mprefer-avx128
>>>>>>>> > >>> command line flags supported
by latest GCC to clang. These
>>>>>>>> flags will be
>>>>>>>> > >>> used to limit the vector
register size presented by TTI to
>>>>>>>> the vectorizers.
>>>>>>>> > >>> The backend will still be
able to use wider registers for
>>>>>>>> code written
>>>>>>>> > >>> using the instrinsics in
x86intrin.h. And the backend will
>>>>>>>> still be able to
>>>>>>>> > >>> use AVX512VL instructions and
the additional XMM16-31 and
>>>>>>>> YMM16-31
>>>>>>>> > >>> registers.
>>>>>>>> > >>>
>>>>>>>> > >>>
>>>>>>>> > >>>
>>>>>>>> > >>> Motivation:
>>>>>>>> > >>>
>>>>>>>> > >>> -Using 512-bit operations on
some Intel CPUs may cause a
>>>>>>>> decrease in CPU
>>>>>>>> > >>> frequency that may offset the
gains from using the wider
>>>>>>>> register size. See
>>>>>>>> > >>> section 15.26 of Intel® 64
and IA-32 Architectures
>>>>>>>> Optimization Reference
>>>>>>>> > >>> Manual published October
2017.
>>>>>>>> > >>>
>>>>>>>> > >>
>>>>>>>> > >> I note the doc mentions that
256-bit AVX operations also have
>>>>>>>> the same
>>>>>>>> > >> issue with reducing the CPU
frequency, which is nice to see
>>>>>>>> documented!
>>>>>>>> > >>
>>>>>>>> > >> There's also the issues
discussed here <http://www.agner.org/
>>>>>>>> > >> optimize/blog/read.php?i=165>
(and elsewhere) related to
>>>>>>>> warm-up time
>>>>>>>> > >> for the 256-bit execution
pipeline, which is another issue
>>>>>>>> with using
>>>>>>>> > >> wide-vector ops.
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >> -The vector ALUs on ports 0 and 1
of the Skylake Server
>>>>>>>> microarchitecture
>>>>>>>> > >>> are only 256-bits wide.
512-bit instructions using these ALUs
>>>>>>>> must use both
>>>>>>>> > >>> ports. See section 2.1 of
Intel® 64 and IA-32 Architectures
>>>>>>>> Optimization
>>>>>>>> > >>> Reference Manual published
October 2017.
>>>>>>>> > >>>
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >>>  Implementation Plan:
>>>>>>>> > >>>
>>>>>>>> > >>> -Add prefer-avx256 and
prefer-avx128 as SubtargetFeatures in
>>>>>>>> X86.td not
>>>>>>>> > >>> mapped to any CPU.
>>>>>>>> > >>>
>>>>>>>> > >>> -Add mprefer-avx256 and
mprefer-avx128 and the corresponding
>>>>>>>> > >>> -mno-prefer-avx128/256
options to clang's driver Options.td
>>>>>>>> file. I believe
>>>>>>>> > >>> this will allow clang to pass
these straight through to the
>>>>>>>> -target-feature
>>>>>>>> > >>> attribute in IR.
>>>>>>>> > >>>
>>>>>>>> > >>> -Modify
X86TTIImpl::getRegisterBitWidth to only return 512 if
>>>>>>>> AVX512 is
>>>>>>>> > >>> enabled and prefer-avx256 and
prefer-avx128 is not set.
>>>>>>>> Similarly return
>>>>>>>> > >>> 256 if AVX is enabled and
prefer-avx128 is not set.
>>>>>>>> > >>>
>>>>>>>> > >>
>>>>>>>> > >> Instead of multiple flags that
have difficult to understand
>>>>>>>> intersecting
>>>>>>>> > >> behavior, one flag with a value
would be better. E.g., what
>>>>>>>> should
>>>>>>>> > >> "-mprefer-avx256
-mprefer-avx128 -mno-prefer-avx256" do? No
>>>>>>>> matter the
>>>>>>>> > >> answer, it's confusing.
(Similarly with other such
>>>>>>>> combinations). Just a
>>>>>>>> > >> single arg
"-mprefer-avx={128/256/512}" (with no "no" version)
>>>>>>>> seems easier
>>>>>>>> > >> to understand to me (keeping the
same behavior as you mention:
>>>>>>>> asking to
>>>>>>>> > >> prefer a larger width than is
supported by your architecture
>>>>>>>> should be fine
>>>>>>>> > >> but ignored).
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > > I agree with this. It's a little
more plumbing as far as
>>>>>>>> subtarget
>>>>>>>> > > features etc (represent via an
optional value or just various
>>>>>>>> "set the avx
>>>>>>>> > > width" features - the latter
being easier, but uglier),
>>>>>>>> however, it's
>>>>>>>> > > probably the right thing to do.
>>>>>>>> > >
>>>>>>>> > > I was looking at this myself just a
couple weeks ago and think
>>>>>>>> this is the
>>>>>>>> > > right direction (when and how to turn
things off) - and
>>>>>>>> probably makes
>>>>>>>> > > sense to be a default for these
architectures? We might end up
>>>>>>>> needing to
>>>>>>>> > > check a couple of additional TTI
places, but it sounds like
>>>>>>>> you're on top
>>>>>>>> > > of it. :)
>>>>>>>> > >
>>>>>>>> > > Thanks very much for doing this work.
>>>>>>>> > >
>>>>>>>> > > -eric
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >> There may be some other backend
changes needed, but I plan to
>>>>>>>> address
>>>>>>>> > >>> those as we find them.
>>>>>>>> > >>>
>>>>>>>> > >>>
>>>>>>>> > >>> At a later point, consider
making -mprefer-avx256 the default
>>>>>>>> for
>>>>>>>> > >>> Skylake Server due to the
above mentioned performance
>>>>>>>> considerations.
>>>>>>>> > >>>
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >>>
>>>>>>>> > >> Does this sound reasonable?
>>>>>>>> > >>>
>>>>>>>> > >>>
>>>>>>>> > >>>
>>>>>>>> > >>> *Latest Intel Optimization
manual available here:
>>>>>>>> > >>>
>>>>>>>>
https://software.intel.com/en-us/articles/intel-sdm#optimization
>>>>>>>> > >>>
>>>>>>>> > >>>
>>>>>>>> > >>> -Craig Topper
>>>>>>>> > >>>
>>>>>>>> > >>>
_______________________________________________
>>>>>>>> > >>> LLVM Developers mailing list
>>>>>>>> > >>> llvm-dev at lists.llvm.org
>>>>>>>> > >>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>> > >>>
>>>>>>>> > >>>
_______________________________________________
>>>>>>>> > >> LLVM Developers mailing list
>>>>>>>> > >> llvm-dev at lists.llvm.org
>>>>>>>> > >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>> > >>
>>>>>>>> > >
>>>>>>>> >
_______________________________________________
>>>>>>>> > LLVM Developers mailing list
>>>>>>>> > llvm-dev at lists.llvm.org
>>>>>>>> >
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171113/6bc812cc/attachment.html>

Hal Finkel via llvm-dev

2017-Nov-13 23:54 UTC

head link

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

On 11/13/2017 05:49 PM, Eric Christopher wrote:>
>
> On Mon, Nov 13, 2017 at 2:15 PM Craig Topper via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     On Sat, Nov 11, 2017 at 8:52 PM, Hal Finkel via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>
>         On 11/11/2017 09:52 PM, UE US via llvm-dev wrote:
>>         If skylake is that bad at AVX2
>
>         I don't think this says anything negative about AVX2, but
AVX-512.
>
>
> Right. I think we're at AVX/AVX2 is "bad" on
Haswell/Broadwell and
> AVX512 is "bad" on Skylake. At least in the "random
autovectorization
> spread out" aspect.
>
>
>
>>         it belongs in -mcpu / -march IMO.
>
>         No. We'd still want to enable the architectural features for
>         vector intrinsics and the like.
>
>
>     I took this to mean that the feature should be enabled by default
>     for -march=skylake-avx512.
>
>
>
> Agreed.
Yes. Also, GNOMETOYS clarified to me (off list) that is what he meant.

  -Hal
>
> -eric
>
>
>
>
>>         Based on the current performance data we're seeing, we
think
>>         we need to ultimately default skylake-avx512 to
>>         -mprefer-vector-width=256.
>
>         Craig, is this for both integer and floating-point code?
>
>
>     I believe so, but I'll try to get confirmation from the people
>     with more data.
>
>
>
>          -Hal
>
>>            Most people will build for the standard x86_64-pc-linux or
>>         whatever anyway,  and completely ignore the change. This will
>>         mainly affect those who build their own software and optimize
>>         for their system, and lots there have probably caught on to
>>         this already.  I always thought that's what -march was made
>>         for, really.
>>
>>         GNOMETOYS
>>
>>         On Sat, Nov 11, 2017 at 10:25 AM, Sanjay Patel via llvm-dev
>>         <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>             Yes - I was thinking of FeatureFastScalarFSQRT /
>>             FeatureFastVectorFSQRT which are used by isFsqrtCheap().
>>             These were added to override the default x86 sqrt
>>             estimate codegen with:
>>             https://reviews.llvm.org/D21379
>>
>>             But I'm not sure we really need that kind of hack. Can
we
>>             adjust the attribute in clang based on the target cpu?
>>             Ie, if you have something like:
>>             $ clang -O2 -march=skylake-avx512 foo.c
>>
>>             Then you can detect that in the clang driver and pass
>>             -mprefer-vector-width=256 to clang codegen as an option?
>>             Clang codegen then adds that function attribute to
>>             everything it outputs. Then, the vectorizers and/or
>>             backend detect that attribute and adjust their behavior
>>             based on it.
>>
>
>     Do we have a precedent for setting a target independent flag from
>     a target specific cpu string in the clang driver? Want to make
>     sure I understand what the processing on such a thing would look
>     like. Particularly to get the order right so the user can override it.
>
>>
>>             So I don't think we should be messing with any kind of
>>             type legality checking because that stuff should all be
>>             correct already. We're just choosing a vector size
based
>>             on a pref. I think we should even allow the pref to go
>>             bigger than a legal type. This came up somewhere on
>>             llvm-dev or in a bug recently in the context of vector
>>             reductions.
>>
>>
>>
>>             On Fri, Nov 10, 2017 at 6:04 PM, Craig Topper
>>             <craig.topper at gmail.com <mailto:craig.topper at
gmail.com>>
>>             wrote:
>>
>>                 Are you referring to
>>                 the X86TargetLowering::isFsqrtCheap hook?
>>
>>                 ~Craig
>>
>>                 On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel
>>                 <spatel at rotateright.com
>>                 <mailto:spatel at rotateright.com>> wrote:
>>
>>                     We can tie a user preference / override to a CPU
>>                     model. We do something like that for square root
>>                     estimates already (although it does use a
>>                     SubtargetFeature currently for x86; ideally,
we'd
>>                     key that off of something in the CPU scheduler
>>                     model).
>>
>>
>>                     On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper
>>                     <craig.topper at gmail.com
>>                     <mailto:craig.topper at gmail.com>> wrote:
>>
>>                         I agree that a less x86 specific command line
>>                         makes sense. I've been having an internal
>>                         discussions with gcc folks and their
>>                         evaluating switching to something like
>>                         -mprefer-vector-width=128/256/512/none
>>
>>                         Based on the current performance data we're
>>                         seeing, we think we need to ultimately
>>                         default skylake-avx512 to
>>                         -mprefer-vector-width=256. If we go with a
>>                         target independent option/implementation is
>>                         there someway we could still affect the
>>                         default behavior in a target specific way?
>>
>>                         ~Craig
>>
>>                         On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel
>>                         <spatel at rotateright.com
>>                         <mailto:spatel at rotateright.com>>
wrote:
>>
>>                             It's clear from the Intel docs how this
>>                             has evolved, but from a compiler
>>                             perspective, this isn't a Skylake
>>                             "feature" :) ... nor an Intel
feature,
>>                             nor an x86 feature.
>>
>>                             It's a generic programmer hint for any
>>                             target with multiple potential vector
>>                             lengths.
>>
>>                             On x86, there's already a potential use
>>                             case for this hint with a different
>>                             starting motivation: re-vectorization.
>>                             That's where we take C code that uses
>>                             128-bit vector intrinsics and selectively
>>                             widen it to 256- or 512-bit vector ops
>>                             based on a newer CPU target than the code
>>                             was originally written for.
>>
>>                             I think it's just a matter of time
before
>>                             a customer requests the same ability for
>>                             another target (maybe they already have
>>                             and I don't know about it). So we
should
>>                             have a solution that recognizes that
>>                             possibility.
>>
>>                             Note that having a target-independent
>>                             implementation in the optimizer doesn't
>>                             preclude a flag alias in clang to
>>                             maintain compatibility with gcc.
>>
>>
>>
>>                             On Tue, Nov 7, 2017 at 2:02 AM, Tobias
>>                             Grosser via llvm-dev
>>                             <llvm-dev at lists.llvm.org
>>                             <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>                                 On Fri, Nov 3, 2017, at 05:47, Craig
>>                                 Topper via llvm-dev wrote:
>>                                 > That's a very good point about
the
>>                                 ordering of the command line options.
>>                                 > gcc's current implementation
treats
>>                                 -mprefer-avx256 has "prefer 256
over
>>                                 > 512" and -mprefer-avx128 as
"prefer
>>                                 128 over 256". Which feels weird
for
>>                                 > other reasons, but has less of an
>>                                 ordering ambiguity.
>>                                 >
>>                                 > -mprefer-avx128 has been in gcc
for
>>                                 many years and predates the creation
>>                                 > of
>>                                 > avx512. -mprefer-avx256 was added
a
>>                                 couple months ago.
>>                                 >
>>                                 > We've had an internal
conversation
>>                                 with the implementor of
>>                                 > -mprefer-avx256
>>                                 > in gcc about making
-mprefer-avx128
>>                                 affect 512-bit vectors as well.
I'll
>>                                 > bring up the ambiguity issue with
them.
>>                                 >
>>                                 > Do we want to be compatible with
>>                                 gcc here?
>>
>>                                 I certainly believe we would want to
>>                                 be compatible with gcc (if we use
>>                                 the same names).
>>
>>                                 Best,
>>                                 Tobias
>>
>>                                 >
>>                                 > ~Craig
>>                                 >
>>                                 > On Thu, Nov 2, 2017 at 7:18 PM,
>>                                 Eric Christopher <echristo at
gmail.com
>>                                 <mailto:echristo at
gmail.com>>
>>                                 > wrote:
>>                                 >
>>                                 > >
>>                                 > >
>>                                 > > On Thu, Nov 2, 2017 at 7:05
PM
>>                                 James Y Knight via llvm-dev <
>>                                 > > llvm-dev at lists.llvm.org
>>                                 <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>                                 > >
>>                                 > >> On Wed, Nov 1, 2017 at
7:35 PM,
>>                                 Craig Topper via llvm-dev <
>>                                 > >> llvm-dev at
lists.llvm.org
>>                                 <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>                                 > >>
>>                                 > >>> Hello all,
>>                                 > >>>
>>                                 > >>>
>>                                 > >>>
>>                                 > >>> I would like to
propose adding
>>                                 the -mprefer-avx256 and -mprefer-avx128
>>                                 > >>> command line flags
supported by
>>                                 latest GCC to clang. These flags will
be
>>                                 > >>> used to limit the
vector
>>                                 register size presented by TTI to the
>>                                 vectorizers.
>>                                 > >>> The backend will
still be able
>>                                 to use wider registers for code written
>>                                 > >>> using the instrinsics
in
>>                                 x86intrin.h. And the backend will
>>                                 still be able to
>>                                 > >>> use AVX512VL
instructions and
>>                                 the additional XMM16-31 and YMM16-31
>>                                 > >>> registers.
>>                                 > >>>
>>                                 > >>>
>>                                 > >>>
>>                                 > >>> Motivation:
>>                                 > >>>
>>                                 > >>> -Using 512-bit
operations on
>>                                 some Intel CPUs may cause a decrease
>>                                 in CPU
>>                                 > >>> frequency that may
offset the
>>                                 gains from using the wider register
>>                                 size. See
>>                                 > >>> section 15.26 of
Intel® 64 and
>>                                 IA-32 Architectures Optimization
>>                                 Reference
>>                                 > >>> Manual published
October 2017.
>>                                 > >>>
>>                                 > >>
>>                                 > >> I note the doc mentions
that
>>                                 256-bit AVX operations also have the
same
>>                                 > >> issue with reducing the
CPU
>>                                 frequency, which is nice to see
>>                                 documented!
>>                                 > >>
>>                                 > >> There's also the
issues
>>                                 discussed here
<http://www.agner.org/
>>                                 > >>
optimize/blog/read.php?i=165>
>>                                 (and elsewhere) related to warm-up time
>>                                 > >> for the 256-bit execution
>>                                 pipeline, which is another issue with
>>                                 using
>>                                 > >> wide-vector ops.
>>                                 > >>
>>                                 > >>
>>                                 > >> -The vector ALUs on ports
0 and
>>                                 1 of the Skylake Server
microarchitecture
>>                                 > >>> are only 256-bits
wide. 512-bit
>>                                 instructions using these ALUs must
>>                                 use both
>>                                 > >>> ports. See section
2.1 of
>>                                 Intel® 64 and IA-32 Architectures
>>                                 Optimization
>>                                 > >>> Reference Manual
published
>>                                 October 2017.
>>                                 > >>>
>>                                 > >>
>>                                 > >>
>>                                 > >>> Implementation Plan:
>>                                 > >>>
>>                                 > >>> -Add prefer-avx256
and
>>                                 prefer-avx128 as SubtargetFeatures in
>>                                 X86.td not
>>                                 > >>> mapped to any CPU.
>>                                 > >>>
>>                                 > >>> -Add mprefer-avx256
and
>>                                 mprefer-avx128 and the corresponding
>>                                 > >>>
-mno-prefer-avx128/256 options
>>                                 to clang's driver Options.td file.
I
>>                                 believe
>>                                 > >>> this will allow clang
to pass
>>                                 these straight through to the
>>                                 -target-feature
>>                                 > >>> attribute in IR.
>>                                 > >>>
>>                                 > >>> -Modify
>>                                 X86TTIImpl::getRegisterBitWidth to
>>                                 only return 512 if AVX512 is
>>                                 > >>> enabled and
prefer-avx256 and
>>                                 prefer-avx128 is not set. Similarly
>>                                 return
>>                                 > >>> 256 if AVX is enabled
and
>>                                 prefer-avx128 is not set.
>>                                 > >>>
>>                                 > >>
>>                                 > >> Instead of multiple flags
that
>>                                 have difficult to understand
intersecting
>>                                 > >> behavior, one flag with a
value
>>                                 would be better. E.g., what should
>>                                 > >> "-mprefer-avx256
-mprefer-avx128
>>                                 -mno-prefer-avx256" do? No matter
the
>>                                 > >> answer, it's
confusing.
>>                                 (Similarly with other such
>>                                 combinations). Just a
>>                                 > >> single arg
>>                                 "-mprefer-avx={128/256/512}"
(with no
>>                                 "no" version) seems easier
>>                                 > >> to understand to me
(keeping the
>>                                 same behavior as you mention: asking to
>>                                 > >> prefer a larger width
than is
>>                                 supported by your architecture should
>>                                 be fine
>>                                 > >> but ignored).
>>                                 > >>
>>                                 > >>
>>                                 > > I agree with this. It's a
little
>>                                 more plumbing as far as subtarget
>>                                 > > features etc (represent via
an
>>                                 optional value or just various
"set
>>                                 the avx
>>                                 > > width" features - the
latter
>>                                 being easier, but uglier), however,
it's
>>                                 > > probably the right thing to
do.
>>                                 > >
>>                                 > > I was looking at this myself
just
>>                                 a couple weeks ago and think this is
the
>>                                 > > right direction (when and how
to
>>                                 turn things off) - and probably makes
>>                                 > > sense to be a default for
these
>>                                 architectures? We might end up needing
to
>>                                 > > check a couple of additional
TTI
>>                                 places, but it sounds like you're
on top
>>                                 > > of it. :)
>>                                 > >
>>                                 > > Thanks very much for doing
this work.
>>                                 > >
>>                                 > > -eric
>>                                 > >
>>                                 > >
>>                                 > >>
>>                                 > >>
>>                                 > >> There may be some other
backend
>>                                 changes needed, but I plan to address
>>                                 > >>> those as we find
them.
>>                                 > >>>
>>                                 > >>>
>>                                 > >>> At a later point,
consider
>>                                 making -mprefer-avx256 the default for
>>                                 > >>> Skylake Server due to
the above
>>                                 mentioned performance considerations.
>>                                 > >>>
>>                                 > >>
>>                                 > >>
>>                                 > >>
>>                                 > >>
>>                                 > >>
>>                                 > >>>
>>                                 > >> Does this sound
reasonable?
>>                                 > >>>
>>                                 > >>>
>>                                 > >>>
>>                                 > >>> *Latest Intel
Optimization
>>                                 manual available here:
>>                                 > >>>
>>                                
https://software.intel.com/en-us/articles/intel-sdm#optimization
>>                                 > >>>
>>                                 > >>>
>>                                 > >>> -Craig Topper
>>                                 > >>>
>>                                 > >>>
>>                                
_______________________________________________
>>                                 > >>> LLVM Developers
mailing list
>>                                 > >>> llvm-dev at
lists.llvm.org
>>                                 <mailto:llvm-dev at
lists.llvm.org>
>>                                 > >>>
>>                                
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>                                 > >>>
>>                                 > >>>
>>                                
_______________________________________________
>>                                 > >> LLVM Developers mailing
list
>>                                 > >> llvm-dev at
lists.llvm.org
>>                                 <mailto:llvm-dev at
lists.llvm.org>
>>                                 > >>
>>                                
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>                                 > >>
>>                                 > >
>>                                 >
>>                                
_______________________________________________
>>                                 > LLVM Developers mailing list
>>                                 > llvm-dev at lists.llvm.org
>>                                 <mailto:llvm-dev at
lists.llvm.org>
>>                                 >
>>                                
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>                                
_______________________________________________
>>                                 LLVM Developers mailing list
>>                                 llvm-dev at lists.llvm.org
>>                                 <mailto:llvm-dev at
lists.llvm.org>
>>                                
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>>
>>
>>
>>             _______________________________________________
>>             LLVM Developers mailing list
>>             llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>             http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>>         _______________________________________________
>>         LLVM Developers mailing list
>>         llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>         -- 
>         Hal Finkel
>         Lead, Compiler Technology and Programming Languages
>         Leadership Computing Facility
>         Argonne National Laboratory
>
>
>         _______________________________________________
>         LLVM Developers mailing list
>         llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171113/d4ec460d/attachment-0001.html>

llvm dev - Nov 2017 - RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available