thr3ads.net - llvm dev - [llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Craig Topper via llvm-dev

2017-Nov-09 23:21 UTC

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

I agree that a less x86 specific command line makes sense. I've been having
an internal discussions with gcc folks and their evaluating switching to
something like -mprefer-vector-width=128/256/512/none

Based on the current performance data we're seeing, we think we need to
ultimately default skylake-avx512 to -mprefer-vector-width=256. If we go
with a target independent option/implementation is there someway we could
still affect the default behavior in a target specific way?

~Craig

On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <spatel at rotateright.com>
wrote:
> It's clear from the Intel docs how this has evolved, but from a
compiler
> perspective, this isn't a Skylake "feature" :) ... nor an
Intel feature,
> nor an x86 feature.
>
> It's a generic programmer hint for any target with multiple potential
> vector lengths.
>
> On x86, there's already a potential use case for this hint with a
> different starting motivation: re-vectorization. That's where we take C
> code that uses 128-bit vector intrinsics and selectively widen it to 256-
> or 512-bit vector ops based on a newer CPU target than the code was
> originally written for.
>
> I think it's just a matter of time before a customer requests the same
> ability for another target (maybe they already have and I don't know
about
> it). So we should have a solution that recognizes that possibility.
>
> Note that having a target-independent implementation in the optimizer
> doesn't preclude a flag alias in clang to maintain compatibility with
gcc.
>
>
>
> On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote:
>> > That's a very good point about the ordering of the command
line options.
>> > gcc's current implementation treats -mprefer-avx256 has
"prefer 256 over
>> > 512" and -mprefer-avx128 as "prefer 128 over 256".
Which feels weird for
>> > other reasons, but has less of an ordering ambiguity.
>> >
>> > -mprefer-avx128 has been in gcc for many years and predates the
creation
>> > of
>> > avx512. -mprefer-avx256 was added a couple months ago.
>> >
>> > We've had an internal conversation with the implementor of
>> > -mprefer-avx256
>> > in gcc about making -mprefer-avx128 affect 512-bit vectors as
well. I'll
>> > bring up the ambiguity issue with them.
>> >
>> > Do we want to be compatible with gcc here?
>>
>> I certainly believe we would want to be compatible with gcc (if we use
>> the same names).
>>
>> Best,
>> Tobias
>>
>> >
>> > ~Craig
>> >
>> > On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo at
gmail.com>
>> > wrote:
>> >
>> > >
>> > >
>> > > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev
<
>> > > llvm-dev at lists.llvm.org> wrote:
>> > >
>> > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev
<
>> > >> llvm-dev at lists.llvm.org> wrote:
>> > >>
>> > >>> Hello all,
>> > >>>
>> > >>>
>> > >>>
>> > >>> I would like to propose adding the -mprefer-avx256
and
>> -mprefer-avx128
>> > >>> command line flags supported by latest GCC to clang.
These flags
>> will be
>> > >>> used to limit the vector register size presented by
TTI to the
>> vectorizers.
>> > >>> The backend will still be able to use wider registers
for code
>> written
>> > >>> using the instrinsics in x86intrin.h. And the backend
will still be
>> able to
>> > >>> use AVX512VL instructions and the additional XMM16-31
and YMM16-31
>> > >>> registers.
>> > >>>
>> > >>>
>> > >>>
>> > >>> Motivation:
>> > >>>
>> > >>> -Using 512-bit operations on some Intel CPUs may
cause a decrease
>> in CPU
>> > >>> frequency that may offset the gains from using the
wider register
>> size. See
>> > >>> section 15.26 of Intel® 64 and IA-32 Architectures
Optimization
>> Reference
>> > >>> Manual published October 2017.
>> > >>>
>> > >>
>> > >> I note the doc mentions that 256-bit AVX operations also
have the
>> same
>> > >> issue with reducing the CPU frequency, which is nice to
see
>> documented!
>> > >>
>> > >> There's also the issues discussed here
<http://www.agner.org/
>> > >> optimize/blog/read.php?i=165> (and elsewhere) related
to warm-up time
>> > >> for the 256-bit execution pipeline, which is another
issue with using
>> > >> wide-vector ops.
>> > >>
>> > >>
>> > >> -The vector ALUs on ports 0 and 1 of the Skylake Server
>> microarchitecture
>> > >>> are only 256-bits wide. 512-bit instructions using
these ALUs must
>> use both
>> > >>> ports. See section 2.1 of Intel® 64 and IA-32
Architectures
>> Optimization
>> > >>> Reference Manual published October 2017.
>> > >>>
>> > >>
>> > >>
>> > >>>  Implementation Plan:
>> > >>>
>> > >>> -Add prefer-avx256 and prefer-avx128 as
SubtargetFeatures in X86.td
>> not
>> > >>> mapped to any CPU.
>> > >>>
>> > >>> -Add mprefer-avx256 and mprefer-avx128 and the
corresponding
>> > >>> -mno-prefer-avx128/256 options to clang's driver
Options.td file. I
>> believe
>> > >>> this will allow clang to pass these straight through
to the
>> -target-feature
>> > >>> attribute in IR.
>> > >>>
>> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only
return 512 if
>> AVX512 is
>> > >>> enabled and prefer-avx256 and prefer-avx128 is not
set. Similarly
>> return
>> > >>> 256 if AVX is enabled and prefer-avx128 is not set.
>> > >>>
>> > >>
>> > >> Instead of multiple flags that have difficult to
understand
>> intersecting
>> > >> behavior, one flag with a value would be better. E.g.,
what should
>> > >> "-mprefer-avx256 -mprefer-avx128
-mno-prefer-avx256" do? No matter
>> the
>> > >> answer, it's confusing. (Similarly with other such
combinations).
>> Just a
>> > >> single arg "-mprefer-avx={128/256/512}" (with
no "no" version) seems
>> easier
>> > >> to understand to me (keeping the same behavior as you
mention:
>> asking to
>> > >> prefer a larger width than is supported by your
architecture should
>> be fine
>> > >> but ignored).
>> > >>
>> > >>
>> > > I agree with this. It's a little more plumbing as far as
subtarget
>> > > features etc (represent via an optional value or just various
"set
>> the avx
>> > > width" features - the latter being easier, but uglier),
however, it's
>> > > probably the right thing to do.
>> > >
>> > > I was looking at this myself just a couple weeks ago and
think this
>> is the
>> > > right direction (when and how to turn things off) - and
probably makes
>> > > sense to be a default for these architectures? We might end
up
>> needing to
>> > > check a couple of additional TTI places, but it sounds like
you're on
>> top
>> > > of it. :)
>> > >
>> > > Thanks very much for doing this work.
>> > >
>> > > -eric
>> > >
>> > >
>> > >>
>> > >>
>> > >> There may be some other backend changes needed, but I
plan to address
>> > >>> those as we find them.
>> > >>>
>> > >>>
>> > >>> At a later point, consider making -mprefer-avx256 the
default for
>> > >>> Skylake Server due to the above mentioned performance
>> considerations.
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>>
>> > >> Does this sound reasonable?
>> > >>>
>> > >>>
>> > >>>
>> > >>> *Latest Intel Optimization manual available here:
>> > >>>
https://software.intel.com/en-us/articles/intel-sdm#optimization
>> > >>>
>> > >>>
>> > >>> -Craig Topper
>> > >>>
>> > >>> _______________________________________________
>> > >>> LLVM Developers mailing list
>> > >>> llvm-dev at lists.llvm.org
>> > >>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> > >>>
>> > >>> _______________________________________________
>> > >> LLVM Developers mailing list
>> > >> llvm-dev at lists.llvm.org
>> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> > >>
>> > >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171109/bb3a4fe2/attachment-0001.html>

Sanjay Patel via llvm-dev

2017-Nov-10 15:39 UTC

head link

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

We can tie a user preference / override to a CPU model. We do something
like that for square root estimates already (although it does use a
SubtargetFeature currently for x86; ideally, we'd key that off of something
in the CPU scheduler model).

On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper <craig.topper at gmail.com>
wrote:
> I agree that a less x86 specific command line makes sense. I've been
> having an internal discussions with gcc folks and their evaluating
> switching to something like -mprefer-vector-width=128/256/512/none
>
> Based on the current performance data we're seeing, we think we need to
> ultimately default skylake-avx512 to -mprefer-vector-width=256. If we go
> with a target independent option/implementation is there someway we could
> still affect the default behavior in a target specific way?
>
> ~Craig
>
> On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <spatel at
rotateright.com>
> wrote:
>
>> It's clear from the Intel docs how this has evolved, but from a
compiler
>> perspective, this isn't a Skylake "feature" :) ... nor an
Intel feature,
>> nor an x86 feature.
>>
>> It's a generic programmer hint for any target with multiple
potential
>> vector lengths.
>>
>> On x86, there's already a potential use case for this hint with a
>> different starting motivation: re-vectorization. That's where we
take C
>> code that uses 128-bit vector intrinsics and selectively widen it to
256-
>> or 512-bit vector ops based on a newer CPU target than the code was
>> originally written for.
>>
>> I think it's just a matter of time before a customer requests the
same
>> ability for another target (maybe they already have and I don't
know about
>> it). So we should have a solution that recognizes that possibility.
>>
>> Note that having a target-independent implementation in the optimizer
>> doesn't preclude a flag alias in clang to maintain compatibility
with gcc.
>>
>>
>>
>> On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote:
>>> > That's a very good point about the ordering of the command
line
>>> options.
>>> > gcc's current implementation treats -mprefer-avx256 has
"prefer 256
>>> over
>>> > 512" and -mprefer-avx128 as "prefer 128 over
256". Which feels weird
>>> for
>>> > other reasons, but has less of an ordering ambiguity.
>>> >
>>> > -mprefer-avx128 has been in gcc for many years and predates
the
>>> creation
>>> > of
>>> > avx512. -mprefer-avx256 was added a couple months ago.
>>> >
>>> > We've had an internal conversation with the implementor of
>>> > -mprefer-avx256
>>> > in gcc about making -mprefer-avx128 affect 512-bit vectors as
well.
>>> I'll
>>> > bring up the ambiguity issue with them.
>>> >
>>> > Do we want to be compatible with gcc here?
>>>
>>> I certainly believe we would want to be compatible with gcc (if we
use
>>> the same names).
>>>
>>> Best,
>>> Tobias
>>>
>>> >
>>> > ~Craig
>>> >
>>> > On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo
at gmail.com>
>>> > wrote:
>>> >
>>> > >
>>> > >
>>> > > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via
llvm-dev <
>>> > > llvm-dev at lists.llvm.org> wrote:
>>> > >
>>> > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via
llvm-dev <
>>> > >> llvm-dev at lists.llvm.org> wrote:
>>> > >>
>>> > >>> Hello all,
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> I would like to propose adding the
-mprefer-avx256 and
>>> -mprefer-avx128
>>> > >>> command line flags supported by latest GCC to
clang. These flags
>>> will be
>>> > >>> used to limit the vector register size presented
by TTI to the
>>> vectorizers.
>>> > >>> The backend will still be able to use wider
registers for code
>>> written
>>> > >>> using the instrinsics in x86intrin.h. And the
backend will still
>>> be able to
>>> > >>> use AVX512VL instructions and the additional
XMM16-31 and YMM16-31
>>> > >>> registers.
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> Motivation:
>>> > >>>
>>> > >>> -Using 512-bit operations on some Intel CPUs may
cause a decrease
>>> in CPU
>>> > >>> frequency that may offset the gains from using
the wider register
>>> size. See
>>> > >>> section 15.26 of Intel® 64 and IA-32
Architectures Optimization
>>> Reference
>>> > >>> Manual published October 2017.
>>> > >>>
>>> > >>
>>> > >> I note the doc mentions that 256-bit AVX operations
also have the
>>> same
>>> > >> issue with reducing the CPU frequency, which is nice
to see
>>> documented!
>>> > >>
>>> > >> There's also the issues discussed here
<http://www.agner.org/
>>> > >> optimize/blog/read.php?i=165> (and elsewhere)
related to warm-up
>>> time
>>> > >> for the 256-bit execution pipeline, which is another
issue with
>>> using
>>> > >> wide-vector ops.
>>> > >>
>>> > >>
>>> > >> -The vector ALUs on ports 0 and 1 of the Skylake
Server
>>> microarchitecture
>>> > >>> are only 256-bits wide. 512-bit instructions
using these ALUs must
>>> use both
>>> > >>> ports. See section 2.1 of Intel® 64 and IA-32
Architectures
>>> Optimization
>>> > >>> Reference Manual published October 2017.
>>> > >>>
>>> > >>
>>> > >>
>>> > >>>  Implementation Plan:
>>> > >>>
>>> > >>> -Add prefer-avx256 and prefer-avx128 as
SubtargetFeatures in
>>> X86.td not
>>> > >>> mapped to any CPU.
>>> > >>>
>>> > >>> -Add mprefer-avx256 and mprefer-avx128 and the
corresponding
>>> > >>> -mno-prefer-avx128/256 options to clang's
driver Options.td file.
>>> I believe
>>> > >>> this will allow clang to pass these straight
through to the
>>> -target-feature
>>> > >>> attribute in IR.
>>> > >>>
>>> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only
return 512 if
>>> AVX512 is
>>> > >>> enabled and prefer-avx256 and prefer-avx128 is
not set. Similarly
>>> return
>>> > >>> 256 if AVX is enabled and prefer-avx128 is not
set.
>>> > >>>
>>> > >>
>>> > >> Instead of multiple flags that have difficult to
understand
>>> intersecting
>>> > >> behavior, one flag with a value would be better.
E.g., what should
>>> > >> "-mprefer-avx256 -mprefer-avx128
-mno-prefer-avx256" do? No matter
>>> the
>>> > >> answer, it's confusing. (Similarly with other
such combinations).
>>> Just a
>>> > >> single arg "-mprefer-avx={128/256/512}"
(with no "no" version)
>>> seems easier
>>> > >> to understand to me (keeping the same behavior as you
mention:
>>> asking to
>>> > >> prefer a larger width than is supported by your
architecture should
>>> be fine
>>> > >> but ignored).
>>> > >>
>>> > >>
>>> > > I agree with this. It's a little more plumbing as far
as subtarget
>>> > > features etc (represent via an optional value or just
various "set
>>> the avx
>>> > > width" features - the latter being easier, but
uglier), however, it's
>>> > > probably the right thing to do.
>>> > >
>>> > > I was looking at this myself just a couple weeks ago and
think this
>>> is the
>>> > > right direction (when and how to turn things off) - and
probably
>>> makes
>>> > > sense to be a default for these architectures? We might
end up
>>> needing to
>>> > > check a couple of additional TTI places, but it sounds
like you're
>>> on top
>>> > > of it. :)
>>> > >
>>> > > Thanks very much for doing this work.
>>> > >
>>> > > -eric
>>> > >
>>> > >
>>> > >>
>>> > >>
>>> > >> There may be some other backend changes needed, but I
plan to
>>> address
>>> > >>> those as we find them.
>>> > >>>
>>> > >>>
>>> > >>> At a later point, consider making -mprefer-avx256
the default for
>>> > >>> Skylake Server due to the above mentioned
performance
>>> considerations.
>>> > >>>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>>
>>> > >> Does this sound reasonable?
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> *Latest Intel Optimization manual available here:
>>> > >>>
https://software.intel.com/en-us/articles/intel-sdm#optimization
>>> > >>>
>>> > >>>
>>> > >>> -Craig Topper
>>> > >>>
>>> > >>> _______________________________________________
>>> > >>> LLVM Developers mailing list
>>> > >>> llvm-dev at lists.llvm.org
>>> > >>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> > >>>
>>> > >>> _______________________________________________
>>> > >> LLVM Developers mailing list
>>> > >> llvm-dev at lists.llvm.org
>>> > >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> > >>
>>> > >
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > llvm-dev at lists.llvm.org
>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171110/9ee05bb5/attachment.html>

Craig Topper via llvm-dev

2017-Nov-11 01:04 UTC

head link

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

Are you referring to the X86TargetLowering::isFsqrtCheap hook?

~Craig

On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel <spatel at rotateright.com>
wrote:
> We can tie a user preference / override to a CPU model. We do something
> like that for square root estimates already (although it does use a
> SubtargetFeature currently for x86; ideally, we'd key that off of
something
> in the CPU scheduler model).
>
>
> On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper <craig.topper at
gmail.com>
> wrote:
>
>> I agree that a less x86 specific command line makes sense. I've
been
>> having an internal discussions with gcc folks and their evaluating
>> switching to something like -mprefer-vector-width=128/256/512/none
>>
>> Based on the current performance data we're seeing, we think we
need to
>> ultimately default skylake-avx512 to -mprefer-vector-width=256. If we
go
>> with a target independent option/implementation is there someway we
could
>> still affect the default behavior in a target specific way?
>>
>> ~Craig
>>
>> On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <spatel at
rotateright.com>
>> wrote:
>>
>>> It's clear from the Intel docs how this has evolved, but from a
compiler
>>> perspective, this isn't a Skylake "feature" :) ...
nor an Intel feature,
>>> nor an x86 feature.
>>>
>>> It's a generic programmer hint for any target with multiple
potential
>>> vector lengths.
>>>
>>> On x86, there's already a potential use case for this hint with
a
>>> different starting motivation: re-vectorization. That's where
we take C
>>> code that uses 128-bit vector intrinsics and selectively widen it
to 256-
>>> or 512-bit vector ops based on a newer CPU target than the code was
>>> originally written for.
>>>
>>> I think it's just a matter of time before a customer requests
the same
>>> ability for another target (maybe they already have and I don't
know about
>>> it). So we should have a solution that recognizes that possibility.
>>>
>>> Note that having a target-independent implementation in the
optimizer
>>> doesn't preclude a flag alias in clang to maintain
compatibility with gcc.
>>>
>>>
>>>
>>> On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote:
>>>> > That's a very good point about the ordering of the
command line
>>>> options.
>>>> > gcc's current implementation treats -mprefer-avx256
has "prefer 256
>>>> over
>>>> > 512" and -mprefer-avx128 as "prefer 128 over
256". Which feels weird
>>>> for
>>>> > other reasons, but has less of an ordering ambiguity.
>>>> >
>>>> > -mprefer-avx128 has been in gcc for many years and
predates the
>>>> creation
>>>> > of
>>>> > avx512. -mprefer-avx256 was added a couple months ago.
>>>> >
>>>> > We've had an internal conversation with the
implementor of
>>>> > -mprefer-avx256
>>>> > in gcc about making -mprefer-avx128 affect 512-bit vectors
as well.
>>>> I'll
>>>> > bring up the ambiguity issue with them.
>>>> >
>>>> > Do we want to be compatible with gcc here?
>>>>
>>>> I certainly believe we would want to be compatible with gcc (if
we use
>>>> the same names).
>>>>
>>>> Best,
>>>> Tobias
>>>>
>>>> >
>>>> > ~Craig
>>>> >
>>>> > On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher
<echristo at gmail.com>
>>>> > wrote:
>>>> >
>>>> > >
>>>> > >
>>>> > > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via
llvm-dev <
>>>> > > llvm-dev at lists.llvm.org> wrote:
>>>> > >
>>>> > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via
llvm-dev <
>>>> > >> llvm-dev at lists.llvm.org> wrote:
>>>> > >>
>>>> > >>> Hello all,
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> I would like to propose adding the
-mprefer-avx256 and
>>>> -mprefer-avx128
>>>> > >>> command line flags supported by latest GCC to
clang. These flags
>>>> will be
>>>> > >>> used to limit the vector register size
presented by TTI to the
>>>> vectorizers.
>>>> > >>> The backend will still be able to use wider
registers for code
>>>> written
>>>> > >>> using the instrinsics in x86intrin.h. And the
backend will still
>>>> be able to
>>>> > >>> use AVX512VL instructions and the additional
XMM16-31 and YMM16-31
>>>> > >>> registers.
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> Motivation:
>>>> > >>>
>>>> > >>> -Using 512-bit operations on some Intel CPUs
may cause a decrease
>>>> in CPU
>>>> > >>> frequency that may offset the gains from
using the wider register
>>>> size. See
>>>> > >>> section 15.26 of Intel® 64 and IA-32
Architectures Optimization
>>>> Reference
>>>> > >>> Manual published October 2017.
>>>> > >>>
>>>> > >>
>>>> > >> I note the doc mentions that 256-bit AVX
operations also have the
>>>> same
>>>> > >> issue with reducing the CPU frequency, which is
nice to see
>>>> documented!
>>>> > >>
>>>> > >> There's also the issues discussed here
<http://www.agner.org/
>>>> > >> optimize/blog/read.php?i=165> (and elsewhere)
related to warm-up
>>>> time
>>>> > >> for the 256-bit execution pipeline, which is
another issue with
>>>> using
>>>> > >> wide-vector ops.
>>>> > >>
>>>> > >>
>>>> > >> -The vector ALUs on ports 0 and 1 of the Skylake
Server
>>>> microarchitecture
>>>> > >>> are only 256-bits wide. 512-bit instructions
using these ALUs
>>>> must use both
>>>> > >>> ports. See section 2.1 of Intel® 64 and IA-32
Architectures
>>>> Optimization
>>>> > >>> Reference Manual published October 2017.
>>>> > >>>
>>>> > >>
>>>> > >>
>>>> > >>>  Implementation Plan:
>>>> > >>>
>>>> > >>> -Add prefer-avx256 and prefer-avx128 as
SubtargetFeatures in
>>>> X86.td not
>>>> > >>> mapped to any CPU.
>>>> > >>>
>>>> > >>> -Add mprefer-avx256 and mprefer-avx128 and
the corresponding
>>>> > >>> -mno-prefer-avx128/256 options to clang's
driver Options.td file.
>>>> I believe
>>>> > >>> this will allow clang to pass these straight
through to the
>>>> -target-feature
>>>> > >>> attribute in IR.
>>>> > >>>
>>>> > >>> -Modify X86TTIImpl::getRegisterBitWidth to
only return 512 if
>>>> AVX512 is
>>>> > >>> enabled and prefer-avx256 and prefer-avx128
is not set. Similarly
>>>> return
>>>> > >>> 256 if AVX is enabled and prefer-avx128 is
not set.
>>>> > >>>
>>>> > >>
>>>> > >> Instead of multiple flags that have difficult to
understand
>>>> intersecting
>>>> > >> behavior, one flag with a value would be better.
E.g., what should
>>>> > >> "-mprefer-avx256 -mprefer-avx128
-mno-prefer-avx256" do? No matter
>>>> the
>>>> > >> answer, it's confusing. (Similarly with other
such combinations).
>>>> Just a
>>>> > >> single arg "-mprefer-avx={128/256/512}"
(with no "no" version)
>>>> seems easier
>>>> > >> to understand to me (keeping the same behavior as
you mention:
>>>> asking to
>>>> > >> prefer a larger width than is supported by your
architecture
>>>> should be fine
>>>> > >> but ignored).
>>>> > >>
>>>> > >>
>>>> > > I agree with this. It's a little more plumbing as
far as subtarget
>>>> > > features etc (represent via an optional value or just
various "set
>>>> the avx
>>>> > > width" features - the latter being easier, but
uglier), however,
>>>> it's
>>>> > > probably the right thing to do.
>>>> > >
>>>> > > I was looking at this myself just a couple weeks ago
and think this
>>>> is the
>>>> > > right direction (when and how to turn things off) -
and probably
>>>> makes
>>>> > > sense to be a default for these architectures? We
might end up
>>>> needing to
>>>> > > check a couple of additional TTI places, but it
sounds like you're
>>>> on top
>>>> > > of it. :)
>>>> > >
>>>> > > Thanks very much for doing this work.
>>>> > >
>>>> > > -eric
>>>> > >
>>>> > >
>>>> > >>
>>>> > >>
>>>> > >> There may be some other backend changes needed,
but I plan to
>>>> address
>>>> > >>> those as we find them.
>>>> > >>>
>>>> > >>>
>>>> > >>> At a later point, consider making
-mprefer-avx256 the default for
>>>> > >>> Skylake Server due to the above mentioned
performance
>>>> considerations.
>>>> > >>>
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >>>
>>>> > >> Does this sound reasonable?
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> *Latest Intel Optimization manual available
here:
>>>> > >>>
https://software.intel.com/en-us/articles/intel-sdm#optimization
>>>> > >>>
>>>> > >>>
>>>> > >>> -Craig Topper
>>>> > >>>
>>>> > >>>
_______________________________________________
>>>> > >>> LLVM Developers mailing list
>>>> > >>> llvm-dev at lists.llvm.org
>>>> > >>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> > >>>
>>>> > >>>
_______________________________________________
>>>> > >> LLVM Developers mailing list
>>>> > >> llvm-dev at lists.llvm.org
>>>> > >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> > >>
>>>> > >
>>>> > _______________________________________________
>>>> > LLVM Developers mailing list
>>>> > llvm-dev at lists.llvm.org
>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171110/aa419eef/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Nov 2017 - RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

Possibly Parallel Threads