thr3ads.net - llvm dev - [llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Eric Christopher via llvm-dev

2017-Nov-03 02:18 UTC

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hello all,
>>
>>
>>
>> I would like to propose adding the -mprefer-avx256 and -mprefer-avx128
>> command line flags supported by latest GCC to clang. These flags will
be
>> used to limit the vector register size presented by TTI to the
vectorizers.
>> The backend will still be able to use wider registers for code written
>> using the instrinsics in x86intrin.h. And the backend will still be
able to
>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31
>> registers.
>>
>>
>>
>> Motivation:
>>
>> -Using 512-bit operations on some Intel CPUs may cause a decrease in
CPU
>> frequency that may offset the gains from using the wider register size.
See
>> section 15.26 of Intel® 64 and IA-32 Architectures Optimization
Reference
>> Manual published October 2017.
>>
>
> I note the doc mentions that 256-bit AVX operations also have the same
> issue with reducing the CPU frequency, which is nice to see documented!
>
> There's also the issues discussed here <
> http://www.agner.org/optimize/blog/read.php?i=165> (and elsewhere)
> related to warm-up time for the 256-bit execution pipeline, which is
> another issue with using wide-vector ops.
>
>
> -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture
>> are only 256-bits wide. 512-bit instructions using these ALUs must use
both
>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures
Optimization
>> Reference Manual published October 2017.
>>
>
>
>>  Implementation Plan:
>>
>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not
>> mapped to any CPU.
>>
>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding
>> -mno-prefer-avx128/256 options to clang's driver Options.td file. I
believe
>> this will allow clang to pass these straight through to the
-target-feature
>> attribute in IR.
>>
>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is
>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly
return
>> 256 if AVX is enabled and prefer-avx128 is not set.
>>
>
> Instead of multiple flags that have difficult to understand intersecting
> behavior, one flag with a value would be better. E.g., what should
> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No
matter the
> answer, it's confusing. (Similarly with other such combinations). Just
a
> single arg "-mprefer-avx={128/256/512}" (with no "no"
version) seems easier
> to understand to me (keeping the same behavior as you mention: asking to
> prefer a larger width than is supported by your architecture should be fine
> but ignored).
>
>I agree with this. It's a little more plumbing as far as subtarget features
etc (represent via an optional value or just various "set the avx
width"
features - the latter being easier, but uglier), however, it's probably the
right thing to do.

I was looking at this myself just a couple weeks ago and think this is the
right direction (when and how to turn things off) - and probably makes
sense to be a default for these architectures? We might end up needing to
check a couple of additional TTI places, but it sounds like you're on top
of it. :)

Thanks very much for doing this work.

-eric

>
>
> There may be some other backend changes needed, but I plan to address
>> those as we find them.
>>
>>
>> At a later point, consider making -mprefer-avx256 the default for
Skylake
>> Server due to the above mentioned performance considerations.
>>
>
>
>
>
>
>>
> Does this sound reasonable?
>>
>>
>>
>> *Latest Intel Optimization manual available here:
>> https://software.intel.com/en-us/articles/intel-sdm#optimization
>>
>>
>> -Craig Topper
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171103/16c66407/attachment.html>

Craig Topper via llvm-dev

2017-Nov-03 04:47 UTC

head link

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

That's a very good point about the ordering of the command line options.
gcc's current implementation treats -mprefer-avx256 has "prefer 256
over
512" and -mprefer-avx128 as "prefer 128 over 256". Which feels
weird for
other reasons, but has less of an ordering ambiguity.

-mprefer-avx128 has been in gcc for many years and predates the creation of
avx512. -mprefer-avx256 was added a couple months ago.

We've had an internal conversation with the implementor of -mprefer-avx256
in gcc about making -mprefer-avx128 affect 512-bit vectors as well. I'll
bring up the ambiguity issue with them.

Do we want to be compatible with gcc here?

~Craig

On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo at gmail.com>
wrote:
>
>
> On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hello all,
>>>
>>>
>>>
>>> I would like to propose adding the -mprefer-avx256 and
-mprefer-avx128
>>> command line flags supported by latest GCC to clang. These flags
will be
>>> used to limit the vector register size presented by TTI to the
vectorizers.
>>> The backend will still be able to use wider registers for code
written
>>> using the instrinsics in x86intrin.h. And the backend will still be
able to
>>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31
>>> registers.
>>>
>>>
>>>
>>> Motivation:
>>>
>>> -Using 512-bit operations on some Intel CPUs may cause a decrease
in CPU
>>> frequency that may offset the gains from using the wider register
size. See
>>> section 15.26 of Intel® 64 and IA-32 Architectures Optimization
Reference
>>> Manual published October 2017.
>>>
>>
>> I note the doc mentions that 256-bit AVX operations also have the same
>> issue with reducing the CPU frequency, which is nice to see documented!
>>
>> There's also the issues discussed here <http://www.agner.org/
>> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up
time
>> for the 256-bit execution pipeline, which is another issue with using
>> wide-vector ops.
>>
>>
>> -The vector ALUs on ports 0 and 1 of the Skylake Server
microarchitecture
>>> are only 256-bits wide. 512-bit instructions using these ALUs must
use both
>>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures
Optimization
>>> Reference Manual published October 2017.
>>>
>>
>>
>>>  Implementation Plan:
>>>
>>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td
not
>>> mapped to any CPU.
>>>
>>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding
>>> -mno-prefer-avx128/256 options to clang's driver Options.td
file. I believe
>>> this will allow clang to pass these straight through to the
-target-feature
>>> attribute in IR.
>>>
>>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if
AVX512 is
>>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly
return
>>> 256 if AVX is enabled and prefer-avx128 is not set.
>>>
>>
>> Instead of multiple flags that have difficult to understand
intersecting
>> behavior, one flag with a value would be better. E.g., what should
>> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No
matter the
>> answer, it's confusing. (Similarly with other such combinations).
Just a
>> single arg "-mprefer-avx={128/256/512}" (with no
"no" version) seems easier
>> to understand to me (keeping the same behavior as you mention: asking
to
>> prefer a larger width than is supported by your architecture should be
fine
>> but ignored).
>>
>>
> I agree with this. It's a little more plumbing as far as subtarget
> features etc (represent via an optional value or just various "set the
avx
> width" features - the latter being easier, but uglier), however,
it's
> probably the right thing to do.
>
> I was looking at this myself just a couple weeks ago and think this is the
> right direction (when and how to turn things off) - and probably makes
> sense to be a default for these architectures? We might end up needing to
> check a couple of additional TTI places, but it sounds like you're on
top
> of it. :)
>
> Thanks very much for doing this work.
>
> -eric
>
>
>>
>>
>> There may be some other backend changes needed, but I plan to address
>>> those as we find them.
>>>
>>>
>>> At a later point, consider making -mprefer-avx256 the default for
>>> Skylake Server due to the above mentioned performance
considerations.
>>>
>>
>>
>>
>>
>>
>>>
>> Does this sound reasonable?
>>>
>>>
>>>
>>> *Latest Intel Optimization manual available here:
>>> https://software.intel.com/en-us/articles/intel-sdm#optimization
>>>
>>>
>>> -Craig Topper
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171102/f8425e83/attachment-0001.html>

Tobias Grosser via llvm-dev

2017-Nov-07 09:02 UTC

head link

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev
wrote:> That's a very good point about the ordering of the command line
options.
> gcc's current implementation treats -mprefer-avx256 has "prefer
256 over
> 512" and -mprefer-avx128 as "prefer 128 over 256". Which
feels weird for
> other reasons, but has less of an ordering ambiguity.
> 
> -mprefer-avx128 has been in gcc for many years and predates the creation
> of
> avx512. -mprefer-avx256 was added a couple months ago.
> 
> We've had an internal conversation with the implementor of
> -mprefer-avx256
> in gcc about making -mprefer-avx128 affect 512-bit vectors as well.
I'll
> bring up the ambiguity issue with them.
> 
> Do we want to be compatible with gcc here?
I certainly believe we would want to be compatible with gcc (if we use
the same names).

Best,
Tobias
> 
> ~Craig
> 
> On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo at
gmail.com>
> wrote:
> 
> >
> >
> > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev <
> > llvm-dev at lists.llvm.org> wrote:
> >
> >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev <
> >> llvm-dev at lists.llvm.org> wrote:
> >>
> >>> Hello all,
> >>>
> >>>
> >>>
> >>> I would like to propose adding the -mprefer-avx256 and
-mprefer-avx128
> >>> command line flags supported by latest GCC to clang. These
flags will be
> >>> used to limit the vector register size presented by TTI to the
vectorizers.
> >>> The backend will still be able to use wider registers for code
written
> >>> using the instrinsics in x86intrin.h. And the backend will
still be able to
> >>> use AVX512VL instructions and the additional XMM16-31 and
YMM16-31
> >>> registers.
> >>>
> >>>
> >>>
> >>> Motivation:
> >>>
> >>> -Using 512-bit operations on some Intel CPUs may cause a
decrease in CPU
> >>> frequency that may offset the gains from using the wider
register size. See
> >>> section 15.26 of Intel® 64 and IA-32 Architectures
Optimization Reference
> >>> Manual published October 2017.
> >>>
> >>
> >> I note the doc mentions that 256-bit AVX operations also have the
same
> >> issue with reducing the CPU frequency, which is nice to see
documented!
> >>
> >> There's also the issues discussed here
<http://www.agner.org/
> >> optimize/blog/read.php?i=165> (and elsewhere) related to
warm-up time
> >> for the 256-bit execution pipeline, which is another issue with
using
> >> wide-vector ops.
> >>
> >>
> >> -The vector ALUs on ports 0 and 1 of the Skylake Server
microarchitecture
> >>> are only 256-bits wide. 512-bit instructions using these ALUs
must use both
> >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures
Optimization
> >>> Reference Manual published October 2017.
> >>>
> >>
> >>
> >>>  Implementation Plan:
> >>>
> >>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in
X86.td not
> >>> mapped to any CPU.
> >>>
> >>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding
> >>> -mno-prefer-avx128/256 options to clang's driver
Options.td file. I believe
> >>> this will allow clang to pass these straight through to the
-target-feature
> >>> attribute in IR.
> >>>
> >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if
AVX512 is
> >>> enabled and prefer-avx256 and prefer-avx128 is not set.
Similarly return
> >>> 256 if AVX is enabled and prefer-avx128 is not set.
> >>>
> >>
> >> Instead of multiple flags that have difficult to understand
intersecting
> >> behavior, one flag with a value would be better. E.g., what should
> >> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do?
No matter the
> >> answer, it's confusing. (Similarly with other such
combinations). Just a
> >> single arg "-mprefer-avx={128/256/512}" (with no
"no" version) seems easier
> >> to understand to me (keeping the same behavior as you mention:
asking to
> >> prefer a larger width than is supported by your architecture
should be fine
> >> but ignored).
> >>
> >>
> > I agree with this. It's a little more plumbing as far as subtarget
> > features etc (represent via an optional value or just various
"set the avx
> > width" features - the latter being easier, but uglier), however,
it's
> > probably the right thing to do.
> >
> > I was looking at this myself just a couple weeks ago and think this is
the
> > right direction (when and how to turn things off) - and probably makes
> > sense to be a default for these architectures? We might end up needing
to
> > check a couple of additional TTI places, but it sounds like you're
on top
> > of it. :)
> >
> > Thanks very much for doing this work.
> >
> > -eric
> >
> >
> >>
> >>
> >> There may be some other backend changes needed, but I plan to
address
> >>> those as we find them.
> >>>
> >>>
> >>> At a later point, consider making -mprefer-avx256 the default
for
> >>> Skylake Server due to the above mentioned performance
considerations.
> >>>
> >>
> >>
> >>
> >>
> >>
> >>>
> >> Does this sound reasonable?
> >>>
> >>>
> >>>
> >>> *Latest Intel Optimization manual available here:
> >>>
https://software.intel.com/en-us/articles/intel-sdm#optimization
> >>>
> >>>
> >>> -Craig Topper
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Nov 2017 - RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

Apparently Analagous Threads