thr3ads.net - llvm dev - [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Das, Dibyendu via llvm-dev

2015-Nov-10 10:39 UTC

[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default

I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and
without -slp-vectorize-hor and let you know.

-Thx

-----Original Message-----
From: nrotem at apple.com [mailto:nrotem at apple.com] 
Sent: Tuesday, November 10, 2015 3:33 AM
To: Charlie Turner
Cc: Das, Dibyendu; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by
default

> On Nov 9, 2015, at 9:55 AM, Charlie Turner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> I have not. I could feasibly do this, but I'm not set up to perform 
> good experiments on X86-64 hardware. Furthermore, if I do it for 
> X86-64, it only seems fair I should do it for the other backends as 
> well, which is much less feasible for me. I'm reaching out the 
> community to see if there's any objection based on their own 
> measurements of this feature about defaulting it to on.
> 
> Please let me know if you think I've got the wrong end of the 
> etiquette stick here, and if so I'll try and acquire sensible numbers 
> for other backends.
> 
> Kind regards,
> Charlie.
> 
> On 9 November 2015 at 17:50, Das, Dibyendu <Dibyendu.Das at amd.com>
wrote:
>> Have you run cpu2006 for x86-64 for perf progression/regression ?
I think it would be great if you could help Charlie with this. 
>> 
>> Sent from my Windows Phone
>> ________________________________
>> From: Charlie Turner via llvm-dev
>> Sent: ‎11/‎9/‎2015 11:15 PM
>> To: llvm-dev at lists.llvm.org
>> Subject: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by 
>> default
>> 
>> I've done compile-time experiments for AArch64 over SPEC{2000,2006}
>> and of course the test-suite. I measure no significant compile-time 
>> impact of enabling this feature by default.
>> 
>> I also ran the test-suite on an X86-64 machine. I can't imagine any
>> other targets being uniquely effected in terms of compile-time by 
>> turning this on after testing both AArch64 and X86-64. I also timed 
>> running the regression tests with -slp-vectorize-hor enabled and 
>> disabled, no significant difference here either.
>> 
>> There are no significant performance regressions (or much
>> improvements) on AArch64 in night-test suite. I do see wins in third 
>> party benchmarks when using this flag, which is why I'm asking if 
>> there would be any objection from the community to making 
>> -slp-vectorize-hor default on.
>> 
>> I have run the regression tests and looked through the bug tracker / 
>> VC logs, I can't see any reason for not enabling it.

+1

If there are no compile time and runtime regressions and if we are seeing wins
in some benchmarks then we should enable this by default.  At some point we
should demote this flag from a command-line flag into a static variable in the
code.  Out of curiosity, how much of the compile time are we spending in the SLP
vectorizer nowadays ?



>> 
>> Thanks,
>> Charlie.
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Charlie Turner via llvm-dev

2015-Nov-10 14:49 UTC

head link

[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default

> Out of curiosity, how much of the compile time are we spending in the SLP
vectorizer nowadays ?
My measurements were originally based off the "real time" reports from
/usr/bin/time (not the bash built-in), so I didn't have per-pass
statistics to hand. I did a quick experiment in which I compiled each
of the SPEC files with opt's -time-passes feature.

The "raw" numbers show that SLP can take anywhere from 0 to 30% of the
total optimization time. At the high end of that scale, things are a
bit fast and loose. Some of the biggest offenders are in rather small
bitcode files (where the total compile time is getting very small as
well)

The largest bitcode file[*] I had in SPEC2006 was about 1MiB. For that
particular example, SLP took less than 1% of the opt time.

For all bitcode files in SPEC2006 between 100KiB and 1MiB, SLP takes
less than 5% of compile time.

In tensor.bc (~ 80KiB) from SPEC2006, SLP took around 9.5% (+- 1%).
This was a borderline case of a compile-time impact with horizontal
reductions (about a 0.8% regression, so within stddev). There were
actually swings the other way as well (i.e., SLP slower without
horizontal reduction detection, so it's hard to make any judgment
here)

Another pretty interesting one is fnpovfpu.bc (~ 40KiB), where SLP
took 17% of compile time.

Anyway, I hope that gives a rough impression of what's going on. I was
taking the wall clock time measurement from -time-passes.

[*] I screwed up initially not reporting the overall compile time in
my haste, so as a proxy metric, I went back and collected bitcode file
sizes, which saved me from having to rerun everything :/

Nadav Rotem via llvm-dev

2015-Nov-10 15:27 UTC

head link

[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default

> On Nov 10, 2015, at 6:49 AM, Charlie Turner <charlesturner7c5 at
gmail.com> wrote:
> 
>> Out of curiosity, how much of the compile time are we spending in the
SLP vectorizer nowadays ?
> 
> My measurements were originally based off the "real time" reports
from
> /usr/bin/time (not the bash built-in), so I didn't have per-pass
> statistics to hand. I did a quick experiment in which I compiled each
> of the SPEC files with opt's -time-passes feature.
> 
> The "raw" numbers show that SLP can take anywhere from 0 to 30%
of the
> total optimization time. At the high end of that scale, things are a
> bit fast and loose. Some of the biggest offenders are in rather small
> bitcode files (where the total compile time is getting very small as
> well)
> 
> The largest bitcode file[*] I had in SPEC2006 was about 1MiB. For that
> particular example, SLP took less than 1% of the opt time.
> 
> For all bitcode files in SPEC2006 between 100KiB and 1MiB, SLP takes
> less than 5% of compile time.
> 
> In tensor.bc (~ 80KiB) from SPEC2006, SLP took around 9.5% (+- 1%).
> This was a borderline case of a compile-time impact with horizontal
> reductions (about a 0.8% regression, so within stddev). There were
> actually swings the other way as well (i.e., SLP slower without
> horizontal reduction detection, so it's hard to make any judgment
> here)
> 
> Another pretty interesting one is fnpovfpu.bc (~ 40KiB), where SLP
> took 17% of compile time.
Thanks for the detailed analysis Charlie. We should probably look into
fnpovfpu.bc and figure out what’s going on there. Overall I think that the
compile time numbers are reasonable.
> 
> Anyway, I hope that gives a rough impression of what's going on. I was
> taking the wall clock time measurement from -time-passes.
> 
> [*] I screwed up initially not reporting the overall compile time in
> my haste, so as a proxy metric, I went back and collected bitcode file
> sizes, which saved me from having to rerun everything :/

Charlie Turner via llvm-dev

2015-Nov-11 13:04 UTC

head link

[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default

> I will try to get some spec cpu 2006 rate runs done under -O3 -flto with
and without -slp-vectorize-hor and let you know.
Do you have a time estimate on when you'll be able to get these
numbers? Another option would be to default the flag on and revert if
this does cause regressions on the targets you're interested in.

TIA,
Charlie.

On 10 November 2015 at 10:39, Das, Dibyendu <Dibyendu.Das at amd.com>
wrote:> I will try to get some spec cpu 2006 rate runs done under -O3 -flto with
and without -slp-vectorize-hor and let you know.
>
> -Thx
>
> -----Original Message-----
> From: nrotem at apple.com [mailto:nrotem at apple.com]
> Sent: Tuesday, November 10, 2015 3:33 AM
> To: Charlie Turner
> Cc: Das, Dibyendu; llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by
default
>
>
>> On Nov 9, 2015, at 9:55 AM, Charlie Turner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> I have not. I could feasibly do this, but I'm not set up to perform
>> good experiments on X86-64 hardware. Furthermore, if I do it for
>> X86-64, it only seems fair I should do it for the other backends as
>> well, which is much less feasible for me. I'm reaching out the
>> community to see if there's any objection based on their own
>> measurements of this feature about defaulting it to on.
>>
>> Please let me know if you think I've got the wrong end of the
>> etiquette stick here, and if so I'll try and acquire sensible
numbers
>> for other backends.
>>
>> Kind regards,
>> Charlie.
>>
>> On 9 November 2015 at 17:50, Das, Dibyendu <Dibyendu.Das at
amd.com> wrote:
>>> Have you run cpu2006 for x86-64 for perf progression/regression ?
>
> I think it would be great if you could help Charlie with this.
>
>>>
>>> Sent from my Windows Phone
>>> ________________________________
>>> From: Charlie Turner via llvm-dev
>>> Sent: ‎11/‎9/‎2015 11:15 PM
>>> To: llvm-dev at lists.llvm.org
>>> Subject: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on
by
>>> default
>>>
>>> I've done compile-time experiments for AArch64 over
SPEC{2000,2006}
>>> and of course the test-suite. I measure no significant compile-time
>>> impact of enabling this feature by default.
>>>
>>> I also ran the test-suite on an X86-64 machine. I can't imagine
any
>>> other targets being uniquely effected in terms of compile-time by
>>> turning this on after testing both AArch64 and X86-64. I also timed
>>> running the regression tests with -slp-vectorize-hor enabled and
>>> disabled, no significant difference here either.
>>>
>>> There are no significant performance regressions (or much
>>> improvements) on AArch64 in night-test suite. I do see wins in
third
>>> party benchmarks when using this flag, which is why I'm asking
if
>>> there would be any objection from the community to making
>>> -slp-vectorize-hor default on.
>>>
>>> I have run the regression tests and looked through the bug tracker
/
>>> VC logs, I can't see any reason for not enabling it.
>
>
> +1
>
> If there are no compile time and runtime regressions and if we are seeing
wins in some benchmarks then we should enable this by default.  At some point we
should demote this flag from a command-line flag into a static variable in the
code.  Out of curiosity, how much of the compile time are we spending in the SLP
vectorizer nowadays ?
>
>
>
>
>>>
>>> Thanks,
>>> Charlie.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Das, Dibyendu via llvm-dev

2015-Nov-11 14:18 UTC

head link

[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default

We have started this. Since there are some holidays expect a small delay. Will
let you know by Friday.

Thx

Sent from my Windows Phone
________________________________
From: Charlie Turner<mailto:charlesturner7c5 at gmail.com>
Sent: ‎11/‎11/‎2015 6:34 PM
To: Das, Dibyendu<mailto:Dibyendu.Das at amd.com>
Cc: nrotem at apple.com<mailto:nrotem at apple.com>; llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by
default
> I will try to get some spec cpu 2006 rate runs done under -O3 -flto with
and without -slp-vectorize-hor and let you know.
Do you have a time estimate on when you'll be able to get these
numbers? Another option would be to default the flag on and revert if
this does cause regressions on the targets you're interested in.

TIA,
Charlie.

On 10 November 2015 at 10:39, Das, Dibyendu <Dibyendu.Das at amd.com>
wrote:> I will try to get some spec cpu 2006 rate runs done under -O3 -flto with
and without -slp-vectorize-hor and let you know.
>
> -Thx
>
> -----Original Message-----
> From: nrotem at apple.com [mailto:nrotem at apple.com]
> Sent: Tuesday, November 10, 2015 3:33 AM
> To: Charlie Turner
> Cc: Das, Dibyendu; llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by
default
>
>
>> On Nov 9, 2015, at 9:55 AM, Charlie Turner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> I have not. I could feasibly do this, but I'm not set up to perform
>> good experiments on X86-64 hardware. Furthermore, if I do it for
>> X86-64, it only seems fair I should do it for the other backends as
>> well, which is much less feasible for me. I'm reaching out the
>> community to see if there's any objection based on their own
>> measurements of this feature about defaulting it to on.
>>
>> Please let me know if you think I've got the wrong end of the
>> etiquette stick here, and if so I'll try and acquire sensible
numbers
>> for other backends.
>>
>> Kind regards,
>> Charlie.
>>
>> On 9 November 2015 at 17:50, Das, Dibyendu <Dibyendu.Das at
amd.com> wrote:
>>> Have you run cpu2006 for x86-64 for perf progression/regression ?
>
> I think it would be great if you could help Charlie with this.
>
>>>
>>> Sent from my Windows Phone
>>> ________________________________
>>> From: Charlie Turner via llvm-dev
>>> Sent: ‎11/‎9/‎2015 11:15 PM
>>> To: llvm-dev at lists.llvm.org
>>> Subject: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on
by
>>> default
>>>
>>> I've done compile-time experiments for AArch64 over
SPEC{2000,2006}
>>> and of course the test-suite. I measure no significant compile-time
>>> impact of enabling this feature by default.
>>>
>>> I also ran the test-suite on an X86-64 machine. I can't imagine
any
>>> other targets being uniquely effected in terms of compile-time by
>>> turning this on after testing both AArch64 and X86-64. I also timed
>>> running the regression tests with -slp-vectorize-hor enabled and
>>> disabled, no significant difference here either.
>>>
>>> There are no significant performance regressions (or much
>>> improvements) on AArch64 in night-test suite. I do see wins in
third
>>> party benchmarks when using this flag, which is why I'm asking
if
>>> there would be any objection from the community to making
>>> -slp-vectorize-hor default on.
>>>
>>> I have run the regression tests and looked through the bug tracker
/
>>> VC logs, I can't see any reason for not enabling it.
>
>
> +1
>
> If there are no compile time and runtime regressions and if we are seeing
wins in some benchmarks then we should enable this by default.  At some point we
should demote this flag from a command-line flag into a static variable in the
code.  Out of curiosity, how much of the compile time are we spending in the SLP
vectorizer nowadays ?
>
>
>
>
>>>
>>> Thanks,
>>> Charlie.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/33cc2636/attachment.html>

llvm dev - Nov 2015 - [RFC][SLP] Let's turn -slp-vectorize-hor on by default

[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default

[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default

[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default

[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default

[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default