Das, Dibyendu via llvm-dev
2015-Nov-10 10:39 UTC
[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default
I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know. -Thx -----Original Message----- From: nrotem at apple.com [mailto:nrotem at apple.com] Sent: Tuesday, November 10, 2015 3:33 AM To: Charlie Turner Cc: Das, Dibyendu; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default> On Nov 9, 2015, at 9:55 AM, Charlie Turner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I have not. I could feasibly do this, but I'm not set up to perform > good experiments on X86-64 hardware. Furthermore, if I do it for > X86-64, it only seems fair I should do it for the other backends as > well, which is much less feasible for me. I'm reaching out the > community to see if there's any objection based on their own > measurements of this feature about defaulting it to on. > > Please let me know if you think I've got the wrong end of the > etiquette stick here, and if so I'll try and acquire sensible numbers > for other backends. > > Kind regards, > Charlie. > > On 9 November 2015 at 17:50, Das, Dibyendu <Dibyendu.Das at amd.com> wrote: >> Have you run cpu2006 for x86-64 for perf progression/regression ?I think it would be great if you could help Charlie with this.>> >> Sent from my Windows Phone >> ________________________________ >> From: Charlie Turner via llvm-dev >> Sent: 11/9/2015 11:15 PM >> To: llvm-dev at lists.llvm.org >> Subject: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by >> default >> >> I've done compile-time experiments for AArch64 over SPEC{2000,2006} >> and of course the test-suite. I measure no significant compile-time >> impact of enabling this feature by default. >> >> I also ran the test-suite on an X86-64 machine. I can't imagine any >> other targets being uniquely effected in terms of compile-time by >> turning this on after testing both AArch64 and X86-64. I also timed >> running the regression tests with -slp-vectorize-hor enabled and >> disabled, no significant difference here either. >> >> There are no significant performance regressions (or much >> improvements) on AArch64 in night-test suite. I do see wins in third >> party benchmarks when using this flag, which is why I'm asking if >> there would be any objection from the community to making >> -slp-vectorize-hor default on. >> >> I have run the regression tests and looked through the bug tracker / >> VC logs, I can't see any reason for not enabling it.+1 If there are no compile time and runtime regressions and if we are seeing wins in some benchmarks then we should enable this by default. At some point we should demote this flag from a command-line flag into a static variable in the code. Out of curiosity, how much of the compile time are we spending in the SLP vectorizer nowadays ?>> >> Thanks, >> Charlie. >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Charlie Turner via llvm-dev
2015-Nov-10 14:49 UTC
[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default
> Out of curiosity, how much of the compile time are we spending in the SLP vectorizer nowadays ?My measurements were originally based off the "real time" reports from /usr/bin/time (not the bash built-in), so I didn't have per-pass statistics to hand. I did a quick experiment in which I compiled each of the SPEC files with opt's -time-passes feature. The "raw" numbers show that SLP can take anywhere from 0 to 30% of the total optimization time. At the high end of that scale, things are a bit fast and loose. Some of the biggest offenders are in rather small bitcode files (where the total compile time is getting very small as well) The largest bitcode file[*] I had in SPEC2006 was about 1MiB. For that particular example, SLP took less than 1% of the opt time. For all bitcode files in SPEC2006 between 100KiB and 1MiB, SLP takes less than 5% of compile time. In tensor.bc (~ 80KiB) from SPEC2006, SLP took around 9.5% (+- 1%). This was a borderline case of a compile-time impact with horizontal reductions (about a 0.8% regression, so within stddev). There were actually swings the other way as well (i.e., SLP slower without horizontal reduction detection, so it's hard to make any judgment here) Another pretty interesting one is fnpovfpu.bc (~ 40KiB), where SLP took 17% of compile time. Anyway, I hope that gives a rough impression of what's going on. I was taking the wall clock time measurement from -time-passes. [*] I screwed up initially not reporting the overall compile time in my haste, so as a proxy metric, I went back and collected bitcode file sizes, which saved me from having to rerun everything :/
Nadav Rotem via llvm-dev
2015-Nov-10 15:27 UTC
[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default
> On Nov 10, 2015, at 6:49 AM, Charlie Turner <charlesturner7c5 at gmail.com> wrote: > >> Out of curiosity, how much of the compile time are we spending in the SLP vectorizer nowadays ? > > My measurements were originally based off the "real time" reports from > /usr/bin/time (not the bash built-in), so I didn't have per-pass > statistics to hand. I did a quick experiment in which I compiled each > of the SPEC files with opt's -time-passes feature. > > The "raw" numbers show that SLP can take anywhere from 0 to 30% of the > total optimization time. At the high end of that scale, things are a > bit fast and loose. Some of the biggest offenders are in rather small > bitcode files (where the total compile time is getting very small as > well) > > The largest bitcode file[*] I had in SPEC2006 was about 1MiB. For that > particular example, SLP took less than 1% of the opt time. > > For all bitcode files in SPEC2006 between 100KiB and 1MiB, SLP takes > less than 5% of compile time. > > In tensor.bc (~ 80KiB) from SPEC2006, SLP took around 9.5% (+- 1%). > This was a borderline case of a compile-time impact with horizontal > reductions (about a 0.8% regression, so within stddev). There were > actually swings the other way as well (i.e., SLP slower without > horizontal reduction detection, so it's hard to make any judgment > here) > > Another pretty interesting one is fnpovfpu.bc (~ 40KiB), where SLP > took 17% of compile time.Thanks for the detailed analysis Charlie. We should probably look into fnpovfpu.bc and figure out what’s going on there. Overall I think that the compile time numbers are reasonable.> > Anyway, I hope that gives a rough impression of what's going on. I was > taking the wall clock time measurement from -time-passes. > > [*] I screwed up initially not reporting the overall compile time in > my haste, so as a proxy metric, I went back and collected bitcode file > sizes, which saved me from having to rerun everything :/
Charlie Turner via llvm-dev
2015-Nov-11 13:04 UTC
[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default
> I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know.Do you have a time estimate on when you'll be able to get these numbers? Another option would be to default the flag on and revert if this does cause regressions on the targets you're interested in. TIA, Charlie. On 10 November 2015 at 10:39, Das, Dibyendu <Dibyendu.Das at amd.com> wrote:> I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know. > > -Thx > > -----Original Message----- > From: nrotem at apple.com [mailto:nrotem at apple.com] > Sent: Tuesday, November 10, 2015 3:33 AM > To: Charlie Turner > Cc: Das, Dibyendu; llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default > > >> On Nov 9, 2015, at 9:55 AM, Charlie Turner via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> I have not. I could feasibly do this, but I'm not set up to perform >> good experiments on X86-64 hardware. Furthermore, if I do it for >> X86-64, it only seems fair I should do it for the other backends as >> well, which is much less feasible for me. I'm reaching out the >> community to see if there's any objection based on their own >> measurements of this feature about defaulting it to on. >> >> Please let me know if you think I've got the wrong end of the >> etiquette stick here, and if so I'll try and acquire sensible numbers >> for other backends. >> >> Kind regards, >> Charlie. >> >> On 9 November 2015 at 17:50, Das, Dibyendu <Dibyendu.Das at amd.com> wrote: >>> Have you run cpu2006 for x86-64 for perf progression/regression ? > > I think it would be great if you could help Charlie with this. > >>> >>> Sent from my Windows Phone >>> ________________________________ >>> From: Charlie Turner via llvm-dev >>> Sent: 11/9/2015 11:15 PM >>> To: llvm-dev at lists.llvm.org >>> Subject: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by >>> default >>> >>> I've done compile-time experiments for AArch64 over SPEC{2000,2006} >>> and of course the test-suite. I measure no significant compile-time >>> impact of enabling this feature by default. >>> >>> I also ran the test-suite on an X86-64 machine. I can't imagine any >>> other targets being uniquely effected in terms of compile-time by >>> turning this on after testing both AArch64 and X86-64. I also timed >>> running the regression tests with -slp-vectorize-hor enabled and >>> disabled, no significant difference here either. >>> >>> There are no significant performance regressions (or much >>> improvements) on AArch64 in night-test suite. I do see wins in third >>> party benchmarks when using this flag, which is why I'm asking if >>> there would be any objection from the community to making >>> -slp-vectorize-hor default on. >>> >>> I have run the regression tests and looked through the bug tracker / >>> VC logs, I can't see any reason for not enabling it. > > > +1 > > If there are no compile time and runtime regressions and if we are seeing wins in some benchmarks then we should enable this by default. At some point we should demote this flag from a command-line flag into a static variable in the code. Out of curiosity, how much of the compile time are we spending in the SLP vectorizer nowadays ? > > > > >>> >>> Thanks, >>> Charlie. >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Das, Dibyendu via llvm-dev
2015-Nov-11 14:18 UTC
[llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default
We have started this. Since there are some holidays expect a small delay. Will let you know by Friday. Thx Sent from my Windows Phone ________________________________ From: Charlie Turner<mailto:charlesturner7c5 at gmail.com> Sent: 11/11/2015 6:34 PM To: Das, Dibyendu<mailto:Dibyendu.Das at amd.com> Cc: nrotem at apple.com<mailto:nrotem at apple.com>; llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default> I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know.Do you have a time estimate on when you'll be able to get these numbers? Another option would be to default the flag on and revert if this does cause regressions on the targets you're interested in. TIA, Charlie. On 10 November 2015 at 10:39, Das, Dibyendu <Dibyendu.Das at amd.com> wrote:> I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know. > > -Thx > > -----Original Message----- > From: nrotem at apple.com [mailto:nrotem at apple.com] > Sent: Tuesday, November 10, 2015 3:33 AM > To: Charlie Turner > Cc: Das, Dibyendu; llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default > > >> On Nov 9, 2015, at 9:55 AM, Charlie Turner via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> I have not. I could feasibly do this, but I'm not set up to perform >> good experiments on X86-64 hardware. Furthermore, if I do it for >> X86-64, it only seems fair I should do it for the other backends as >> well, which is much less feasible for me. I'm reaching out the >> community to see if there's any objection based on their own >> measurements of this feature about defaulting it to on. >> >> Please let me know if you think I've got the wrong end of the >> etiquette stick here, and if so I'll try and acquire sensible numbers >> for other backends. >> >> Kind regards, >> Charlie. >> >> On 9 November 2015 at 17:50, Das, Dibyendu <Dibyendu.Das at amd.com> wrote: >>> Have you run cpu2006 for x86-64 for perf progression/regression ? > > I think it would be great if you could help Charlie with this. > >>> >>> Sent from my Windows Phone >>> ________________________________ >>> From: Charlie Turner via llvm-dev >>> Sent: 11/9/2015 11:15 PM >>> To: llvm-dev at lists.llvm.org >>> Subject: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by >>> default >>> >>> I've done compile-time experiments for AArch64 over SPEC{2000,2006} >>> and of course the test-suite. I measure no significant compile-time >>> impact of enabling this feature by default. >>> >>> I also ran the test-suite on an X86-64 machine. I can't imagine any >>> other targets being uniquely effected in terms of compile-time by >>> turning this on after testing both AArch64 and X86-64. I also timed >>> running the regression tests with -slp-vectorize-hor enabled and >>> disabled, no significant difference here either. >>> >>> There are no significant performance regressions (or much >>> improvements) on AArch64 in night-test suite. I do see wins in third >>> party benchmarks when using this flag, which is why I'm asking if >>> there would be any objection from the community to making >>> -slp-vectorize-hor default on. >>> >>> I have run the regression tests and looked through the bug tracker / >>> VC logs, I can't see any reason for not enabling it. > > > +1 > > If there are no compile time and runtime regressions and if we are seeing wins in some benchmarks then we should enable this by default. At some point we should demote this flag from a command-line flag into a static variable in the code. Out of curiosity, how much of the compile time are we spending in the SLP vectorizer nowadays ? > > > > >>> >>> Thanks, >>> Charlie. >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/33cc2636/attachment.html>