thr3ads.net - llvm dev - [LLVMdev] Enabling the vectorizer for -Os -- ping [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Nadav Rotem

2013-Jun-14 04:37 UTC

[LLVMdev] Enabling the vectorizer for -Os -- ping

Hi,  

Last week I wrote llvm-dev and presented data that shows how enabling the
vectorizer on -Os can improve the performance of many workloads and that it has
negligible effects on code size.  I also added a command line switch to make it
easier for people to benchmark the vectorizer using -Os directly from clang
without changing LLVM.  Has anyone done any benchmarks on -Os + vectorization ?

Thanks,
Nadav

Renato Golin

2013-Jun-14 08:29 UTC

head link

[LLVMdev] Enabling the vectorizer for -Os -- ping

On 14 June 2013 05:37, Nadav Rotem <nrotem at apple.com> wrote:
> Last week I wrote llvm-dev and presented data that shows how enabling the
> vectorizer on -Os can improve the performance of many workloads and that it
> has negligible effects on code size.  I also added a command line switch to
> make it easier for people to benchmark the vectorizer using -Os directly
> from clang without changing LLVM.  Has anyone done any benchmarks on -Os +
> vectorization ?
>
Hi Nadav,

I haven't, sorry. I'll run some tests on my Chromebook and will let you
know.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130614/8fc2ab6a/attachment.html>

Renato Golin

2013-Jun-14 14:00 UTC

head link

[LLVMdev] Enabling the vectorizer for -Os -- ping

Hi Nadav,

No noticeable difference between "-Os" and "-Os -fvectorize"
in code size
or compilation times in my tests, and only minimal performance improvements
(small enough to be ignored).

cheers,
--renato


On 14 June 2013 09:29, Renato Golin <renato.golin at linaro.org> wrote:
> On 14 June 2013 05:37, Nadav Rotem <nrotem at apple.com> wrote:
>
>> Last week I wrote llvm-dev and presented data that shows how enabling
the
>> vectorizer on -Os can improve the performance of many workloads and
that it
>> has negligible effects on code size.  I also added a command line
switch to
>> make it easier for people to benchmark the vectorizer using -Os
directly
>> from clang without changing LLVM.  Has anyone done any benchmarks on
-Os +
>> vectorization ?
>>
>
> Hi Nadav,
>
> I haven't, sorry. I'll run some tests on my Chromebook and will let
you
> know.
>
> cheers,
> --renato
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130614/73689b14/attachment.html>

Chandler Carruth

2013-Jun-14 18:53 UTC

head link

[LLVMdev] Enabling the vectorizer for -Os -- ping

Sorry for the delays here. I am running our benchmark suite and will have
data in a day or so.
On Jun 13, 2013 9:40 PM, "Nadav Rotem" <nrotem at apple.com>
wrote:
> Hi,
>
> Last week I wrote llvm-dev and presented data that shows how enabling the
> vectorizer on -Os can improve the performance of many workloads and that it
> has negligible effects on code size.  I also added a command line switch to
> make it easier for people to benchmark the vectorizer using -Os directly
> from clang without changing LLVM.  Has anyone done any benchmarks on -Os +
> vectorization ?
>
> Thanks,
> Nadav
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130614/8be8c8d5/attachment.html>

Chandler Carruth

2013-Jun-16 04:10 UTC

head link

[LLVMdev] Enabling the vectorizer for -Os -- ping

All I have to say is *wow*. The vectorizer performs *remarkably* better now
than it did the last time I benchmarked it. I'm stunned.

I measured -O2 and -Os, as well as -march=x86-64 and -march=corei7-avx. My
hope with the latter two was to cover both worst-case and best-case in
terms of the quality of the vector ISA available.

First, binary size growth. This is measured on average across a reasonably
wide selection of binaries including large servers, video codecs, image
processing, etc.

O2, x86-64: 1% larger w/ vectorizer
O2, corei7-avx: 1.2% larger
Os, x86-64: 0.1% larger
Os, corei7-avx: < 0.1% larger

This is incredibly impressive IMO. =]

The performance numbers are also pretty good. There are a couple of minor
regressions, only one significant one. That one happens to be open source:
https://code.google.com/p/snappy/source/browse/trunk/snappy.cc this slows
down -- the vectorizer vectorizes a cold loop, which then gets inlined and
blocks subsequent inlining. (Many thanks to Ben Kramer for pointing out the
cause so quickly for me.) But there are a lot of potential solutions to
this problem:

1) vectorize after inlining -- this has some problems (code growth mostly)
but we might be able to solve them.
2) mark the cold path as cold so the optimizer is aware of it (tested this,
it seems to work, but i'm still experimenting)
3) rewrite this part of snappy to be fundamentally better (the code as it
is doesn't make a lot of sense to me, but i'm not an expert on it and
will
need time to figure out the best way to solve the issue)

I'm actually happy with any of the 3, although #2 isn't terribly
satisfying. But even if that's the result, I can live with it.

So essentially, I think you should turn the vectorizer on completely.
What's left seem very much like small isolated issues.

Thanks for driving this whole thing and giving me time to do some
evaluation. I'm really thrilled by the result.
-Chandler

On Fri, Jun 14, 2013 at 11:53 AM, Chandler Carruth <chandlerc at
google.com>wrote:
> Sorry for the delays here. I am running our benchmark suite and will have
> data in a day or so.
> On Jun 13, 2013 9:40 PM, "Nadav Rotem" <nrotem at
apple.com> wrote:
>
>> Hi,
>>
>> Last week I wrote llvm-dev and presented data that shows how enabling
the
>> vectorizer on -Os can improve the performance of many workloads and
that it
>> has negligible effects on code size.  I also added a command line
switch to
>> make it easier for people to benchmark the vectorizer using -Os
directly
>> from clang without changing LLVM.  Has anyone done any benchmarks on
-Os +
>> vectorization ?
>>
>> Thanks,
>> Nadav
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130615/c736da73/attachment.html>

Xinliang David Li

2013-Jun-16 16:15 UTC

head link

[LLVMdev] Enabling the vectorizer for -Os -- ping

More data point for you: Intel's ICC turns on loop vectorizer at -O2
and -Os too.

Cheers,

David

On Thu, Jun 13, 2013 at 9:37 PM, Nadav Rotem <nrotem at apple.com>
wrote:> Hi,
>
> Last week I wrote llvm-dev and presented data that shows how enabling the
vectorizer on -Os can improve the performance of many workloads and that it has
negligible effects on code size.  I also added a command line switch to make it
easier for people to benchmark the vectorizer using -Os directly from clang
without changing LLVM.  Has anyone done any benchmarks on -Os + vectorization ?
>
> Thanks,
> Nadav
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jun 2013 - [LLVMdev] Enabling the vectorizer for -Os -- ping

[LLVMdev] Enabling the vectorizer for -Os -- ping

[LLVMdev] Enabling the vectorizer for -Os -- ping

[LLVMdev] Enabling the vectorizer for -Os -- ping

[LLVMdev] Enabling the vectorizer for -Os -- ping

[LLVMdev] Enabling the vectorizer for -Os -- ping

[LLVMdev] Enabling the vectorizer for -Os -- ping

Possibly Parallel Threads