Displaying 20 results from an estimated 500 matches similar to: "Enable vectorizer-maximize-bandwidth by default?"
2016 Oct 07
7
Debug info interacting with optimization and code generation
In theory, compiler should generate bit-identical code with and without
debug info. I.e.
# clang -c -O2 -g a.cc -o a.g.o
# clang -c -O2 -g0 a.cc -o a.g0.o
# strip a.g.o a.g0.o
# diff a.g.o a.g0.o
The diff should find two binaries identical. For brevity, in the rest of
the mail, I'll refer to this requirement as "codegen consistency" (any
better name?)
Unfortunately, LLVM does not
2017 Jan 30
4
(RFC) Adjusting default loop fully unroll threshold
Currently, loop fully unroller shares the same default threshold as loop
dynamic unroller and partial unroller. This seems conservative because
unlike dynamic/partial unrolling, fully unrolling will not affect
LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to
double the threshold for loop fully unroller. This will change the codegen
of several SPECCPU benchmarks:
Code
2016 Oct 27
2
(RFC) Encoding code duplication factor in discriminator
The impact to debug_line is actually not small. I only implemented the part
1 (encoding duplication factor) for loop unrolling and loop vectorization.
The debug_line size overhead for "-O2 -g1" binary of speccpu C/C++
benchmarks:
433.milc 23.59%
444.namd 6.25%
447.dealII 8.43%
450.soplex 2.41%
453.povray 5.40%
470.lbm 0.00%
482.sphinx3 7.10%
400.perlbench 2.77%
401.bzip2 9.62%
403.gcc
2017 Jan 30
2
(RFC) Adjusting default loop fully unroll threshold
On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Currently, loop fully unroller shares the same default threshold as loop
> dynamic unroller and partial unroller. This seems conservative because
> unlike dynamic/partial
2016 Oct 27
0
(RFC) Encoding code duplication factor in discriminator
The large percentages are from those tiny benchmarks. If you look at
omnetpp (0.52%), and xalanc (1.46%), the increase is small. To get a better
average increase, you can sum up total debug_line size before and after and
compute percentage accordingly.
David
On Thu, Oct 27, 2016 at 1:11 PM, Dehao Chen <dehao at google.com> wrote:
> The impact to debug_line is actually not small. I only
2017 Jan 31
0
(RFC) Adjusting default loop fully unroll threshold
On Mon, Jan 30, 2017 at 3:56 PM, Chandler Carruth <chandlerc at google.com>
wrote:
> On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Currently, loop fully unroller shares the same default
2017 Jan 30
0
(RFC) Adjusting default loop fully unroll threshold
> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> Currently, loop fully unroller shares the same default threshold as loop dynamic unroller and partial unroller. This seems conservative because unlike dynamic/partial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368
2016 Oct 27
0
(RFC) Encoding code duplication factor in discriminator
Do you have an estimate of the debug_line size increase? I guess it will be
small.
David
On Thu, Oct 27, 2016 at 11:39 AM, Dehao Chen <dehao at google.com> wrote:
> Motivation:
> Many optimizations duplicate code. E.g. loop unroller duplicates the loop
> body, GVN duplicates computation, etc. The duplicated code will share the
> same debug info with the original code. For
2017 Jan 31
3
(RFC) Adjusting default loop fully unroll threshold
> On Jan 30, 2017, at 4:56 PM, Dehao Chen <dehao at google.com> wrote:
>
>
>
> On Mon, Jan 30, 2017 at 3:56 PM, Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> wrote:
> On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> On Jan 30,
2016 Oct 27
8
(RFC) Encoding code duplication factor in discriminator
Motivation:
Many optimizations duplicate code. E.g. loop unroller duplicates the loop
body, GVN duplicates computation, etc. The duplicated code will share the
same debug info with the original code. For SamplePGO, the debug info is
used to present the profile. Code duplication will affect profile accuracy.
Taking loop unrolling for example:
#1 foo();
#2 for (i = 0; i < N; i++) {
#3 bar();
2011 May 16
2
wireframe advice - with reproducible code
Dear List,
i am trying to produce a 3d plot using wireframe using the code:
wireframe(Residuals_FD ~ Elevation * Temperature, data = data2, scales = list(arrows = FALSE), drape = TRUE, colorkey = TRUE)
As you can see when the code (using the data below) is run the plot area is set-up correctly but the actual surface is missing?
Any help would be greatly appreciated.
Chris
#data
Elevation
2017 Feb 13
5
(RFC) Adjusting default loop fully unroll threshold
FWIW, I'm good with the updated data, but I'd really like at least someone
from Apple and someone from ARM to chime in here... CC-ing random people in
the hope it helps...
On Mon, Feb 13, 2017 at 8:30 AM Dehao Chen via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Thanks for the comment. The performance experiments were performed on
> Intel Sandybridge. Updated this info to
2017 Feb 10
4
(RFC) Adjusting default loop fully unroll threshold
On 02/10/2017 05:21 PM, Dehao Chen wrote:
> Thanks every for the comments.
>
> Do we have a decision here?
You're good to go as far as I'm concerned.
-Hal
>
> Dehao
>
> On Tue, Feb 7, 2017 at 10:24 PM, Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>> wrote:
>
>
> On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote:
2017 May 30
5
Enable vectorizer-maximize-bandwidth by default?
On Tue, May 30, 2017 at 1:40 AM Agabaria, Mohammed via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> We’re seeing nice improvements but also significant degradations on IA,
> which we would like to investigate before the patch is committed.
>
>
>
> Major degradations we see:
>
>
>
> networking
>
> ip_pktcheckb1m -6.80 %
>
>
2017 Jun 12
2
Enable vectorizer-maximize-bandwidth by default?
Guys, Just to clarify that with the current fix in SLM there is no need to wait for other issues to be fixed (minor issue).
So you can move on with your patch.
From: Agabaria, Mohammed
Sent: Wednesday, June 07, 2017 15:24
To: Zaks, Ayal <ayal.zaks at intel.com>; Chandler Carruth <chandlerc at gmail.com>; Flamedoge <code.kchoi at gmail.com>; Dehao Chen <dehao at google.com>
2017 Feb 08
2
(RFC) Adjusting default loop fully unroll threshold
On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote:
> Sorry if I missed it, but what machine/CPU are you using to collect
> the perf numbers?
>
> I am concerned that what may be a win on a CPU that keeps a couple of
> hundred instructions in-flight and has many MB of caches will not hold
> for a small core.
In my experience, unrolling tends to help weaker cores even more
2017 Feb 07
2
(RFC) Adjusting default loop fully unroll threshold
Ping... with the updated code size impact data, any more comments? Any more
data that would be interesting to collect?
Thanks,
Dehao
On Thu, Feb 2, 2017 at 2:07 PM, Dehao Chen <dehao at google.com> wrote:
> Here is the code size impact for clang, chrome and 24 google internal
> benchmarks (name omited, 14 15 16 are encoding/decoding benchmarks similar
> as h264). There are 2
2017 Feb 02
2
(RFC) Adjusting default loop fully unroll threshold
I had suggested having size metrics from somewhat larger applications such
as Chrome, Webkit, or Firefox; clang itself; and maybe some of our internal
binaries with rough size brackets?
On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen <dehao at google.com> wrote:
> With the new data points, any comments on whether this can justify setting
> fully inline threshold to 300 (or any other
2016 Nov 04
2
(RFC) Encoding code duplication factor in discriminator
Discussed with Hal, Adrain and Paul offline at the llvm dev meeting today.
* trip count is not enough for vectorization, there is runtime check that
might go false, which can be reflected in profile that we may want to
preserve.
* simply recording these context-profile may cause problems to
iterative-sample-pgo. i.e. when you find a loop's vectorized version no
executed (due to runtime
2017 Feb 02
2
(RFC) Adjusting default loop fully unroll threshold
> On Feb 1, 2017, at 4:57 PM, Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> clang, chrome, and some internal large apps are good candidates for size metrics.
I'd also add the standard LLVM testsuite just because it's the suite everyone in the community can use.
Michael
>
> David
>
> On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via