similar to: (RFC) Adjusting default loop fully unroll threshold

Displaying 20 results from an estimated 2000 matches similar to: "(RFC) Adjusting default loop fully unroll threshold"

2017 Jan 30
2
(RFC) Adjusting default loop fully unroll threshold
On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Currently, loop fully unroller shares the same default threshold as loop > dynamic unroller and partial unroller. This seems conservative because > unlike dynamic/partial
2017 Jan 30
0
(RFC) Adjusting default loop fully unroll threshold
> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Currently, loop fully unroller shares the same default threshold as loop dynamic unroller and partial unroller. This seems conservative because unlike dynamic/partial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368
2017 Jan 31
0
(RFC) Adjusting default loop fully unroll threshold
On Mon, Jan 30, 2017 at 3:56 PM, Chandler Carruth <chandlerc at google.com> wrote: > On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >> Currently, loop fully unroller shares the same default
2017 Jan 31
3
(RFC) Adjusting default loop fully unroll threshold
> On Jan 30, 2017, at 4:56 PM, Dehao Chen <dehao at google.com> wrote: > > > > On Mon, Jan 30, 2017 at 3:56 PM, Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> wrote: > On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> On Jan 30,
2017 May 18
6
Enable vectorizer-maximize-bandwidth by default?
Hi, I'm proposing to make vectorizer-maximize-bandwidth on by default for loop vectorizer because it should generally help performance. I've tested the performance impact on Intel sandybridge machine with speccpu benchmarks: Benchmark Base:Reference (1) ------------------------------------------------------- spec/2006/fp/C++/444.namd 26.84
2016 Oct 27
2
(RFC) Encoding code duplication factor in discriminator
The impact to debug_line is actually not small. I only implemented the part 1 (encoding duplication factor) for loop unrolling and loop vectorization. The debug_line size overhead for "-O2 -g1" binary of speccpu C/C++ benchmarks: 433.milc 23.59% 444.namd 6.25% 447.dealII 8.43% 450.soplex 2.41% 453.povray 5.40% 470.lbm 0.00% 482.sphinx3 7.10% 400.perlbench 2.77% 401.bzip2 9.62% 403.gcc
2016 Oct 27
0
(RFC) Encoding code duplication factor in discriminator
The large percentages are from those tiny benchmarks. If you look at omnetpp (0.52%), and xalanc (1.46%), the increase is small. To get a better average increase, you can sum up total debug_line size before and after and compute percentage accordingly. David On Thu, Oct 27, 2016 at 1:11 PM, Dehao Chen <dehao at google.com> wrote: > The impact to debug_line is actually not small. I only
2016 Oct 27
0
(RFC) Encoding code duplication factor in discriminator
Do you have an estimate of the debug_line size increase? I guess it will be small. David On Thu, Oct 27, 2016 at 11:39 AM, Dehao Chen <dehao at google.com> wrote: > Motivation: > Many optimizations duplicate code. E.g. loop unroller duplicates the loop > body, GVN duplicates computation, etc. The duplicated code will share the > same debug info with the original code. For
2016 Mar 29
2
[CodeGen] CodeSize - TailMerging and BlockPlacement
Hi everyone, The code layout that TailMerging (inside BranchFolding) works on is not the final layout optimized based on the branch probability. Generally, after BlockPlacement, many new merging opportunities emerge. I did an experiment of adding additional BranchFolding and BlockPlacement after the existing BlockPlacement (i.e., -block-placement -branch-folder -block-placement) targeting
2012 Sep 29
7
[LLVMdev] LLVM's Pre-allocation Scheduler Tested against a Branch-and-Bound Scheduler
Hi, We are currently working on revising a journal article that describes our work on pre-allocation scheduling using LLVM and have some questions about LLVM's pre-allocation scheduler. The answers to these question will help us better document and analyze the results of our benchmark tests that compare our algorithm with LLVM's pre-allocation scheduling algorithm. First, here is a
2016 Aug 30
2
Fwd: cfl-aa
dear LLVMers, I am trying to use some of the LLVM alias analyses, and I would like to check two things with you: is scev-aa being maintained in LLVM 3.7? Second question: I run cfl-aa, and I got a very small number of pointer disambiguation (no alias) with it. My results for SPEC CINT 2006 follow below. Is this low number of no alias responses something to be excepted? Below the results that I
2017 May 30
8
Enable vectorizer-maximize-bandwidth by default?
On Fri, May 19, 2017 at 4:01 PM Adam Nemet via llvm-dev < llvm-dev at lists.llvm.org> wrote: > I will run it on Cyclone/AArch64 next week. > FYI, we're still waiting on these Adam... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170530/7cb390ca/attachment.html>
2016 Oct 27
8
(RFC) Encoding code duplication factor in discriminator
Motivation: Many optimizations duplicate code. E.g. loop unroller duplicates the loop body, GVN duplicates computation, etc. The duplicated code will share the same debug info with the original code. For SamplePGO, the debug info is used to present the profile. Code duplication will affect profile accuracy. Taking loop unrolling for example: #1 foo(); #2 for (i = 0; i < N; i++) { #3 bar();
2012 Sep 29
0
[LLVMdev] LLVM's Pre-allocation Scheduler Tested against a Branch-and-Bound Scheduler
On Sep 29, 2012, at 2:43 AM, Ghassan Shobaki <ghassan_shobaki at yahoo.com> wrote: > Hi, > > We are currently working on revising a journal article that describes our work on pre-allocation scheduling using LLVM and have some questions about LLVM's pre-allocation scheduler. The answers to these question will help us better document and analyze the results of our benchmark
2015 Jun 11
2
[LLVMdev] BasicAA unable to analyze recursive PHI nodes
----- Original Message ----- > From: "Tobias Edler von Koch" <tobias at codeaurora.org> > To: "Daniel Berlin" <dberlin at dberlin.org> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Thursday, June 11, 2015 10:02:37 AM > Subject: Re: [LLVMdev] BasicAA unable to analyze recursive PHI nodes > > Hi Daniel,
2012 Jun 05
2
[LLVMdev] [PATCH] add x32 psABI support
If you are interesting to play around X32, you may refer to http://sourceware.org/glibc/wiki/x32 to bootstrap a local environment on Linux. Yours - Michael -----Original Message----- From: cfe-commits-bounces at cs.uiuc.edu [mailto:cfe-commits-bounces at cs.uiuc.edu] On Behalf Of Liao, Michael Sent: Monday, June 04, 2012 5:09 PM To: llvm-commits at cs.uiuc.edu; cfe-commits at cs.uiuc.edu
2010 Jul 22
0
[LLVMdev] fp Question
On Jul 22, 2010, at 4:18 PMPDT, Reza Yazdani wrote: > Hi, > > I ran Spec2006 with -O4. All integer benchmarks passed, but only 8 > out 17 of floating point benchmarks passed. Is this normal or I > made a mistake in my build? Hi Reza. Somebody on Linux should answer, but I don't think it's normal. You may have checked out the source at a moment when it had a bug
2010 Jul 22
3
[LLVMdev] fp Question
Hi, I ran Spec2006 with -O4. All integer benchmarks passed, but only 8 out 17 of floating point benchmarks passed. Is this normal or I made a mistake in my build? Reza -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100722/4c4a81a9/attachment.html>
2010 Jul 23
3
[LLVMdev] fp Question
Following is the list of fp benchmarks that fail. They all pass with -O3, but some fail with -O4. I did the test run. Thanks, Reza Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- ---------
2017 Feb 13
5
(RFC) Adjusting default loop fully unroll threshold
FWIW, I'm good with the updated data, but I'd really like at least someone from Apple and someone from ARM to chime in here... CC-ing random people in the hope it helps... On Mon, Feb 13, 2017 at 8:30 AM Dehao Chen via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Thanks for the comment. The performance experiments were performed on > Intel Sandybridge. Updated this info to