thr3ads.net - similar to: "[LLVMdev] [Shrink-Wrapping] Request For Benchmarking: X86 and AArch64"

Displaying 20 results from an estimated 700 matches similar to: "[LLVMdev] [Shrink-Wrapping] Request For Benchmarking: X86 and AArch64"

[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?

2015 Feb 26

[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?

Hi all, I've started looking at the GlobalMerge pass, enabled by default on ARM and AArch64. I think we should reconsider that, at least for AArch64. As is, the pass just merges all globals together, in groups of 4KB (AArch64, 128B on ARM). At the time it was enabled, the general thinking was "it's almost free, it doesn't affect performance much, we might as well use it".

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

2013 Jul 18

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

Andy and I briefly discussed this the other day, we have not yet got chance to list a detailed pass order for the pre- and post- IPO scalar optimizations. This is wish-list in our mind: pre-IPO: based on the ordering he propose, get rid of the inlining (or just inline tiny func), get rid of all loop xforms... post-IPO: get rid of inlining, or maybe we still need it, only

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

2016 Oct 12

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote: > I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences. > The following tests pass at "-O3" and

[LLVMdev] TSVC/Equivalencing-dbl

2012 Oct 05

[LLVMdev] TSVC/Equivalencing-dbl

----- Original Message ----- > From: "Duncan Sands" <duncan.sands at gmail.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: llvmdev at cs.uiuc.edu > Sent: Friday, October 5, 2012 2:50:06 PM > Subject: Re: TSVC/Equivalencing-dbl > > Hi Hal, > > On 05/10/12 20:32, Hal Finkel wrote: > > ----- Original Message ----- > >>

[LLVMdev] TSVC/Equivalencing-dbl

2012 Oct 05

[LLVMdev] TSVC/Equivalencing-dbl

----- Original Message ----- > From: "Duncan Sands" <duncan.sands at gmail.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: llvmdev at cs.uiuc.edu > Sent: Friday, October 5, 2012 12:10:03 PM > Subject: Re: TSVC/Equivalencing-dbl > > Oops, I ran the testsuite wrong: read clang output for dragonegg > output. Okay, can you resummarize? Do

[LLVMdev] TSVC/Equivalencing-dbl

2012 Oct 07

[LLVMdev] TSVC/Equivalencing-dbl

Hi Hal, To get my understanding right, is this a test-case problem or there is a problem with x86 code generation?. I can spend some time to look into the problem. Thanks, Shivaram -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel Sent: Saturday, October 06, 2012 1:57 AM To: Duncan Sands Cc: llvmdev at cs.uiuc.edu

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

2015 May 15

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

tl;dr in low data situations we don’t look at past information, and that increases the false positive regression rate. We should look at the possibly incorrect recent past runs to fix that. Motivation: LNT’s current regression detection system has false positive rate that is too high to make it useful. With test suites as large as the llvm “test-suite” a single report will show hundreds of

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

2018 Aug 14

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

Summary ======= I'm planning on adjusting SimplifyCFG so that it doesn't turn two-entry phi nodes into selects until later in the pass pipeline, to give passes which can understand phis but not selects more opportunity to optimize. The thing I'm trying to do which made me think of doing this is described below, but from the benchmarking I've done it looks like this is overall a

[LLVMdev] TSVC/Equivalencing-dbl

2012 Oct 05

[LLVMdev] TSVC/Equivalencing-dbl

PS: Here's how I can reproduce with clang on linux: clang -S -o tsc.ll -O0 -flto -std=gnu99 tsc.c ; clang -S -o dummy.ll -O0 -flto -std=gnu99 dummy.c ; opt -std-compile-opts tsc.ll -S -o tsc.1.ll ; opt -std-compile-opts dummy.ll -S -o dummy.1.ll ; llvm-link tsc.1.ll dummy.1.ll -S -o total.ll ; opt -std-link-opts total.ll -S -o total.1.ll ; llc total.1.ll ; gcc -o z total.1.s The program

[LLVMdev] TSVC/Equivalencing-dbl

2012 Oct 05

[LLVMdev] TSVC/Equivalencing-dbl

Hi Hal, On 05/10/12 20:32, Hal Finkel wrote: > ----- Original Message ----- >> From: "Duncan Sands" <duncan.sands at gmail.com> >> To: "Hal Finkel" <hfinkel at anl.gov> >> Cc: llvmdev at cs.uiuc.edu >> Sent: Friday, October 5, 2012 12:10:03 PM >> Subject: Re: TSVC/Equivalencing-dbl >> >> Oops, I ran the testsuite wrong:

[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization

2013 Sep 14

[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization

Hello all, I have evaluated the compile-time and execution-time performance of Polly canonicalization passes. Details can be referred to http://188.40.87.11:8000/db_default/v4/nts/recent_activity. There are four runs: pollyBasic (run 45): clang -O3 -Xclang -load -Xclang LLVMPolly.so pollyNoGenSCEV (run 44): clang -O3 -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-codegen-scev

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

2018 Aug 15

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

I'm concerned that we're focusing on one side of this. Let me point out a few concerns w/changing the canonical form here: 1. LICM does not know how to hoist or sink regions. It does know how to hoist and sink selects. 2. InstCombine has limited support for triangles/diamonds, but fairly extensive support for selects. 3. EarlyCSE and GVN do not know how to eliminate fully

[LLVMdev] TSVC/Equivalencing-dbl

2012 Oct 05

[LLVMdev] TSVC/Equivalencing-dbl

Hi Hal, I was looking into why this fails with dragonegg, and noticed the following: if I compile with GCC (-O0) then I get as output: Running each loop 3125 times... Loop Time(Sec) Checksum S421 0.00 32010.620068485 S1421 0.00 16000 S422 0.00 3.7377231414078 S423 0.00 32000.736895702 S424 0.00 32822.36069424 This is the same as the reference output. If I run exactly the

[LLVMdev] TSVC/Equivalencing-dbl

2012 Oct 05

[LLVMdev] TSVC/Equivalencing-dbl

Oops, I ran the testsuite wrong: read clang output for dragonegg output.

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

2018 Aug 17

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

> On Aug 15, 2018, at 10:57 PM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > On 08/15/2018 02:38 PM, Philip Reames via llvm-dev wrote: >> I'm concerned that we're focusing on one side of this. Let me point out a few concerns w/changing the canonical form here: >> >> LICM does not know how to hoist or sink regions. It does know

[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization

2013 Sep 13

[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization

At 2013-09-09 13:07:07,"Tobias Grosser" <tobias at grosser.es> wrote: >On 09/09/2013 05:18 AM, Star Tan wrote: >> >> At 2013-09-09 05:52:35,"Tobias Grosser" <tobias at grosser.es> wrote: >> >>> On 09/08/2013 08:03 PM, Star Tan wrote: >>> Also, I wonder if your runs include the dependence analysis. If this is >>> the

Compare test-suite benchmarks performance complied without TBAA, with default TBAA and with new TBAA struct path

2018 Apr 26

Compare test-suite benchmarks performance complied without TBAA, with default TBAA and with new TBAA struct path

Hello, I was interested in how much Type-Based Alias Analysis helps to optimize code. For that purpose, I've compared three sets of benchmarks: compiled without TBAA, compiled with a default TBAA metadata format, and compiled with new TBAA metadata format. As a set of benchmarks, I've used the LLVM test suite (http://llvm.org/docs/TestingGuide.html#test-suite-overview) which has a lot of

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

2016 Oct 12

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On 12 October 2016 at 13:04, Sebastian Pop <sebpop.llvm at gmail.com> wrote: > The other problem is the reference output does not match > at "-O0 -ffp-contract=off". It might be that the reference output was recorded > at "-O3 -ffp-contract=off". I think that this hides either a compiler > bug or a test bug. Ah, yes! You mentioned before and I forgot to

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

2016 Oct 12

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On 12 October 2016 at 14:26, Sebastian Pop <sebpop.llvm at gmail.com> wrote: > Correct me if I misunderstood: you would be ok changing the > reference output to exactly match the output of "-O0 -ffp-contract=off". No, that's not at all what I said. Matching identical outputs to FP tests makes no sense because there's *always* an error bar. The output of O0, O1, O2,

[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization

2013 Sep 08

[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization

Hello all, I have done some basic experiments about Polly canonicalization passes and I found the SCEV canonicalization has significant impact on both compile-time and execution-time performance. Detailed results for SCEV and default canonicalization can be viewed on: http://188.40.87.11:8000/db_default/v4/nts/32 (or 33, 34) *pNoGen with SCEV canonicalization (run 32): -O3 -Xclang -load

similar to: [LLVMdev] [Shrink-Wrapping] Request For Benchmarking: X86 and AArch64