thr3ads.net - similar to: "My own codegen is 2.5x slower than llc?"

2018 May 29

0

My own codegen is 2.5x slower than llc?

> On 29 May 2018, at 22:02, David Jones via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > My back-end code generator uses LLVM 5.0.1 to optimize and generate code for x86_64. > > If I run it on a given sample of IR, it takes almost 5 minutes to generate object code. 95%+ of this time is spent in MergeConsecutiveStores(). (One function has a basic block with 14000

My own codegen is 2.5x slower than llc?

2018 May 29

0

My own codegen is 2.5x slower than llc?

What percentage of performance advantage do you expect to get from having a basic block with 14000 instructions, rather than breaking it up a bit? On Wed, May 30, 2018 at 12:02 AM, David Jones via llvm-dev < llvm-dev at lists.llvm.org> wrote: > My back-end code generator uses LLVM 5.0.1 to optimize and generate code > for x86_64. > > If I run it on a given sample of IR, it

[LLVMdev] i1 types in MergeConsecutiveStores

2015 May 12

2

[LLVMdev] i1 types in MergeConsecutiveStores

Hello LLVM, In DAGCombiner.cpp, MergeConsecutiveStores uses int64_t ElementSizeBytes = MemVT.getSizeInBits()/8; https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L10669 which is broken for i1 types where getSizeInBits() == 1. My out-of-tree target hits this case and eventually LLVM asserts in Type.cpp. Is there some reason MergeConsecutiveStores should

large slowdown in DAGCombiner::MergeConsecutiveStores

2020 Mar 19

2

large slowdown in DAGCombiner::MergeConsecutiveStores

Hello all, We are seeing a large compiler performance regression in moving from LLVM 6.0.1 to 8.0.1. We have a long function (~50000 instructions) that used to compile in about a minute but now takes at least an hour. All the time is in MergeConsecutiveStores, I believe due to super-linear behavior in analyzing very long chains of stores. For example, this change makes the problem go away: ```

[LLVMdev] DAGCompiler::MergeConsecutiveStores Question

2013 Nov 22

2

[LLVMdev] DAGCompiler::MergeConsecutiveStores Question

In DAGCombiner::MergeConsecutiveStores, there is this check: if (Index->getAlignment() != St->getAlignment()) break; Apparently this check ensures that all of the stores have the same alignment. Why is that necessary? This seems very overly restrictive to me. -David

[LLVMdev] DAGCompiler::MergeConsecutiveStores Question

2013 Nov 22

0

[LLVMdev] DAGCompiler::MergeConsecutiveStores Question

Hi David, You are right. This check is overly restrictive. We can replace this check with code that uses the alignment of the first store. Thanks, Nadav On Nov 22, 2013, at 9:31 AM, dag at cray.com wrote: > In DAGCombiner::MergeConsecutiveStores, there is this check: > > if (Index->getAlignment() != St->getAlignment()) > break; > > Apparently this check

[LLVMdev] DAGCombiner::MergeConsecutiveStores

2015 Feb 13

2

[LLVMdev] DAGCombiner::MergeConsecutiveStores

Hi, I'm quite puzzled by a little bit of code in the DAGCombiner where it merges loads in MergeConsecutiveStores. Two 16bit loads have been merged to one 32bit load, and two 16bit stores have been combined to one 32bit store. And then the code goes like this: // Replace one of the loads with the new load. LoadSDNode *Ld = cast<LoadSDNode>(LoadNodes[0].MemNode);

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

2

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hey Nadav, I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem <nrotem at apple.com> wrote: > Author:

Testing LLVM XRay

2018 Aug 27

2

Testing LLVM XRay

Hi All, I am trying to test run clang XRay tool. I was following the steps at [1]. But the log file does not seem to get generated. According to the instructions I used 'fxray-instrument' switch when compiling and then specified 'patch_premain=true' at XRAY_OPTIONS. Is there anything else that I need to do? I am on a trunk build of clang. Could that be it? I am on clang version

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

0

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hi Daniel, Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". Thanks, Nadav Sent from my iPhone > On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should

rL296252 Made large integer operation codegen significantly worse.

2017 Feb 25

2

rL296252 Made large integer operation codegen significantly worse.

Hi, I'm working with workload where the bottleneck is cryptographic signature checks. Or, in compiler terms, most large integer operations. Looking at rL296252 , the state of affair in that area degraded quite significantly, see test/CodeGen/X86/i256-add.ll for instance. Is there some kind of work in progress here and it is expected to get better ? Because if not, that's a big problem.

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

1

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is

Optimization of successive constant stores

2015 Dec 11

2

Optimization of successive constant stores

Hmm... found an interesting issue: Given: %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0 store i8 1, i8* %2, align 8 %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1 store i8 2, i8* %3, align 1 %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2 store i8 3, i8* %4, align 2 %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3

How to add assembly instructions in CodeGen

2018 May 09

1

How to add assembly instructions in CodeGen

Hi Dean, I looked at XRay. I also thought on the similar line to add assembly instructions as auxiliary template code and jump on to there. However, that may still dis-align the stack. I have to think about it. But your XRay code does give me the courage to think about this seriously. Thank you for your help. I also figured out that we can access certain CodeGen's feature right from the IR

[X-ray] How to check successful instrumentation and generate call trace?

2019 Jan 21

2

[X-ray] How to check successful instrumentation and generate call trace?

Hi all, I want to test X-ray performance and compare it with other research tools, so I use Clang 7.0.0 to compile and instrument GNU binutils-2.3.1 with the following commands: cd binutils-2.31/ mkdir build cd build/ CC=$local/clang CXX=$local/clang++ CFLAGS=-fxray-instrument CXXFLAGS=-fxray-instrument ../configure --prefix=/home/zhangysh1995/local make Then I extract instrumentation map

[LLVMdev] JIT compilation 2-3 times slower in latest LLVM snapshot

2015 Jul 11

2

[LLVMdev] JIT compilation 2-3 times slower in latest LLVM snapshot

On 11 July 2015 at 13:14, Caldarale, Charles R <Chuck.Caldarale at unisys.com> wrote: >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] >> On Behalf Of Dibyendu Majumdar >> Subject: [LLVMdev] JIT compilation 2-3 times slower in latest LLVM snapshot > >> I updated my clone of the LLVM github mirror today and I am finding >> that

Optimization of successive constant stores

2015 Dec 11

2

Optimization of successive constant stores

Consider the following: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %UodStructType = type { i8, i8, i8, i8, i32, i8* } define void @test(%UodStructType*) { %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0 store i8 1, i8* %2, align 8 %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1

OT: distribution of a pathological random variate

2007 Aug 29

3

OT: distribution of a pathological random variate

Folks, I wonder if anything could be said about the distribution of a random variate x, where x = N(0,1)/N(0,1) Obviously x is pathological because it could be 0/0. If we exclude this point, so the set is {x/(0/0)}, does x have a well defined distribution? or does it exist a distribution that approximates x. (The case could be generalized of course to N(mu1, sigma1)/N(mu2, sigma2) and one

alternative to rocks cluster

2007 Feb 09

3

alternative to rocks cluster

Hi I am after a solution where i can easily kickstart many, read hundreds, of boxes in a short time frame. Perhaps the way i install software is to actually re-kix the box with a new software baseline - that type of idea. I have looked at rocks and it looks good but it seems a little rigid in that i need to be able to determine certain things like hostname etc as in our env hostname

Disable optimization on basic block level

2017 Apr 24

3

Disable optimization on basic block level

How do you disable optimization for a function? I ask because my application often compiles machine-generated code that results in pathological structures that take a long time to optimize, for little benefit. As an example, if a basic block has over a million instructions in it, then DSE can take a while, as it is O(n^2) in the number of instructions in the block. In my application (at least),

similar to: My own codegen is 2.5x slower than llc?