thr3ads.net - similar to: "[LLVMdev] i1 types in MergeConsecutiveStores"

large slowdown in DAGCombiner::MergeConsecutiveStores

2020 Mar 19

2

large slowdown in DAGCombiner::MergeConsecutiveStores

Hello all, We are seeing a large compiler performance regression in moving from LLVM 6.0.1 to 8.0.1. We have a long function (~50000 instructions) that used to compile in about a minute but now takes at least an hour. All the time is in MergeConsecutiveStores, I believe due to super-linear behavior in analyzing very long chains of stores. For example, this change makes the problem go away: ```

[LLVMdev] DAGCompiler::MergeConsecutiveStores Question

2013 Nov 22

2

[LLVMdev] DAGCompiler::MergeConsecutiveStores Question

In DAGCombiner::MergeConsecutiveStores, there is this check: if (Index->getAlignment() != St->getAlignment()) break; Apparently this check ensures that all of the stores have the same alignment. Why is that necessary? This seems very overly restrictive to me. -David

[LLVMdev] DAGCompiler::MergeConsecutiveStores Question

2013 Nov 22

0

[LLVMdev] DAGCompiler::MergeConsecutiveStores Question

Hi David, You are right. This check is overly restrictive. We can replace this check with code that uses the alignment of the first store. Thanks, Nadav On Nov 22, 2013, at 9:31 AM, dag at cray.com wrote: > In DAGCombiner::MergeConsecutiveStores, there is this check: > > if (Index->getAlignment() != St->getAlignment()) > break; > > Apparently this check

[LLVMdev] DAGCombiner::MergeConsecutiveStores

2015 Feb 13

2

[LLVMdev] DAGCombiner::MergeConsecutiveStores

Hi, I'm quite puzzled by a little bit of code in the DAGCombiner where it merges loads in MergeConsecutiveStores. Two 16bit loads have been merged to one 32bit load, and two 16bit stores have been combined to one 32bit store. And then the code goes like this: // Replace one of the loads with the new load. LoadSDNode *Ld = cast<LoadSDNode>(LoadNodes[0].MemNode);

My own codegen is 2.5x slower than llc?

2018 May 29

4

My own codegen is 2.5x slower than llc?

My back-end code generator uses LLVM 5.0.1 to optimize and generate code for x86_64. If I run it on a given sample of IR, it takes almost 5 minutes to generate object code. 95%+ of this time is spent in MergeConsecutiveStores(). (One function has a basic block with 14000 instructions, which is a pathological case for MergeConsecutiveStores.) If, instead, I dump out the LLVM IR, and manually

Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT'

2017 Sep 17

2

Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT'

Please open a bugzilla ticket and attach your testcase. It will allow us to debug and fix the problem. Thanks - Elena From: JinGu [mailto:jingu at codeplay.com] Sent: Saturday, September 16, 2017 00:38 To: Demikhovsky, Elena <elena.demikhovsky at intel.com>; daniel_l_sanders at apple.com <daniel_l_sanders at apple.com>; Jon Chesterfield <jonathanchesterfield at

My own codegen is 2.5x slower than llc?

2018 May 29

0

My own codegen is 2.5x slower than llc?

> On 29 May 2018, at 22:02, David Jones via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > My back-end code generator uses LLVM 5.0.1 to optimize and generate code for x86_64. > > If I run it on a given sample of IR, it takes almost 5 minutes to generate object code. 95%+ of this time is spent in MergeConsecutiveStores(). (One function has a basic block with 14000

Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT'

2017 Sep 18

1

Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT'

> so I think we need to use non-extending load for element size less than 8bit on "DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT" like this roughly. > if (N->getOperand(0).getValueType().getVectorElementType().getSizeInBits() < 8) { > return DAG.getLoad(N->getValueType(0), dl, Store, StackPtr, MachinePointerInfo()); > } else { > return

My own codegen is 2.5x slower than llc?

2018 May 29

0

My own codegen is 2.5x slower than llc?

What percentage of performance advantage do you expect to get from having a basic block with 14000 instructions, rather than breaking it up a bit? On Wed, May 30, 2018 at 12:02 AM, David Jones via llvm-dev < llvm-dev at lists.llvm.org> wrote: > My back-end code generator uses LLVM 5.0.1 to optimize and generate code > for x86_64. > > If I run it on a given sample of IR, it

Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT'

2017 Sep 15

2

Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT'

> extends the elements to 8bit and stores them on stack. Store is responsible for zero-extend. This is the policy... - Elena -----Original Message----- From: jingu at codeplay.com [mailto:jingu at codeplay.com] Sent: Friday, September 15, 2017 17:45 To: llvm-dev at lists.llvm.org; Demikhovsky, Elena <elena.demikhovsky at intel.com>; daniel_l_sanders at apple.com Subject: Re: Question

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

2

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hey Nadav, I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem <nrotem at apple.com> wrote: > Author:

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

0

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hi Daniel, Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". Thanks, Nadav Sent from my iPhone > On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

1

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is

Optimization of successive constant stores

2015 Dec 11

2

Optimization of successive constant stores

Hmm... found an interesting issue: Given: %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0 store i8 1, i8* %2, align 8 %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1 store i8 2, i8* %3, align 1 %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2 store i8 3, i8* %4, align 2 %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3

RFC: atomic operations on SI+

2016 Mar 25

2

RFC: atomic operations on SI+

Hi Tom, Matt, I'm working on a project that needs few coherent atomic operations (HSA mode: load, store, compare-and-swap) for std::atomic_uint in HCC. the attached patch implements atomic compare and swap for SI+ (untested). I tried to stay within what was available, but there are few issues that I was unsure how to address: 1.) it currently uses v2i32 for both input and output. This

Optimization of successive constant stores

2015 Dec 11

2

Optimization of successive constant stores

Consider the following: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %UodStructType = type { i8, i8, i8, i8, i32, i8* } define void @test(%UodStructType*) { %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0 store i8 1, i8* %2, align 8 %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1

RFC: atomic operations on SI+

2016 Mar 28

0

RFC: atomic operations on SI+

On Fri, Mar 25, 2016 at 02:22:11PM -0400, Jan Vesely wrote: > Hi Tom, Matt, > > I'm working on a project that needs few coherent atomic operations (HSA > mode: load, store, compare-and-swap) for std::atomic_uint in HCC. > > the attached patch implements atomic compare and swap for SI+ > (untested). I tried to stay within what was available, but there are > few issues

[LLVMdev] r114523 (convert the last 4 X86ISD...) breaks clang

2010 Sep 22

2

[LLVMdev] r114523 (convert the last 4 X86ISD...) breaks clang

Hello, After commit r114523, I start to get crash when compiling with clang (Release+Asserts) for i386: (I know I should fill a bug report instead of posting here, but I don't get much time right now). Trying to compile the following simple code, clang asserts. ---------- round.c -------- #include <math.h> float test() { return llround(1); } -------------------- [MacPro:~/Desktop]

[LLVMdev] SelectionDAG constant folding leads to assertion failure

2010 Mar 15

1

[LLVMdev] SelectionDAG constant folding leads to assertion failure

My experimental code calls DAG.getNode to construct a unary node with a flag result. Unfortunately the argument turns out to be constant, so lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2332 calls VT.getSizeInBits on the flag type, which isSimple(), so we call V.getSizeInBits at ValueTypes.h:560 and fail at ValueTypes.h:240: clang: .../include/llvm/CodeGen/ValueTypes.h:240: unsigned int

[LLVMdev] LegalizeDAG Error?

2009 Dec 22

2

[LLVMdev] LegalizeDAG Error?

The LegalizeDAG.cpp file has this code in SelectionDAGLegalize::PromoteNode: case ISD::BSWAP: { unsigned DiffBits = NVT.getSizeInBits() - OVT.getSizeInBits(); Tmp1 = DAG.getNode(ISD::ZERO_EXTEND, dl, NVT, Tmp1); Tmp1 = DAG.getNode(ISD::BSWAP, dl, NVT, Tmp1); Tmp1 = DAG.getNode(ISD::SRL, dl, NVT, Tmp1, DAG.getConstant(DiffBits, TLI.getShiftAmountTy()));

similar to: [LLVMdev] i1 types in MergeConsecutiveStores