thr3ads.net - similar to: "[LLVMdev] DAGCompiler::MergeConsecutiveStores Question"

[LLVMdev] DAGCompiler::MergeConsecutiveStores Question

2013 Nov 22

0

[LLVMdev] DAGCompiler::MergeConsecutiveStores Question

Hi David, You are right. This check is overly restrictive. We can replace this check with code that uses the alignment of the first store. Thanks, Nadav On Nov 22, 2013, at 9:31 AM, dag at cray.com wrote: > In DAGCombiner::MergeConsecutiveStores, there is this check: > > if (Index->getAlignment() != St->getAlignment()) > break; > > Apparently this check

[LLVMdev] i1 types in MergeConsecutiveStores

2015 May 12

2

[LLVMdev] i1 types in MergeConsecutiveStores

Hello LLVM, In DAGCombiner.cpp, MergeConsecutiveStores uses int64_t ElementSizeBytes = MemVT.getSizeInBits()/8; https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L10669 which is broken for i1 types where getSizeInBits() == 1. My out-of-tree target hits this case and eventually LLVM asserts in Type.cpp. Is there some reason MergeConsecutiveStores should

large slowdown in DAGCombiner::MergeConsecutiveStores

2020 Mar 19

2

large slowdown in DAGCombiner::MergeConsecutiveStores

Hello all, We are seeing a large compiler performance regression in moving from LLVM 6.0.1 to 8.0.1. We have a long function (~50000 instructions) that used to compile in about a minute but now takes at least an hour. All the time is in MergeConsecutiveStores, I believe due to super-linear behavior in analyzing very long chains of stores. For example, this change makes the problem go away: ```

[LLVMdev] DAGCombiner::MergeConsecutiveStores

2015 Feb 13

2

[LLVMdev] DAGCombiner::MergeConsecutiveStores

Hi, I'm quite puzzled by a little bit of code in the DAGCombiner where it merges loads in MergeConsecutiveStores. Two 16bit loads have been merged to one 32bit load, and two 16bit stores have been combined to one 32bit store. And then the code goes like this: // Replace one of the loads with the new load. LoadSDNode *Ld = cast<LoadSDNode>(LoadNodes[0].MemNode);

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

2

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hey Nadav, I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem <nrotem at apple.com> wrote: > Author:

[LLVMdev] Possible DAGCombiner or TargetData Bug

2009 Feb 19

0

[LLVMdev] Possible DAGCombiner or TargetData Bug

I agree, that doesn't look right. It looks like this is what was intended: Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp =================================================================== --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp (revision 65000) +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp (working copy) @@ -4903,9 +4903,9 @@ // resultant store does not need a higher alignment than

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

4

[LLVMdev] Limit loop vectorizer to SSE

Something like: index 6db7f68..68564cb 100644 --- a/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1208,6 +1208,8 @@ void InnerLoopVectorizer::vectorizeMemoryInstruction(Instr Type *DataTy = VectorType::get(ScalarDataTy, VF); Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand(); unsigned Alignment = LI ?

[LLVMdev] Possible DAGCombiner or TargetData Bug

2009 Feb 19

3

[LLVMdev] Possible DAGCombiner or TargetData Bug

I got bit by this in LLVM 2.4 DagCombiner.cpp and it's still in trunk: SDValue DAGCombiner::visitSTORE(SDNode *N) { [...] // If this is a store of a bit convert, store the input value if the // resultant store does not need a higher alignment than the original. if (Value.getOpcode() == ISD::BIT_CONVERT && !ST->isTruncatingStore() && ST->isUnindexed()) {

[LLVMdev] Possible DAGCombiner or TargetData Bug

2009 Feb 20

2

[LLVMdev] Possible DAGCombiner or TargetData Bug

On Wednesday 18 February 2009 21:43, Dan Gohman wrote: > I agree, that doesn't look right. It looks like this > is what was intended: > > Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp > =================================================================== > --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp (revision 65000) > +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

0

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hi Daniel, Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". Thanks, Nadav Sent from my iPhone > On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

0

[LLVMdev] Limit loop vectorizer to SSE

----- Original Message ----- > From: "Arnold Schwaighofer" <aschwaighofer at apple.com> > To: "Joshua Klontz" <josh.klontz at gmail.com> > Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu> > Sent: Friday, November 15, 2013 4:05:53 PM > Subject: Re: [LLVMdev] Limit loop vectorizer to SSE > > > Something like: > > index

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

1

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

2

[LLVMdev] Limit loop vectorizer to SSE

Yes, I was just about to send out: DL->getABITypeAlignment(ScalarDataTy); The question is: “… ABI alignment for the target …" is that getPrefTypeAlignment or getABITypeAlignment I would have thought the latter. On Nov 15, 2013, at 4:12 PM, Hal Finkel <hfinkel at anl.gov> wrote: > ----- Original Message ----- >> From: "Arnold Schwaighofer"

[LLVMdev] Bug in visitSIGN_EXTEND in DAGCombiner.cpp?

2013 Mar 11

0

[LLVMdev] Bug in visitSIGN_EXTEND in DAGCombiner.cpp?

> > Line 4501 in trunk DAGCombiner.cpp… I changed the ISD::SELECT to the VT.isVector() ? ISD::VSELECT : ISD::SELECT... > Thanks. From the commit message I think that we should only run this optimization on scalars. >> Can you write down the input SDNode ? What types are inputs ? > > 0x107046d10: v2i8 = vselect 0x107046c10, 0x107046b10, 0x107045e10 [ID=-3]

[LLVMdev] Bug in visitSIGN_EXTEND in DAGCombiner.cpp?

2013 Mar 11

3

[LLVMdev] Bug in visitSIGN_EXTEND in DAGCombiner.cpp?

On Mar 11, 2013, at 9:41 AM, Nadav Rotem <nrotem at apple.com<mailto:nrotem at apple.com>> wrote: Hi Richard, I did… It originates from an icmp ne <2x i8>, zero initializer followed by a sext of the result 2x i1 to 2x i8. When we visit the SIGN_EXTEND, we generate the ISD::SELECT even though the selector and both operands are vectors. It sounds like a bug in the dag combine

Optimization of successive constant stores

2015 Dec 11

2

Optimization of successive constant stores

Hmm... found an interesting issue: Given: %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0 store i8 1, i8* %2, align 8 %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1 store i8 2, i8* %3, align 1 %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2 store i8 3, i8* %4, align 2 %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

0

[LLVMdev] Limit loop vectorizer to SSE

Nadav, I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere. v/r, Josh [1]

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

6

[LLVMdev] Limit loop vectorizer to SSE

On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at linaro.org> wrote: > On 15 November 2013 20:24, Joshua Klontz <josh.klontz at gmail.com> wrote: > Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks! > > I think

[LLVMdev] Bug in visitSIGN_EXTEND in DAGCombiner.cpp?

2013 Mar 11

2

[LLVMdev] Bug in visitSIGN_EXTEND in DAGCombiner.cpp?

On Mar 8, 2013, at 2:29 PM, Nadav Rotem <nrotem at apple.com<mailto:nrotem at apple.com>> wrote: Hi Richard, visitSIGN_EXTEND() in DAGCombiner.cpp generates an ISD::SELECT even if VT is a vector, which causes ExpandSELECT() to assert during legalization. I think what's required is to have visitSIGN_EXTEND generate a VSELECT if VT is a vector… ISD::SELECT should be used for

My own codegen is 2.5x slower than llc?

2018 May 29

4

My own codegen is 2.5x slower than llc?

My back-end code generator uses LLVM 5.0.1 to optimize and generate code for x86_64. If I run it on a given sample of IR, it takes almost 5 minutes to generate object code. 95%+ of this time is spent in MergeConsecutiveStores(). (One function has a basic block with 14000 instructions, which is a pathological case for MergeConsecutiveStores.) If, instead, I dump out the LLVM IR, and manually

similar to: [LLVMdev] DAGCompiler::MergeConsecutiveStores Question