similar to: large slowdown in DAGCombiner::MergeConsecutiveStores

Displaying 20 results from an estimated 200 matches similar to: "large slowdown in DAGCombiner::MergeConsecutiveStores"

2015 May 12
2
[LLVMdev] i1 types in MergeConsecutiveStores
Hello LLVM, In DAGCombiner.cpp, MergeConsecutiveStores uses int64_t ElementSizeBytes = MemVT.getSizeInBits()/8; https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L10669 which is broken for i1 types where getSizeInBits() == 1. My out-of-tree target hits this case and eventually LLVM asserts in Type.cpp. Is there some reason MergeConsecutiveStores should
2013 Nov 22
2
[LLVMdev] DAGCompiler::MergeConsecutiveStores Question
In DAGCombiner::MergeConsecutiveStores, there is this check: if (Index->getAlignment() != St->getAlignment()) break; Apparently this check ensures that all of the stores have the same alignment. Why is that necessary? This seems very overly restrictive to me. -David
2015 Feb 13
2
[LLVMdev] DAGCombiner::MergeConsecutiveStores
Hi, I'm quite puzzled by a little bit of code in the DAGCombiner where it merges loads in MergeConsecutiveStores. Two 16bit loads have been merged to one 32bit load, and two 16bit stores have been combined to one 32bit store. And then the code goes like this: // Replace one of the loads with the new load. LoadSDNode *Ld = cast<LoadSDNode>(LoadNodes[0].MemNode);
2013 Nov 22
0
[LLVMdev] DAGCompiler::MergeConsecutiveStores Question
Hi David, You are right. This check is overly restrictive. We can replace this check with code that uses the alignment of the first store. Thanks, Nadav On Nov 22, 2013, at 9:31 AM, dag at cray.com wrote: > In DAGCombiner::MergeConsecutiveStores, there is this check: > > if (Index->getAlignment() != St->getAlignment()) > break; > > Apparently this check
2018 May 29
4
My own codegen is 2.5x slower than llc?
My back-end code generator uses LLVM 5.0.1 to optimize and generate code for x86_64. If I run it on a given sample of IR, it takes almost 5 minutes to generate object code. 95%+ of this time is spent in MergeConsecutiveStores(). (One function has a basic block with 14000 instructions, which is a pathological case for MergeConsecutiveStores.) If, instead, I dump out the LLVM IR, and manually
2018 May 29
0
My own codegen is 2.5x slower than llc?
> On 29 May 2018, at 22:02, David Jones via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > My back-end code generator uses LLVM 5.0.1 to optimize and generate code for x86_64. > > If I run it on a given sample of IR, it takes almost 5 minutes to generate object code. 95%+ of this time is spent in MergeConsecutiveStores(). (One function has a basic block with 14000
2018 May 29
0
My own codegen is 2.5x slower than llc?
What percentage of performance advantage do you expect to get from having a basic block with 14000 instructions, rather than breaking it up a bit? On Wed, May 30, 2018 at 12:02 AM, David Jones via llvm-dev < llvm-dev at lists.llvm.org> wrote: > My back-end code generator uses LLVM 5.0.1 to optimize and generate code > for x86_64. > > If I run it on a given sample of IR, it
2013 Jul 27
2
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hey Nadav, I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem <nrotem at apple.com> wrote: > Author:
2013 Apr 06
2
[LLVMdev] Integer divide by zero
A division intrinsic with defined behavior on all arguments would be awesome! Thanks for considering this. On Sat, Apr 6, 2013 at 11:27 AM, Joe Groff <arcata at gmail.com> wrote: > On Saturday, April 6, 2013, Jeff Bezanson wrote: >> >> >> Presumably the optimizer benefits from taking advantage of the >> undefined behavior, but to get a consistent result you need
2013 Apr 06
0
[LLVMdev] Integer divide by zero
On Sat, Apr 6, 2013 at 3:22 PM, Jeff Bezanson <jeff.bezanson at gmail.com>wrote: > A division intrinsic with defined behavior on all arguments would be > awesome! Thanks for considering this. 'Tis a good compromise. If there are no objections/concerns, I would like to move forward with it. Thanks, Joe! -Cameron -------------- next part -------------- An HTML attachment was
2004 Aug 04
4
Using answering machine in my phone
Is this supported? I have a very simple setup where I have 2 X100P cards and a TDM10B. The TDM10B is connected to a phone that has a digital answering machine built into it. If I make an inbound call on either X100P interface it gets transferred to the TDM10B interface. If I let it ring the TDM10B interface answers the call and the greeting message of the answering machine starts. Then shortly
2013 Jul 27
0
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Daniel, Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". Thanks, Nadav Sent from my iPhone > On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should
2013 Jul 27
1
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is
2015 Dec 11
2
Optimization of successive constant stores
Hmm... found an interesting issue: Given: %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0 store i8 1, i8* %2, align 8 %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1 store i8 2, i8* %3, align 1 %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2 store i8 3, i8* %4, align 2 %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
2012 Dec 25
2
[LLVMdev] 3.2 version string
LLVM 3.2 came as a nice Christmas present. Just one minor question: I noticed that the version string (used to name the shared library etc.) is "3.2svn" instead of the expected "3.2". This violates our build system's expectations of what things are called. It would be easy for us to change, but I want to make sure this is not a mistake. I am fairly certain I downloaded the
2013 Apr 06
3
[LLVMdev] Integer divide by zero
I'm also not fully happy with LLVM's behavior here. There is another undefined case too, which is the minimum integer divided by -1. In Julia I can get "random" answers by doing: julia> sdiv_int(-9223372036854775808, -1) 87106304 julia> sdiv_int(-9223372036854775808, -1) 87108096 In other contexts where the arguments are not constant, this typically gives an FPE trap.
2015 Dec 11
2
Optimization of successive constant stores
Consider the following: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %UodStructType = type { i8, i8, i8, i8, i32, i8* } define void @test(%UodStructType*) { %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0 store i8 1, i8* %2, align 8 %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
2012 Mar 01
2
Julia
My purpose in mentioning the Julia language (julialang.org) here is not to start a flame war. I find it to be a very interesting development and others who read this list may want to read about it too. It is still very much early days for this language - about the same stage as R was in 1995 or 1996 when only a few people knew about it - but Julia holds much potential. There is a thread about
2013 Apr 06
0
[LLVMdev] Integer divide by zero
On Saturday, April 6, 2013, Jeff Bezanson wrote: > > Presumably the optimizer benefits from taking advantage of the > undefined behavior, but to get a consistent result you need to check > for both zero and this case, which is an awful lot of checks. Yes they > will branch predict well, but this still can't be good, for code size > if nothing else. How much performance can
2013 Apr 07
2
[LLVMdev] Integer divide by zero
Hi Cameron, On 06/04/13 22:52, Cameron McInally wrote: > On Sat, Apr 6, 2013 at 3:22 PM, Jeff Bezanson <jeff.bezanson at gmail.com > <mailto:jeff.bezanson at gmail.com>> wrote: > > A division intrinsic with defined behavior on all arguments would be > awesome! Thanks for considering this. > > > 'Tis a good compromise. If there are no