Displaying 20 results from an estimated 5000 matches similar to: "My own codegen is 2.5x slower than llc?"
2018 May 29
0
My own codegen is 2.5x slower than llc?
> On 29 May 2018, at 22:02, David Jones via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> My back-end code generator uses LLVM 5.0.1 to optimize and generate code for x86_64.
>
> If I run it on a given sample of IR, it takes almost 5 minutes to generate object code. 95%+ of this time is spent in MergeConsecutiveStores(). (One function has a basic block with 14000
2018 May 29
0
My own codegen is 2.5x slower than llc?
What percentage of performance advantage do you expect to get from having a
basic block with 14000 instructions, rather than breaking it up a bit?
On Wed, May 30, 2018 at 12:02 AM, David Jones via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> My back-end code generator uses LLVM 5.0.1 to optimize and generate code
> for x86_64.
>
> If I run it on a given sample of IR, it
2015 May 12
2
[LLVMdev] i1 types in MergeConsecutiveStores
Hello LLVM,
In DAGCombiner.cpp, MergeConsecutiveStores uses
int64_t ElementSizeBytes = MemVT.getSizeInBits()/8;
https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L10669
which is broken for i1 types where getSizeInBits() == 1. My
out-of-tree target hits this case and eventually LLVM asserts in
Type.cpp.
Is there some reason MergeConsecutiveStores should
2020 Mar 19
2
large slowdown in DAGCombiner::MergeConsecutiveStores
Hello all,
We are seeing a large compiler performance regression in moving from LLVM
6.0.1 to 8.0.1. We have a long function (~50000 instructions) that used to
compile in about a minute but now takes at least an hour. All the time is
in MergeConsecutiveStores, I believe due to super-linear behavior in
analyzing very long chains of stores. For example, this change makes the
problem go away:
```
2013 Nov 22
2
[LLVMdev] DAGCompiler::MergeConsecutiveStores Question
In DAGCombiner::MergeConsecutiveStores, there is this check:
if (Index->getAlignment() != St->getAlignment())
break;
Apparently this check ensures that all of the stores have the same
alignment. Why is that necessary? This seems very overly restrictive
to me.
-David
2013 Nov 22
0
[LLVMdev] DAGCompiler::MergeConsecutiveStores Question
Hi David,
You are right. This check is overly restrictive. We can replace this check with code that uses the alignment of the first store.
Thanks,
Nadav
On Nov 22, 2013, at 9:31 AM, dag at cray.com wrote:
> In DAGCombiner::MergeConsecutiveStores, there is this check:
>
> if (Index->getAlignment() != St->getAlignment())
> break;
>
> Apparently this check
2015 Feb 13
2
[LLVMdev] DAGCombiner::MergeConsecutiveStores
Hi,
I'm quite puzzled by a little bit of code in the DAGCombiner where it
merges loads in MergeConsecutiveStores.
Two 16bit loads have been merged to one 32bit load, and two 16bit stores
have been combined to one 32bit store.
And then the code goes like this:
// Replace one of the loads with the new load.
LoadSDNode *Ld = cast<LoadSDNode>(LoadNodes[0].MemNode);
2013 Jul 27
2
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hey Nadav,
I'd humbly suggest that rather than use 3 directly, you should add a shared
constant between these two passes, so when one changes, the other doesn't
need to be updated. It would also ensure this bit of info about what needs
to be updated isn't only contained in the comments..
On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Author:
2018 Aug 27
2
Testing LLVM XRay
Hi All,
I am trying to test run clang XRay tool. I was following the steps at [1].
But the log file does not seem to get generated. According to the
instructions I used 'fxray-instrument' switch when compiling and then
specified 'patch_premain=true' at XRAY_OPTIONS. Is there anything else that
I need to do? I am on a trunk build of clang. Could that be it? I am on
clang version
2013 Jul 27
0
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Daniel,
Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair".
Thanks,
Nadav
Sent from my iPhone
> On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote:
>
> Hey Nadav,
> I'd humbly suggest that rather than use 3 directly, you should
2017 Feb 25
2
rL296252 Made large integer operation codegen significantly worse.
Hi,
I'm working with workload where the bottleneck is cryptographic signature
checks. Or, in compiler terms, most large integer operations.
Looking at rL296252 , the state of affair in that area degraded quite
significantly, see test/CodeGen/X86/i256-add.ll for instance.
Is there some kind of work in progress here and it is expected to get
better ? Because if not, that's a big problem.
2013 Jul 27
1
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Nadav,
Okay.
1. The comment doesn't make this clear. I would suggest, at a minimum,
updating it to mention pairs specifically, to avoid the issue in #2
2. If the day comes when the selectiondag store vectorizer handles more
than pairs, and does so better, is anyone really going to remember this
random 3 exists in the other vectorizer?
I would posit, based on experience, the answer is
2015 Dec 11
2
Optimization of successive constant stores
Hmm... found an interesting issue:
Given:
%2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
store i8 1, i8* %2, align 8
%3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
store i8 2, i8* %3, align 1
%4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
store i8 3, i8* %4, align 2
%5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
2018 May 09
1
How to add assembly instructions in CodeGen
Hi Dean,
I looked at XRay. I also thought on the similar line to add assembly
instructions as auxiliary template code and jump on to there. However, that
may still dis-align the stack. I have to think about it. But your XRay code
does give me the courage to think about this seriously.
Thank you for your help. I also figured out that we can access certain
CodeGen's feature right from the IR
2019 Jan 21
2
[X-ray] How to check successful instrumentation and generate call trace?
Hi all,
I want to test X-ray performance and compare it with other research tools, so I use Clang 7.0.0 to compile and instrument GNU binutils-2.3.1 with the following commands:
cd binutils-2.31/
mkdir build
cd build/
CC=$local/clang CXX=$local/clang++ CFLAGS=-fxray-instrument CXXFLAGS=-fxray-instrument ../configure --prefix=/home/zhangysh1995/local
make
Then I extract instrumentation map
2015 Jul 11
2
[LLVMdev] JIT compilation 2-3 times slower in latest LLVM snapshot
On 11 July 2015 at 13:14, Caldarale, Charles R
<Chuck.Caldarale at unisys.com> wrote:
>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
>> On Behalf Of Dibyendu Majumdar
>> Subject: [LLVMdev] JIT compilation 2-3 times slower in latest LLVM snapshot
>
>> I updated my clone of the LLVM github mirror today and I am finding
>> that
2015 Dec 11
2
Optimization of successive constant stores
Consider the following:
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%UodStructType = type { i8, i8, i8, i8, i32, i8* }
define void @test(%UodStructType*) {
%2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
store i8 1, i8* %2, align 8
%3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
2007 Aug 29
3
OT: distribution of a pathological random variate
Folks,
I wonder if anything could be said about the distribution of a random variate x, where
x = N(0,1)/N(0,1)
Obviously x is pathological because it could be 0/0. If we exclude this point, so the set is {x/(0/0)}, does x have a well defined distribution? or does it exist a distribution that approximates x.
(The case could be generalized of course to N(mu1, sigma1)/N(mu2, sigma2) and one
2007 Feb 09
3
alternative to rocks cluster
Hi
I am after a solution where i can easily kickstart many, read hundreds,
of boxes in a short time frame. Perhaps the way i install software is to
actually re-kix the box with a new software baseline - that type of idea.
I have looked at rocks and it looks good but it seems a little rigid in
that i need to be able to determine certain things like hostname etc as
in our env hostname
2017 Apr 24
3
Disable optimization on basic block level
How do you disable optimization for a function?
I ask because my application often compiles machine-generated code that
results in pathological structures that take a long time to optimize, for
little benefit. As an example, if a basic block has over a million
instructions in it, then DSE can take a while, as it is O(n^2) in the
number of instructions in the block. In my application (at least),