thr3ads.net - llvm dev - [llvm-dev] My own codegen is 2.5x slower than llc? [May 2018]

If this information is useful, please help other people find it:
Share via:

David Jones via llvm-dev

2018-May-29 12:02 UTC

[llvm-dev] My own codegen is 2.5x slower than llc?

My back-end code generator uses LLVM 5.0.1 to optimize and generate code
for x86_64.

If I run it on a given sample of IR, it takes almost 5 minutes to generate
object code.  95%+ of this time is spent in MergeConsecutiveStores().  (One
function has a basic block with 14000 instructions, which is a pathological
case for MergeConsecutiveStores.)

If, instead, I dump out the LLVM IR, and manually run both opt and llc on
it with -O2, the whole affair takes only 2 minutes.

I am using a dynamically linked LLVM library.  I have verified using GDB
that both my code generator and llc are invoking the shared library (i.e.
the exact same code) so I would not expect to see a 2.5x performance
difference.

What could explain this?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180529/7a422d66/attachment.html>

Bruce Hoult via llvm-dev

2018-May-29 12:15 UTC

head link

[llvm-dev] My own codegen is 2.5x slower than llc?

What percentage of performance advantage do you expect to get from having a
basic block with 14000 instructions, rather than breaking it up a bit?

On Wed, May 30, 2018 at 12:02 AM, David Jones via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> My back-end code generator uses LLVM 5.0.1 to optimize and generate code
> for x86_64.
>
> If I run it on a given sample of IR, it takes almost 5 minutes to generate
> object code.  95%+ of this time is spent in MergeConsecutiveStores().  (One
> function has a basic block with 14000 instructions, which is a pathological
> case for MergeConsecutiveStores.)
>
> If, instead, I dump out the LLVM IR, and manually run both opt and llc on
> it with -O2, the whole affair takes only 2 minutes.
>
> I am using a dynamically linked LLVM library.  I have verified using GDB
> that both my code generator and llc are invoking the shared library (i.e.
> the exact same code) so I would not expect to see a 2.5x performance
> difference.
>
> What could explain this?
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180530/a8ef1121/attachment.html>

David Jones via llvm-dev

2018-May-29 12:30 UTC

head link

[llvm-dev] My own codegen is 2.5x slower than llc?

I don't. Unfortunately, I must compile the code I am given.

I do actually have mitigations in some places that break up long basic
blocks, but they are not universally applicable.  Interestingly, 14000
doesn't seem that big to me.  One of my mitigations was put in to break up
blocks with 2 million instructions.

If you generate a SystemVerilog UVM register model for a large SOC and it
has 70K registers in it, then your build() method will have 2 million
instructions in it.  That is the type of code I am given.


On Tue, May 29, 2018 at 8:15 AM, Bruce Hoult <brucehoult at sifive.com>
wrote:
> What percentage of performance advantage do you expect to get from having
> a basic block with 14000 instructions, rather than breaking it up a bit?
>
> On Wed, May 30, 2018 at 12:02 AM, David Jones via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> My back-end code generator uses LLVM 5.0.1 to optimize and generate
code
>> for x86_64.
>>
>> If I run it on a given sample of IR, it takes almost 5 minutes to
>> generate object code.  95%+ of this time is spent in
>> MergeConsecutiveStores().  (One function has a basic block with 14000
>> instructions, which is a pathological case for MergeConsecutiveStores.)
>>
>> If, instead, I dump out the LLVM IR, and manually run both opt and llc
on
>> it with -O2, the whole affair takes only 2 minutes.
>>
>> I am using a dynamically linked LLVM library.  I have verified using
GDB
>> that both my code generator and llc are invoking the shared library
(i.e.
>> the exact same code) so I would not expect to see a 2.5x performance
>> difference.
>>
>> What could explain this?
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180529/07eab3f9/attachment.html>

Dean Michael Berris via llvm-dev

2018-May-29 12:41 UTC

head link

[llvm-dev] My own codegen is 2.5x slower than llc?

> On 29 May 2018, at 22:02, David Jones via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> My back-end code generator uses LLVM 5.0.1 to optimize and generate code
for x86_64.
> 
> If I run it on a given sample of IR, it takes almost 5 minutes to generate
object code.  95%+ of this time is spent in MergeConsecutiveStores().  (One
function has a basic block with 14000 instructions, which is a pathological case
for MergeConsecutiveStores.)
> 
> If, instead, I dump out the LLVM IR, and manually run both opt and llc on
it with -O2, the whole affair takes only 2 minutes.
> 
> I am using a dynamically linked LLVM library.  I have verified using GDB
that both my code generator and llc are invoking the shared library (i.e. the
exact same code) so I would not expect to see a 2.5x performance difference.
> 
> What could explain this?
> 
Without any more additional information, I would think this has something to do
with the locality of the memory when you’re using the LLVM API to generate the
basic blocks and instructions versus when you’re reading the data in from files
(as what llc and opt would be doing). I suspect without seeing the way you’re
constructing the basic blocks and instructions, that you’re doing it one
instruction at a time and relying on vectors/lists growing one element at a time
(instead of using an object pool which already pre-allocates elements that are
colocated in the same page of memory).

There’s a lot of factors that will potentially lead to why you’re seeing a
marked performance difference here. If you’re able, you might want to build your
code-generator with XRay and see whether it points out where your latency is
coming from.

https://llvm.org/docs/XRayExample.html

-- Dean

Nirav Davé via llvm-dev

2018-May-29 16:25 UTC

head link

[llvm-dev] My own codegen is 2.5x slower than llc?

David,

I have no particular insight into the performance variance, but I
recently landed a patch that should remove the vast majority of
pathological cases in MergeConsecutiveStores (r332490). If you can
land that locally you'd likely sidestep this issue entirely. Note,
you'll
probably want to catch the previous associated cleanups as well, i.e.,
r328233, r332489.

If it's still too long, feel free to send me a test case and I'll take a
look.

-Nirav

On Tue, May 29, 2018 at 8:41 AM Dean Michael Berris via llvm-dev <
llvm-dev at lists.llvm.org> wrote:>
>
>
> > On 29 May 2018, at 22:02, David Jones via llvm-dev <
llvm-dev at lists.llvm.org> wrote:> >
> > My back-end code generator uses LLVM 5.0.1 to optimize and generate
code for x86_64.> >
> > If I run it on a given sample of IR, it takes almost 5 minutes togenerate object code.  95%+ of this time is spent in
MergeConsecutiveStores().  (One function has a basic block with 14000
instructions, which is a pathological case for
MergeConsecutiveStores.)> >
> > If, instead, I dump out the LLVM IR, and manually run both opt and llc
on it with -O2, the whole affair takes only 2 minutes.> >
> > I am using a dynamically linked LLVM library.  I have verified usingGDB that both my code generator and llc are invoking the shared library
(i.e. the exact same code) so I would not expect to see a 2.5x performance
difference.> >
> > What could explain this?
> >
>
> Without any more additional information, I would think this has somethingto do with the locality of the memory when you’re using the LLVM API to
generate the basic blocks and instructions versus when you’re reading the
data in from files (as what llc and opt would be doing). I suspect without
seeing the way you’re constructing the basic blocks and instructions, that
you’re doing it one instruction at a time and relying on vectors/lists
growing one element at a time (instead of using an object pool which
already pre-allocates elements that are colocated in the same page of
memory).>
> There’s a lot of factors that will potentially lead to why you’re seeinga marked performance difference here. If you’re able, you might want to
build your code-generator with XRay and see whether it points out where
your latency is coming from.>
> https://llvm.org/docs/XRayExample.html
>
> -- Dean
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180529/063a716a/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - May 2018 - My own codegen is 2.5x slower than llc?

[llvm-dev] My own codegen is 2.5x slower than llc?

[llvm-dev] My own codegen is 2.5x slower than llc?

[llvm-dev] My own codegen is 2.5x slower than llc?

[llvm-dev] My own codegen is 2.5x slower than llc?

[llvm-dev] My own codegen is 2.5x slower than llc?

Reasonably Related Threads