thr3ads.net - llvm dev - [llvm-dev] MCJit Runtine Performance [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Morten Brodersen via llvm-dev

2016-Feb-05 03:39 UTC

[llvm-dev] MCJit Runtine Performance

Hi Lang,

 > MCJIT does not compile lazily (though it sounds like that's not an 
issue here?)

That is not an issue here since the code JIT's once (a few secs) and 
then run the generated machine code for hours.

 > Morten - Can you share any test cases that demonstrate the slowdown. 
I'd love to take a look at this.

The code is massive so not practical. However I will try and extract an 
example function that demonstrates the difference (as per previous email).

On 05/02/16 11:52, Lang Hames wrote:> These are some pretty extreme slowdowns. The legacy JIT shared the 
> code generator with MCJIT, and as far as I'm aware there were really 
> only three main differences:
>
> 1) The legacy JIT used a custom instruction encoder, whereas MCJIT 
> uses MC.
> 2) (Related to 1) MCJIT needs to perform runtime linking of the object 
> files produced by MC.
> 3) MCJIT does not compile lazily (though it sounds like that's not an 
> issue here?)
>
> Keno - did you ever look at the codegen pipeline construction for the 
> legacy JIT vs MCJIT? Are we choosing different passes?
>
> Morten - Can you share any test cases that demonstrate the slowdown. 
> I'd love to take a look at this.
>
> Cheers,
> Lang.
>
> On Thu, Feb 4, 2016 at 4:16 PM, Hal Finkel via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     ----- Original Message -----
>     > From: "Keno Fischer via llvm-dev" <llvm-dev at
lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>>
>     > To: "Morten Brodersen" <Morten.Brodersen at
constrainttec.com
>     <mailto:Morten.Brodersen at constrainttec.com>>
>     > Cc: "llvm-dev" <llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>>
>     > Sent: Thursday, February 4, 2016 6:05:29 PM
>     > Subject: Re: [llvm-dev] MCJit Runtine Performance
>     >
>     >
>     >
>     > Yes, unfortunately, this is very much known. Over in the julia
>     > project, we've recently gone through this and taken the hit
(after
>     > doing some work to fix the very extreme corner cases that we were
>     > hitting). We're not entirely sure why the slowdown is this
>     > noticable, but at least in our case, profiling didn't reveal
any
>     > remaining low hanging fruits that are responsible. One thing you
can
>     > potentially try if you haven't yet is to enable fast ISel and
see if
>     > that brings you closer to the old runtimes.
>
>     And maybe the register allocator? Are you using the greedy one or
>     the linear one? Are there any other MI-level optimizations running?
>
>      -Hal
>
>     >
>     >
>     > On Thu, Feb 4, 2016 at 7:00 PM, Morten Brodersen via llvm-dev <
>     > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org> > wrote:
>     >
>     >
>     > Hi All,
>     >
>     > We recently upgraded a number of applications from LLVM 3.5.2 (old
>     > JIT) to LLVM 3.7.1 (MCJit).
>     >
>     > We made the minimum changes needed for the switch (no changes to
the
>     > IR generated or the IR optimizations applied).
>     >
>     > The resulting code pass all tests (8000+).
>     >
>     > However the runtime performance dropped significantly: 30% to
>     40% for
>     > all applications.
>     >
>     > The applications I am talking about optimize airline rosters and
>     > pairings. LLVM is used for compiling high level business rules to
>     > efficient machine code.
>     >
>     > A typical optimization run takes 6 to 8 hours. So a 30% to 40%
>     > reduction in speed has real impact (=> we can't upgrade
from 3.5.2).
>     >
>     > We have triple checked and reviewed the changes we made from old
JIT
>     > to MCJIt. We also tried different ways to optimize the IR.
>     >
>     > However all results indicate that the performance drop happens
>     in the
>     > (black box) IR to machine code stage.
>     >
>     > So my question is if the runtime performance reduction is
>     > known/expected for MCJit vs. old JIT? Or if we might be doing
>     > something wrong?
>     >
>     > If you need more information, in order to understand the issue,
>     > please tell us so that we can provide you with more details.
>     >
>     > Thanks
>     > Morten
>     >
>     > _______________________________________________
>     > LLVM Developers mailing list
>     > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>     > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     >
>     >
>     > _______________________________________________
>     > LLVM Developers mailing list
>     > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>     > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     >
>
>     --
>     Hal Finkel
>     Assistant Computational Scientist
>     Leadership Computing Facility
>     Argonne National Laboratory
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/a37dc981/attachment.html>

Benoit Belley via llvm-dev

2016-Feb-05 14:34 UTC

head link

[llvm-dev] MCJit Runtine Performance

Hi Morten,

We have experienced a similar slow down in execution performance when upgrading
to LLVM 3.7. The issue for us was that our front-end was emitting alloca
instruction in non-entry basic blocks. After fixing the generation of LLVM IR in
our front-end, we got similar or better performant with LLVM 3.7. See:

http://llvm.org/docs/Frontend/PerformanceTips.html#use-of-allocas

Maybe, this is something that you can double check.

Here’s a detailed explanation of the cause of the slowdown:


With LLVM 3.7, We have noticed that the MemCpy pass will attempt to copy LLVM
struct using moves that are as large as possible. For example, a struct of 3
floats is copied using a 64-bit and a 32-bit move. It is therefore important
that such a struct be aligned on 8-byte boundary, not just 4 bytes! Else, one
runs the risk of triggering store-forwarding failure pipelining stalls (which we
did encountered really badly with one of our internal performance benchmark). It
is therefore important that the SROA pass correctly eliminates the load/store to
the alloca memory regions.

Benoit


Benoit Belley
Sr Principal Developer
M&E-Product Development Group

MAIN +1 514 393 1616
DIRECT +1 438 448 6304
FAX +1 514 393 0110

Twitter<http://twitter.com/autodesk>
Facebook<https://www.facebook.com/Autodesk>

Autodesk, Inc.
10 Duke Street
Montreal, Quebec, Canada H3C 2L7
www.autodesk.com<http://www.autodesk.com/>

[Description: Email_Signature_Logobar]


From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> on behalf of Morten Brodersen via llvm-dev
<llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Reply-To: Morten Brodersen <Morten.Brodersen at
constrainttec.com<mailto:Morten.Brodersen at constrainttec.com>>
Date: jeudi 4 février 2016 22:39
To: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: Re: [llvm-dev] MCJit Runtine Performance

Hi Lang,
> MCJIT does not compile lazily (though it sounds like that's not an
issue here?)
That is not an issue here since the code JIT's once (a few secs) and then
run the generated machine code for hours.
> Morten - Can you share any test cases that demonstrate the slowdown.
I'd love to take a look at this.
The code is massive so not practical. However I will try and extract an example
function that demonstrates the difference (as per previous email).

On 05/02/16 11:52, Lang Hames wrote:
These are some pretty extreme slowdowns. The legacy JIT shared the code
generator with MCJIT, and as far as I'm aware there were really only three
main differences:

1) The legacy JIT used a custom instruction encoder, whereas MCJIT uses MC.
2) (Related to 1) MCJIT needs to perform runtime linking of the object files
produced by MC.
3) MCJIT does not compile lazily (though it sounds like that's not an issue
here?)

Keno - did you ever look at the codegen pipeline construction for the legacy JIT
vs MCJIT? Are we choosing different passes?

Morten - Can you share any test cases that demonstrate the slowdown. I'd
love to take a look at this.

Cheers,
Lang.

On Thu, Feb 4, 2016 at 4:16 PM, Hal Finkel via llvm-dev <<mailto:llvm-dev
at lists.llvm.org>llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>> wrote:
----- Original Message -----> From: "Keno Fischer via llvm-dev" <<mailto:llvm-dev at
lists.llvm.org>llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
> To: "Morten Brodersen" <Morten.Brodersen at
constrainttec.com<mailto:Morten.Brodersen at constrainttec.com>>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org<mailto:llvm-dev
at lists.llvm.org>>
> Sent: Thursday, February 4, 2016 6:05:29 PM
> Subject: Re: [llvm-dev] MCJit Runtine Performance
>
>
>
> Yes, unfortunately, this is very much known. Over in the julia
> project, we've recently gone through this and taken the hit (after
> doing some work to fix the very extreme corner cases that we were
> hitting). We're not entirely sure why the slowdown is this
> noticable, but at least in our case, profiling didn't reveal any
> remaining low hanging fruits that are responsible. One thing you can
> potentially try if you haven't yet is to enable fast ISel and see if
> that brings you closer to the old runtimes.
And maybe the register allocator? Are you using the greedy one or the linear
one? Are there any other MI-level optimizations running?

 -Hal
>
>
> On Thu, Feb 4, 2016 at 7:00 PM, Morten Brodersen via llvm-dev <
> llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> >
wrote:
>
>
> Hi All,
>
> We recently upgraded a number of applications from LLVM 3.5.2 (old
> JIT) to LLVM 3.7.1 (MCJit).
>
> We made the minimum changes needed for the switch (no changes to the
> IR generated or the IR optimizations applied).
>
> The resulting code pass all tests (8000+).
>
> However the runtime performance dropped significantly: 30% to 40% for
> all applications.
>
> The applications I am talking about optimize airline rosters and
> pairings. LLVM is used for compiling high level business rules to
> efficient machine code.
>
> A typical optimization run takes 6 to 8 hours. So a 30% to 40%
> reduction in speed has real impact (=> we can't upgrade from 3.5.2).
>
> We have triple checked and reviewed the changes we made from old JIT
> to MCJIt. We also tried different ways to optimize the IR.
>
> However all results indicate that the performance drop happens in the
> (black box) IR to machine code stage.
>
> So my question is if the runtime performance reduction is
> known/expected for MCJit vs. old JIT? Or if we might be doing
> something wrong?
>
> If you need more information, in order to understand the issue,
> please tell us so that we can provide you with more details.
>
> Thanks
> Morten
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/ef2af7f2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 350F40DB-4457-4455-A632-0DF05738AF15[3].png
Type: image/png
Size: 4316 bytes
Desc: 350F40DB-4457-4455-A632-0DF05738AF15[3].png
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/ef2af7f2/attachment-0001.png>

Morten Brodersen via llvm-dev

2016-Feb-07 23:58 UTC

head link

[llvm-dev] MCJit Runtine Performance

Thanks for this Benoit. I will investigate.

Cheers
Morten

On 06/02/16 01:34, Benoit Belley wrote:> Hi Morten,
>
> We have experienced a similar slow down in execution performance when 
> upgrading to LLVM 3.7. The issue for us was that our front-end was 
> emitting alloca instruction in non-entry basic blocks. After fixing 
> the generation of LLVM IR in our front-end, we got similar or better 
> performant with LLVM 3.7. See:
>
> http://llvm.org/docs/Frontend/PerformanceTips.html#use-of-allocas
>
> Maybe, this is something that you can double check.
>
> Here’s a detailed explanation of the cause of the slowdown:
>
>     With LLVM 3.7, We have noticed that the MemCpy pass will attempt to
copy LLVM struct using moves that are as large as possible. For example, a
struct of 3 floats is copied using a 64-bit and a 32-bit move. It is therefore
important that such a struct be aligned on 8-byte boundary, not just 4 bytes!
Else, one runs the risk of triggering store-forwarding failure pipelining stalls
(which we did encountered really badly with one of our internal performance
benchmark). It is therefore important that the SROA pass correctly eliminates
the load/store to the alloca memory regions.
>
> Benoit
>
>
> *Benoit Belley*
>
> Sr Principal Developer
>
> M&E-Product Development Group
>
> *MAIN* +1 514 393 1616
>
> *DIRECT* +1 438 448 6304
>
> *FAX* +1 514 393 0110
>
> Twitter <http://twitter.com/autodesk>
>
> Facebook <https://www.facebook.com/Autodesk>
>
> *Autodesk, Inc.*
>
> 10 Duke Street
>
> Montreal, Quebec, Canada H3C 2L7
>
> www.autodesk.com <http://www.autodesk.com/>
>
> Description: Email_Signature_Logobar
>
>
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org 
> <mailto:llvm-dev-bounces at lists.llvm.org>> on behalf of Morten 
> Brodersen via llvm-dev <llvm-dev at lists.llvm.org 
> <mailto:llvm-dev at lists.llvm.org>>
> Reply-To: Morten Brodersen <Morten.Brodersen at constrainttec.com 
> <mailto:Morten.Brodersen at constrainttec.com>>
> Date: jeudi 4 février 2016 22:39
> To: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>
> Subject: Re: [llvm-dev] MCJit Runtine Performance
>
>     Hi Lang,
>
>     > MCJIT does not compile lazily (though it sounds like that's
not
>     an issue here?)
>
>     That is not an issue here since the code JIT's once (a few secs)
>     and then run the generated machine code for hours.
>
>     > Morten - Can you share any test cases that demonstrate the
>     slowdown. I'd love to take a look at this.
>
>     The code is massive so not practical. However I will try and
>     extract an example function that demonstrates the difference (as
>     per previous email).
>
>     On 05/02/16 11:52, Lang Hames wrote:
>>     These are some pretty extreme slowdowns. The legacy JIT shared
>>     the code generator with MCJIT, and as far as I'm aware there
were
>>     really only three main differences:
>>
>>     1) The legacy JIT used a custom instruction encoder, whereas
>>     MCJIT uses MC.
>>     2) (Related to 1) MCJIT needs to perform runtime linking of the
>>     object files produced by MC.
>>     3) MCJIT does not compile lazily (though it sounds like that's
>>     not an issue here?)
>>
>>     Keno - did you ever look at the codegen pipeline construction for
>>     the legacy JIT vs MCJIT? Are we choosing different passes?
>>
>>     Morten - Can you share any test cases that demonstrate the
>>     slowdown. I'd love to take a look at this.
>>
>>     Cheers,
>>     Lang.
>>
>>     On Thu, Feb 4, 2016 at 4:16 PM, Hal Finkel via llvm-dev
>>     <llvm-dev at lists.llvm.org> wrote:
>>
>>         ----- Original Message -----
>>         > From: "Keno Fischer via llvm-dev" <llvm-dev
at lists.llvm.org>
>>         > To: "Morten Brodersen" <Morten.Brodersen at
constrainttec.com
>>         <mailto:Morten.Brodersen at constrainttec.com>>
>>         > Cc: "llvm-dev" <llvm-dev at lists.llvm.org
>>         <mailto:llvm-dev at lists.llvm.org>>
>>         > Sent: Thursday, February 4, 2016 6:05:29 PM
>>         > Subject: Re: [llvm-dev] MCJit Runtine Performance
>>         >
>>         >
>>         >
>>         > Yes, unfortunately, this is very much known. Over in the
julia
>>         > project, we've recently gone through this and taken
the hit
>>         (after
>>         > doing some work to fix the very extreme corner cases that
>>         we were
>>         > hitting). We're not entirely sure why the slowdown is
this
>>         > noticable, but at least in our case, profiling didn't
>>         reveal any
>>         > remaining low hanging fruits that are responsible. One
>>         thing you can
>>         > potentially try if you haven't yet is to enable fast
ISel
>>         and see if
>>         > that brings you closer to the old runtimes.
>>
>>         And maybe the register allocator? Are you using the greedy
>>         one or the linear one? Are there any other MI-level
>>         optimizations running?
>>
>>          -Hal
>>
>>         >
>>         >
>>         > On Thu, Feb 4, 2016 at 7:00 PM, Morten Brodersen via
llvm-dev <
>>         > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org> >
>>         wrote:
>>         >
>>         >
>>         > Hi All,
>>         >
>>         > We recently upgraded a number of applications from LLVM
>>         3.5.2 (old
>>         > JIT) to LLVM 3.7.1 (MCJit).
>>         >
>>         > We made the minimum changes needed for the switch (no
>>         changes to the
>>         > IR generated or the IR optimizations applied).
>>         >
>>         > The resulting code pass all tests (8000+).
>>         >
>>         > However the runtime performance dropped significantly: 30%
>>         to 40% for
>>         > all applications.
>>         >
>>         > The applications I am talking about optimize airline
>>         rosters and
>>         > pairings. LLVM is used for compiling high level business
>>         rules to
>>         > efficient machine code.
>>         >
>>         > A typical optimization run takes 6 to 8 hours. So a 30% to
40%
>>         > reduction in speed has real impact (=> we can't
upgrade
>>         from 3.5.2).
>>         >
>>         > We have triple checked and reviewed the changes we made
>>         from old JIT
>>         > to MCJIt. We also tried different ways to optimize the IR.
>>         >
>>         > However all results indicate that the performance drop
>>         happens in the
>>         > (black box) IR to machine code stage.
>>         >
>>         > So my question is if the runtime performance reduction is
>>         > known/expected for MCJit vs. old JIT? Or if we might be
doing
>>         > something wrong?
>>         >
>>         > If you need more information, in order to understand the
issue,
>>         > please tell us so that we can provide you with more
details.
>>         >
>>         > Thanks
>>         > Morten
>>         >
>>         > _______________________________________________
>>         > LLVM Developers mailing list
>>         > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>         > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>         >
>>         >
>>         > _______________________________________________
>>         > LLVM Developers mailing list
>>         > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>         > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>         >
>>
>>         --
>>         Hal Finkel
>>         Assistant Computational Scientist
>>         Leadership Computing Facility
>>         Argonne National Laboratory
>>         _______________________________________________
>>         LLVM Developers mailing list
>>         llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160208/3cfafc69/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 4316 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160208/3cfafc69/attachment.png>

llvm dev - Feb 2016 - MCJit Runtine Performance

[llvm-dev] MCJit Runtine Performance

[llvm-dev] MCJit Runtine Performance

[llvm-dev] MCJit Runtine Performance