Hi Lang, > MCJIT does not compile lazily (though it sounds like that's not an issue here?) That is not an issue here since the code JIT's once (a few secs) and then run the generated machine code for hours. > Morten - Can you share any test cases that demonstrate the slowdown. I'd love to take a look at this. The code is massive so not practical. However I will try and extract an example function that demonstrates the difference (as per previous email). On 05/02/16 11:52, Lang Hames wrote:> These are some pretty extreme slowdowns. The legacy JIT shared the > code generator with MCJIT, and as far as I'm aware there were really > only three main differences: > > 1) The legacy JIT used a custom instruction encoder, whereas MCJIT > uses MC. > 2) (Related to 1) MCJIT needs to perform runtime linking of the object > files produced by MC. > 3) MCJIT does not compile lazily (though it sounds like that's not an > issue here?) > > Keno - did you ever look at the codegen pipeline construction for the > legacy JIT vs MCJIT? Are we choosing different passes? > > Morten - Can you share any test cases that demonstrate the slowdown. > I'd love to take a look at this. > > Cheers, > Lang. > > On Thu, Feb 4, 2016 at 4:16 PM, Hal Finkel via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > ----- Original Message ----- > > From: "Keno Fischer via llvm-dev" <llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org>> > > To: "Morten Brodersen" <Morten.Brodersen at constrainttec.com > <mailto:Morten.Brodersen at constrainttec.com>> > > Cc: "llvm-dev" <llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org>> > > Sent: Thursday, February 4, 2016 6:05:29 PM > > Subject: Re: [llvm-dev] MCJit Runtine Performance > > > > > > > > Yes, unfortunately, this is very much known. Over in the julia > > project, we've recently gone through this and taken the hit (after > > doing some work to fix the very extreme corner cases that we were > > hitting). We're not entirely sure why the slowdown is this > > noticable, but at least in our case, profiling didn't reveal any > > remaining low hanging fruits that are responsible. One thing you can > > potentially try if you haven't yet is to enable fast ISel and see if > > that brings you closer to the old runtimes. > > And maybe the register allocator? Are you using the greedy one or > the linear one? Are there any other MI-level optimizations running? > > -Hal > > > > > > > On Thu, Feb 4, 2016 at 7:00 PM, Morten Brodersen via llvm-dev < > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > wrote: > > > > > > Hi All, > > > > We recently upgraded a number of applications from LLVM 3.5.2 (old > > JIT) to LLVM 3.7.1 (MCJit). > > > > We made the minimum changes needed for the switch (no changes to the > > IR generated or the IR optimizations applied). > > > > The resulting code pass all tests (8000+). > > > > However the runtime performance dropped significantly: 30% to > 40% for > > all applications. > > > > The applications I am talking about optimize airline rosters and > > pairings. LLVM is used for compiling high level business rules to > > efficient machine code. > > > > A typical optimization run takes 6 to 8 hours. So a 30% to 40% > > reduction in speed has real impact (=> we can't upgrade from 3.5.2). > > > > We have triple checked and reviewed the changes we made from old JIT > > to MCJIt. We also tried different ways to optimize the IR. > > > > However all results indicate that the performance drop happens > in the > > (black box) IR to machine code stage. > > > > So my question is if the runtime performance reduction is > > known/expected for MCJit vs. old JIT? Or if we might be doing > > something wrong? > > > > If you need more information, in order to understand the issue, > > please tell us so that we can provide you with more details. > > > > Thanks > > Morten > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/a37dc981/attachment.html>
Hi Morten, We have experienced a similar slow down in execution performance when upgrading to LLVM 3.7. The issue for us was that our front-end was emitting alloca instruction in non-entry basic blocks. After fixing the generation of LLVM IR in our front-end, we got similar or better performant with LLVM 3.7. See: http://llvm.org/docs/Frontend/PerformanceTips.html#use-of-allocas Maybe, this is something that you can double check. Here’s a detailed explanation of the cause of the slowdown: With LLVM 3.7, We have noticed that the MemCpy pass will attempt to copy LLVM struct using moves that are as large as possible. For example, a struct of 3 floats is copied using a 64-bit and a 32-bit move. It is therefore important that such a struct be aligned on 8-byte boundary, not just 4 bytes! Else, one runs the risk of triggering store-forwarding failure pipelining stalls (which we did encountered really badly with one of our internal performance benchmark). It is therefore important that the SROA pass correctly eliminates the load/store to the alloca memory regions. Benoit Benoit Belley Sr Principal Developer M&E-Product Development Group MAIN +1 514 393 1616 DIRECT +1 438 448 6304 FAX +1 514 393 0110 Twitter<http://twitter.com/autodesk> Facebook<https://www.facebook.com/Autodesk> Autodesk, Inc. 10 Duke Street Montreal, Quebec, Canada H3C 2L7 www.autodesk.com<http://www.autodesk.com/> [Description: Email_Signature_Logobar] From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> on behalf of Morten Brodersen via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Reply-To: Morten Brodersen <Morten.Brodersen at constrainttec.com<mailto:Morten.Brodersen at constrainttec.com>> Date: jeudi 4 février 2016 22:39 To: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: Re: [llvm-dev] MCJit Runtine Performance Hi Lang,> MCJIT does not compile lazily (though it sounds like that's not an issue here?)That is not an issue here since the code JIT's once (a few secs) and then run the generated machine code for hours.> Morten - Can you share any test cases that demonstrate the slowdown. I'd love to take a look at this.The code is massive so not practical. However I will try and extract an example function that demonstrates the difference (as per previous email). On 05/02/16 11:52, Lang Hames wrote: These are some pretty extreme slowdowns. The legacy JIT shared the code generator with MCJIT, and as far as I'm aware there were really only three main differences: 1) The legacy JIT used a custom instruction encoder, whereas MCJIT uses MC. 2) (Related to 1) MCJIT needs to perform runtime linking of the object files produced by MC. 3) MCJIT does not compile lazily (though it sounds like that's not an issue here?) Keno - did you ever look at the codegen pipeline construction for the legacy JIT vs MCJIT? Are we choosing different passes? Morten - Can you share any test cases that demonstrate the slowdown. I'd love to take a look at this. Cheers, Lang. On Thu, Feb 4, 2016 at 4:16 PM, Hal Finkel via llvm-dev <<mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: ----- Original Message -----> From: "Keno Fischer via llvm-dev" <<mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> > To: "Morten Brodersen" <Morten.Brodersen at constrainttec.com<mailto:Morten.Brodersen at constrainttec.com>> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> > Sent: Thursday, February 4, 2016 6:05:29 PM > Subject: Re: [llvm-dev] MCJit Runtine Performance > > > > Yes, unfortunately, this is very much known. Over in the julia > project, we've recently gone through this and taken the hit (after > doing some work to fix the very extreme corner cases that we were > hitting). We're not entirely sure why the slowdown is this > noticable, but at least in our case, profiling didn't reveal any > remaining low hanging fruits that are responsible. One thing you can > potentially try if you haven't yet is to enable fast ISel and see if > that brings you closer to the old runtimes.And maybe the register allocator? Are you using the greedy one or the linear one? Are there any other MI-level optimizations running? -Hal> > > On Thu, Feb 4, 2016 at 7:00 PM, Morten Brodersen via llvm-dev < > llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> > wrote: > > > Hi All, > > We recently upgraded a number of applications from LLVM 3.5.2 (old > JIT) to LLVM 3.7.1 (MCJit). > > We made the minimum changes needed for the switch (no changes to the > IR generated or the IR optimizations applied). > > The resulting code pass all tests (8000+). > > However the runtime performance dropped significantly: 30% to 40% for > all applications. > > The applications I am talking about optimize airline rosters and > pairings. LLVM is used for compiling high level business rules to > efficient machine code. > > A typical optimization run takes 6 to 8 hours. So a 30% to 40% > reduction in speed has real impact (=> we can't upgrade from 3.5.2). > > We have triple checked and reviewed the changes we made from old JIT > to MCJIt. We also tried different ways to optimize the IR. > > However all results indicate that the performance drop happens in the > (black box) IR to machine code stage. > > So my question is if the runtime performance reduction is > known/expected for MCJit vs. old JIT? Or if we might be doing > something wrong? > > If you need more information, in order to understand the issue, > please tell us so that we can provide you with more details. > > Thanks > Morten > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/ef2af7f2/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 350F40DB-4457-4455-A632-0DF05738AF15[3].png Type: image/png Size: 4316 bytes Desc: 350F40DB-4457-4455-A632-0DF05738AF15[3].png URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/ef2af7f2/attachment-0001.png>
Thanks for this Benoit. I will investigate. Cheers Morten On 06/02/16 01:34, Benoit Belley wrote:> Hi Morten, > > We have experienced a similar slow down in execution performance when > upgrading to LLVM 3.7. The issue for us was that our front-end was > emitting alloca instruction in non-entry basic blocks. After fixing > the generation of LLVM IR in our front-end, we got similar or better > performant with LLVM 3.7. See: > > http://llvm.org/docs/Frontend/PerformanceTips.html#use-of-allocas > > Maybe, this is something that you can double check. > > Here’s a detailed explanation of the cause of the slowdown: > > With LLVM 3.7, We have noticed that the MemCpy pass will attempt to copy LLVM struct using moves that are as large as possible. For example, a struct of 3 floats is copied using a 64-bit and a 32-bit move. It is therefore important that such a struct be aligned on 8-byte boundary, not just 4 bytes! Else, one runs the risk of triggering store-forwarding failure pipelining stalls (which we did encountered really badly with one of our internal performance benchmark). It is therefore important that the SROA pass correctly eliminates the load/store to the alloca memory regions. > > Benoit > > > *Benoit Belley* > > Sr Principal Developer > > M&E-Product Development Group > > *MAIN* +1 514 393 1616 > > *DIRECT* +1 438 448 6304 > > *FAX* +1 514 393 0110 > > Twitter <http://twitter.com/autodesk> > > Facebook <https://www.facebook.com/Autodesk> > > *Autodesk, Inc.* > > 10 Duke Street > > Montreal, Quebec, Canada H3C 2L7 > > www.autodesk.com <http://www.autodesk.com/> > > Description: Email_Signature_Logobar > > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org > <mailto:llvm-dev-bounces at lists.llvm.org>> on behalf of Morten > Brodersen via llvm-dev <llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org>> > Reply-To: Morten Brodersen <Morten.Brodersen at constrainttec.com > <mailto:Morten.Brodersen at constrainttec.com>> > Date: jeudi 4 février 2016 22:39 > To: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> > Subject: Re: [llvm-dev] MCJit Runtine Performance > > Hi Lang, > > > MCJIT does not compile lazily (though it sounds like that's not > an issue here?) > > That is not an issue here since the code JIT's once (a few secs) > and then run the generated machine code for hours. > > > Morten - Can you share any test cases that demonstrate the > slowdown. I'd love to take a look at this. > > The code is massive so not practical. However I will try and > extract an example function that demonstrates the difference (as > per previous email). > > On 05/02/16 11:52, Lang Hames wrote: >> These are some pretty extreme slowdowns. The legacy JIT shared >> the code generator with MCJIT, and as far as I'm aware there were >> really only three main differences: >> >> 1) The legacy JIT used a custom instruction encoder, whereas >> MCJIT uses MC. >> 2) (Related to 1) MCJIT needs to perform runtime linking of the >> object files produced by MC. >> 3) MCJIT does not compile lazily (though it sounds like that's >> not an issue here?) >> >> Keno - did you ever look at the codegen pipeline construction for >> the legacy JIT vs MCJIT? Are we choosing different passes? >> >> Morten - Can you share any test cases that demonstrate the >> slowdown. I'd love to take a look at this. >> >> Cheers, >> Lang. >> >> On Thu, Feb 4, 2016 at 4:16 PM, Hal Finkel via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> >> ----- Original Message ----- >> > From: "Keno Fischer via llvm-dev" <llvm-dev at lists.llvm.org> >> > To: "Morten Brodersen" <Morten.Brodersen at constrainttec.com >> <mailto:Morten.Brodersen at constrainttec.com>> >> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org >> <mailto:llvm-dev at lists.llvm.org>> >> > Sent: Thursday, February 4, 2016 6:05:29 PM >> > Subject: Re: [llvm-dev] MCJit Runtine Performance >> > >> > >> > >> > Yes, unfortunately, this is very much known. Over in the julia >> > project, we've recently gone through this and taken the hit >> (after >> > doing some work to fix the very extreme corner cases that >> we were >> > hitting). We're not entirely sure why the slowdown is this >> > noticable, but at least in our case, profiling didn't >> reveal any >> > remaining low hanging fruits that are responsible. One >> thing you can >> > potentially try if you haven't yet is to enable fast ISel >> and see if >> > that brings you closer to the old runtimes. >> >> And maybe the register allocator? Are you using the greedy >> one or the linear one? Are there any other MI-level >> optimizations running? >> >> -Hal >> >> > >> > >> > On Thu, Feb 4, 2016 at 7:00 PM, Morten Brodersen via llvm-dev < >> > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > >> wrote: >> > >> > >> > Hi All, >> > >> > We recently upgraded a number of applications from LLVM >> 3.5.2 (old >> > JIT) to LLVM 3.7.1 (MCJit). >> > >> > We made the minimum changes needed for the switch (no >> changes to the >> > IR generated or the IR optimizations applied). >> > >> > The resulting code pass all tests (8000+). >> > >> > However the runtime performance dropped significantly: 30% >> to 40% for >> > all applications. >> > >> > The applications I am talking about optimize airline >> rosters and >> > pairings. LLVM is used for compiling high level business >> rules to >> > efficient machine code. >> > >> > A typical optimization run takes 6 to 8 hours. So a 30% to 40% >> > reduction in speed has real impact (=> we can't upgrade >> from 3.5.2). >> > >> > We have triple checked and reviewed the changes we made >> from old JIT >> > to MCJIt. We also tried different ways to optimize the IR. >> > >> > However all results indicate that the performance drop >> happens in the >> > (black box) IR to machine code stage. >> > >> > So my question is if the runtime performance reduction is >> > known/expected for MCJit vs. old JIT? Or if we might be doing >> > something wrong? >> > >> > If you need more information, in order to understand the issue, >> > please tell us so that we can provide you with more details. >> > >> > Thanks >> > Morten >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >> >> -- >> Hal Finkel >> Assistant Computational Scientist >> Leadership Computing Facility >> Argonne National Laboratory >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160208/3cfafc69/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4316 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160208/3cfafc69/attachment.png>