James Molloy via llvm-dev
2016-Jul-19 14:16 UTC
[llvm-dev] RFC: Enabling Module passes post-ISel
Hi all, I like all the ideas so far. Here are my thoughts: I think that fundamentally users of LLVM should be able to opt-in to more aggressive or intensive computation at compile time if they wish. Users' needs differ, and while a 33% increase in clang LTO is absolutely out of the question for some people, for those developing microcontrollers or HPC applications that may well be irrelevant. Either the volume of code expected is significantly smaller or they're happy to trade off compile time for expensive server time. That does not mean that we shouldn't strive for a solution that can be acceptable by all users. On the other hand making something opt-in makes it non-default, and that increases the testing surface. Tangentially I think that LLVM currently doesn't have the right tuning knobs to allow the user to select their desired tradeoff. We have one optimization flag -O{s,z,0,1,2,3} which encodes both optimization *goal* (a point on the pareto curve between size and speed) and amount of effort to expend at compile time achieving that goal. Anyway, that's besides the point. I like Justin's idea of removing IR from the backend to free up memory. I think it's a very long term project though, one that requires significant (re)design; alias analysis access in the backend would be completely broken and BasicAA among others depends on seeing the IR at query time. We'd need to work out a way of providing alias analysis with no IR present. I don't think that is feasible for the near future. So my suggestion is that we go with Matthias' idea - do the small amount of refactoring needed to allow MachineModulePasses on an opt-in basis. The knobs to enable that opt-in might need some more bikeshedding. Cheers, James On Tue, 19 Jul 2016 at 08:21 Justin Bogner <mail at justinbogner.com> wrote:> James Molloy via llvm-dev <llvm-dev at lists.llvm.org> writes: > > In LLVM it is currently not possible to write a Module-level pass (a > pass that > > modifies or analyzes multiple MachineFunctions) after DAG formation. This > > inhibits some optimizations[1] and is something I'd like to see changed. > > > > The problem is that in the backend, we emit a function at a time, from > DAG > > formation to object emission. So no two MachineFunctions ever exist at > any one > > time. Changing this necessarily means increasing memory usage. > > > > I've prototyped this change and have measured peak memory usage in the > worst > > case scenario - LTO'ing llc and clang. Without further ado: > > > > llvm-lto llc: before: 1.44GB maximum resident set size > > after: 1.68GB (+17%) > > > > llvm-lto clang: before: 2.48GB maximum resident set size > > after: 3.42GB (+33%) > > > > The increases are very large. This is worst-case (non-LTO builds would > see the > > peak usage of the backend masked by the peak of the midend) but still - > pretty > > big. Thoughts? Is this completely no-go? is this something that we *just > need* > > to do? Is crippling the backend architecture to keep memory down > justified? Is > > this something we could enable under an option? > > Personally, I think this price is too high. I think that if we want to > enable machine module passes (which we probably do) we need to turn > MachineFunction into more of a first class object that isn't just a > wrapper around IR. > > This can and should be designed to work something like Pete's solution, > where we get rid of the IR and just have machine level stuff in memory. > This way, we may still increase the memory usage here, but it should be > far less dramatic. > > You'll note that doing this also has tangential benefits - it should be > helpful for simplifying MIR and generally improving testability of the > backends. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160719/e214e7d4/attachment.html>
Renato Golin via llvm-dev
2016-Jul-19 14:59 UTC
[llvm-dev] RFC: Enabling Module passes post-ISel
On 19 July 2016 at 15:16, James Molloy via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I think that fundamentally users of LLVM should be able to opt-in to more > aggressive or intensive computation at compile time if they wish. Users' > needs differ, and while a 33% increase in clang LTO is absolutely out of the > question for some people, for those developing microcontrollers or HPC > applications that may well be irrelevant. Either the volume of code expected > is significantly smaller or they're happy to trade off compile time for > expensive server time. That does not mean that we shouldn't strive for a > solution that can be acceptable by all users. On the other hand making > something opt-in makes it non-default, and that increases the testing > surface.I agree with this reasoning in principle. LTO is already an example of that trade off. The problem is, as with LTO, how to make sure those options don't bit-rot, without duplicating testing infrastructure. I don't have a good solution for that. Another thing that this raises (and we should have done that for LTO) is monitoring not just compile and run times, but also compile and run time memory consumption. I believe some people are already doing it ad-hoc, but it would be good to have that, for example, in LNT.> So my suggestion is that we go with Matthias' idea - do the small amount of > refactoring needed to allow MachineModulePasses on an opt-in basis. The > knobs to enable that opt-in might need some more bikeshedding.We have prior art on that, so I think it should be mostly fine. Bikeshedding won't be necessary, not for the flags, I think. We just have to make sure this is not something that will encumber other changes in the area (and I'm being vague on purpose, as I can't think of anything). :) cheers, --renato
Matthias Braun via llvm-dev
2016-Jul-19 16:05 UTC
[llvm-dev] RFC: Enabling Module passes post-ISel
On the idea of deleting the IR after the AsmPrinting phase: It’s a good thing to do but won’t help with MachineModule memory consumption as at the point of the MachineModulePass the AsmPrinter hasn’t run yet but we have all the IR and MIR constructed. More notes below:> On Jul 19, 2016, at 7:16 AM, James Molloy via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi all, > > I like all the ideas so far. Here are my thoughts: > > I think that fundamentally users of LLVM should be able to opt-in to more aggressive or intensive computation at compile time if they wish. Users' needs differ, and while a 33% increase in clang LTO is absolutely out of the question for some people, for those developing microcontrollers or HPC applications that may well be irrelevant. Either the volume of code expected is significantly smaller or they're happy to trade off compile time for expensive server time. That does not mean that we shouldn't strive for a solution that can be acceptable by all users. On the other hand making something opt-in makes it non-default, and that increases the testing surface. > > Tangentially I think that LLVM currently doesn't have the right tuning knobs to allow the user to select their desired tradeoff. We have one optimization flag -O{s,z,0,1,2,3} which encodes both optimization *goal* (a point on the pareto curve between size and speed) and amount of effort to expend at compile time achieving that goal. Anyway, that's besides the point. > > I like Justin's idea of removing IR from the backend to free up memory. I think it's a very long term project though, one that requires significant (re)design; alias analysis access in the backend would be completely broken and BasicAA among others depends on seeing the IR at query time. We'd need to work out a way of providing alias analysis with no IR present. I don't think that is feasible for the near future.Yep, would be great to cut the IR ties but I fear it is a big project we won’t just do on the side...> > So my suggestion is that we go with Matthias' idea - do the small amount of refactoring needed to allow MachineModulePasses on an opt-in basis. The knobs to enable that opt-in might need some more bikeshedding.My current patches probably need some more work and I’d prefer it if someone with a actual use case pushes this forward (my prototype was to help out an intern project where I don’t know when it will get upstreamed). I’d be happy to review patches. As far as the patches go: - The general idea of moving the ownership of the MachineFunction from the MachineFunctionAnalysis to a Function->MachineFunction map in MachineModuleInfo worked nicely and is the way to go IMO. The API is simply a function in MachineModuleInfo that gives you the corresponding MachineFunction for a given IR Function. - Currently my patches do a map lookup in MachineFunction::runOnMachineFunction() to find the MachineFunction, we may want to bring MachineFunctionAnalysis back simply as a caching layer to get to the MachineFunction* faster. - My patches currently disable .mir file loading but that shouldn’t be too hard to fix. - Adding a pass after the AsmPrinter that deletes the MachineFunction will get us the current behavior where we only 1 MachineFunction is alive at the same time provided there is no ModulePass in the codegen pipeline. Given how simple the approach turned out in the end I am not worried about correctness problems/testing when we add this code. I do agree though that we need more testing infrastructure for compiletime / memory consumption! It is hard to notice an accidental 5-10% increase in memory usage or compiletime when you are not measuring for it. In this case it would be hard to notice the effects of accidentally adding a module pass into the codegen pipeline. - Matthias> > Cheers, > > James > > On Tue, 19 Jul 2016 at 08:21 Justin Bogner <mail at justinbogner.com <mailto:mail at justinbogner.com>> wrote: > James Molloy via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> writes: > > In LLVM it is currently not possible to write a Module-level pass (a pass that > > modifies or analyzes multiple MachineFunctions) after DAG formation. This > > inhibits some optimizations[1] and is something I'd like to see changed. > > > > The problem is that in the backend, we emit a function at a time, from DAG > > formation to object emission. So no two MachineFunctions ever exist at any one > > time. Changing this necessarily means increasing memory usage. > > > > I've prototyped this change and have measured peak memory usage in the worst > > case scenario - LTO'ing llc and clang. Without further ado: > > > > llvm-lto llc: before: 1.44GB maximum resident set size > > after: 1.68GB (+17%) > > > > llvm-lto clang: before: 2.48GB maximum resident set size > > after: 3.42GB (+33%) > > > > The increases are very large. This is worst-case (non-LTO builds would see the > > peak usage of the backend masked by the peak of the midend) but still - pretty > > big. Thoughts? Is this completely no-go? is this something that we *just need* > > to do? Is crippling the backend architecture to keep memory down justified? Is > > this something we could enable under an option? > > Personally, I think this price is too high. I think that if we want to > enable machine module passes (which we probably do) we need to turn > MachineFunction into more of a first class object that isn't just a > wrapper around IR. > > This can and should be designed to work something like Pete's solution, > where we get rid of the IR and just have machine level stuff in memory. > This way, we may still increase the memory usage here, but it should be > far less dramatic. > > You'll note that doing this also has tangential benefits - it should be > helpful for simplifying MIR and generally improving testability of the > backends. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160719/d84040d0/attachment.html>
Hal Finkel via llvm-dev
2016-Jul-27 13:41 UTC
[llvm-dev] RFC: Enabling Module passes post-ISel
----- Original Message -----> From: "James Molloy" <james at jamesmolloy.co.uk> > To: "Justin Bogner" <mail at justinbogner.com>, "James Molloy via > llvm-dev" <llvm-dev at lists.llvm.org> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Chandler Carruth" > <chandlerc at google.com>, "Matthias Braun" <matze at braunis.de>, "Pete > Cooper" <peter_cooper at apple.com> > Sent: Tuesday, July 19, 2016 9:16:02 AM > Subject: Re: [llvm-dev] RFC: Enabling Module passes post-ISel> Hi all,> I like all the ideas so far. Here are my thoughts:> I think that fundamentally users of LLVM should be able to opt-in to > more aggressive or intensive computation at compile time if they > wish. Users' needs differ, and while a 33% increase in clang LTO is > absolutely out of the question for some people, for those developing > microcontrollers or HPC applications that may well be irrelevant.I agree. A 33% increase is absorbable in many environments.> Either the volume of code expected is significantly smaller or > they're happy to trade off compile time for expensive server time. > That does not mean that we shouldn't strive for a solution that can > be acceptable by all users. On the other hand making something > opt-in makes it non-default, and that increases the testing surface.> Tangentially I think that LLVM currently doesn't have the right > tuning knobs to allow the user to select their desired tradeoff. We > have one optimization flag -O{s,z,0,1,2,3} which encodes both > optimization *goal* (a point on the pareto curve between size and > speed) and amount of effort to expend at compile time achieving that > goal. Anyway, that's besides the point.> I like Justin's idea of removing IR from the backend to free up > memory. I think it's a very long term project though, one that > requires significant (re)design; alias analysis access in the > backend would be completely broken and BasicAA among others depends > on seeing the IR at query time. We'd need to work out a way of > providing alias analysis with no IR present. I don't think that is > feasible for the near future.> So my suggestion is that we go with Matthias' idea - do the small > amount of refactoring needed to allow MachineModulePasses on an > opt-in basis. The knobs to enable that opt-in might need some more > bikeshedding.This makes sense to me. I expect that targets will be able to opt-in in some optimization-level-dependent fashion. -Hal> Cheers,> James> On Tue, 19 Jul 2016 at 08:21 Justin Bogner < mail at justinbogner.com > > wrote:> > James Molloy via llvm-dev < llvm-dev at lists.llvm.org > writes: > > > > In LLVM it is currently not possible to write a Module-level pass > > > (a pass that > > > > modifies or analyzes multiple MachineFunctions) after DAG > > > formation. This > > > > inhibits some optimizations[1] and is something I'd like to see > > > changed. > > > > > > > > The problem is that in the backend, we emit a function at a time, > > > from DAG > > > > formation to object emission. So no two MachineFunctions ever > > > exist > > > at any one > > > > time. Changing this necessarily means increasing memory usage. > > > > > > > > I've prototyped this change and have measured peak memory usage > > > in > > > the worst > > > > case scenario - LTO'ing llc and clang. Without further ado: > > > > > > > > llvm-lto llc: before: 1.44GB maximum resident set size > > > > after: 1.68GB (+17%) > > > > > > > > llvm-lto clang: before: 2.48GB maximum resident set size > > > > after: 3.42GB (+33%) > > > > > > > > The increases are very large. This is worst-case (non-LTO builds > > > would see the > > > > peak usage of the backend masked by the peak of the midend) but > > > still - pretty > > > > big. Thoughts? Is this completely no-go? is this something that > > > we > > > *just need* > > > > to do? Is crippling the backend architecture to keep memory down > > > justified? Is > > > > this something we could enable under an option? >> > Personally, I think this price is too high. I think that if we want > > to > > > enable machine module passes (which we probably do) we need to turn > > > MachineFunction into more of a first class object that isn't just a > > > wrapper around IR. >> > This can and should be designed to work something like Pete's > > solution, > > > where we get rid of the IR and just have machine level stuff in > > memory. > > > This way, we may still increase the memory usage here, but it > > should > > be > > > far less dramatic. >> > You'll note that doing this also has tangential benefits - it > > should > > be > > > helpful for simplifying MIR and generally improving testability of > > the > > > backends. >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160727/0269ce26/attachment-0001.html>
Matthias Braun via llvm-dev
2016-Aug-20 02:04 UTC
[llvm-dev] RFC: Enabling Module passes post-ISel
I submitted a cleaned up patch here: https://reviews.llvm.org/D23736 <https://reviews.llvm.org/D23736> - Matthias> On Jul 27, 2016, at 6:41 AM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > From: "James Molloy" <james at jamesmolloy.co.uk> > To: "Justin Bogner" <mail at justinbogner.com>, "James Molloy via llvm-dev" <llvm-dev at lists.llvm.org> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Chandler Carruth" <chandlerc at google.com>, "Matthias Braun" <matze at braunis.de>, "Pete Cooper" <peter_cooper at apple.com> > Sent: Tuesday, July 19, 2016 9:16:02 AM > Subject: Re: [llvm-dev] RFC: Enabling Module passes post-ISel > > Hi all, > > I like all the ideas so far. Here are my thoughts: > > I think that fundamentally users of LLVM should be able to opt-in to more aggressive or intensive computation at compile time if they wish. Users' needs differ, and while a 33% increase in clang LTO is absolutely out of the question for some people, for those developing microcontrollers or HPC applications that may well be irrelevant. > I agree. A 33% increase is absorbable in many environments. > Either the volume of code expected is significantly smaller or they're happy to trade off compile time for expensive server time. That does not mean that we shouldn't strive for a solution that can be acceptable by all users. On the other hand making something opt-in makes it non-default, and that increases the testing surface. > > Tangentially I think that LLVM currently doesn't have the right tuning knobs to allow the user to select their desired tradeoff. We have one optimization flag -O{s,z,0,1,2,3} which encodes both optimization *goal* (a point on the pareto curve between size and speed) and amount of effort to expend at compile time achieving that goal. Anyway, that's besides the point. > > I like Justin's idea of removing IR from the backend to free up memory. I think it's a very long term project though, one that requires significant (re)design; alias analysis access in the backend would be completely broken and BasicAA among others depends on seeing the IR at query time. We'd need to work out a way of providing alias analysis with no IR present. I don't think that is feasible for the near future. > > So my suggestion is that we go with Matthias' idea - do the small amount of refactoring needed to allow MachineModulePasses on an opt-in basis. The knobs to enable that opt-in might need some more bikeshedding. > This makes sense to me. I expect that targets will be able to opt-in in some optimization-level-dependent fashion. > > -Hal > > Cheers, > > James > > On Tue, 19 Jul 2016 at 08:21 Justin Bogner <mail at justinbogner.com <mailto:mail at justinbogner.com>> wrote: > James Molloy via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> writes: > > In LLVM it is currently not possible to write a Module-level pass (a pass that > > modifies or analyzes multiple MachineFunctions) after DAG formation. This > > inhibits some optimizations[1] and is something I'd like to see changed. > > > > The problem is that in the backend, we emit a function at a time, from DAG > > formation to object emission. So no two MachineFunctions ever exist at any one > > time. Changing this necessarily means increasing memory usage. > > > > I've prototyped this change and have measured peak memory usage in the worst > > case scenario - LTO'ing llc and clang. Without further ado: > > > > llvm-lto llc: before: 1.44GB maximum resident set size > > after: 1.68GB (+17%) > > > > llvm-lto clang: before: 2.48GB maximum resident set size > > after: 3.42GB (+33%) > > > > The increases are very large. This is worst-case (non-LTO builds would see the > > peak usage of the backend masked by the peak of the midend) but still - pretty > > big. Thoughts? Is this completely no-go? is this something that we *just need* > > to do? Is crippling the backend architecture to keep memory down justified? Is > > this something we could enable under an option? > > Personally, I think this price is too high. I think that if we want to > enable machine module passes (which we probably do) we need to turn > MachineFunction into more of a first class object that isn't just a > wrapper around IR. > > This can and should be designed to work something like Pete's solution, > where we get rid of the IR and just have machine level stuff in memory. > This way, we may still increase the memory usage here, but it should be > far less dramatic. > > You'll note that doing this also has tangential benefits - it should be > helpful for simplifying MIR and generally improving testability of the > backends. > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160819/6091ce27/attachment.html>