Amara Emerson via llvm-dev
2017-Nov-10 17:12 UTC
[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework
Hi everyone, This RFC concerns the design and architecture of a generic machine instruction combiner/optimizer framework to be developed as part of the GISel pipeline. As we transition from correctness and reducing the fallback rate to SelectionDAG at -O0, we’re now starting to think about using GlobalISel with optimizations enabled. There are obviously many parts to this story as optimizations happen at various stages of the codegen pipeline. The focus of this RFC is the replacement of the equivalent of the DAGCombiner in SDAG land. Despite the focus on the DAGCombiner, since there aren’t perfect 1-1 mappings between SDAG and GlobalISel components, this may also include features that are currently implemented as part of the target lowerings, and tablegen isel patterns. As we’re starting from a blank slate, we have an opportunity here to think about what we might need from such a framework without the legacy cruft (although we still have the high performance bar to meet). I want to poll the community about what future requirements we have for the GISel G_MI optimizer/combiner. The following are the general requirements we have so far: It should have at least equivalent, but hopefully better runtime/compile time trade off than the DAGCombiner. There needs to be flexibility in the design to allow targets to run subsets of the overall optimizer. For example, some targets may want to avoid trying to run certain types of optimizations like vector or FP combines if they’re either not applicable, or not worth the compile time. Have a reasonably concise way to write most optimizations. Hand written C++ will always be an option, but there’s value in having easy to read and reason about descriptions of transforms. These requirements aren’t set in stone nor complete, but using them as a starting point: a single monolithic “Generic MI combiner” component doesn’t look like the right approach. Our current thinking is that, like we’ve done with the Legalizer, the specific mechanics of the actual optimization should be separated into it’s own unit. This would allow the combines to be re-used at different stages of the pipeline according to target needs. Using the current situation with instcombine as an example, there is no way to explicitly pick and choose a specific subset of IC, it’s only available as a whole pass with all the costs that entails. The reasoning behind req 3 is that there may be compile time savings available if we can describe in a declarative style the combines we want to do, like it’s currently possible with tablegen patterns. This hasn’t been proven out yet, but consider an alternative where we use the machine instruction equivalent of the IR/PatternMatch tooling which allows easy and expressive matching of IR sub-trees. A concern I have with using that as the main approach to writing combines is that it’s easy to add new matchers in an routine which re-computes information that’s previously been computed in previous match() attempts. This form of back-tracking might be avoided if we can reason about a group of combines together automatically (or perhaps we could add caching capabilities to PatternMatch). What would everyone else like to see from this? Thanks, Amara -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171110/06e0bc64/attachment.html>
Hal Finkel via llvm-dev
2017-Nov-10 18:19 UTC
[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework
On 11/10/2017 11:12 AM, Amara Emerson via llvm-dev wrote:> Hi everyone, > > This RFC concerns the design and architecture of a generic machine > instruction combiner/optimizer framework to be developed as part of > the GISel pipeline. As we transition from correctness and reducing the > fallback rate to SelectionDAG at -O0, we’re now starting to think > about using GlobalISel with optimizations enabled. There are obviously > many parts to this story as optimizations happen at various stages of > the codegen pipeline. The focus of this RFC is the replacement of the > equivalent of the DAGCombiner in SDAG land. Despite the focus on the > DAGCombiner, since there aren’t perfect 1-1 mappings between SDAG and > GlobalISel components, this may also include features that are > currently implemented as part of the target lowerings, and tablegen > isel patterns. As we’re starting from a blank slate, we have an > opportunity here to think about what we might need from such a > framework without the legacy cruft (although we still have the high > performance bar to meet). > > I want to poll the community about what future requirements we have > for the GISel G_MI optimizer/combiner. The following are the general > requirements we have so far: > > 1. It should have at least equivalent, but hopefully better > runtime/compile time trade off than the DAGCombiner. > 2. There needs to be flexibility in the design to allow targets to > run subsets of the overall optimizer. For example, some targets > may want to avoid trying to run certain types of optimizations > like vector or FP combines if they’re either not applicable, or > not worth the compile time. > 3. Have a reasonably concise way to write most optimizations. Hand > written C++ will always be an option, but there’s value in having > easy to read and reason about descriptions of transforms. > > > These requirements aren’t set in stone nor complete, but using them as > a starting point: a single monolithic “Generic MI combiner” component > doesn’t look like the right approach. Our current thinking is that, > like we’ve done with the Legalizer, the specific mechanics of the > actual optimization should be separated into it’s own unit. This would > allow the combines to be re-used at different stages of the pipeline > according to target needs. Using the current situation with > instcombine as an example, there is no way to explicitly pick and > choose a specific subset of IC, it’s only available as a whole pass > with all the costs that entails. > > The reasoning behind req 3 is that there may be compile time savings > available if we can describe in a declarative style the combines we > want to do, like it’s currently possible with tablegen patterns. This > hasn’t been proven out yet, but consider an alternative where we use > the machine instruction equivalent of the IR/PatternMatch tooling > which allows easy and expressive matching of IR sub-trees. A concern I > have with using that as the main approach to writing combines is that > it’s easy to add new matchers in an routine which re-computes > information that’s previously been computed in previous match() attempts.I share this concern.> This form of back-tracking might be avoided if we can reason about a > group of combines together automatically (or perhaps we could add > caching capabilities to PatternMatch). > > What would everyone else like to see from this?The current DAGCombine, being constructed on top of SDAG, has a kind of built-in CSE and automatic DCE. How will things change, if they'll change, in this new model? Thanks again, Hal> > Thanks, > Amara > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171110/178c7e00/attachment.html>
Daniel Sanders via llvm-dev
2017-Nov-10 21:54 UTC
[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework
Hi Amara,> On 10 Nov 2017, at 09:12, Amara Emerson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi everyone, > > This RFC concerns the design and architecture of a generic machine instruction combiner/optimizer framework to be developed as part of the GISel pipeline. As we transition from correctness and reducing the fallback rate to SelectionDAG at -O0, we’re now starting to think about using GlobalISel with optimizations enabled. There are obviously many parts to this story as optimizations happen at various stages of the codegen pipeline. The focus of this RFC is the replacement of the equivalent of the DAGCombiner in SDAG land. Despite the focus on the DAGCombiner, since there aren’t perfect 1-1 mappings between SDAG and GlobalISel components, this may also include features that are currently implemented as part of the target lowerings, and tablegen isel patterns. As we’re starting from a blank slate, we have an opportunity here to think about what we might need from such a framework without the legacy cruft (although we still have the high performance bar to meet). > > I want to poll the community about what future requirements we have for the GISel G_MI optimizer/combiner. The following are the general requirements we have so far: > > It should have at least equivalent, but hopefully better runtime/compile time trade off than the DAGCombiner. > There needs to be flexibility in the design to allow targets to run subsets of the overall optimizer. For example, some targets may want to avoid trying to run certain types of optimizations like vector or FP combines if they’re either not applicable, or not worth the compile time. > Have a reasonably concise way to write most optimizations. Hand written C++ will always be an option, but there’s value in having easy to read and reason about descriptions of transforms. > > These requirements aren’t set in stone nor complete, but using them as a starting point: a single monolithic “Generic MI combiner” component doesn’t look like the right approach. Our current thinking is that, like we’ve done with the Legalizer, the specific mechanics of the actual optimization should be separated into it’s own unit. This would allow the combines to be re-used at different stages of the pipeline according to target needs. Using the current situation with instcombine as an example, there is no way to explicitly pick and choose a specific subset of IC, it’s only available as a whole pass with all the costs that entails.I agree. If we've just replaced some MIR with different MIR then we should be able to ask the combiner to operate on the new MIR. Much like the legalizer, this would reduce the frequency with which we have to implement the same thing in multiple passes just because we can't run the existing code at the time.> The reasoning behind req 3 is that there may be compile time savings available if we can describe in a declarative style the combines we want to do, like it’s currently possible with tablegen patterns. This hasn’t been proven out yet, but consider an alternative where we use the machine instruction equivalent of the IR/PatternMatch tooling which allows easy and expressive matching of IR sub-trees. A concern I have with using that as the main approach to writing combines is that it’s easy to add new matchers in an routine which re-computes information that’s previously been computed in previous match() attempts. This form of back-tracking might be avoided if we can reason about a group of combines together automatically (or perhaps we could add caching capabilities to PatternMatch).My thinking on this is that (with a few exceptions that I'll get to), combine and select are basically the same thing. You match some MIR, and replace it with other MIR. The main difference being that combine doesn't have to constrain to register classes (unless it wants to) while select does. With that in mind, I was thinking that it makes sense to put a lot of effort into the optimization of the tablegen-erated selection table (as has been started in Quentin's recent patch) and then re-use it for combines too. We'll need to be careful how we define GlobalISel's counterpart to SelectionDAG patterns to make it expressive enough to support combines but that's essentially a second frontend (the other being the SelectionDAG importer) on a common backend Req 2 becomes simple to implement in this approach. You can either use the existing feature-bits mechanism to enable/disable combine rules as a group, or add an equivalent mechanism in tablegen to decide whether a rule makes it into the emitted table or not and have multiple tables which you can run/not-run at will. With the new coverage feedback mechanism, we could potentially organize our tables semi-automatically by highlighting combine rules that never or rarely fire in a particular pass. One feature I think we ought to have that isn't on the requirements list already, is that I think we should have a means to support rules with more than one match root. For example (using SelectionDAG patterns): (set $dst1:GPR32, (i32 (load $ptr:GPR64))) (set $dst2:GPR32, (i32 (load (add $ptr:GPR64 4)))) into: (set $tmp:GPR64, (v2s32 (load $ptr:GPR64))) (set $dst1, (extractelt $tmp:GPR64, 0)) (set $dst2, (extractelt $tmp:GPR64, 1)) Or something along those lines (such as fusing div/mod together). The combiner should be smart enough to make the root the $ptr, and follow the use of $ptr into the load/add, then follow the def to the 4.> What would everyone else like to see from this? > > Thanks, > Amara > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171110/e59ff1d3/attachment.html>
Aditya Nandakumar via llvm-dev
2017-Nov-10 22:04 UTC
[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework
> On Nov 10, 2017, at 10:19 AM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > On 11/10/2017 11:12 AM, Amara Emerson via llvm-dev wrote: >> Hi everyone, >> >> This RFC concerns the design and architecture of a generic machine instruction combiner/optimizer framework to be developed as part of the GISel pipeline. As we transition from correctness and reducing the fallback rate to SelectionDAG at -O0, we’re now starting to think about using GlobalISel with optimizations enabled. There are obviously many parts to this story as optimizations happen at various stages of the codegen pipeline. The focus of this RFC is the replacement of the equivalent of the DAGCombiner in SDAG land. Despite the focus on the DAGCombiner, since there aren’t perfect 1-1 mappings between SDAG and GlobalISel components, this may also include features that are currently implemented as part of the target lowerings, and tablegen isel patterns. As we’re starting from a blank slate, we have an opportunity here to think about what we might need from such a framework without the legacy cruft (although we still have the high performance bar to meet). >> >> I want to poll the community about what future requirements we have for the GISel G_MI optimizer/combiner. The following are the general requirements we have so far: >> >> It should have at least equivalent, but hopefully better runtime/compile time trade off than the DAGCombiner. >> There needs to be flexibility in the design to allow targets to run subsets of the overall optimizer. For example, some targets may want to avoid trying to run certain types of optimizations like vector or FP combines if they’re either not applicable, or not worth the compile time. >> Have a reasonably concise way to write most optimizations. Hand written C++ will always be an option, but there’s value in having easy to read and reason about descriptions of transforms. >> >> These requirements aren’t set in stone nor complete, but using them as a starting point: a single monolithic “Generic MI combiner” component doesn’t look like the right approach. Our current thinking is that, like we’ve done with the Legalizer, the specific mechanics of the actual optimization should be separated into it’s own unit. This would allow the combines to be re-used at different stages of the pipeline according to target needs. Using the current situation with instcombine as an example, there is no way to explicitly pick and choose a specific subset of IC, it’s only available as a whole pass with all the costs that entails. >> >> The reasoning behind req 3 is that there may be compile time savings available if we can describe in a declarative style the combines we want to do, like it’s currently possible with tablegen patterns. This hasn’t been proven out yet, but consider an alternative where we use the machine instruction equivalent of the IR/PatternMatch tooling which allows easy and expressive matching of IR sub-trees. A concern I have with using that as the main approach to writing combines is that it’s easy to add new matchers in an routine which re-computes information that’s previously been computed in previous match() attempts. > > I share this concern. > >> This form of back-tracking might be avoided if we can reason about a group of combines together automatically (or perhaps we could add caching capabilities to PatternMatch). >> >> What would everyone else like to see from this? > > The current DAGCombine, being constructed on top of SDAG, has a kind of built-in CSE and automatic DCE. How will things change, if they'll change, in this new model?Hi Hal, I suspect one option is to have a separate CSE pass, and the backends get to choose where exactly they plug in their pipeline. I think DCE should be part of the combine pass (and the legalizer is about to start doing that as well).> > Thanks again, > Hal > >> >> Thanks, >> Amara >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171110/b9cef47d/attachment-0001.html>
Vedant Kumar via llvm-dev
2017-Nov-13 19:53 UTC
[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework
Hi Amara,> On Nov 10, 2017, at 9:12 AM, Amara Emerson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi everyone, > > This RFC concerns the design and architecture of a generic machine instruction combiner/optimizer framework to be developed as part of the GISel pipeline. As we transition from correctness and reducing the fallback rate to SelectionDAG at -O0, we’re now starting to think about using GlobalISel with optimizations enabled. There are obviously many parts to this story as optimizations happen at various stages of the codegen pipeline. The focus of this RFC is the replacement of the equivalent of the DAGCombiner in SDAG land. Despite the focus on the DAGCombiner, since there aren’t perfect 1-1 mappings between SDAG and GlobalISel components, this may also include features that are currently implemented as part of the target lowerings, and tablegen isel patterns. As we’re starting from a blank slate, we have an opportunity here to think about what we might need from such a framework without the legacy cruft (although we still have the high performance bar to meet). > > I want to poll the community about what future requirements we have for the GISel G_MI optimizer/combiner. The following are the general requirements we have so far: > > It should have at least equivalent, but hopefully better runtime/compile time trade off than the DAGCombiner. > There needs to be flexibility in the design to allow targets to run subsets of the overall optimizer. For example, some targets may want to avoid trying to run certain types of optimizations like vector or FP combines if they’re either not applicable, or not worth the compile time. > Have a reasonably concise way to write most optimizations. Hand written C++ will always be an option, but there’s value in having easy to read and reason about descriptions of transforms. > > These requirements aren’t set in stone nor complete, but using them as a starting point: a single monolithic “Generic MI combiner” component doesn’t look like the right approach. Our current thinking is that, like we’ve done with the Legalizer, the specific mechanics of the actual optimization should be separated into it’s own unit. This would allow the combines to be re-used at different stages of the pipeline according to target needs. Using the current situation with instcombine as an example, there is no way to explicitly pick and choose a specific subset of IC, it’s only available as a whole pass with all the costs that entails. > > The reasoning behind req 3 is that there may be compile time savings available if we can describe in a declarative style the combines we want to do, like it’s currently possible with tablegen patterns. This hasn’t been proven out yet, but consider an alternative where we use the machine instruction equivalent of the IR/PatternMatch tooling which allows easy and expressive matching of IR sub-trees. A concern I have with using that as the main approach to writing combines is that it’s easy to add new matchers in an routine which re-computes information that’s previously been computed in previous match() attempts. This form of back-tracking might be avoided if we can reason about a group of combines together automatically (or perhaps we could add caching capabilities to PatternMatch). > > What would everyone else like to see from this?It would be great to provide first-class support for maintaining debug value information as a part of the new combine framework. With SelectionDAG, we don't have a systematic way of preserving debug locations and values across combines. This is a source of optimized debugging bugs. If, as a part of the new framework, we could concisely express that a RAUW-style combine simply transfers debug values from A to B, we might define away some of these bugs [1]. Adrian put in place some infrastructure to do this in SelectionDAG (r317825). However, auditing/fixing debug value transfer issues in hand-written combines is time-consuming. I think it should be a goal of the new framework to make this a bit easier. best, vedant [1] To pick one at random, the '(zext (zextload x)) -> (zext (truncate (zextload x)))' combine should transfer debug values from N to the new ExtLoad, and from N0 to the new (trunc ExtLoad), but currently doesn't.> > Thanks, > Amara > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171113/e88ff862/attachment-0001.html>
Gerolf Hoflehner via llvm-dev
2017-Nov-18 00:41 UTC
[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework
> On Nov 13, 2017, at 11:53 AM, Vedant Kumar via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi Amara, > >> On Nov 10, 2017, at 9:12 AM, Amara Emerson via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi everyone, >> >> This RFC concerns the design and architecture of a generic machine instruction combiner/optimizer framework to be developed as part of the GISel pipeline. As we transition from correctness and reducing the fallback rate to SelectionDAG at -O0, we’re now starting to think about using GlobalISel with optimizations enabled. There are obviously many parts to this story as optimizations happen at various stages of the codegen pipeline. The focus of this RFC is the replacement of the equivalent of the DAGCombiner in SDAG land. Despite the focus on the DAGCombiner, since there aren’t perfect 1-1 mappings between SDAG and GlobalISel components, this may also include features that are currently implemented as part of the target lowerings, and tablegen isel patterns. As we’re starting from a blank slate, we have an opportunity here to think about what we might need from such a framework without the legacy cruft (although we still have the high performance bar to meet). >> >> I want to poll the community about what future requirements we have for the GISel G_MI optimizer/combiner. The following are the general requirements we have so far: >> >> It should have at least equivalent, but hopefully better runtime/compile time trade off than the DAGCombiner. >> There needs to be flexibility in the design to allow targets to run subsets of the overall optimizer. For example, some targets may want to avoid trying to run certain types of optimizations like vector or FP combines if they’re either not applicable, or not worth the compile time. >> Have a reasonably concise way to write most optimizations. Hand written C++ will always be an option, but there’s value in having easy to read and reason about descriptions of transforms. >> >> These requirements aren’t set in stone nor complete, but using them as a starting point: a single monolithic “Generic MI combiner” component doesn’t look like the right approach. Our current thinking is that, like we’ve done with the Legalizer, the specific mechanics of the actual optimization should be separated into it’s own unit. This would allow the combines to be re-used at different stages of the pipeline according to target needs. Using the current situation with instcombine as an example, there is no way to explicitly pick and choose a specific subset of IC, it’s only available as a whole pass with all the costs that entails. >> >> The reasoning behind req 3 is that there may be compile time savings available if we can describe in a declarative style the combines we want to do, like it’s currently possible with tablegen patterns. This hasn’t been proven out yet, but consider an alternative where we use the machine instruction equivalent of the IR/PatternMatch tooling which allows easy and expressive matching of IR sub-trees. A concern I have with using that as the main approach to writing combines is that it’s easy to add new matchers in an routine which re-computes information that’s previously been computed in previous match() attempts. This form of back-tracking might be avoided if we can reason about a group of combines together automatically (or perhaps we could add caching capabilities to PatternMatch). >> >> What would everyone else like to see from this? > > It would be great to provide first-class support for maintaining debug value information as a part of the new combine framework. > > With SelectionDAG, we don't have a systematic way of preserving debug locations and values across combines. This is a source of optimized debugging bugs. If, as a part of the new framework, we could concisely express that a RAUW-style combine simply transfers debug values from A to B, we might define away some of these bugs [1]. > > Adrian put in place some infrastructure to do this in SelectionDAG (r317825). However, auditing/fixing debug value transfer issues in hand-written combines is time-consuming. I think it should be a goal of the new framework to make this a bit easier.+1 Do you and/or Adrian also have thought on testing debug values? Verification and testing strategies should/could be part of the design also.> > best, > vedant > > [1] To pick one at random, the '(zext (zextload x)) -> (zext (truncate (zextload x)))' combine should transfer debug values from N to the new ExtLoad, and from N0 to the new (trunc ExtLoad), but currently doesn't. > >> >> Thanks, >> Amara >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171117/ba43b00b/attachment.html>
Possibly Parallel Threads
- RFC: [GlobalISel] Towards a generic MI combiner framework
- RFC: [GlobalISel] Towards a generic MI combiner framework
- RFC: [GlobalISel] Towards a generic MI combiner framework
- RFC: [GlobalISel] Towards a generic MI combiner framework
- RFC: [GlobalISel] Towards a generic MI combiner framework