thr3ads.net - llvm dev - [llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Aditya Nandakumar via llvm-dev

2017-Nov-10 22:04 UTC

[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework

> On Nov 10, 2017, at 10:19 AM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> On 11/10/2017 11:12 AM, Amara Emerson via llvm-dev wrote:
>> Hi everyone,
>> 
>> This RFC concerns the design and architecture of a generic machine
instruction combiner/optimizer framework to be developed as part of the GISel
pipeline. As we transition from correctness and reducing the fallback rate to
SelectionDAG at -O0, we’re now starting to think about using GlobalISel with
optimizations enabled. There are obviously many parts to this story as
optimizations happen at various stages of the codegen pipeline. The focus of
this RFC is the replacement of the equivalent of the DAGCombiner in SDAG land.
Despite the focus on the DAGCombiner, since there aren’t perfect 1-1 mappings
between SDAG and GlobalISel components, this may also include features that are
currently implemented as part of the target lowerings, and tablegen isel
patterns. As we’re starting from a blank slate, we have an opportunity here to
think about what we might need from such a framework without the legacy cruft
(although we still have the high performance bar to meet).
>> 
>> I want to poll the community about what future requirements we have for
the GISel G_MI optimizer/combiner. The following are the general requirements we
have so far:
>> 
>> It should have at least equivalent, but hopefully better
runtime/compile time trade off than the DAGCombiner.
>> There needs to be flexibility in the design to allow targets to run
subsets of the overall optimizer. For example, some targets may want to avoid
trying to run certain types of optimizations like vector or FP combines if
they’re either not applicable, or not worth the compile time.
>> Have a reasonably concise way to write most optimizations. Hand written
C++ will always be an option, but there’s value in having easy to read and
reason about descriptions of transforms.
>> 
>> These requirements aren’t set in stone nor complete, but using them as
a starting point: a single monolithic “Generic MI combiner” component doesn’t
look like the right approach. Our current thinking is that, like we’ve done with
the Legalizer, the specific mechanics of the actual optimization should be
separated into it’s own unit. This would allow the combines to be re-used at
different stages of the pipeline according to target needs. Using the current
situation with instcombine as an example, there is no way to explicitly pick and
choose a specific subset of IC, it’s only available as a whole pass with all the
costs that entails.
>> 
>> The reasoning behind req 3 is that there may be compile time savings
available if we can describe in a declarative style the combines we want to do,
like it’s currently possible with tablegen patterns. This hasn’t been proven out
yet, but consider an alternative where we use the machine instruction equivalent
of the IR/PatternMatch tooling which allows easy and expressive matching of IR
sub-trees. A concern I have with using that as the main approach to writing
combines is that it’s easy to add new matchers in an routine which re-computes
information that’s previously been computed in previous match() attempts.
> 
> I share this concern.
> 
>> This form of back-tracking might be avoided if we can reason about a
group of combines together automatically (or perhaps we could add caching
capabilities to PatternMatch).
>> 
>> What would everyone else like to see from this?
> 
> The current DAGCombine, being constructed on top of SDAG, has a kind of
built-in CSE and automatic DCE. How will things change, if they'll change,
in this new model?Hi Hal,

I suspect one option is to have a separate CSE pass, and the backends get to
choose where exactly they plug in their pipeline. I think DCE should be part of
the combine pass (and the legalizer is about to start doing that as
well).> 
> Thanks again,
> Hal
> 
>> 
>> Thanks,
>> Amara
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171110/b9cef47d/attachment-0001.html>

Amara Emerson via llvm-dev

2017-Nov-11 18:44 UTC

head link

[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework

> On Nov 10, 2017, at 10:04 PM, Aditya Nandakumar <proaditya at
gmail.com> wrote:
>> 
>> The current DAGCombine, being constructed on top of SDAG, has a kind of
built-in CSE and automatic DCE. How will things change, if they'll change,
in this new model?
> Hi Hal,
> 
> I suspect one option is to have a separate CSE pass, and the backends get
to choose where exactly they plug in their pipeline. I think DCE should be part
of the combine pass (and the legalizer is about to start doing that as well).For SSA form MIR there’s already the MachineCSE pass. How important the CSE/DCE
is at the combine stage I don’t know. As an approximation perhaps we can get an
idea by disabling the behavior in the DAGCombiner and seeing the effects.
> On Nov 10, 2017, at 9:54 PM, Daniel Sanders <daniel_l_sanders at
apple.com> wrote:
> 
> My thinking on this is that (with a few exceptions that I'll get to),
combine and select are basically the same thing. You match some MIR, and replace
it with other MIR. The main difference being that combine doesn't have to
constrain to register classes (unless it wants to) while select does.
> 
> With that in mind, I was thinking that it makes sense to put a lot of
effort into the optimization of the tablegen-erated selection table (as has been
started in Quentin's recent patch) and then re-use it for combines too.
We'll need to be careful how we define GlobalISel's counterpart to
SelectionDAG patterns to make it expressive enough to support combines but
that's essentially a second frontend (the other being the SelectionDAG
importer) on a common backendAgreed that combine and selection are similar processes. It sounds like this is
something we should look at prototyping.
> 
> Req 2 becomes simple to implement in this approach. You can either use the
existing feature-bits mechanism to enable/disable combine rules as a group, or
add an equivalent mechanism in tablegen to decide whether a rule makes it into
the emitted table or not and have multiple tables which you can run/not-run at
will. With the new coverage feedback mechanism, we could potentially organize
our tables semi-automatically by highlighting combine rules that never or rarely
fire in a particular pass.
> 
> One feature I think we ought to have that isn't on the requirements
list already, is that I think we should have a means to support rules with more
than one match root. For example (using SelectionDAG patterns):
>   (set $dst1:GPR32, (i32 (load $ptr:GPR64)))
>   (set $dst2:GPR32, (i32 (load (add $ptr:GPR64 4))))
> into:
>   (set $tmp:GPR64, (v2s32 (load $ptr:GPR64)))
>   (set $dst1, (extractelt $tmp:GPR64, 0))
>   (set $dst2, (extractelt $tmp:GPR64, 1))
> Or something along those lines (such as fusing div/mod together). The
combiner should be smart enough to make the root the $ptr, and follow the use of
$ptr into the load/add, then follow the def to the 4.This seems like a nice feature, but I wonder about the impact this will have on
the speed of the matching algorithm. I don’t know enough about it to say though.
IMO complex features can be done in C++ code if they’re uncommon, in preference
for fast handling of the common cases. Maybe a few more use cases are needed.

Thanks,
Amara
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171111/1464683a/attachment.html>

Hal Finkel via llvm-dev

2017-Nov-11 19:03 UTC

head link

[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework

On 11/11/2017 12:44 PM, Amara Emerson wrote:>
>> On Nov 10, 2017, at 10:04 PM, Aditya Nandakumar <proaditya at
gmail.com
>> <mailto:proaditya at gmail.com>> wrote:
>>>
>>> The current DAGCombine, being constructed on top of SDAG, has a
kind
>>> of built-in CSE and automatic DCE. How will things change, if 
>>> they'll change, in this new model?
>> Hi Hal,
>>
>> I suspect one option is to have a separate CSE pass, and the backends 
>> get to choose where exactly they plug in their pipeline. I think DCE 
>> should be part of the combine pass (and the legalizer is about to 
>> start doing that as well).
> For SSA form MIR there’s already the MachineCSE pass. How important 
> the CSE/DCE is at the combine stage I don’t know. As an approximation 
> perhaps we can get an idea by disabling the behavior in the 
> DAGCombiner and seeing the effects.
My impression is that the automated CSE and DCE is very important to the 
current implementation. There's a lot of code that depends on his 
happening in order to have the expected effects. Otherwise, the 
use-count checks won't do the right thing (because the old unused uses 
of things won't immediately go away).

I'm not entirely sure you can just turn off the uniquing in SDAG and get 
a sensible result.

  -Hal
>
>> On Nov 10, 2017, at 9:54 PM, Daniel Sanders 
>> <daniel_l_sanders at apple.com <mailto:daniel_l_sanders at
apple.com>> wrote:
>>
>> My thinking on this is that (with a few exceptions that I'll get
to),
>> combine and select are basically the same thing. You match some MIR, 
>> and replace it with other MIR. The main difference being that combine 
>> doesn't have to constrain to register classes (unless it wants to) 
>> while select does.
>>
>> With that in mind, I was thinking that it makes sense to put a lot of 
>> effort into the optimization of the tablegen-erated selection table 
>> (as has been started in Quentin's recent patch) and then re-use it 
>> for combines too. We'll need to be careful how we define
GlobalISel's
>> counterpart to SelectionDAG patterns to make it expressive enough to 
>> support combines but that's essentially a second frontend (the
other
>> being the SelectionDAG importer) on a common backend
> Agreed that combine and selection are similar processes. It sounds 
> like this is something we should look at prototyping.
>
>>
>> Req 2 becomes simple to implement in this approach. You can either 
>> use the existing feature-bits mechanism to enable/disable combine 
>> rules as a group, or add an equivalent mechanism in tablegen to 
>> decide whether a rule makes it into the emitted table or not and have 
>> multiple tables which you can run/not-run at will. With the new 
>> coverage feedback mechanism, we could potentially organize our tables 
>> semi-automatically by highlighting combine rules that never or rarely 
>> fire in a particular pass.
>>
>> One feature I think we ought to have that isn't on the requirements
>> list already, is that I think we should have a means to support rules 
>> with more than one match root. For example (using SelectionDAG
patterns):
>>   (set $dst1:GPR32, (i32 (load $ptr:GPR64)))
>>   (set $dst2:GPR32, (i32 (load (add $ptr:GPR64 4))))
>> into:
>>   (set $tmp:GPR64, (v2s32 (load $ptr:GPR64)))
>>   (set $dst1, (extractelt $tmp:GPR64, 0))
>>   (set $dst2, (extractelt $tmp:GPR64, 1))
>> Or something along those lines (such as fusing div/mod together). The 
>> combiner should be smart enough to make the root the $ptr, and follow 
>> the use of $ptr into the load/add, then follow the def to the 4.
> This seems like a nice feature, but I wonder about the impact this 
> will have on the speed of the matching algorithm. I don’t know enough 
> about it to say though. IMO complex features can be done in C++ code 
> if they’re uncommon, in preference for fast handling of the common 
> cases. Maybe a few more use cases are needed.
>
> Thanks,
> Amara
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171111/4557e350/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Nov 2017 - RFC: [GlobalISel] Towards a generic MI combiner framework

[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework

[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework

[llvm-dev] RFC: [GlobalISel] Towards a generic MI combiner framework

Possibly Parallel Threads