thr3ads.net - llvm dev - [llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Sameer Sahasrabuddhe via llvm-dev

2021-Mar-30 16:20 UTC

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

---- On Thu, 11 Feb 2021 06:07:17 +0530 David Blaikie via llvm-dev <llvm-dev
at lists.llvm.org> wrote ----

 > Owen - perhaps you've got some relevant perspective here given the GPU
 > context/background, and general middle end optimization design
 > questions. Could you take a look?
 > (& perhaps Johannes?)
 > 
 > On Mon, Feb 1, 2021 at 11:36 AM David Blaikie <dblaikie at
gmail.com> wrote:
 > >
 > > Thanks for sending this out!
 > >
 > > +Mehdi - if you have a chance to look, I'd appreciate your
thoughts on this, partly as a general LLVM contributor, but also with an eye to
abstractions over multiple IRs given your work on MLIR
 > > +Lang - you mentioned maybe some colleagues of yours might be able to
take a look at the design here?
 > >
 > > I'd like to try writing up maybe alternatives to a couple of the
patches in the series to see how they compare, but haven't done that as yet.
 > >
 > > My brief take is that I think the opaque handles make a fair bit of
sense/seem high-value/low-cost (if it turned out the handle for one of these IR
types needed to become a bit heavier, I don't think it'd be super
difficult to revisit this code - make the generic handle large enough to
accommodate whatever the largest concrete handle happens to be, etc). Though
I'm still a bit less certain about the full runtime polymorphic
abstractions.

For context, this is about the options laid out in the following document from
Nicolai:

https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?usp=sharing

The above document was first introduced in the following email, which is the
starting point of the current thread:

https://lists.llvm.org/pipermail/llvm-dev/2020-December/147433.html

The main issue is the need for an abstraction that greatly improves the
experience of writing analyses that work on LLVM IR and MIR (and potentially
also on MLIR). The way I understand it, Nicolai proposed an abstraction that
involves dynamic polymorphism to abstract out the details of the underlying IR.
David was more in favour of first unifying the IRs to the point that the dynamic
polymorphism is limited to only occasional corner cases instead.

I am currently involved in the activity towards decoupling AMDGPU's
dependency on the abstractions being discussed here, and the one thing that
immediately jumps out is the ability to traverse the predecessors and successors
of a basic block. From all the emails so far and the resulting proposal, it
seems that both David and Nicolai think that having a common non-abstract base
class for basic blocks is a good idea. Such a base class will provide a vector
of predecessors and successors that can be used to traverse the CFG independent
of the actual IR.

The current situation is that the basic blocks in LLVM IR and MLIR do not
explicitly track their preds and succs. Preds are determined from the uses of
the block, while succs are determined from the terminator instruction. MIR has
explicit pred and succ vectors, but the succ pointers are duplicated by their
presence as operands to the terminator instructions. The MachineBasicBlock is
not a value, so there is no notion of traversing its uses for predecessors.

So if we create a common non-abstract base class for basic blocks, then MLIR and
LLVM IR will incur the overhead of two vectors, one each for the preds and
succs. Nicolai had reported a 3.1% to 4.1% increase in the size LLVM IR in some
typical programs:

https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing

Is this overhead the main issue in deciding whether we should introduce the
common base class for traversing a CFG? Do note that MIR already duplicates
successors. One could argue that duplicating the successor pointers in LLVM IR
and MLIR merely brings them on the same level as MIR. If that argument sticks,
then the "real" overhead is only half of what is reported to account
for the duplicate predecessor pointers.

The duplication of predecessor pointers stems from the fact that basic blocks in
LLVM IR and MLIR are also values that are (primarily) used as operands to the
terminator and PHI instructions. We could redefine these instructions to use
non-value basic blocks as operands. CFG edges are not the typical use-def
relation that other values represent, and it's reasonable to say that
PHINodes and terminators are special in this one way.  I have not managed to see
how far that rabbit hole goes, but I am not really convinced that this is
attractive in any way. Is this even an option?

Sameer.

Mehdi AMINI via llvm-dev

2021-Mar-31 00:25 UTC

head link

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

On Tue, Mar 30, 2021 at 9:20 AM Sameer Sahasrabuddhe <sameer at
sbuddhe.net>
wrote:
> ---- On Thu, 11 Feb 2021 06:07:17 +0530 David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote ----
>
>  > Owen - perhaps you've got some relevant perspective here given
the GPU
>  > context/background, and general middle end optimization design
>  > questions. Could you take a look?
>  > (& perhaps Johannes?)
>  >
>  > On Mon, Feb 1, 2021 at 11:36 AM David Blaikie <dblaikie at
gmail.com>
> wrote:
>  > >
>  > > Thanks for sending this out!
>  > >
>  > > +Mehdi - if you have a chance to look, I'd appreciate your
thoughts
> on this, partly as a general LLVM contributor, but also with an eye to
> abstractions over multiple IRs given your work on MLIR
>  > > +Lang - you mentioned maybe some colleagues of yours might be
able to
> take a look at the design here?
>  > >
>  > > I'd like to try writing up maybe alternatives to a couple of
the
> patches in the series to see how they compare, but haven't done that as
yet.
>  > >
>  > > My brief take is that I think the opaque handles make a fair bit
of
> sense/seem high-value/low-cost (if it turned out the handle for one of
> these IR types needed to become a bit heavier, I don't think it'd
be super
> difficult to revisit this code - make the generic handle large enough to
> accommodate whatever the largest concrete handle happens to be, etc).
> Though I'm still a bit less certain about the full runtime polymorphic
> abstractions.
>
> For context, this is about the options laid out in the following document
> from Nicolai:
>
>
>
https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?usp=sharing
>
> The above document was first introduced in the following email, which is
> the starting point of the current thread:
>
> https://lists.llvm.org/pipermail/llvm-dev/2020-December/147433.html
>
> The main issue is the need for an abstraction that greatly improves the
> experience of writing analyses that work on LLVM IR and MIR (and
> potentially also on MLIR). The way I understand it, Nicolai proposed an
> abstraction that involves dynamic polymorphism to abstract out the details
> of the underlying IR. David was more in favour of first unifying the IRs to
> the point that the dynamic polymorphism is limited to only occasional
> corner cases instead.
>
> I am currently involved in the activity towards decoupling AMDGPU's
> dependency on the abstractions being discussed here, and the one thing that
> immediately jumps out is the ability to traverse the predecessors and
> successors of a basic block. From all the emails so far and the resulting
> proposal, it seems that both David and Nicolai think that having a common
> non-abstract base class for basic blocks is a good idea. Such a base class
> will provide a vector of predecessors and successors that can be used to
> traverse the CFG independent of the actual IR.
>
> The current situation is that the basic blocks in LLVM IR and MLIR do not
> explicitly track their preds and succs. Preds are determined from the uses
> of the block, while succs are determined from the terminator instruction.
> MIR has explicit pred and succ vectors, but the succ pointers are
> duplicated by their presence as operands to the terminator instructions.
> The MachineBasicBlock is not a value, so there is no notion of traversing
> its uses for predecessors.
>
> So if we create a common non-abstract base class for basic blocks, then
> MLIR and LLVM IR will incur the overhead of two vectors, one each for the
> preds and succs. Nicolai had reported a 3.1% to 4.1% increase in the size
> LLVM IR in some typical programs:
>
>
>
https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing
>
> Is this overhead the main issue in deciding whether we should introduce
> the common base class for traversing a CFG? Do note that MIR already
> duplicates successors. One could argue that duplicating the successor
> pointers in LLVM IR and MLIR merely brings them on the same level as MIR.
> If that argument sticks, then the "real" overhead is only half of
what is
> reported to account for the duplicate predecessor pointers.
>
> The duplication of predecessor pointers stems from the fact that basic
> blocks in LLVM IR and MLIR are also values that are (primarily) used as
> operands to the terminator and PHI instructions. We could redefine these
> instructions to use non-value basic blocks as operands. CFG edges are not
> the typical use-def relation that other values represent, and it's
> reasonable to say that PHINodes and terminators are special in this one
> way.  I have not managed to see how far that rabbit hole goes, but I am not
> really convinced that this is attractive in any way. Is this even an
option?
>
I'm not knowledgeable about everything you're looking at, but one minor
thing: in MLIR blocks aren't regular value operands and aren't part of
the
value hierarchy, they don't participate in the usual use-def chains (and
MLIR does not have Phi nodes either).
Note that LLVM takes advantage of this to implement the `blockaddress`
instruction, which would be difficult in MLIR at the moment.

Best,

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210330/4c62648d/attachment.html>

David Blaikie via llvm-dev

2021-Mar-31 03:15 UTC

head link

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

(FWIW, I'd still really love Mehdi, Owen, or anyone else's opinion on
the original proposal or any parts of this general problem - folks
working on some profile features have been continuing the template
direction to abstract over MIR+LLVM IR and if there's benefits to be
had here, it'd be great to get them... )

On Tue, Mar 30, 2021 at 9:20 AM Sameer Sahasrabuddhe <sameer at
sbuddhe.net> wrote:>
> ---- On Thu, 11 Feb 2021 06:07:17 +0530 David Blaikie via llvm-dev
<llvm-dev at lists.llvm.org> wrote ----
>
>  > Owen - perhaps you've got some relevant perspective here given
the GPU
>  > context/background, and general middle end optimization design
>  > questions. Could you take a look?
>  > (& perhaps Johannes?)
>  >
>  > On Mon, Feb 1, 2021 at 11:36 AM David Blaikie <dblaikie at
gmail.com> wrote:
>  > >
>  > > Thanks for sending this out!
>  > >
>  > > +Mehdi - if you have a chance to look, I'd appreciate your
thoughts on this, partly as a general LLVM contributor, but also with an eye to
abstractions over multiple IRs given your work on MLIR
>  > > +Lang - you mentioned maybe some colleagues of yours might be
able to take a look at the design here?
>  > >
>  > > I'd like to try writing up maybe alternatives to a couple of
the patches in the series to see how they compare, but haven't done that as
yet.
>  > >
>  > > My brief take is that I think the opaque handles make a fair bit
of sense/seem high-value/low-cost (if it turned out the handle for one of these
IR types needed to become a bit heavier, I don't think it'd be super
difficult to revisit this code - make the generic handle large enough to
accommodate whatever the largest concrete handle happens to be, etc). Though
I'm still a bit less certain about the full runtime polymorphic
abstractions.
>
> For context, this is about the options laid out in the following document
from Nicolai:
>
>
https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?usp=sharing
>
> The above document was first introduced in the following email, which is
the starting point of the current thread:
>
> https://lists.llvm.org/pipermail/llvm-dev/2020-December/147433.html
>
> The main issue is the need for an abstraction that greatly improves the
experience of writing analyses that work on LLVM IR and MIR (and potentially
also on MLIR). The way I understand it, Nicolai proposed an abstraction that
involves dynamic polymorphism to abstract out the details of the underlying IR.
David was more in favour of first unifying the IRs to the point that the dynamic
polymorphism is limited to only occasional corner cases instead.
To clarify, I think what I was leaning towards was having the APIs
match syntactically, so it'd be easier to write templates over them
without the need for type traits and adapters.
> I am currently involved in the activity towards decoupling AMDGPU's
dependency on the abstractions being discussed here, and the one thing that
immediately jumps out is the ability to traverse the predecessors and successors
of a basic block. From all the emails so far and the resulting proposal, it
seems that both David and Nicolai think that having a common non-abstract base
class for basic blocks is a good idea.
I don't /think/ that's what I was advocating for (but it's been a
while, and I might be misremembering - if you've got pointers to
particular parts of the conversation to refresh my memory/frame this
discussion, that'd be handy) - I'd worry about the cost of trying to
introduce a common base class between these different APIs. I think
what I was advocating for was for them to have syntactically matching
APIs so that templated code could abstract over them without the need
for traits/adapters.
> Such a base class will provide a vector of predecessors and successors that
can be used to traverse the CFG independent of the actual IR.
>
> The current situation is that the basic blocks in LLVM IR and MLIR do not
explicitly track their preds and succs. Preds are determined from the uses of
the block, while succs are determined from the terminator instruction. MIR has
explicit pred and succ vectors, but the succ pointers are duplicated by their
presence as operands to the terminator instructions. The MachineBasicBlock is
not a value, so there is no notion of traversing its uses for predecessors.
>
> So if we create a common non-abstract base class for basic blocks, then
MLIR and LLVM IR will incur the overhead of two vectors, one each for the preds
and succs. Nicolai had reported a 3.1% to 4.1% increase in the size LLVM IR in
some typical programs:
>
>
https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing
That's overhead in the bitcode file size, yes?
> Is this overhead the main issue in deciding whether we should introduce the
common base class for traversing a CFG?
I expect that would be a pretty significant concern, but also memory
usage and any time overhead keeping these new data structures up to
date, or slowing down compilation due to memory paging in and out,
etc.
> Do note that MIR already duplicates successors. One could argue that
duplicating the successor pointers in LLVM IR and MLIR merely brings them on the
same level as MIR. If that argument sticks, then the "real" overhead
is only half of what is reported to account for the duplicate predecessor
pointers.
>
> The duplication of predecessor pointers stems from the fact that basic
blocks in LLVM IR and MLIR are also values that are (primarily) used as operands
to the terminator and PHI instructions. We could redefine these instructions to
use non-value basic blocks as operands. CFG edges are not the typical use-def
relation that other values represent, and it's reasonable to say that
PHINodes and terminators are special in this one way.  I have not managed to see
how far that rabbit hole goes, but I am not really convinced that this is
attractive in any way. Is this even an option?
>
> Sameer.

llvm dev - Mar 2021 - [RFC] Abstracting over SSA form IRs to implement generic analyses

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses