thr3ads.net - llvm dev - [llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses [Apr 2021]

If this information is useful, please help other people find it:
Share via:

Sameer Sahasrabuddhe via llvm-dev

2021-Mar-31 16:01 UTC

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

David Blaikie writes:
>> I am currently involved in the activity towards decoupling AMDGPU's
>> dependency on the abstractions being discussed here, and the one
>> thing that immediately jumps out is the ability to traverse the
>> predecessors and successors of a basic block. From all the emails so
>> far and the resulting proposal, it seems that both David and Nicolai
>> think that having a common non-abstract base class for basic blocks
>> is a good idea.
>
> I don't /think/ that's what I was advocating for (but it's been
a
> while, and I might be misremembering - if you've got pointers to
> particular parts of the conversation to refresh my memory/frame this
> discussion, that'd be handy) - I'd worry about the cost of trying
to
> introduce a common base class between these different APIs. I think
> what I was advocating for was for them to have syntactically matching
> APIs so that templated code could abstract over them without the need
> for traits/adapters.
You're right. Turns out that you didn't specifically advocate what I
said above. The following is what was actually said, but I misunderstood
something else from the thread:

https://lists.llvm.org/pipermail/llvm-dev/2020-October/146090.html

  "I meant for a different direction - if we got buy-in for making LLVM IR
and
   MIR implement the same C++ concept (& wrote down what that concept was -
   which APIs, how they look textually) then I think it would be relatively
   easy to get the small/mechanical renaming patches reviewed. Incremental
   from day 1, not big new abstraction, then incremental."
>> So if we create a common non-abstract base class for basic blocks,
>> then MLIR and LLVM IR will incur the overhead of two vectors, one
>> each for the preds and succs. Nicolai had reported a 3.1% to 4.1%
>> increase in the size LLVM IR in some typical programs:
>>
>>
https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing
>
> That's overhead in the bitcode file size, yes?
It's the memory overhead of storing those pointers in a vector. I don't
think it needs to affect the bitcode, since it's just redundant
information that can be populated after reading the whole function.
>> Is this overhead the main issue in deciding whether we should
>> introduce the common base class for traversing a CFG?
>
> I expect that would be a pretty significant concern, but also memory
> usage and any time overhead keeping these new data structures up to
> date, or slowing down compilation due to memory paging in and out,
> etc.
Right. If the discussion progressed to those details, then we will be in
a good place :)

Everyone seems to agree that we need an abstraction that makes it easier
to write analyses, and Nicolai's document captures all the options. But
in particular the document makes a case against templates, and instead
advocates a way to write analyses without referring to the actual types
in the underlying IR. This is achieved with a combination of opaque
handles and dynamic polymorphism. The document also strongly recommends
minimizing the dynamic polymorphism by designing "common non-abstract
base classes for SSA basic block and instructions concepts".

My focus right now is on the base class for basic blocks. It represents
a frequently used property of the CFG, and the problem is concise, and
already familiar to anyone writing an analysis. Our current experience
with writing new analyses in AMDGPU provides a motivation for moving
away from templates. But this is only a specific part of the bigger
problem. In particular, it only represents the "CFG" part of the
problem, but not the "SSA IR" part.

The actual use-case can be seen in the branch below, which implements
"CycleInfo" as a template over LLVM IR and MIR. It's a work in
progress.

https://github.com/ssahasra/llvm-project/tree/cycleinfo-as-template

For now, we can consider a simpler analysis "CfgInfo" that reports
various properties such as outgoing edges per node.

Example A: CfgInfo over LLVM IR

    CfgInfo::compute(BasicBlock *entry) {
      worklist.enqueue(entry);
      while (!worklist.empty) {
        auto B = worklist.dequeue();
        for (auto S : successors(B)) {
           worklist.enqueue_if_new(S);
           collectStats(B, S);
        }
      }
    }

Example B: CfgInfo over Machine IR

    CfgInfo::compute(MachineBasicBlock *entry) {
      worklist.enqueue(entry);
      while (!worklist.empty) {
        auto B = worklist.dequeue();
        for (auto S : B->successors()) { // <-- diff
           worklist.enqueue_if_new(S);
           collectStats(B, S);
        }
      }
    }

Example C: Template over both IRs

    template <typename BlockType>
    CfgInfo::compute(BlockType *entry) {
      worklist.enqueue(entry);
      while (!worklist.empty) {
        auto B = worklist.dequeue();
        for (auto S : successors(B)) { // <-- needs an overload
           worklist.enqueue_if_new(S);
           collectStats(B, S);
        }
      }
    }

In this example, any instance of BlockType needs to provide a global
function "successors()". The following is sufficient with Machine IR
in
this particular case, but in general, this would be provided by traits:

namespace llvm {
    auto successors(MachineBasicBlock *B) { return B->successors(); }
}

Example D: CfgInfo over a common base class

    CfgInfo::compute(BasicBlockBase *entry) {
      worklist.enqueue(entry);
      while (!worklist.empty) {
        auto B = worklist.dequeue();
        for (auto S : B->successors()) { // <-- not virtual
           worklist.enqueue_if_new(S);
           collectStats(B, S);
        }
      }
    }

This last version is not a template. It's an implementation that gets
reused for both LLVM IR and Machine IR by simply casting the respective
basic block type to the base type.

There are two main problems with the template approach:

1. Just for that one function "successors()", the type of the block
has
   to be passed around as a template parameter. Starting from the entry
   point of the analysis, this has to go all the way to every leaf
   function that wants to call "successors()". This can especially be
a
   problem if the template has other dependent types such as iterator
   adaptors.

2. The actual analysis can be quite complicated, but because it's a
   template, it has to be in a header file so that it can be
   instantiated at the point where the analysis is actually computed.

The CycleInfo patch demonstrates both these problems. There are multiple
files as follows:

GenericCycleInfo.h

This file defines the templates GenericCycleInfo<BlockType> and
GenericCycle<BlockType>. It also contains the entire implementation of
the analysis.

CycleInfo.h

- This file declares a thin wrapper CycleInfo around
  GenericCycleInfo<BasicBlock> and another wrapper Cycle around
  GenericCycle<BasicBlock>. These are the LLVM IR instances of the
  templates.
- The wrappers avoid including GenericCycleInfo.h by only needing the
  pointer type to the wrapped template instances.
- In particular, notice how the child_iterator explicitly exposes the
  details of the wrapped type, but then returns the wrapper with the `*'
  operator.

  using const_child_iterator_base     typename
std::vector<std::unique_ptr<GenericCycle<BasicBlock>>>::const_iterator;
  struct const_child_iterator
      : iterator_adaptor_base<const_child_iterator,
const_child_iterator_base> {
    using Base         iterator_adaptor_base<const_child_iterator,
const_child_iterator_base>;

    Cycle operator *() const;
    const_child_iterator(const_child_iterator_base I) : Base(I) {}
  };

CycleInfo.cpp

This file is where the template is actually instantiated, along with the
implementation of the wrapper classes.

A similar pair of files will be needed for MachineCycleInfo later. These
will be almost identical to the CycleInfo files, except for the name of
the block type and the overload for "succesors()" and
"predecessors()".

All these files are just boilerplate designed to hide a template and
separately expose two instances of that template. Every type introduced
by the template needs to be carefully wrapped for use in other
files. Instead, if there was an interface (and not just a concept) that
exposed CFG edges, there would be just one CycleInfo.h file to declare
the analysis classes and one CycleInfo.cpp file to implement them.

So the argument goes somewhat like this:

1. Static polymorphism through templates creates a development overhead
   in the form of boilerplate, and also some runtime overhead of indirecting
   through wrapper types.

2. Dynamic polymorphism through an abstract base class or an abstract
   context class removes the boilerplate, but it introduces the runtime
   overhead of indirect calls. It also retains the runtime overhead of
   indirecting through opaque wrappers.

3. A non-abstract base class eliminates all the above overheads (for
   this specific problem of CFG traversal), but introduces a memory
   overhead because some information from the derived classes has to be
   duplicated in the base class.

The CycleInfo analysis exemplifies our motivation for moving away from
templates and traits. Assuming virtual calls are considered strongly
undesirable, that leaves us with the option of a non-abstract base
class.

Sameer.

Nicolai Hähnle via llvm-dev

2021-Mar-31 22:03 UTC

head link

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

Thank you for picking this up, Sameer, and thank you David for pinging some
folks. As context for others, I'm currently on an extended parental leave
and consciously not editing any code that would usually be work-related.

A few high-level remarks from what I remember. Let's always keep in mind
that there are a whole bunch of puzzle pieces in flight here, and there's
always a risk that the discussion gets off track by losing sight of that
bigger picture. Some of those puzzle pieces can hopefully be
non-controversial. In particular, I'm thinking here of:

1. Aligning the different IR classes so that they implement common concepts
and allow writing templates idiomatically.
2. Allowing the use of opaque handles
<https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?ts=5fd124e3#heading=h.adqu14d3b7jd>
as described in the document.

An observation about the performance implications of common base classes
for basic blocks. I'm taking it as a given that a common base class would
have predecessor and successor vectors. In addition to making predecessor
and successor iteration simpler, this could actually end up _saving_ memory
for LLVM IR in many cases if we make the following additional changes:

1. Change the definition of phi nodes so that they only contain the
predecessor _values_ in their argument lists (removing all the arguments
referring to the predecessor basic blocks).
2. Change the definition of terminator instructions so that they no longer
contain the successor basic block(s) in their argument lists and instead
implicitly refer to the basic blocks successor list. For example, a
conditional branch would be defined such that its only argument is the i1
condition. If the condition is true, the program branches to
brInst->getParent()->getSuccessors()[0]; if the condition is false, it
branches to brInst->getParent()->getSuccessors()[1].

This is an option that I hadn't taken into account when I calculated those
3-4% of memory overhead, but it should end up saving memory whenever a
basic block has >= 2 phi nodes (the vectors themselves have an overhead, so
it depends). My experiment would have to be re-done to get a better idea of
the final balance.

The changes mentioned above are admittedly very invasive and complicated
changes. It would be good to get an idea of what sort of acceptance
criterion the LLVM community has here.

Cheers,
Nicolai


On Wed, Mar 31, 2021 at 6:01 PM Sameer Sahasrabuddhe <sameer at
sbuddhe.net>
wrote:
>
> David Blaikie writes:
>
> >> I am currently involved in the activity towards decoupling
AMDGPU's
> >> dependency on the abstractions being discussed here, and the one
> >> thing that immediately jumps out is the ability to traverse the
> >> predecessors and successors of a basic block. From all the emails
so
> >> far and the resulting proposal, it seems that both David and
Nicolai
> >> think that having a common non-abstract base class for basic
blocks
> >> is a good idea.
> >
> > I don't /think/ that's what I was advocating for (but it's
been a
> > while, and I might be misremembering - if you've got pointers to
> > particular parts of the conversation to refresh my memory/frame this
> > discussion, that'd be handy) - I'd worry about the cost of
trying to
> > introduce a common base class between these different APIs. I think
> > what I was advocating for was for them to have syntactically matching
> > APIs so that templated code could abstract over them without the need
> > for traits/adapters.
>
> You're right. Turns out that you didn't specifically advocate what
I
> said above. The following is what was actually said, but I misunderstood
> something else from the thread:
>
> https://lists.llvm.org/pipermail/llvm-dev/2020-October/146090.html
>
>   "I meant for a different direction - if we got buy-in for making
LLVM IR
> and
>    MIR implement the same C++ concept (& wrote down what that concept
was -
>    which APIs, how they look textually) then I think it would be relatively
>    easy to get the small/mechanical renaming patches reviewed. Incremental
>    from day 1, not big new abstraction, then incremental."
>
> >> So if we create a common non-abstract base class for basic blocks,
> >> then MLIR and LLVM IR will incur the overhead of two vectors, one
> >> each for the preds and succs. Nicolai had reported a 3.1% to 4.1%
> >> increase in the size LLVM IR in some typical programs:
> >>
> >>
>
https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing
> >
> > That's overhead in the bitcode file size, yes?
>
> It's the memory overhead of storing those pointers in a vector. I
don't
> think it needs to affect the bitcode, since it's just redundant
> information that can be populated after reading the whole function.
>
> >> Is this overhead the main issue in deciding whether we should
> >> introduce the common base class for traversing a CFG?
> >
> > I expect that would be a pretty significant concern, but also memory
> > usage and any time overhead keeping these new data structures up to
> > date, or slowing down compilation due to memory paging in and out,
> > etc.
>
> Right. If the discussion progressed to those details, then we will be in
> a good place :)
>
> Everyone seems to agree that we need an abstraction that makes it easier
> to write analyses, and Nicolai's document captures all the options. But
> in particular the document makes a case against templates, and instead
> advocates a way to write analyses without referring to the actual types
> in the underlying IR. This is achieved with a combination of opaque
> handles and dynamic polymorphism. The document also strongly recommends
> minimizing the dynamic polymorphism by designing "common non-abstract
> base classes for SSA basic block and instructions concepts".
>
> My focus right now is on the base class for basic blocks. It represents
> a frequently used property of the CFG, and the problem is concise, and
> already familiar to anyone writing an analysis. Our current experience
> with writing new analyses in AMDGPU provides a motivation for moving
> away from templates. But this is only a specific part of the bigger
> problem. In particular, it only represents the "CFG" part of the
> problem, but not the "SSA IR" part.
>
> The actual use-case can be seen in the branch below, which implements
> "CycleInfo" as a template over LLVM IR and MIR. It's a work
in progress.
>
> https://github.com/ssahasra/llvm-project/tree/cycleinfo-as-template
>
> For now, we can consider a simpler analysis "CfgInfo" that
reports
> various properties such as outgoing edges per node.
>
> Example A: CfgInfo over LLVM IR
>
>     CfgInfo::compute(BasicBlock *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : successors(B)) {
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> Example B: CfgInfo over Machine IR
>
>     CfgInfo::compute(MachineBasicBlock *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : B->successors()) { // <-- diff
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> Example C: Template over both IRs
>
>     template <typename BlockType>
>     CfgInfo::compute(BlockType *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : successors(B)) { // <-- needs an overload
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> In this example, any instance of BlockType needs to provide a global
> function "successors()". The following is sufficient with Machine
IR in
> this particular case, but in general, this would be provided by traits:
>
> namespace llvm {
>     auto successors(MachineBasicBlock *B) { return B->successors(); }
> }
>
> Example D: CfgInfo over a common base class
>
>     CfgInfo::compute(BasicBlockBase *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : B->successors()) { // <-- not virtual
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> This last version is not a template. It's an implementation that gets
> reused for both LLVM IR and Machine IR by simply casting the respective
> basic block type to the base type.
>
> There are two main problems with the template approach:
>
> 1. Just for that one function "successors()", the type of the
block has
>    to be passed around as a template parameter. Starting from the entry
>    point of the analysis, this has to go all the way to every leaf
>    function that wants to call "successors()". This can
especially be a
>    problem if the template has other dependent types such as iterator
>    adaptors.
>
> 2. The actual analysis can be quite complicated, but because it's a
>    template, it has to be in a header file so that it can be
>    instantiated at the point where the analysis is actually computed.
>
> The CycleInfo patch demonstrates both these problems. There are multiple
> files as follows:
>
> GenericCycleInfo.h
>
> This file defines the templates GenericCycleInfo<BlockType> and
> GenericCycle<BlockType>. It also contains the entire implementation
of
> the analysis.
>
> CycleInfo.h
>
> - This file declares a thin wrapper CycleInfo around
>   GenericCycleInfo<BasicBlock> and another wrapper Cycle around
>   GenericCycle<BasicBlock>. These are the LLVM IR instances of the
>   templates.
> - The wrappers avoid including GenericCycleInfo.h by only needing the
>   pointer type to the wrapped template instances.
> - In particular, notice how the child_iterator explicitly exposes the
>   details of the wrapped type, but then returns the wrapper with the
`*'
>   operator.
>
>   using const_child_iterator_base >     typename
>
std::vector<std::unique_ptr<GenericCycle<BasicBlock>>>::const_iterator;
>   struct const_child_iterator
>       : iterator_adaptor_base<const_child_iterator,
> const_child_iterator_base> {
>     using Base >         iterator_adaptor_base<const_child_iterator,
> const_child_iterator_base>;
>
>     Cycle operator *() const;
>     const_child_iterator(const_child_iterator_base I) : Base(I) {}
>   };
>
> CycleInfo.cpp
>
> This file is where the template is actually instantiated, along with the
> implementation of the wrapper classes.
>
> A similar pair of files will be needed for MachineCycleInfo later. These
> will be almost identical to the CycleInfo files, except for the name of
> the block type and the overload for "succesors()" and
"predecessors()".
>
> All these files are just boilerplate designed to hide a template and
> separately expose two instances of that template. Every type introduced
> by the template needs to be carefully wrapped for use in other
> files. Instead, if there was an interface (and not just a concept) that
> exposed CFG edges, there would be just one CycleInfo.h file to declare
> the analysis classes and one CycleInfo.cpp file to implement them.
>
> So the argument goes somewhat like this:
>
> 1. Static polymorphism through templates creates a development overhead
>    in the form of boilerplate, and also some runtime overhead of
> indirecting
>    through wrapper types.
>
> 2. Dynamic polymorphism through an abstract base class or an abstract
>    context class removes the boilerplate, but it introduces the runtime
>    overhead of indirect calls. It also retains the runtime overhead of
>    indirecting through opaque wrappers.
>
> 3. A non-abstract base class eliminates all the above overheads (for
>    this specific problem of CFG traversal), but introduces a memory
>    overhead because some information from the derived classes has to be
>    duplicated in the base class.
>
> The CycleInfo analysis exemplifies our motivation for moving away from
> templates and traits. Assuming virtual calls are considered strongly
> undesirable, that leaves us with the option of a non-abstract base
> class.
>
> Sameer.
>

-- 
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210401/563f7bda/attachment.html>

Mehdi AMINI via llvm-dev

2021-Mar-31 23:30 UTC

head link

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

On Wed, Mar 31, 2021 at 9:01 AM Sameer Sahasrabuddhe <sameer at
sbuddhe.net>
wrote:
>
> David Blaikie writes:
>
> >> I am currently involved in the activity towards decoupling
AMDGPU's
> >> dependency on the abstractions being discussed here, and the one
> >> thing that immediately jumps out is the ability to traverse the
> >> predecessors and successors of a basic block. From all the emails
so
> >> far and the resulting proposal, it seems that both David and
Nicolai
> >> think that having a common non-abstract base class for basic
blocks
> >> is a good idea.
> >
> > I don't /think/ that's what I was advocating for (but it's
been a
> > while, and I might be misremembering - if you've got pointers to
> > particular parts of the conversation to refresh my memory/frame this
> > discussion, that'd be handy) - I'd worry about the cost of
trying to
> > introduce a common base class between these different APIs. I think
> > what I was advocating for was for them to have syntactically matching
> > APIs so that templated code could abstract over them without the need
> > for traits/adapters.
>
> You're right. Turns out that you didn't specifically advocate what
I
> said above. The following is what was actually said, but I misunderstood
> something else from the thread:
>
> https://lists.llvm.org/pipermail/llvm-dev/2020-October/146090.html
>
>   "I meant for a different direction - if we got buy-in for making
LLVM IR
> and
>    MIR implement the same C++ concept (& wrote down what that concept
was -
>    which APIs, how they look textually) then I think it would be relatively
>    easy to get the small/mechanical renaming patches reviewed. Incremental
>    from day 1, not big new abstraction, then incremental."
>
> >> So if we create a common non-abstract base class for basic blocks,
> >> then MLIR and LLVM IR will incur the overhead of two vectors, one
> >> each for the preds and succs. Nicolai had reported a 3.1% to 4.1%
> >> increase in the size LLVM IR in some typical programs:
> >>
> >>
>
https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing
> >
> > That's overhead in the bitcode file size, yes?
>
> It's the memory overhead of storing those pointers in a vector. I
don't
> think it needs to affect the bitcode, since it's just redundant
> information that can be populated after reading the whole function.
>
> >> Is this overhead the main issue in deciding whether we should
> >> introduce the common base class for traversing a CFG?
> >
> > I expect that would be a pretty significant concern, but also memory
> > usage and any time overhead keeping these new data structures up to
> > date, or slowing down compilation due to memory paging in and out,
> > etc.
>
> Right. If the discussion progressed to those details, then we will be in
> a good place :)
>
> Everyone seems to agree that we need an abstraction that makes it easier
> to write analyses, and Nicolai's document captures all the options. But
> in particular the document makes a case against templates, and instead
> advocates a way to write analyses without referring to the actual types
> in the underlying IR. This is achieved with a combination of opaque
> handles and dynamic polymorphism. The document also strongly recommends
> minimizing the dynamic polymorphism by designing "common non-abstract
> base classes for SSA basic block and instructions concepts".
>
> My focus right now is on the base class for basic blocks. It represents
> a frequently used property of the CFG, and the problem is concise, and
> already familiar to anyone writing an analysis. Our current experience
> with writing new analyses in AMDGPU provides a motivation for moving
> away from templates. But this is only a specific part of the bigger
> problem. In particular, it only represents the "CFG" part of the
> problem, but not the "SSA IR" part.
>
> The actual use-case can be seen in the branch below, which implements
> "CycleInfo" as a template over LLVM IR and MIR. It's a work
in progress.
>
> https://github.com/ssahasra/llvm-project/tree/cycleinfo-as-template
>
> For now, we can consider a simpler analysis "CfgInfo" that
reports
> various properties such as outgoing edges per node.
>
> Example A: CfgInfo over LLVM IR
>
>     CfgInfo::compute(BasicBlock *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : successors(B)) {
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> Example B: CfgInfo over Machine IR
>
>     CfgInfo::compute(MachineBasicBlock *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : B->successors()) { // <-- diff
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> Example C: Template over both IRs
>
>     template <typename BlockType>
>     CfgInfo::compute(BlockType *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : successors(B)) { // <-- needs an overload
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> In this example, any instance of BlockType needs to provide a global
> function "successors()". The following is sufficient with Machine
IR in
> this particular case, but in general, this would be provided by traits:
>
> namespace llvm {
>     auto successors(MachineBasicBlock *B) { return B->successors(); }
> }
>
> Example D: CfgInfo over a common base class
>
>     CfgInfo::compute(BasicBlockBase *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : B->successors()) { // <-- not virtual
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> This last version is not a template. It's an implementation that gets
> reused for both LLVM IR and Machine IR by simply casting the respective
> basic block type to the base type.
>
> There are two main problems with the template approach:
>
> 1. Just for that one function "successors()", the type of the
block has
>    to be passed around as a template parameter. Starting from the entry
>    point of the analysis, this has to go all the way to every leaf
>    function that wants to call "successors()". This can
especially be a
>    problem if the template has other dependent types such as iterator
>    adaptors.
>
> 2. The actual analysis can be quite complicated, but because it's a
>    template, it has to be in a header file so that it can be
>    instantiated at the point where the analysis is actually computed.
>
> The CycleInfo patch demonstrates both these problems. There are multiple
> files as follows:
>
> GenericCycleInfo.h
>
> This file defines the templates GenericCycleInfo<BlockType> and
> GenericCycle<BlockType>. It also contains the entire implementation
of
> the analysis.
>
> CycleInfo.h
>
> - This file declares a thin wrapper CycleInfo around
>   GenericCycleInfo<BasicBlock> and another wrapper Cycle around
>   GenericCycle<BasicBlock>. These are the LLVM IR instances of the
>   templates.
> - The wrappers avoid including GenericCycleInfo.h by only needing the
>   pointer type to the wrapped template instances.
> - In particular, notice how the child_iterator explicitly exposes the
>   details of the wrapped type, but then returns the wrapper with the
`*'
>   operator.
>
>   using const_child_iterator_base >     typename
>
std::vector<std::unique_ptr<GenericCycle<BasicBlock>>>::const_iterator;
>   struct const_child_iterator
>       : iterator_adaptor_base<const_child_iterator,
> const_child_iterator_base> {
>     using Base >         iterator_adaptor_base<const_child_iterator,
> const_child_iterator_base>;
>
>     Cycle operator *() const;
>     const_child_iterator(const_child_iterator_base I) : Base(I) {}
>   };
>
> CycleInfo.cpp
>
> This file is where the template is actually instantiated, along with the
> implementation of the wrapper classes.
>
> A similar pair of files will be needed for MachineCycleInfo later. These
> will be almost identical to the CycleInfo files, except for the name of
> the block type and the overload for "succesors()" and
"predecessors()".
>
> All these files are just boilerplate designed to hide a template and
> separately expose two instances of that template. Every type introduced
> by the template needs to be carefully wrapped for use in other
> files. Instead, if there was an interface (and not just a concept) that
> exposed CFG edges, there would be just one CycleInfo.h file to declare
> the analysis classes and one CycleInfo.cpp file to implement them.
>
> So the argument goes somewhat like this:
>
> 1. Static polymorphism through templates creates a development overhead
>    in the form of boilerplate, and also some runtime overhead of
> indirecting
>    through wrapper types.
>
> 2. Dynamic polymorphism through an abstract base class or an abstract
>    context class removes the boilerplate, but it introduces the runtime
>    overhead of indirect calls. It also retains the runtime overhead of
>    indirecting through opaque wrappers.
>
> 3. A non-abstract base class eliminates all the above overheads (for
>    this specific problem of CFG traversal), but introduces a memory
>    overhead because some information from the derived classes has to be
>    duplicated in the base class.
>
> The CycleInfo analysis exemplifies our motivation for moving away from
> templates and traits. Assuming virtual calls are considered strongly
> undesirable, that leaves us with the option of a non-abstract base
> class.


Thanks, that was an excellent summary and the example was really helpful
(at least to me)!
I'm fairly old school here I guess, as I'd have naively written an
abstract
base class with virtual methods (or function_ref callback into the
implementation), but there are concerns with the indirection compared to a
templated implementation.
But aren't we trading the cost of this indirection with even more cost if
we have to duplicate information in a base class?

Best,

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210331/3e5164c6/attachment.html>

Sameer Sahasrabuddhe via llvm-dev

2021-Apr-01 08:54 UTC

head link

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

Nicolai Hähnle writes:
> A few high-level remarks from what I remember. Let's always keep in
mind
> that there are a whole bunch of puzzle pieces in flight here, and
there's
> always a risk that the discussion gets off track by losing sight of that
> bigger picture. Some of those puzzle pieces can hopefully be
> non-controversial. In particular, I'm thinking here of:
>
> 1. Aligning the different IR classes so that they implement common concepts
> and allow writing templates idiomatically.
> 2. Allowing the use of opaque handles
>
<https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?ts=5fd124e3#heading=h.adqu14d3b7jd>
> as described in the document.
I agree if you mean to say that the above two things are not mutually
exclusive, and both are beneficial. At least from my point of view,
templates over a common set of concepts are very valuable for writing
utility functions, but become cumbersome for writing large
analyses. Opaque handles (combined with a context class) can make it
easier to separate the IR-independent implementation from the
IR-dependent interface.

The context class provides a way to look up properties on the
opaque handle (such as the successors for a BlockHandle object) through
virtual functions. But having a non-abstract base class quickly reduce
the surface of these virtual functions, since the most common functions
like successors() are readily available in the base class itself.

That is related to what Mehdi asked in a sibling email:
> I'm fairly old school here I guess, as I'd have naively written an
> abstract base class with virtual methods (or function_ref callback
> into the implementation), but there are concerns with the indirection
> compared to a templated implementation.
> But aren't we trading the cost of this indirection with even more cost
> if we have to duplicate information in a base class?
Yes, we are avoiding indirect calls (not to be confused with just
indirection through a data pointer). It's useful to ask whether it is
okay to have indirect calls (through virtual functions) when computing
an analysis instead of doing something that affects the IR
itself. Is it an option to let each analysis worry about its own
performance, perhaps by caching internal results of the virtual calls?
Presumably, using the results of an analysis or updating the analysis
will only have an incremental cost from indirect calls?

If we want to move away from templates, then dynamic polymorphism cannot
be completely avoided, like when looking up the definition of a
value. But having a common base class can take out big chunks of the
surface area. The posterboy for this is CFG traversal along the
successor/predecessor pointers.
> An observation about the performance implications of common base classes
> for basic blocks. I'm taking it as a given that a common base class
would
> have predecessor and successor vectors. In addition to making predecessor
> and successor iteration simpler, this could actually end up _saving_ memory
> for LLVM IR in many cases if we make the following additional changes:
>
> 1. Change the definition of phi nodes so that they only contain the
> predecessor _values_ in their argument lists (removing all the arguments
> referring to the predecessor basic blocks).
> 2. Change the definition of terminator instructions so that they no longer
> contain the successor basic block(s) in their argument lists and instead
> implicitly refer to the basic blocks successor list. For example, a
> conditional branch would be defined such that its only argument is the i1
> condition. If the condition is true, the program branches to
> brInst->getParent()->getSuccessors()[0]; if the condition is false,
it
> branches to brInst->getParent()->getSuccessors()[1].
>
> This is an option that I hadn't taken into account when I calculated
those
> 3-4% of memory overhead, but it should end up saving memory whenever a
> basic block has >= 2 phi nodes (the vectors themselves have an overhead,
so
> it depends). My experiment would have to be re-done to get a better idea of
> the final balance.
>
> The changes mentioned above are admittedly very invasive and complicated
> changes. It would be good to get an idea of what sort of acceptance
> criterion the LLVM community has here.
Guess what, I had actually attempted this before reviving this thread!
My first experiment was to put an "assert(getParent())" in the
{get,set}Successor() methods. That didn't result in any interesting
failures. The real problem is with the terminator constructors. Usually
any Instruction::Create() sort of function optionally takes an
InsertBefore or InsertAtEnd argument. But if we require terminators to
access successors from the parent block, then variants of say
BranchInst::Create() that take successor pointers as arguments must now
require either an InsertBefore or an InsertAtEnd. That breaks all sorts
of code that manipulates the CFG.

Requiring a parent block when creating a terminator is a lot of work,
and I wasn't sure that the gain was worth the effort. It would need
refactoring many places that have been built around the assumption that
anyone can create an instruction in a vacuum. Instead, redefining the
BasicBlock to not be derived from the Value class seemed more practical
to me. It reduces the same overhead of redundantly tracking CFG
edges. For the special uses of llvm::BasicBlock as a value, we could
provide a new BasicBlockValue class. I tried commenting out the Value
class parent from the BasicBlock declaration, but that experiment is too
big to try on my own, especially without some consensus forming in this
discussion.

Sameer.

llvm dev - Apr 2021 - [RFC] Abstracting over SSA form IRs to implement generic analyses

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses