Sameer Sahasrabuddhe via llvm-dev
2021-Mar-31 16:01 UTC
[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses
David Blaikie writes:>> I am currently involved in the activity towards decoupling AMDGPU's >> dependency on the abstractions being discussed here, and the one >> thing that immediately jumps out is the ability to traverse the >> predecessors and successors of a basic block. From all the emails so >> far and the resulting proposal, it seems that both David and Nicolai >> think that having a common non-abstract base class for basic blocks >> is a good idea. > > I don't /think/ that's what I was advocating for (but it's been a > while, and I might be misremembering - if you've got pointers to > particular parts of the conversation to refresh my memory/frame this > discussion, that'd be handy) - I'd worry about the cost of trying to > introduce a common base class between these different APIs. I think > what I was advocating for was for them to have syntactically matching > APIs so that templated code could abstract over them without the need > for traits/adapters.You're right. Turns out that you didn't specifically advocate what I said above. The following is what was actually said, but I misunderstood something else from the thread: https://lists.llvm.org/pipermail/llvm-dev/2020-October/146090.html "I meant for a different direction - if we got buy-in for making LLVM IR and MIR implement the same C++ concept (& wrote down what that concept was - which APIs, how they look textually) then I think it would be relatively easy to get the small/mechanical renaming patches reviewed. Incremental from day 1, not big new abstraction, then incremental.">> So if we create a common non-abstract base class for basic blocks, >> then MLIR and LLVM IR will incur the overhead of two vectors, one >> each for the preds and succs. Nicolai had reported a 3.1% to 4.1% >> increase in the size LLVM IR in some typical programs: >> >> https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing > > That's overhead in the bitcode file size, yes?It's the memory overhead of storing those pointers in a vector. I don't think it needs to affect the bitcode, since it's just redundant information that can be populated after reading the whole function.>> Is this overhead the main issue in deciding whether we should >> introduce the common base class for traversing a CFG? > > I expect that would be a pretty significant concern, but also memory > usage and any time overhead keeping these new data structures up to > date, or slowing down compilation due to memory paging in and out, > etc.Right. If the discussion progressed to those details, then we will be in a good place :) Everyone seems to agree that we need an abstraction that makes it easier to write analyses, and Nicolai's document captures all the options. But in particular the document makes a case against templates, and instead advocates a way to write analyses without referring to the actual types in the underlying IR. This is achieved with a combination of opaque handles and dynamic polymorphism. The document also strongly recommends minimizing the dynamic polymorphism by designing "common non-abstract base classes for SSA basic block and instructions concepts". My focus right now is on the base class for basic blocks. It represents a frequently used property of the CFG, and the problem is concise, and already familiar to anyone writing an analysis. Our current experience with writing new analyses in AMDGPU provides a motivation for moving away from templates. But this is only a specific part of the bigger problem. In particular, it only represents the "CFG" part of the problem, but not the "SSA IR" part. The actual use-case can be seen in the branch below, which implements "CycleInfo" as a template over LLVM IR and MIR. It's a work in progress. https://github.com/ssahasra/llvm-project/tree/cycleinfo-as-template For now, we can consider a simpler analysis "CfgInfo" that reports various properties such as outgoing edges per node. Example A: CfgInfo over LLVM IR CfgInfo::compute(BasicBlock *entry) { worklist.enqueue(entry); while (!worklist.empty) { auto B = worklist.dequeue(); for (auto S : successors(B)) { worklist.enqueue_if_new(S); collectStats(B, S); } } } Example B: CfgInfo over Machine IR CfgInfo::compute(MachineBasicBlock *entry) { worklist.enqueue(entry); while (!worklist.empty) { auto B = worklist.dequeue(); for (auto S : B->successors()) { // <-- diff worklist.enqueue_if_new(S); collectStats(B, S); } } } Example C: Template over both IRs template <typename BlockType> CfgInfo::compute(BlockType *entry) { worklist.enqueue(entry); while (!worklist.empty) { auto B = worklist.dequeue(); for (auto S : successors(B)) { // <-- needs an overload worklist.enqueue_if_new(S); collectStats(B, S); } } } In this example, any instance of BlockType needs to provide a global function "successors()". The following is sufficient with Machine IR in this particular case, but in general, this would be provided by traits: namespace llvm { auto successors(MachineBasicBlock *B) { return B->successors(); } } Example D: CfgInfo over a common base class CfgInfo::compute(BasicBlockBase *entry) { worklist.enqueue(entry); while (!worklist.empty) { auto B = worklist.dequeue(); for (auto S : B->successors()) { // <-- not virtual worklist.enqueue_if_new(S); collectStats(B, S); } } } This last version is not a template. It's an implementation that gets reused for both LLVM IR and Machine IR by simply casting the respective basic block type to the base type. There are two main problems with the template approach: 1. Just for that one function "successors()", the type of the block has to be passed around as a template parameter. Starting from the entry point of the analysis, this has to go all the way to every leaf function that wants to call "successors()". This can especially be a problem if the template has other dependent types such as iterator adaptors. 2. The actual analysis can be quite complicated, but because it's a template, it has to be in a header file so that it can be instantiated at the point where the analysis is actually computed. The CycleInfo patch demonstrates both these problems. There are multiple files as follows: GenericCycleInfo.h This file defines the templates GenericCycleInfo<BlockType> and GenericCycle<BlockType>. It also contains the entire implementation of the analysis. CycleInfo.h - This file declares a thin wrapper CycleInfo around GenericCycleInfo<BasicBlock> and another wrapper Cycle around GenericCycle<BasicBlock>. These are the LLVM IR instances of the templates. - The wrappers avoid including GenericCycleInfo.h by only needing the pointer type to the wrapped template instances. - In particular, notice how the child_iterator explicitly exposes the details of the wrapped type, but then returns the wrapper with the `*' operator. using const_child_iterator_base typename std::vector<std::unique_ptr<GenericCycle<BasicBlock>>>::const_iterator; struct const_child_iterator : iterator_adaptor_base<const_child_iterator, const_child_iterator_base> { using Base iterator_adaptor_base<const_child_iterator, const_child_iterator_base>; Cycle operator *() const; const_child_iterator(const_child_iterator_base I) : Base(I) {} }; CycleInfo.cpp This file is where the template is actually instantiated, along with the implementation of the wrapper classes. A similar pair of files will be needed for MachineCycleInfo later. These will be almost identical to the CycleInfo files, except for the name of the block type and the overload for "succesors()" and "predecessors()". All these files are just boilerplate designed to hide a template and separately expose two instances of that template. Every type introduced by the template needs to be carefully wrapped for use in other files. Instead, if there was an interface (and not just a concept) that exposed CFG edges, there would be just one CycleInfo.h file to declare the analysis classes and one CycleInfo.cpp file to implement them. So the argument goes somewhat like this: 1. Static polymorphism through templates creates a development overhead in the form of boilerplate, and also some runtime overhead of indirecting through wrapper types. 2. Dynamic polymorphism through an abstract base class or an abstract context class removes the boilerplate, but it introduces the runtime overhead of indirect calls. It also retains the runtime overhead of indirecting through opaque wrappers. 3. A non-abstract base class eliminates all the above overheads (for this specific problem of CFG traversal), but introduces a memory overhead because some information from the derived classes has to be duplicated in the base class. The CycleInfo analysis exemplifies our motivation for moving away from templates and traits. Assuming virtual calls are considered strongly undesirable, that leaves us with the option of a non-abstract base class. Sameer.
Nicolai Hähnle via llvm-dev
2021-Mar-31 22:03 UTC
[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses
Thank you for picking this up, Sameer, and thank you David for pinging some folks. As context for others, I'm currently on an extended parental leave and consciously not editing any code that would usually be work-related. A few high-level remarks from what I remember. Let's always keep in mind that there are a whole bunch of puzzle pieces in flight here, and there's always a risk that the discussion gets off track by losing sight of that bigger picture. Some of those puzzle pieces can hopefully be non-controversial. In particular, I'm thinking here of: 1. Aligning the different IR classes so that they implement common concepts and allow writing templates idiomatically. 2. Allowing the use of opaque handles <https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?ts=5fd124e3#heading=h.adqu14d3b7jd> as described in the document. An observation about the performance implications of common base classes for basic blocks. I'm taking it as a given that a common base class would have predecessor and successor vectors. In addition to making predecessor and successor iteration simpler, this could actually end up _saving_ memory for LLVM IR in many cases if we make the following additional changes: 1. Change the definition of phi nodes so that they only contain the predecessor _values_ in their argument lists (removing all the arguments referring to the predecessor basic blocks). 2. Change the definition of terminator instructions so that they no longer contain the successor basic block(s) in their argument lists and instead implicitly refer to the basic blocks successor list. For example, a conditional branch would be defined such that its only argument is the i1 condition. If the condition is true, the program branches to brInst->getParent()->getSuccessors()[0]; if the condition is false, it branches to brInst->getParent()->getSuccessors()[1]. This is an option that I hadn't taken into account when I calculated those 3-4% of memory overhead, but it should end up saving memory whenever a basic block has >= 2 phi nodes (the vectors themselves have an overhead, so it depends). My experiment would have to be re-done to get a better idea of the final balance. The changes mentioned above are admittedly very invasive and complicated changes. It would be good to get an idea of what sort of acceptance criterion the LLVM community has here. Cheers, Nicolai On Wed, Mar 31, 2021 at 6:01 PM Sameer Sahasrabuddhe <sameer at sbuddhe.net> wrote:> > David Blaikie writes: > > >> I am currently involved in the activity towards decoupling AMDGPU's > >> dependency on the abstractions being discussed here, and the one > >> thing that immediately jumps out is the ability to traverse the > >> predecessors and successors of a basic block. From all the emails so > >> far and the resulting proposal, it seems that both David and Nicolai > >> think that having a common non-abstract base class for basic blocks > >> is a good idea. > > > > I don't /think/ that's what I was advocating for (but it's been a > > while, and I might be misremembering - if you've got pointers to > > particular parts of the conversation to refresh my memory/frame this > > discussion, that'd be handy) - I'd worry about the cost of trying to > > introduce a common base class between these different APIs. I think > > what I was advocating for was for them to have syntactically matching > > APIs so that templated code could abstract over them without the need > > for traits/adapters. > > You're right. Turns out that you didn't specifically advocate what I > said above. The following is what was actually said, but I misunderstood > something else from the thread: > > https://lists.llvm.org/pipermail/llvm-dev/2020-October/146090.html > > "I meant for a different direction - if we got buy-in for making LLVM IR > and > MIR implement the same C++ concept (& wrote down what that concept was - > which APIs, how they look textually) then I think it would be relatively > easy to get the small/mechanical renaming patches reviewed. Incremental > from day 1, not big new abstraction, then incremental." > > >> So if we create a common non-abstract base class for basic blocks, > >> then MLIR and LLVM IR will incur the overhead of two vectors, one > >> each for the preds and succs. Nicolai had reported a 3.1% to 4.1% > >> increase in the size LLVM IR in some typical programs: > >> > >> > https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing > > > > That's overhead in the bitcode file size, yes? > > It's the memory overhead of storing those pointers in a vector. I don't > think it needs to affect the bitcode, since it's just redundant > information that can be populated after reading the whole function. > > >> Is this overhead the main issue in deciding whether we should > >> introduce the common base class for traversing a CFG? > > > > I expect that would be a pretty significant concern, but also memory > > usage and any time overhead keeping these new data structures up to > > date, or slowing down compilation due to memory paging in and out, > > etc. > > Right. If the discussion progressed to those details, then we will be in > a good place :) > > Everyone seems to agree that we need an abstraction that makes it easier > to write analyses, and Nicolai's document captures all the options. But > in particular the document makes a case against templates, and instead > advocates a way to write analyses without referring to the actual types > in the underlying IR. This is achieved with a combination of opaque > handles and dynamic polymorphism. The document also strongly recommends > minimizing the dynamic polymorphism by designing "common non-abstract > base classes for SSA basic block and instructions concepts". > > My focus right now is on the base class for basic blocks. It represents > a frequently used property of the CFG, and the problem is concise, and > already familiar to anyone writing an analysis. Our current experience > with writing new analyses in AMDGPU provides a motivation for moving > away from templates. But this is only a specific part of the bigger > problem. In particular, it only represents the "CFG" part of the > problem, but not the "SSA IR" part. > > The actual use-case can be seen in the branch below, which implements > "CycleInfo" as a template over LLVM IR and MIR. It's a work in progress. > > https://github.com/ssahasra/llvm-project/tree/cycleinfo-as-template > > For now, we can consider a simpler analysis "CfgInfo" that reports > various properties such as outgoing edges per node. > > Example A: CfgInfo over LLVM IR > > CfgInfo::compute(BasicBlock *entry) { > worklist.enqueue(entry); > while (!worklist.empty) { > auto B = worklist.dequeue(); > for (auto S : successors(B)) { > worklist.enqueue_if_new(S); > collectStats(B, S); > } > } > } > > Example B: CfgInfo over Machine IR > > CfgInfo::compute(MachineBasicBlock *entry) { > worklist.enqueue(entry); > while (!worklist.empty) { > auto B = worklist.dequeue(); > for (auto S : B->successors()) { // <-- diff > worklist.enqueue_if_new(S); > collectStats(B, S); > } > } > } > > Example C: Template over both IRs > > template <typename BlockType> > CfgInfo::compute(BlockType *entry) { > worklist.enqueue(entry); > while (!worklist.empty) { > auto B = worklist.dequeue(); > for (auto S : successors(B)) { // <-- needs an overload > worklist.enqueue_if_new(S); > collectStats(B, S); > } > } > } > > In this example, any instance of BlockType needs to provide a global > function "successors()". The following is sufficient with Machine IR in > this particular case, but in general, this would be provided by traits: > > namespace llvm { > auto successors(MachineBasicBlock *B) { return B->successors(); } > } > > Example D: CfgInfo over a common base class > > CfgInfo::compute(BasicBlockBase *entry) { > worklist.enqueue(entry); > while (!worklist.empty) { > auto B = worklist.dequeue(); > for (auto S : B->successors()) { // <-- not virtual > worklist.enqueue_if_new(S); > collectStats(B, S); > } > } > } > > This last version is not a template. It's an implementation that gets > reused for both LLVM IR and Machine IR by simply casting the respective > basic block type to the base type. > > There are two main problems with the template approach: > > 1. Just for that one function "successors()", the type of the block has > to be passed around as a template parameter. Starting from the entry > point of the analysis, this has to go all the way to every leaf > function that wants to call "successors()". This can especially be a > problem if the template has other dependent types such as iterator > adaptors. > > 2. The actual analysis can be quite complicated, but because it's a > template, it has to be in a header file so that it can be > instantiated at the point where the analysis is actually computed. > > The CycleInfo patch demonstrates both these problems. There are multiple > files as follows: > > GenericCycleInfo.h > > This file defines the templates GenericCycleInfo<BlockType> and > GenericCycle<BlockType>. It also contains the entire implementation of > the analysis. > > CycleInfo.h > > - This file declares a thin wrapper CycleInfo around > GenericCycleInfo<BasicBlock> and another wrapper Cycle around > GenericCycle<BasicBlock>. These are the LLVM IR instances of the > templates. > - The wrappers avoid including GenericCycleInfo.h by only needing the > pointer type to the wrapped template instances. > - In particular, notice how the child_iterator explicitly exposes the > details of the wrapped type, but then returns the wrapper with the `*' > operator. > > using const_child_iterator_base > typename > std::vector<std::unique_ptr<GenericCycle<BasicBlock>>>::const_iterator; > struct const_child_iterator > : iterator_adaptor_base<const_child_iterator, > const_child_iterator_base> { > using Base > iterator_adaptor_base<const_child_iterator, > const_child_iterator_base>; > > Cycle operator *() const; > const_child_iterator(const_child_iterator_base I) : Base(I) {} > }; > > CycleInfo.cpp > > This file is where the template is actually instantiated, along with the > implementation of the wrapper classes. > > A similar pair of files will be needed for MachineCycleInfo later. These > will be almost identical to the CycleInfo files, except for the name of > the block type and the overload for "succesors()" and "predecessors()". > > All these files are just boilerplate designed to hide a template and > separately expose two instances of that template. Every type introduced > by the template needs to be carefully wrapped for use in other > files. Instead, if there was an interface (and not just a concept) that > exposed CFG edges, there would be just one CycleInfo.h file to declare > the analysis classes and one CycleInfo.cpp file to implement them. > > So the argument goes somewhat like this: > > 1. Static polymorphism through templates creates a development overhead > in the form of boilerplate, and also some runtime overhead of > indirecting > through wrapper types. > > 2. Dynamic polymorphism through an abstract base class or an abstract > context class removes the boilerplate, but it introduces the runtime > overhead of indirect calls. It also retains the runtime overhead of > indirecting through opaque wrappers. > > 3. A non-abstract base class eliminates all the above overheads (for > this specific problem of CFG traversal), but introduces a memory > overhead because some information from the derived classes has to be > duplicated in the base class. > > The CycleInfo analysis exemplifies our motivation for moving away from > templates and traits. Assuming virtual calls are considered strongly > undesirable, that leaves us with the option of a non-abstract base > class. > > Sameer. >-- Lerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210401/563f7bda/attachment.html>
Mehdi AMINI via llvm-dev
2021-Mar-31 23:30 UTC
[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses
On Wed, Mar 31, 2021 at 9:01 AM Sameer Sahasrabuddhe <sameer at sbuddhe.net> wrote:> > David Blaikie writes: > > >> I am currently involved in the activity towards decoupling AMDGPU's > >> dependency on the abstractions being discussed here, and the one > >> thing that immediately jumps out is the ability to traverse the > >> predecessors and successors of a basic block. From all the emails so > >> far and the resulting proposal, it seems that both David and Nicolai > >> think that having a common non-abstract base class for basic blocks > >> is a good idea. > > > > I don't /think/ that's what I was advocating for (but it's been a > > while, and I might be misremembering - if you've got pointers to > > particular parts of the conversation to refresh my memory/frame this > > discussion, that'd be handy) - I'd worry about the cost of trying to > > introduce a common base class between these different APIs. I think > > what I was advocating for was for them to have syntactically matching > > APIs so that templated code could abstract over them without the need > > for traits/adapters. > > You're right. Turns out that you didn't specifically advocate what I > said above. The following is what was actually said, but I misunderstood > something else from the thread: > > https://lists.llvm.org/pipermail/llvm-dev/2020-October/146090.html > > "I meant for a different direction - if we got buy-in for making LLVM IR > and > MIR implement the same C++ concept (& wrote down what that concept was - > which APIs, how they look textually) then I think it would be relatively > easy to get the small/mechanical renaming patches reviewed. Incremental > from day 1, not big new abstraction, then incremental." > > >> So if we create a common non-abstract base class for basic blocks, > >> then MLIR and LLVM IR will incur the overhead of two vectors, one > >> each for the preds and succs. Nicolai had reported a 3.1% to 4.1% > >> increase in the size LLVM IR in some typical programs: > >> > >> > https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing > > > > That's overhead in the bitcode file size, yes? > > It's the memory overhead of storing those pointers in a vector. I don't > think it needs to affect the bitcode, since it's just redundant > information that can be populated after reading the whole function. > > >> Is this overhead the main issue in deciding whether we should > >> introduce the common base class for traversing a CFG? > > > > I expect that would be a pretty significant concern, but also memory > > usage and any time overhead keeping these new data structures up to > > date, or slowing down compilation due to memory paging in and out, > > etc. > > Right. If the discussion progressed to those details, then we will be in > a good place :) > > Everyone seems to agree that we need an abstraction that makes it easier > to write analyses, and Nicolai's document captures all the options. But > in particular the document makes a case against templates, and instead > advocates a way to write analyses without referring to the actual types > in the underlying IR. This is achieved with a combination of opaque > handles and dynamic polymorphism. The document also strongly recommends > minimizing the dynamic polymorphism by designing "common non-abstract > base classes for SSA basic block and instructions concepts". > > My focus right now is on the base class for basic blocks. It represents > a frequently used property of the CFG, and the problem is concise, and > already familiar to anyone writing an analysis. Our current experience > with writing new analyses in AMDGPU provides a motivation for moving > away from templates. But this is only a specific part of the bigger > problem. In particular, it only represents the "CFG" part of the > problem, but not the "SSA IR" part. > > The actual use-case can be seen in the branch below, which implements > "CycleInfo" as a template over LLVM IR and MIR. It's a work in progress. > > https://github.com/ssahasra/llvm-project/tree/cycleinfo-as-template > > For now, we can consider a simpler analysis "CfgInfo" that reports > various properties such as outgoing edges per node. > > Example A: CfgInfo over LLVM IR > > CfgInfo::compute(BasicBlock *entry) { > worklist.enqueue(entry); > while (!worklist.empty) { > auto B = worklist.dequeue(); > for (auto S : successors(B)) { > worklist.enqueue_if_new(S); > collectStats(B, S); > } > } > } > > Example B: CfgInfo over Machine IR > > CfgInfo::compute(MachineBasicBlock *entry) { > worklist.enqueue(entry); > while (!worklist.empty) { > auto B = worklist.dequeue(); > for (auto S : B->successors()) { // <-- diff > worklist.enqueue_if_new(S); > collectStats(B, S); > } > } > } > > Example C: Template over both IRs > > template <typename BlockType> > CfgInfo::compute(BlockType *entry) { > worklist.enqueue(entry); > while (!worklist.empty) { > auto B = worklist.dequeue(); > for (auto S : successors(B)) { // <-- needs an overload > worklist.enqueue_if_new(S); > collectStats(B, S); > } > } > } > > In this example, any instance of BlockType needs to provide a global > function "successors()". The following is sufficient with Machine IR in > this particular case, but in general, this would be provided by traits: > > namespace llvm { > auto successors(MachineBasicBlock *B) { return B->successors(); } > } > > Example D: CfgInfo over a common base class > > CfgInfo::compute(BasicBlockBase *entry) { > worklist.enqueue(entry); > while (!worklist.empty) { > auto B = worklist.dequeue(); > for (auto S : B->successors()) { // <-- not virtual > worklist.enqueue_if_new(S); > collectStats(B, S); > } > } > } > > This last version is not a template. It's an implementation that gets > reused for both LLVM IR and Machine IR by simply casting the respective > basic block type to the base type. > > There are two main problems with the template approach: > > 1. Just for that one function "successors()", the type of the block has > to be passed around as a template parameter. Starting from the entry > point of the analysis, this has to go all the way to every leaf > function that wants to call "successors()". This can especially be a > problem if the template has other dependent types such as iterator > adaptors. > > 2. The actual analysis can be quite complicated, but because it's a > template, it has to be in a header file so that it can be > instantiated at the point where the analysis is actually computed. > > The CycleInfo patch demonstrates both these problems. There are multiple > files as follows: > > GenericCycleInfo.h > > This file defines the templates GenericCycleInfo<BlockType> and > GenericCycle<BlockType>. It also contains the entire implementation of > the analysis. > > CycleInfo.h > > - This file declares a thin wrapper CycleInfo around > GenericCycleInfo<BasicBlock> and another wrapper Cycle around > GenericCycle<BasicBlock>. These are the LLVM IR instances of the > templates. > - The wrappers avoid including GenericCycleInfo.h by only needing the > pointer type to the wrapped template instances. > - In particular, notice how the child_iterator explicitly exposes the > details of the wrapped type, but then returns the wrapper with the `*' > operator. > > using const_child_iterator_base > typename > std::vector<std::unique_ptr<GenericCycle<BasicBlock>>>::const_iterator; > struct const_child_iterator > : iterator_adaptor_base<const_child_iterator, > const_child_iterator_base> { > using Base > iterator_adaptor_base<const_child_iterator, > const_child_iterator_base>; > > Cycle operator *() const; > const_child_iterator(const_child_iterator_base I) : Base(I) {} > }; > > CycleInfo.cpp > > This file is where the template is actually instantiated, along with the > implementation of the wrapper classes. > > A similar pair of files will be needed for MachineCycleInfo later. These > will be almost identical to the CycleInfo files, except for the name of > the block type and the overload for "succesors()" and "predecessors()". > > All these files are just boilerplate designed to hide a template and > separately expose two instances of that template. Every type introduced > by the template needs to be carefully wrapped for use in other > files. Instead, if there was an interface (and not just a concept) that > exposed CFG edges, there would be just one CycleInfo.h file to declare > the analysis classes and one CycleInfo.cpp file to implement them. > > So the argument goes somewhat like this: > > 1. Static polymorphism through templates creates a development overhead > in the form of boilerplate, and also some runtime overhead of > indirecting > through wrapper types. > > 2. Dynamic polymorphism through an abstract base class or an abstract > context class removes the boilerplate, but it introduces the runtime > overhead of indirect calls. It also retains the runtime overhead of > indirecting through opaque wrappers. > > 3. A non-abstract base class eliminates all the above overheads (for > this specific problem of CFG traversal), but introduces a memory > overhead because some information from the derived classes has to be > duplicated in the base class. > > The CycleInfo analysis exemplifies our motivation for moving away from > templates and traits. Assuming virtual calls are considered strongly > undesirable, that leaves us with the option of a non-abstract base > class.Thanks, that was an excellent summary and the example was really helpful (at least to me)! I'm fairly old school here I guess, as I'd have naively written an abstract base class with virtual methods (or function_ref callback into the implementation), but there are concerns with the indirection compared to a templated implementation. But aren't we trading the cost of this indirection with even more cost if we have to duplicate information in a base class? Best, -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210331/3e5164c6/attachment.html>
Sameer Sahasrabuddhe via llvm-dev
2021-Apr-01 08:54 UTC
[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses
Nicolai Hähnle writes:> A few high-level remarks from what I remember. Let's always keep in mind > that there are a whole bunch of puzzle pieces in flight here, and there's > always a risk that the discussion gets off track by losing sight of that > bigger picture. Some of those puzzle pieces can hopefully be > non-controversial. In particular, I'm thinking here of: > > 1. Aligning the different IR classes so that they implement common concepts > and allow writing templates idiomatically. > 2. Allowing the use of opaque handles > <https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?ts=5fd124e3#heading=h.adqu14d3b7jd> > as described in the document.I agree if you mean to say that the above two things are not mutually exclusive, and both are beneficial. At least from my point of view, templates over a common set of concepts are very valuable for writing utility functions, but become cumbersome for writing large analyses. Opaque handles (combined with a context class) can make it easier to separate the IR-independent implementation from the IR-dependent interface. The context class provides a way to look up properties on the opaque handle (such as the successors for a BlockHandle object) through virtual functions. But having a non-abstract base class quickly reduce the surface of these virtual functions, since the most common functions like successors() are readily available in the base class itself. That is related to what Mehdi asked in a sibling email:> I'm fairly old school here I guess, as I'd have naively written an > abstract base class with virtual methods (or function_ref callback > into the implementation), but there are concerns with the indirection > compared to a templated implementation. > But aren't we trading the cost of this indirection with even more cost > if we have to duplicate information in a base class?Yes, we are avoiding indirect calls (not to be confused with just indirection through a data pointer). It's useful to ask whether it is okay to have indirect calls (through virtual functions) when computing an analysis instead of doing something that affects the IR itself. Is it an option to let each analysis worry about its own performance, perhaps by caching internal results of the virtual calls? Presumably, using the results of an analysis or updating the analysis will only have an incremental cost from indirect calls? If we want to move away from templates, then dynamic polymorphism cannot be completely avoided, like when looking up the definition of a value. But having a common base class can take out big chunks of the surface area. The posterboy for this is CFG traversal along the successor/predecessor pointers.> An observation about the performance implications of common base classes > for basic blocks. I'm taking it as a given that a common base class would > have predecessor and successor vectors. In addition to making predecessor > and successor iteration simpler, this could actually end up _saving_ memory > for LLVM IR in many cases if we make the following additional changes: > > 1. Change the definition of phi nodes so that they only contain the > predecessor _values_ in their argument lists (removing all the arguments > referring to the predecessor basic blocks). > 2. Change the definition of terminator instructions so that they no longer > contain the successor basic block(s) in their argument lists and instead > implicitly refer to the basic blocks successor list. For example, a > conditional branch would be defined such that its only argument is the i1 > condition. If the condition is true, the program branches to > brInst->getParent()->getSuccessors()[0]; if the condition is false, it > branches to brInst->getParent()->getSuccessors()[1]. > > This is an option that I hadn't taken into account when I calculated those > 3-4% of memory overhead, but it should end up saving memory whenever a > basic block has >= 2 phi nodes (the vectors themselves have an overhead, so > it depends). My experiment would have to be re-done to get a better idea of > the final balance. > > The changes mentioned above are admittedly very invasive and complicated > changes. It would be good to get an idea of what sort of acceptance > criterion the LLVM community has here.Guess what, I had actually attempted this before reviving this thread! My first experiment was to put an "assert(getParent())" in the {get,set}Successor() methods. That didn't result in any interesting failures. The real problem is with the terminator constructors. Usually any Instruction::Create() sort of function optionally takes an InsertBefore or InsertAtEnd argument. But if we require terminators to access successors from the parent block, then variants of say BranchInst::Create() that take successor pointers as arguments must now require either an InsertBefore or an InsertAtEnd. That breaks all sorts of code that manipulates the CFG. Requiring a parent block when creating a terminator is a lot of work, and I wasn't sure that the gain was worth the effort. It would need refactoring many places that have been built around the assumption that anyone can create an instruction in a vacuum. Instead, redefining the BasicBlock to not be derived from the Value class seemed more practical to me. It reduces the same overhead of redundantly tracking CFG edges. For the special uses of llvm::BasicBlock as a value, we could provide a new BasicBlockValue class. I tried commenting out the Value class parent from the BasicBlock declaration, but that experiment is too big to try on my own, especially without some consensus forming in this discussion. Sameer.