Sameer Sahasrabuddhe via llvm-dev
2021-Mar-30 16:20 UTC
[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses
---- On Thu, 11 Feb 2021 06:07:17 +0530 David Blaikie via llvm-dev <llvm-dev at lists.llvm.org> wrote ---- > Owen - perhaps you've got some relevant perspective here given the GPU > context/background, and general middle end optimization design > questions. Could you take a look? > (& perhaps Johannes?) > > On Mon, Feb 1, 2021 at 11:36 AM David Blaikie <dblaikie at gmail.com> wrote: > > > > Thanks for sending this out! > > > > +Mehdi - if you have a chance to look, I'd appreciate your thoughts on this, partly as a general LLVM contributor, but also with an eye to abstractions over multiple IRs given your work on MLIR > > +Lang - you mentioned maybe some colleagues of yours might be able to take a look at the design here? > > > > I'd like to try writing up maybe alternatives to a couple of the patches in the series to see how they compare, but haven't done that as yet. > > > > My brief take is that I think the opaque handles make a fair bit of sense/seem high-value/low-cost (if it turned out the handle for one of these IR types needed to become a bit heavier, I don't think it'd be super difficult to revisit this code - make the generic handle large enough to accommodate whatever the largest concrete handle happens to be, etc). Though I'm still a bit less certain about the full runtime polymorphic abstractions. For context, this is about the options laid out in the following document from Nicolai: https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?usp=sharing The above document was first introduced in the following email, which is the starting point of the current thread: https://lists.llvm.org/pipermail/llvm-dev/2020-December/147433.html The main issue is the need for an abstraction that greatly improves the experience of writing analyses that work on LLVM IR and MIR (and potentially also on MLIR). The way I understand it, Nicolai proposed an abstraction that involves dynamic polymorphism to abstract out the details of the underlying IR. David was more in favour of first unifying the IRs to the point that the dynamic polymorphism is limited to only occasional corner cases instead. I am currently involved in the activity towards decoupling AMDGPU's dependency on the abstractions being discussed here, and the one thing that immediately jumps out is the ability to traverse the predecessors and successors of a basic block. From all the emails so far and the resulting proposal, it seems that both David and Nicolai think that having a common non-abstract base class for basic blocks is a good idea. Such a base class will provide a vector of predecessors and successors that can be used to traverse the CFG independent of the actual IR. The current situation is that the basic blocks in LLVM IR and MLIR do not explicitly track their preds and succs. Preds are determined from the uses of the block, while succs are determined from the terminator instruction. MIR has explicit pred and succ vectors, but the succ pointers are duplicated by their presence as operands to the terminator instructions. The MachineBasicBlock is not a value, so there is no notion of traversing its uses for predecessors. So if we create a common non-abstract base class for basic blocks, then MLIR and LLVM IR will incur the overhead of two vectors, one each for the preds and succs. Nicolai had reported a 3.1% to 4.1% increase in the size LLVM IR in some typical programs: https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing Is this overhead the main issue in deciding whether we should introduce the common base class for traversing a CFG? Do note that MIR already duplicates successors. One could argue that duplicating the successor pointers in LLVM IR and MLIR merely brings them on the same level as MIR. If that argument sticks, then the "real" overhead is only half of what is reported to account for the duplicate predecessor pointers. The duplication of predecessor pointers stems from the fact that basic blocks in LLVM IR and MLIR are also values that are (primarily) used as operands to the terminator and PHI instructions. We could redefine these instructions to use non-value basic blocks as operands. CFG edges are not the typical use-def relation that other values represent, and it's reasonable to say that PHINodes and terminators are special in this one way. I have not managed to see how far that rabbit hole goes, but I am not really convinced that this is attractive in any way. Is this even an option? Sameer.
Mehdi AMINI via llvm-dev
2021-Mar-31 00:25 UTC
[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses
On Tue, Mar 30, 2021 at 9:20 AM Sameer Sahasrabuddhe <sameer at sbuddhe.net> wrote:> ---- On Thu, 11 Feb 2021 06:07:17 +0530 David Blaikie via llvm-dev < > llvm-dev at lists.llvm.org> wrote ---- > > > Owen - perhaps you've got some relevant perspective here given the GPU > > context/background, and general middle end optimization design > > questions. Could you take a look? > > (& perhaps Johannes?) > > > > On Mon, Feb 1, 2021 at 11:36 AM David Blaikie <dblaikie at gmail.com> > wrote: > > > > > > Thanks for sending this out! > > > > > > +Mehdi - if you have a chance to look, I'd appreciate your thoughts > on this, partly as a general LLVM contributor, but also with an eye to > abstractions over multiple IRs given your work on MLIR > > > +Lang - you mentioned maybe some colleagues of yours might be able to > take a look at the design here? > > > > > > I'd like to try writing up maybe alternatives to a couple of the > patches in the series to see how they compare, but haven't done that as yet. > > > > > > My brief take is that I think the opaque handles make a fair bit of > sense/seem high-value/low-cost (if it turned out the handle for one of > these IR types needed to become a bit heavier, I don't think it'd be super > difficult to revisit this code - make the generic handle large enough to > accommodate whatever the largest concrete handle happens to be, etc). > Though I'm still a bit less certain about the full runtime polymorphic > abstractions. > > For context, this is about the options laid out in the following document > from Nicolai: > > > https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?usp=sharing > > The above document was first introduced in the following email, which is > the starting point of the current thread: > > https://lists.llvm.org/pipermail/llvm-dev/2020-December/147433.html > > The main issue is the need for an abstraction that greatly improves the > experience of writing analyses that work on LLVM IR and MIR (and > potentially also on MLIR). The way I understand it, Nicolai proposed an > abstraction that involves dynamic polymorphism to abstract out the details > of the underlying IR. David was more in favour of first unifying the IRs to > the point that the dynamic polymorphism is limited to only occasional > corner cases instead. > > I am currently involved in the activity towards decoupling AMDGPU's > dependency on the abstractions being discussed here, and the one thing that > immediately jumps out is the ability to traverse the predecessors and > successors of a basic block. From all the emails so far and the resulting > proposal, it seems that both David and Nicolai think that having a common > non-abstract base class for basic blocks is a good idea. Such a base class > will provide a vector of predecessors and successors that can be used to > traverse the CFG independent of the actual IR. > > The current situation is that the basic blocks in LLVM IR and MLIR do not > explicitly track their preds and succs. Preds are determined from the uses > of the block, while succs are determined from the terminator instruction. > MIR has explicit pred and succ vectors, but the succ pointers are > duplicated by their presence as operands to the terminator instructions. > The MachineBasicBlock is not a value, so there is no notion of traversing > its uses for predecessors. > > So if we create a common non-abstract base class for basic blocks, then > MLIR and LLVM IR will incur the overhead of two vectors, one each for the > preds and succs. Nicolai had reported a 3.1% to 4.1% increase in the size > LLVM IR in some typical programs: > > > https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing > > Is this overhead the main issue in deciding whether we should introduce > the common base class for traversing a CFG? Do note that MIR already > duplicates successors. One could argue that duplicating the successor > pointers in LLVM IR and MLIR merely brings them on the same level as MIR. > If that argument sticks, then the "real" overhead is only half of what is > reported to account for the duplicate predecessor pointers. > > The duplication of predecessor pointers stems from the fact that basic > blocks in LLVM IR and MLIR are also values that are (primarily) used as > operands to the terminator and PHI instructions. We could redefine these > instructions to use non-value basic blocks as operands. CFG edges are not > the typical use-def relation that other values represent, and it's > reasonable to say that PHINodes and terminators are special in this one > way. I have not managed to see how far that rabbit hole goes, but I am not > really convinced that this is attractive in any way. Is this even an option? >I'm not knowledgeable about everything you're looking at, but one minor thing: in MLIR blocks aren't regular value operands and aren't part of the value hierarchy, they don't participate in the usual use-def chains (and MLIR does not have Phi nodes either). Note that LLVM takes advantage of this to implement the `blockaddress` instruction, which would be difficult in MLIR at the moment. Best, -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210330/4c62648d/attachment.html>
David Blaikie via llvm-dev
2021-Mar-31 03:15 UTC
[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses
(FWIW, I'd still really love Mehdi, Owen, or anyone else's opinion on the original proposal or any parts of this general problem - folks working on some profile features have been continuing the template direction to abstract over MIR+LLVM IR and if there's benefits to be had here, it'd be great to get them... ) On Tue, Mar 30, 2021 at 9:20 AM Sameer Sahasrabuddhe <sameer at sbuddhe.net> wrote:> > ---- On Thu, 11 Feb 2021 06:07:17 +0530 David Blaikie via llvm-dev <llvm-dev at lists.llvm.org> wrote ---- > > > Owen - perhaps you've got some relevant perspective here given the GPU > > context/background, and general middle end optimization design > > questions. Could you take a look? > > (& perhaps Johannes?) > > > > On Mon, Feb 1, 2021 at 11:36 AM David Blaikie <dblaikie at gmail.com> wrote: > > > > > > Thanks for sending this out! > > > > > > +Mehdi - if you have a chance to look, I'd appreciate your thoughts on this, partly as a general LLVM contributor, but also with an eye to abstractions over multiple IRs given your work on MLIR > > > +Lang - you mentioned maybe some colleagues of yours might be able to take a look at the design here? > > > > > > I'd like to try writing up maybe alternatives to a couple of the patches in the series to see how they compare, but haven't done that as yet. > > > > > > My brief take is that I think the opaque handles make a fair bit of sense/seem high-value/low-cost (if it turned out the handle for one of these IR types needed to become a bit heavier, I don't think it'd be super difficult to revisit this code - make the generic handle large enough to accommodate whatever the largest concrete handle happens to be, etc). Though I'm still a bit less certain about the full runtime polymorphic abstractions. > > For context, this is about the options laid out in the following document from Nicolai: > > https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?usp=sharing > > The above document was first introduced in the following email, which is the starting point of the current thread: > > https://lists.llvm.org/pipermail/llvm-dev/2020-December/147433.html > > The main issue is the need for an abstraction that greatly improves the experience of writing analyses that work on LLVM IR and MIR (and potentially also on MLIR). The way I understand it, Nicolai proposed an abstraction that involves dynamic polymorphism to abstract out the details of the underlying IR. David was more in favour of first unifying the IRs to the point that the dynamic polymorphism is limited to only occasional corner cases instead.To clarify, I think what I was leaning towards was having the APIs match syntactically, so it'd be easier to write templates over them without the need for type traits and adapters.> I am currently involved in the activity towards decoupling AMDGPU's dependency on the abstractions being discussed here, and the one thing that immediately jumps out is the ability to traverse the predecessors and successors of a basic block. From all the emails so far and the resulting proposal, it seems that both David and Nicolai think that having a common non-abstract base class for basic blocks is a good idea.I don't /think/ that's what I was advocating for (but it's been a while, and I might be misremembering - if you've got pointers to particular parts of the conversation to refresh my memory/frame this discussion, that'd be handy) - I'd worry about the cost of trying to introduce a common base class between these different APIs. I think what I was advocating for was for them to have syntactically matching APIs so that templated code could abstract over them without the need for traits/adapters.> Such a base class will provide a vector of predecessors and successors that can be used to traverse the CFG independent of the actual IR. > > The current situation is that the basic blocks in LLVM IR and MLIR do not explicitly track their preds and succs. Preds are determined from the uses of the block, while succs are determined from the terminator instruction. MIR has explicit pred and succ vectors, but the succ pointers are duplicated by their presence as operands to the terminator instructions. The MachineBasicBlock is not a value, so there is no notion of traversing its uses for predecessors. > > So if we create a common non-abstract base class for basic blocks, then MLIR and LLVM IR will incur the overhead of two vectors, one each for the preds and succs. Nicolai had reported a 3.1% to 4.1% increase in the size LLVM IR in some typical programs: > > https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharingThat's overhead in the bitcode file size, yes?> Is this overhead the main issue in deciding whether we should introduce the common base class for traversing a CFG?I expect that would be a pretty significant concern, but also memory usage and any time overhead keeping these new data structures up to date, or slowing down compilation due to memory paging in and out, etc.> Do note that MIR already duplicates successors. One could argue that duplicating the successor pointers in LLVM IR and MLIR merely brings them on the same level as MIR. If that argument sticks, then the "real" overhead is only half of what is reported to account for the duplicate predecessor pointers. > > The duplication of predecessor pointers stems from the fact that basic blocks in LLVM IR and MLIR are also values that are (primarily) used as operands to the terminator and PHI instructions. We could redefine these instructions to use non-value basic blocks as operands. CFG edges are not the typical use-def relation that other values represent, and it's reasonable to say that PHINodes and terminators are special in this one way. I have not managed to see how far that rabbit hole goes, but I am not really convinced that this is attractive in any way. Is this even an option? > > Sameer.