Andrey Bokhanko via llvm-dev
2021-Mar-16 11:23 UTC
[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization
Let me add my modest +1 vote to committing BOLT as it is, and *then* restructuring it as a part of LLVM development process -- with proper reviews, etc. This is how flang and OpenMP runtime had been added to LLVM project. This is a sure way to start things going; otherwise we may end up with a project preparing for inclusion into LLVM ad infinitum. Yours, Andrey On Tue, Mar 16, 2021 at 7:16 AM Xinliang David Li <xinliangli at gmail.com> wrote:> > > > On Fri, Mar 12, 2021 at 11:57 AM Rafael Auler <rafaelauler at fb.com> wrote: >> >> Chris, the approach of living under /bolt sounds reasonable to me. >> >> >> >> Mehdi and David, the difference of doing things in-tree vs out-of-tree is that, currently, BOLT out-of-tree has >> >> (1) different legal requirements for accepting contributions (external contributions require devs to sign a CLA). So I agree with Mehdi that the same forks will get broken as we refactor code, but once BOLT is in the llvm monorepo, at least they will have the chance to upstream it with different legal requirements. If they don’t want to upstream it, that’s fine too, but I would like to give them a chance. >> (2) a different development workflow that is less open than LLVM’s. Because we want the input of the community on a refactoring that reflects how they want to use the libraries too, it would be more natural for this to happen inside in-tree LLVM. >> >> >> >> David, if we try to coordinate this refactoring happening in both repos (library part in LLVM while the client part in our separate repo), that will be challenging to do because we wouldn’t be able to easily test the LLVM’s diffs – a problem we are already facing with upstreaming our changes to LLVM without BOLT being there to easily show devs how our changes are actually used and tested. Moreover, other contributors who don’t have easy access to our github repo will have a hard time working with us in the refactor as they wouldn’t be able to do work on the tool (just the open library). > > > Hi Rafael, I am not actually proposing an intermediate state where parts of BOLT lives in LLVM while the client lives in a separate repo. What I meant is a restructuring step within BOLT before dropping in LLVM. For instance, in the bolt's top directory, there are lots of different things -- different driver programs, profile reader/writers, debug info handling, exception handling code, BOLT IR/core data structures (BB, Loop, Function) etc, pass managers etc. The Pass directory is also pretty flat. Some preliminary reorganization with more tests added can reduce a lot of churns in the future. WDYT? > > thanks, > > David > > > >> >> >> >> Mehdi, your suggestion looks good, I intend to show everyone the monorepo snapshot. We are making sure it is ready to be published and that’s why I’ve been referring to our snapshot as “imagine our github repo contents are under /bolt” because that is pretty much it, but I will present it soon. >> >> >> >> >> >> >> >> From: Xinliang David Li <xinliangli at gmail.com> >> Date: Thursday, March 11, 2021 at 11:33 PM >> To: Chris Lattner <clattner at nondot.org> >> Cc: Rafael Auler <rafaelauler at fb.com>, llvm-dev <llvm-dev at lists.llvm.org>, Andrey Bokhanko <andreybokhanko at gmail.com> >> Subject: Re: [llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization >> >> Dropping Bolt to the top level directory sounds reasonable, but perhaps a hybrid approach similar to what is mentioned by Medhi can be applied. Basically Bolt first goes through a round of refactoring in github upstream first with design that is close to the future structure in LLVM, and then drops in as a monolithic piece initially. This will make future restructuring much easier. There are other benefits: 1) it is a good opportunity to clean up Bolt's internal APIs 2) It is time to beef up unittests; 3) it makes code review easier. >> >> >> >> David >> >> >> >> On Thu, Mar 11, 2021 at 10:34 PM Chris Lattner via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> On Mar 11, 2021, at 9:40 PM, Rafael Auler via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> >> >> Hi Mehdi and David, >> >> >> >> Indeed, we share similar concerns. We do intend to move functionality of BOLT to live as a library, but the timeline is unclear. In fact, most of BOLT could live in a library already, it’s just a matter of moving some files into separate components. Instead of the files living in tools/llvm-bolt, most could just be moved under lib/something, and we already have a llvm-bolt.cpp file that instantiates the driver that coordinates the binary rewriting process, which is the entry point of BOLT as a library. People could already leverage this to use BOLT in different ways (for example, I wrote some time ago a different utility that runs the driver for two different binaries and compares the two – this was named boltdiff later). >> >> >> >> My main reason for committing the project as a whole first, in the same way as flang did, though, (as a project merged into the monorepo), is because BOLT is already opensource for a while, and it is a 6-year old project with about 800 commits and 50K lines of code and we know we have people who forked the project and would like to contribute to it. If I commit into LLVM a different BOLT (not just rebased), then I (a) break or make it hard for any work on top of it from other contributors, (b) lose the original history or make it harder to preserve it. That’s why I was going for a more smoother transition. I, as a developer, put value in the ability to blame and to understand why things were built a certain way, and not bringing BOLT’s history (in the same way as flang did) would mean we and the community loses a lot of context on the decisions of the project. And I guess that’s also the rationale for a monorepo, to have multiple projects merged together. >> >> >> >> Because of that, I initially put bolt under /bolt, following flang’s model of merging the history so every developer has the right context. But the original location was under llvm/tools. >> >> >> >> As with others, I’m not very aware of the internal architecture of bolt, so take this with a grain of salt: >> >> >> >> From what I understand, I have a slight preference for starting this out as a /bolt top level “subproject”, because the code currently sounds monolithic. As the implementation logic is refactored into more reusable units, those library can be cleanly movable within the monorepo, e.g. under the llvm-project/llvm directory if appropriate. >> >> >> >> The advantage of doing this is that nothing in the llvm-project/llvm repo can come to depend on the bolt code until and if it gets refactored. This is also how things like LLDB started out (and it would be great for more of the reusable libraries in LLDB to be merged into LLVM over time). >> >> >> >> Does anyone have any concerns about this approach? >> >> >> >> >> >> >> >> Unrelatedly, I’d also love to see the llvm repository exploded a bit into more top level repos, e.g. splitting support/adt out to their own thing. It is also worth considering splitting the MC layer out to its own thing as well, LLVM IR and the mid-level optimizer into its own thing, and CodeGen and the targets into its own thing. >> >> >> >> The major constraint we need is that we want the dependences between top-level subproject to be a strong DAG between the subproject now and defensible into the future, and we don’t want minor evolution of the codebase to cause libraries to have to be moved around. The benefit of splitting it up is easier to enforce layering, encouraging LLVM developers to work across subproject a bit more, and making it easier for subproject to depend on slices of “the big llvm directory”. >> >> >> >> -Chris >> >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Xinliang David Li via llvm-dev
2021-Mar-16 17:46 UTC
[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization
I think one thing we can all agree upon is the community wants a good balance between velocity and quality (ensured by proper reviews). I believe doing some preliminary restructuring and cleanups can help not only the quality, but improves velocity as well. A good structure serves the purpose of 'self-documentation' and will greatly help code reviewers (to be more effective). thanks, David On Tue, Mar 16, 2021 at 4:24 AM Andrey Bokhanko <andreybokhanko at gmail.com> wrote:> Let me add my modest +1 vote to committing BOLT as it is, and *then* > restructuring it as a part of LLVM development process -- with proper > reviews, etc. > > This is how flang and OpenMP runtime had been added to LLVM project. > This is a sure way to start things going; otherwise we may end up with > a project preparing for inclusion into LLVM ad infinitum. > > Yours, > Andrey > > > > > On Tue, Mar 16, 2021 at 7:16 AM Xinliang David Li <xinliangli at gmail.com> > wrote: > > > > > > > > On Fri, Mar 12, 2021 at 11:57 AM Rafael Auler <rafaelauler at fb.com> > wrote: > >> > >> Chris, the approach of living under /bolt sounds reasonable to me. > >> > >> > >> > >> Mehdi and David, the difference of doing things in-tree vs out-of-tree > is that, currently, BOLT out-of-tree has > >> > >> (1) different legal requirements for accepting contributions > (external contributions require devs to sign a CLA). So I agree with Mehdi > that the same forks will get broken as we refactor code, but once BOLT is > in the llvm monorepo, at least they will have the chance to upstream it > with different legal requirements. If they don’t want to upstream it, > that’s fine too, but I would like to give them a chance. > >> (2) a different development workflow that is less open than LLVM’s. > Because we want the input of the community on a refactoring that reflects > how they want to use the libraries too, it would be more natural for this > to happen inside in-tree LLVM. > >> > >> > >> > >> David, if we try to coordinate this refactoring happening in both repos > (library part in LLVM while the client part in our separate repo), that > will be challenging to do because we wouldn’t be able to easily test the > LLVM’s diffs – a problem we are already facing with upstreaming our changes > to LLVM without BOLT being there to easily show devs how our changes are > actually used and tested. Moreover, other contributors who don’t have easy > access to our github repo will have a hard time working with us in the > refactor as they wouldn’t be able to do work on the tool (just the open > library). > > > > > > Hi Rafael, I am not actually proposing an intermediate state where parts > of BOLT lives in LLVM while the client lives in a separate repo. What I > meant is a restructuring step within BOLT before dropping in LLVM. For > instance, in the bolt's top directory, there are lots of different things > -- different driver programs, profile reader/writers, debug info handling, > exception handling code, BOLT IR/core data structures (BB, Loop, Function) > etc, pass managers etc. The Pass directory is also pretty flat. Some > preliminary reorganization with more tests added can reduce a lot of churns > in the future. WDYT? > > > > thanks, > > > > David > > > > > > > >> > >> > >> > >> Mehdi, your suggestion looks good, I intend to show everyone the > monorepo snapshot. We are making sure it is ready to be published and > that’s why I’ve been referring to our snapshot as “imagine our github repo > contents are under /bolt” because that is pretty much it, but I will > present it soon. > >> > >> > >> > >> > >> > >> > >> > >> From: Xinliang David Li <xinliangli at gmail.com> > >> Date: Thursday, March 11, 2021 at 11:33 PM > >> To: Chris Lattner <clattner at nondot.org> > >> Cc: Rafael Auler <rafaelauler at fb.com>, llvm-dev < > llvm-dev at lists.llvm.org>, Andrey Bokhanko <andreybokhanko at gmail.com> > >> Subject: Re: [llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, > Transformation, and Optimization > >> > >> Dropping Bolt to the top level directory sounds reasonable, but perhaps > a hybrid approach similar to what is mentioned by Medhi can be applied. > Basically Bolt first goes through a round of refactoring in github upstream > first with design that is close to the future structure in LLVM, and then > drops in as a monolithic piece initially. This will make future > restructuring much easier. There are other benefits: 1) it is a good > opportunity to clean up Bolt's internal APIs 2) It is time to beef up > unittests; 3) it makes code review easier. > >> > >> > >> > >> David > >> > >> > >> > >> On Thu, Mar 11, 2021 at 10:34 PM Chris Lattner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> > >> On Mar 11, 2021, at 9:40 PM, Rafael Auler via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> > >> > >> > >> Hi Mehdi and David, > >> > >> > >> > >> Indeed, we share similar concerns. We do intend to move functionality > of BOLT to live as a library, but the timeline is unclear. In fact, most of > BOLT could live in a library already, it’s just a matter of moving some > files into separate components. Instead of the files living in > tools/llvm-bolt, most could just be moved under lib/something, and we > already have a llvm-bolt.cpp file that instantiates the driver that > coordinates the binary rewriting process, which is the entry point of BOLT > as a library. People could already leverage this to use BOLT in different > ways (for example, I wrote some time ago a different utility that runs the > driver for two different binaries and compares the two – this was named > boltdiff later). > >> > >> > >> > >> My main reason for committing the project as a whole first, in the same > way as flang did, though, (as a project merged into the monorepo), is > because BOLT is already opensource for a while, and it is a 6-year old > project with about 800 commits and 50K lines of code and we know we have > people who forked the project and would like to contribute to it. If I > commit into LLVM a different BOLT (not just rebased), then I (a) break or > make it hard for any work on top of it from other contributors, (b) lose > the original history or make it harder to preserve it. That’s why I was > going for a more smoother transition. I, as a developer, put value in the > ability to blame and to understand why things were built a certain way, and > not bringing BOLT’s history (in the same way as flang did) would mean we > and the community loses a lot of context on the decisions of the project. > And I guess that’s also the rationale for a monorepo, to have multiple > projects merged together. > >> > >> > >> > >> Because of that, I initially put bolt under /bolt, following flang’s > model of merging the history so every developer has the right context. But > the original location was under llvm/tools. > >> > >> > >> > >> As with others, I’m not very aware of the internal architecture of > bolt, so take this with a grain of salt: > >> > >> > >> > >> From what I understand, I have a slight preference for starting this > out as a /bolt top level “subproject”, because the code currently sounds > monolithic. As the implementation logic is refactored into more reusable > units, those library can be cleanly movable within the monorepo, e.g. under > the llvm-project/llvm directory if appropriate. > >> > >> > >> > >> The advantage of doing this is that nothing in the llvm-project/llvm > repo can come to depend on the bolt code until and if it gets refactored. > This is also how things like LLDB started out (and it would be great for > more of the reusable libraries in LLDB to be merged into LLVM over time). > >> > >> > >> > >> Does anyone have any concerns about this approach? > >> > >> > >> > >> > >> > >> > >> > >> Unrelatedly, I’d also love to see the llvm repository exploded a bit > into more top level repos, e.g. splitting support/adt out to their own > thing. It is also worth considering splitting the MC layer out to its own > thing as well, LLVM IR and the mid-level optimizer into its own thing, and > CodeGen and the targets into its own thing. > >> > >> > >> > >> The major constraint we need is that we want the dependences between > top-level subproject to be a strong DAG between the subproject now and > defensible into the future, and we don’t want minor evolution of the > codebase to cause libraries to have to be moved around. The benefit of > splitting it up is easier to enforce layering, encouraging LLVM developers > to work across subproject a bit more, and making it easier for subproject > to depend on slices of “the big llvm directory”. > >> > >> > >> > >> -Chris > >> > >> > >> > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210316/3c711a01/attachment-0001.html>
Mehdi AMINI via llvm-dev
2021-Mar-17 19:53 UTC
[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization
On Tue, Mar 16, 2021 at 4:24 AM Andrey Bokhanko via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Let me add my modest +1 vote to committing BOLT as it is, and *then* > restructuring it as a part of LLVM development process -- with proper > reviews, etc. > > This is how flang and OpenMP runtime had been added to LLVM project. >Actually if I remember correctly flang went through multiple months of preparatory upgrade that were asked for by some people in the community, and they did so out-of-tree before getting ready to land in a single merge.> This is a sure way to start things going; otherwise we may end up with > a project preparing for inclusion into LLVM ad infinitum. >We just have to make the expectation very clear and having a "moving goalposts" situation and it should work fine. Any particular reason that would put us in a "ad infinitum" situation? -- Mehdi> > Yours, > Andrey > > > > > On Tue, Mar 16, 2021 at 7:16 AM Xinliang David Li <xinliangli at gmail.com> > wrote: > > > > > > > > On Fri, Mar 12, 2021 at 11:57 AM Rafael Auler <rafaelauler at fb.com> > wrote: > >> > >> Chris, the approach of living under /bolt sounds reasonable to me. > >> > >> > >> > >> Mehdi and David, the difference of doing things in-tree vs out-of-tree > is that, currently, BOLT out-of-tree has > >> > >> (1) different legal requirements for accepting contributions > (external contributions require devs to sign a CLA). So I agree with Mehdi > that the same forks will get broken as we refactor code, but once BOLT is > in the llvm monorepo, at least they will have the chance to upstream it > with different legal requirements. If they don’t want to upstream it, > that’s fine too, but I would like to give them a chance. > >> (2) a different development workflow that is less open than LLVM’s. > Because we want the input of the community on a refactoring that reflects > how they want to use the libraries too, it would be more natural for this > to happen inside in-tree LLVM. > >> > >> > >> > >> David, if we try to coordinate this refactoring happening in both repos > (library part in LLVM while the client part in our separate repo), that > will be challenging to do because we wouldn’t be able to easily test the > LLVM’s diffs – a problem we are already facing with upstreaming our changes > to LLVM without BOLT being there to easily show devs how our changes are > actually used and tested. Moreover, other contributors who don’t have easy > access to our github repo will have a hard time working with us in the > refactor as they wouldn’t be able to do work on the tool (just the open > library). > > > > > > Hi Rafael, I am not actually proposing an intermediate state where parts > of BOLT lives in LLVM while the client lives in a separate repo. What I > meant is a restructuring step within BOLT before dropping in LLVM. For > instance, in the bolt's top directory, there are lots of different things > -- different driver programs, profile reader/writers, debug info handling, > exception handling code, BOLT IR/core data structures (BB, Loop, Function) > etc, pass managers etc. The Pass directory is also pretty flat. Some > preliminary reorganization with more tests added can reduce a lot of churns > in the future. WDYT? > > > > thanks, > > > > David > > > > > > > >> > >> > >> > >> Mehdi, your suggestion looks good, I intend to show everyone the > monorepo snapshot. We are making sure it is ready to be published and > that’s why I’ve been referring to our snapshot as “imagine our github repo > contents are under /bolt” because that is pretty much it, but I will > present it soon. > >> > >> > >> > >> > >> > >> > >> > >> From: Xinliang David Li <xinliangli at gmail.com> > >> Date: Thursday, March 11, 2021 at 11:33 PM > >> To: Chris Lattner <clattner at nondot.org> > >> Cc: Rafael Auler <rafaelauler at fb.com>, llvm-dev < > llvm-dev at lists.llvm.org>, Andrey Bokhanko <andreybokhanko at gmail.com> > >> Subject: Re: [llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, > Transformation, and Optimization > >> > >> Dropping Bolt to the top level directory sounds reasonable, but perhaps > a hybrid approach similar to what is mentioned by Medhi can be applied. > Basically Bolt first goes through a round of refactoring in github upstream > first with design that is close to the future structure in LLVM, and then > drops in as a monolithic piece initially. This will make future > restructuring much easier. There are other benefits: 1) it is a good > opportunity to clean up Bolt's internal APIs 2) It is time to beef up > unittests; 3) it makes code review easier. > >> > >> > >> > >> David > >> > >> > >> > >> On Thu, Mar 11, 2021 at 10:34 PM Chris Lattner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> > >> On Mar 11, 2021, at 9:40 PM, Rafael Auler via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> > >> > >> > >> Hi Mehdi and David, > >> > >> > >> > >> Indeed, we share similar concerns. We do intend to move functionality > of BOLT to live as a library, but the timeline is unclear. In fact, most of > BOLT could live in a library already, it’s just a matter of moving some > files into separate components. Instead of the files living in > tools/llvm-bolt, most could just be moved under lib/something, and we > already have a llvm-bolt.cpp file that instantiates the driver that > coordinates the binary rewriting process, which is the entry point of BOLT > as a library. People could already leverage this to use BOLT in different > ways (for example, I wrote some time ago a different utility that runs the > driver for two different binaries and compares the two – this was named > boltdiff later). > >> > >> > >> > >> My main reason for committing the project as a whole first, in the same > way as flang did, though, (as a project merged into the monorepo), is > because BOLT is already opensource for a while, and it is a 6-year old > project with about 800 commits and 50K lines of code and we know we have > people who forked the project and would like to contribute to it. If I > commit into LLVM a different BOLT (not just rebased), then I (a) break or > make it hard for any work on top of it from other contributors, (b) lose > the original history or make it harder to preserve it. That’s why I was > going for a more smoother transition. I, as a developer, put value in the > ability to blame and to understand why things were built a certain way, and > not bringing BOLT’s history (in the same way as flang did) would mean we > and the community loses a lot of context on the decisions of the project. > And I guess that’s also the rationale for a monorepo, to have multiple > projects merged together. > >> > >> > >> > >> Because of that, I initially put bolt under /bolt, following flang’s > model of merging the history so every developer has the right context. But > the original location was under llvm/tools. > >> > >> > >> > >> As with others, I’m not very aware of the internal architecture of > bolt, so take this with a grain of salt: > >> > >> > >> > >> From what I understand, I have a slight preference for starting this > out as a /bolt top level “subproject”, because the code currently sounds > monolithic. As the implementation logic is refactored into more reusable > units, those library can be cleanly movable within the monorepo, e.g. under > the llvm-project/llvm directory if appropriate. > >> > >> > >> > >> The advantage of doing this is that nothing in the llvm-project/llvm > repo can come to depend on the bolt code until and if it gets refactored. > This is also how things like LLDB started out (and it would be great for > more of the reusable libraries in LLDB to be merged into LLVM over time). > >> > >> > >> > >> Does anyone have any concerns about this approach? > >> > >> > >> > >> > >> > >> > >> > >> Unrelatedly, I’d also love to see the llvm repository exploded a bit > into more top level repos, e.g. splitting support/adt out to their own > thing. It is also worth considering splitting the MC layer out to its own > thing as well, LLVM IR and the mid-level optimizer into its own thing, and > CodeGen and the targets into its own thing. > >> > >> > >> > >> The major constraint we need is that we want the dependences between > top-level subproject to be a strong DAG between the subproject now and > defensible into the future, and we don’t want minor evolution of the > codebase to cause libraries to have to be moved around. The benefit of > splitting it up is easier to enforce layering, encouraging LLVM developers > to work across subproject a bit more, and making it easier for subproject > to depend on slices of “the big llvm directory”. > >> > >> > >> > >> -Chris > >> > >> > >> > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210317/f393e92d/attachment.html>