Mehdi AMINI via llvm-dev
2021-Mar-17 19:53 UTC
[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization
On Tue, Mar 16, 2021 at 4:24 AM Andrey Bokhanko via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Let me add my modest +1 vote to committing BOLT as it is, and *then* > restructuring it as a part of LLVM development process -- with proper > reviews, etc. > > This is how flang and OpenMP runtime had been added to LLVM project. >Actually if I remember correctly flang went through multiple months of preparatory upgrade that were asked for by some people in the community, and they did so out-of-tree before getting ready to land in a single merge.> This is a sure way to start things going; otherwise we may end up with > a project preparing for inclusion into LLVM ad infinitum. >We just have to make the expectation very clear and having a "moving goalposts" situation and it should work fine. Any particular reason that would put us in a "ad infinitum" situation? -- Mehdi> > Yours, > Andrey > > > > > On Tue, Mar 16, 2021 at 7:16 AM Xinliang David Li <xinliangli at gmail.com> > wrote: > > > > > > > > On Fri, Mar 12, 2021 at 11:57 AM Rafael Auler <rafaelauler at fb.com> > wrote: > >> > >> Chris, the approach of living under /bolt sounds reasonable to me. > >> > >> > >> > >> Mehdi and David, the difference of doing things in-tree vs out-of-tree > is that, currently, BOLT out-of-tree has > >> > >> (1) different legal requirements for accepting contributions > (external contributions require devs to sign a CLA). So I agree with Mehdi > that the same forks will get broken as we refactor code, but once BOLT is > in the llvm monorepo, at least they will have the chance to upstream it > with different legal requirements. If they don’t want to upstream it, > that’s fine too, but I would like to give them a chance. > >> (2) a different development workflow that is less open than LLVM’s. > Because we want the input of the community on a refactoring that reflects > how they want to use the libraries too, it would be more natural for this > to happen inside in-tree LLVM. > >> > >> > >> > >> David, if we try to coordinate this refactoring happening in both repos > (library part in LLVM while the client part in our separate repo), that > will be challenging to do because we wouldn’t be able to easily test the > LLVM’s diffs – a problem we are already facing with upstreaming our changes > to LLVM without BOLT being there to easily show devs how our changes are > actually used and tested. Moreover, other contributors who don’t have easy > access to our github repo will have a hard time working with us in the > refactor as they wouldn’t be able to do work on the tool (just the open > library). > > > > > > Hi Rafael, I am not actually proposing an intermediate state where parts > of BOLT lives in LLVM while the client lives in a separate repo. What I > meant is a restructuring step within BOLT before dropping in LLVM. For > instance, in the bolt's top directory, there are lots of different things > -- different driver programs, profile reader/writers, debug info handling, > exception handling code, BOLT IR/core data structures (BB, Loop, Function) > etc, pass managers etc. The Pass directory is also pretty flat. Some > preliminary reorganization with more tests added can reduce a lot of churns > in the future. WDYT? > > > > thanks, > > > > David > > > > > > > >> > >> > >> > >> Mehdi, your suggestion looks good, I intend to show everyone the > monorepo snapshot. We are making sure it is ready to be published and > that’s why I’ve been referring to our snapshot as “imagine our github repo > contents are under /bolt” because that is pretty much it, but I will > present it soon. > >> > >> > >> > >> > >> > >> > >> > >> From: Xinliang David Li <xinliangli at gmail.com> > >> Date: Thursday, March 11, 2021 at 11:33 PM > >> To: Chris Lattner <clattner at nondot.org> > >> Cc: Rafael Auler <rafaelauler at fb.com>, llvm-dev < > llvm-dev at lists.llvm.org>, Andrey Bokhanko <andreybokhanko at gmail.com> > >> Subject: Re: [llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, > Transformation, and Optimization > >> > >> Dropping Bolt to the top level directory sounds reasonable, but perhaps > a hybrid approach similar to what is mentioned by Medhi can be applied. > Basically Bolt first goes through a round of refactoring in github upstream > first with design that is close to the future structure in LLVM, and then > drops in as a monolithic piece initially. This will make future > restructuring much easier. There are other benefits: 1) it is a good > opportunity to clean up Bolt's internal APIs 2) It is time to beef up > unittests; 3) it makes code review easier. > >> > >> > >> > >> David > >> > >> > >> > >> On Thu, Mar 11, 2021 at 10:34 PM Chris Lattner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> > >> On Mar 11, 2021, at 9:40 PM, Rafael Auler via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> > >> > >> > >> Hi Mehdi and David, > >> > >> > >> > >> Indeed, we share similar concerns. We do intend to move functionality > of BOLT to live as a library, but the timeline is unclear. In fact, most of > BOLT could live in a library already, it’s just a matter of moving some > files into separate components. Instead of the files living in > tools/llvm-bolt, most could just be moved under lib/something, and we > already have a llvm-bolt.cpp file that instantiates the driver that > coordinates the binary rewriting process, which is the entry point of BOLT > as a library. People could already leverage this to use BOLT in different > ways (for example, I wrote some time ago a different utility that runs the > driver for two different binaries and compares the two – this was named > boltdiff later). > >> > >> > >> > >> My main reason for committing the project as a whole first, in the same > way as flang did, though, (as a project merged into the monorepo), is > because BOLT is already opensource for a while, and it is a 6-year old > project with about 800 commits and 50K lines of code and we know we have > people who forked the project and would like to contribute to it. If I > commit into LLVM a different BOLT (not just rebased), then I (a) break or > make it hard for any work on top of it from other contributors, (b) lose > the original history or make it harder to preserve it. That’s why I was > going for a more smoother transition. I, as a developer, put value in the > ability to blame and to understand why things were built a certain way, and > not bringing BOLT’s history (in the same way as flang did) would mean we > and the community loses a lot of context on the decisions of the project. > And I guess that’s also the rationale for a monorepo, to have multiple > projects merged together. > >> > >> > >> > >> Because of that, I initially put bolt under /bolt, following flang’s > model of merging the history so every developer has the right context. But > the original location was under llvm/tools. > >> > >> > >> > >> As with others, I’m not very aware of the internal architecture of > bolt, so take this with a grain of salt: > >> > >> > >> > >> From what I understand, I have a slight preference for starting this > out as a /bolt top level “subproject”, because the code currently sounds > monolithic. As the implementation logic is refactored into more reusable > units, those library can be cleanly movable within the monorepo, e.g. under > the llvm-project/llvm directory if appropriate. > >> > >> > >> > >> The advantage of doing this is that nothing in the llvm-project/llvm > repo can come to depend on the bolt code until and if it gets refactored. > This is also how things like LLDB started out (and it would be great for > more of the reusable libraries in LLDB to be merged into LLVM over time). > >> > >> > >> > >> Does anyone have any concerns about this approach? > >> > >> > >> > >> > >> > >> > >> > >> Unrelatedly, I’d also love to see the llvm repository exploded a bit > into more top level repos, e.g. splitting support/adt out to their own > thing. It is also worth considering splitting the MC layer out to its own > thing as well, LLVM IR and the mid-level optimizer into its own thing, and > CodeGen and the targets into its own thing. > >> > >> > >> > >> The major constraint we need is that we want the dependences between > top-level subproject to be a strong DAG between the subproject now and > defensible into the future, and we don’t want minor evolution of the > codebase to cause libraries to have to be moved around. The benefit of > splitting it up is easier to enforce layering, encouraging LLVM developers > to work across subproject a bit more, and making it easier for subproject > to depend on slices of “the big llvm directory”. > >> > >> > >> > >> -Chris > >> > >> > >> > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210317/f393e92d/attachment.html>
Eric Christopher via llvm-dev
2021-Mar-18 00:13 UTC
[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization
On Wed, Mar 17, 2021 at 3:55 PM Mehdi AMINI via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > On Tue, Mar 16, 2021 at 4:24 AM Andrey Bokhanko via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Let me add my modest +1 vote to committing BOLT as it is, and *then* >> restructuring it as a part of LLVM development process -- with proper >> reviews, etc. >> >> This is how flang and OpenMP runtime had been added to LLVM project. >> > > Actually if I remember correctly flang went through multiple months of > preparatory upgrade that were asked for by some people in the community, > and they did so out-of-tree before getting ready to land in a single merge. > >As the person who requested the most changes for flang I concur here. There was some negotiation as to what was reasonable to expect before and what was easier to add after. I think we should get a proposal and a change that shows what we're looking at as far as inclusion and we can make our evaluations at this point. Thanks! -eric> > >> This is a sure way to start things going; otherwise we may end up with >> a project preparing for inclusion into LLVM ad infinitum. >> > > We just have to make the expectation very clear and having a "moving > goalposts" situation and it should work fine. Any particular reason that > would put us in a "ad infinitum" situation? > > -- > Mehdi > > > >> >> Yours, >> Andrey >> >> >> >> >> On Tue, Mar 16, 2021 at 7:16 AM Xinliang David Li <xinliangli at gmail.com> >> wrote: >> > >> > >> > >> > On Fri, Mar 12, 2021 at 11:57 AM Rafael Auler <rafaelauler at fb.com> >> wrote: >> >> >> >> Chris, the approach of living under /bolt sounds reasonable to me. >> >> >> >> >> >> >> >> Mehdi and David, the difference of doing things in-tree vs out-of-tree >> is that, currently, BOLT out-of-tree has >> >> >> >> (1) different legal requirements for accepting contributions >> (external contributions require devs to sign a CLA). So I agree with Mehdi >> that the same forks will get broken as we refactor code, but once BOLT is >> in the llvm monorepo, at least they will have the chance to upstream it >> with different legal requirements. If they don’t want to upstream it, >> that’s fine too, but I would like to give them a chance. >> >> (2) a different development workflow that is less open than LLVM’s. >> Because we want the input of the community on a refactoring that reflects >> how they want to use the libraries too, it would be more natural for this >> to happen inside in-tree LLVM. >> >> >> >> >> >> >> >> David, if we try to coordinate this refactoring happening in both >> repos (library part in LLVM while the client part in our separate repo), >> that will be challenging to do because we wouldn’t be able to easily test >> the LLVM’s diffs – a problem we are already facing with upstreaming our >> changes to LLVM without BOLT being there to easily show devs how our >> changes are actually used and tested. Moreover, other contributors who >> don’t have easy access to our github repo will have a hard time working >> with us in the refactor as they wouldn’t be able to do work on the tool >> (just the open library). >> > >> > >> > Hi Rafael, I am not actually proposing an intermediate state where >> parts of BOLT lives in LLVM while the client lives in a separate repo. What >> I meant is a restructuring step within BOLT before dropping in LLVM. For >> instance, in the bolt's top directory, there are lots of different things >> -- different driver programs, profile reader/writers, debug info handling, >> exception handling code, BOLT IR/core data structures (BB, Loop, Function) >> etc, pass managers etc. The Pass directory is also pretty flat. Some >> preliminary reorganization with more tests added can reduce a lot of churns >> in the future. WDYT? >> > >> > thanks, >> > >> > David >> > >> > >> > >> >> >> >> >> >> >> >> Mehdi, your suggestion looks good, I intend to show everyone the >> monorepo snapshot. We are making sure it is ready to be published and >> that’s why I’ve been referring to our snapshot as “imagine our github repo >> contents are under /bolt” because that is pretty much it, but I will >> present it soon. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> From: Xinliang David Li <xinliangli at gmail.com> >> >> Date: Thursday, March 11, 2021 at 11:33 PM >> >> To: Chris Lattner <clattner at nondot.org> >> >> Cc: Rafael Auler <rafaelauler at fb.com>, llvm-dev < >> llvm-dev at lists.llvm.org>, Andrey Bokhanko <andreybokhanko at gmail.com> >> >> Subject: Re: [llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, >> Transformation, and Optimization >> >> >> >> Dropping Bolt to the top level directory sounds reasonable, but >> perhaps a hybrid approach similar to what is mentioned by Medhi can be >> applied. Basically Bolt first goes through a round of refactoring in github >> upstream first with design that is close to the future structure in LLVM, >> and then drops in as a monolithic piece initially. This will make future >> restructuring much easier. There are other benefits: 1) it is a good >> opportunity to clean up Bolt's internal APIs 2) It is time to beef up >> unittests; 3) it makes code review easier. >> >> >> >> >> >> >> >> David >> >> >> >> >> >> >> >> On Thu, Mar 11, 2021 at 10:34 PM Chris Lattner via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >> >> >> On Mar 11, 2021, at 9:40 PM, Rafael Auler via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >> >> >> >> >> >> >> Hi Mehdi and David, >> >> >> >> >> >> >> >> Indeed, we share similar concerns. We do intend to move functionality >> of BOLT to live as a library, but the timeline is unclear. In fact, most of >> BOLT could live in a library already, it’s just a matter of moving some >> files into separate components. Instead of the files living in >> tools/llvm-bolt, most could just be moved under lib/something, and we >> already have a llvm-bolt.cpp file that instantiates the driver that >> coordinates the binary rewriting process, which is the entry point of BOLT >> as a library. People could already leverage this to use BOLT in different >> ways (for example, I wrote some time ago a different utility that runs the >> driver for two different binaries and compares the two – this was named >> boltdiff later). >> >> >> >> >> >> >> >> My main reason for committing the project as a whole first, in the >> same way as flang did, though, (as a project merged into the monorepo), is >> because BOLT is already opensource for a while, and it is a 6-year old >> project with about 800 commits and 50K lines of code and we know we have >> people who forked the project and would like to contribute to it. If I >> commit into LLVM a different BOLT (not just rebased), then I (a) break or >> make it hard for any work on top of it from other contributors, (b) lose >> the original history or make it harder to preserve it. That’s why I was >> going for a more smoother transition. I, as a developer, put value in the >> ability to blame and to understand why things were built a certain way, and >> not bringing BOLT’s history (in the same way as flang did) would mean we >> and the community loses a lot of context on the decisions of the project. >> And I guess that’s also the rationale for a monorepo, to have multiple >> projects merged together. >> >> >> >> >> >> >> >> Because of that, I initially put bolt under /bolt, following flang’s >> model of merging the history so every developer has the right context. But >> the original location was under llvm/tools. >> >> >> >> >> >> >> >> As with others, I’m not very aware of the internal architecture of >> bolt, so take this with a grain of salt: >> >> >> >> >> >> >> >> From what I understand, I have a slight preference for starting this >> out as a /bolt top level “subproject”, because the code currently sounds >> monolithic. As the implementation logic is refactored into more reusable >> units, those library can be cleanly movable within the monorepo, e.g. under >> the llvm-project/llvm directory if appropriate. >> >> >> >> >> >> >> >> The advantage of doing this is that nothing in the llvm-project/llvm >> repo can come to depend on the bolt code until and if it gets refactored. >> This is also how things like LLDB started out (and it would be great for >> more of the reusable libraries in LLDB to be merged into LLVM over time). >> >> >> >> >> >> >> >> Does anyone have any concerns about this approach? >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Unrelatedly, I’d also love to see the llvm repository exploded a bit >> into more top level repos, e.g. splitting support/adt out to their own >> thing. It is also worth considering splitting the MC layer out to its own >> thing as well, LLVM IR and the mid-level optimizer into its own thing, and >> CodeGen and the targets into its own thing. >> >> >> >> >> >> >> >> The major constraint we need is that we want the dependences between >> top-level subproject to be a strong DAG between the subproject now and >> defensible into the future, and we don’t want minor evolution of the >> codebase to cause libraries to have to be moved around. The benefit of >> splitting it up is easier to enforce layering, encouraging LLVM developers >> to work across subproject a bit more, and making it easier for subproject >> to depend on slices of “the big llvm directory”. >> >> >> >> >> >> >> >> -Chris >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> llvm-dev at lists.llvm.org >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210317/045fb147/attachment.html>
Andrey Bokhanko via llvm-dev
2021-Mar-18 08:48 UTC
[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization
On Wed, Mar 17, 2021 at 10:54 PM Mehdi AMINI <joker.eph at gmail.com> wrote:> Actually if I remember correctly flang went through multiple months of preparatory upgrade that were asked for by some people in the community, and they did so out-of-tree before getting ready to land in a single merge.I have to admit that contrary to OpenMP, that I followed very closely, I only superficially followed flang development. Thus, I stand corrected by Mehdi and Eric here.> We just have to make the expectation very clear and having a "moving goalposts" situation and it should work fine. Any particular reason that would put us in a "ad infinitum" situation?I said "we may end up" -- or we may not. :-) No particular reason apart of history of software engineering. As you said, clear expectations from the very start are a key ingredient to avoid this happening. IMHO, it's infinitely better to start project development in a wide and mature open source community ASAP -- at expense of some potential refactoring work -- rather than delay until code is "good enough". This says a man who spent most of his life working on proprietary projects and used to argue with Chandler that "proprietary development model is less expensive and leads to higher quality" (now I know better). Just one man's opinion. It's fine to disagree. Yours, Andrey