thr3ads.net - llvm dev - [llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Mehdi AMINI via llvm-dev

2021-Mar-17 19:53 UTC

[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

On Tue, Mar 16, 2021 at 4:24 AM Andrey Bokhanko via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Let me add my modest +1 vote to committing BOLT as it is, and *then*
> restructuring it as a part of LLVM development process -- with proper
> reviews, etc.
>
> This is how flang and OpenMP runtime had been added to LLVM project.
>
Actually if I remember correctly flang went through multiple months of
preparatory upgrade that were asked for by some people in the community,
and they did so out-of-tree before getting ready to land in a single merge.


> This is a sure way to start things going; otherwise we may end up with
> a project preparing for inclusion into LLVM ad infinitum.
>
We just have to make the expectation very clear and having a "moving
goalposts" situation and it should work fine. Any particular reason that
would put us in a "ad infinitum" situation?

-- 
Mehdi


>
> Yours,
> Andrey
>
>
>
>
> On Tue, Mar 16, 2021 at 7:16 AM Xinliang David Li <xinliangli at
gmail.com>
> wrote:
> >
> >
> >
> > On Fri, Mar 12, 2021 at 11:57 AM Rafael Auler <rafaelauler at
fb.com>
> wrote:
> >>
> >> Chris, the approach of living under /bolt sounds reasonable to me.
> >>
> >>
> >>
> >> Mehdi and David, the difference of doing things in-tree vs
out-of-tree
> is that, currently, BOLT out-of-tree has
> >>
> >>   (1) different legal requirements for accepting contributions
> (external contributions require devs to sign a CLA). So I agree with Mehdi
> that the same forks will get broken as we refactor code, but once BOLT is
> in the llvm monorepo, at least they will have the chance to upstream it
> with different legal requirements. If they don’t want to upstream it,
> that’s fine too, but I would like to give them a chance.
> >>   (2) a different development workflow that is less open than
LLVM’s.
> Because we want the input of the community on a refactoring that reflects
> how they want to use the libraries too, it would be more natural for this
> to happen inside in-tree LLVM.
> >>
> >>
> >>
> >> David, if we try to coordinate this refactoring happening in both
repos
> (library part in LLVM while the client part in our separate repo), that
> will be challenging to do because we wouldn’t be able to easily test the
> LLVM’s diffs – a problem we are already facing with upstreaming our changes
> to LLVM without BOLT being there to easily show devs how our changes are
> actually used and tested. Moreover, other contributors who don’t have easy
> access to our github repo will have a hard time working with us in the
> refactor as they wouldn’t be able to do work on the tool (just the open
> library).
> >
> >
> > Hi Rafael, I am not actually proposing an intermediate state where
parts
> of BOLT lives in LLVM while the client lives in a separate repo. What I
> meant is a restructuring step within BOLT before dropping in LLVM.  For
> instance, in the bolt's top directory, there are lots of different
things
> -- different driver programs, profile reader/writers, debug info handling,
> exception handling code, BOLT IR/core data structures (BB, Loop, Function)
> etc, pass managers etc. The Pass directory is also pretty flat.   Some
> preliminary reorganization with more tests added can reduce a lot of churns
> in the future. WDYT?
> >
> > thanks,
> >
> > David
> >
> >
> >
> >>
> >>
> >>
> >> Mehdi, your suggestion looks good, I intend to show everyone the
> monorepo snapshot. We are making sure it is ready to be published and
> that’s why I’ve been referring to our snapshot as “imagine our github repo
> contents are under /bolt” because that is pretty much it, but I will
> present it soon.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> From: Xinliang David Li <xinliangli at gmail.com>
> >> Date: Thursday, March 11, 2021 at 11:33 PM
> >> To: Chris Lattner <clattner at nondot.org>
> >> Cc: Rafael Auler <rafaelauler at fb.com>, llvm-dev <
> llvm-dev at lists.llvm.org>, Andrey Bokhanko <andreybokhanko at
gmail.com>
> >> Subject: Re: [llvm-dev] [RFC] BOLT: A Framework for Binary
Analysis,
> Transformation, and Optimization
> >>
> >> Dropping Bolt to the top level directory sounds reasonable, but
perhaps
> a hybrid approach similar to what is mentioned by Medhi can be applied.
> Basically Bolt first goes through a round of refactoring in github upstream
> first with design that is close to the future structure in LLVM, and then
> drops in as a monolithic piece initially. This will make future
> restructuring much easier. There are other benefits: 1) it is a good
> opportunity to clean up Bolt's internal APIs 2) It is time to beef up
> unittests;  3) it makes code review easier.
> >>
> >>
> >>
> >> David
> >>
> >>
> >>
> >> On Thu, Mar 11, 2021 at 10:34 PM Chris Lattner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >> On Mar 11, 2021, at 9:40 PM, Rafael Auler via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >>
> >>
> >> Hi Mehdi and David,
> >>
> >>
> >>
> >> Indeed, we share similar concerns. We do intend to move
functionality
> of BOLT to live as a library, but the timeline is unclear. In fact, most of
> BOLT could live in a library already, it’s just a matter of moving some
> files into separate components. Instead of the files living in
> tools/llvm-bolt, most could just be moved under lib/something, and we
> already have a llvm-bolt.cpp file that instantiates the driver that
> coordinates the binary rewriting process, which is the entry point of BOLT
> as a library. People could already leverage this to use BOLT in different
> ways (for example, I wrote some time ago a different utility that runs the
> driver for two different binaries and compares the two – this was named
> boltdiff later).
> >>
> >>
> >>
> >> My main reason for committing the project as a whole first, in the
same
> way as flang did, though, (as a project merged into the monorepo), is
> because BOLT is already opensource for a while, and it is a 6-year old
> project with about 800 commits and 50K lines of code and we know we have
> people who forked the project and would like to contribute to it. If I
> commit into LLVM a different BOLT (not just rebased), then I (a) break or
> make it hard for any work on top of it from other contributors, (b) lose
> the original history or make it harder to preserve it.  That’s why I was
> going for a more smoother transition. I, as a developer, put value in the
> ability to blame and to understand why things were built a certain way, and
> not bringing BOLT’s history (in the same way as flang did) would mean we
> and the community loses a lot of context on the decisions of the project.
> And I guess that’s also the rationale for a monorepo, to have multiple
> projects merged together.
> >>
> >>
> >>
> >> Because of that, I initially put bolt under /bolt, following
flang’s
> model of merging the history so every developer has the right context. But
> the original location was under llvm/tools.
> >>
> >>
> >>
> >> As with others, I’m not very aware of the internal architecture of
> bolt, so take this with a grain of salt:
> >>
> >>
> >>
> >> From what I understand, I have a slight preference for starting
this
> out as a /bolt top level “subproject”, because the code currently sounds
> monolithic.  As the implementation logic is refactored into more reusable
> units, those library can be cleanly movable within the monorepo, e.g. under
> the llvm-project/llvm directory if appropriate.
> >>
> >>
> >>
> >> The advantage of doing this is that nothing in the
llvm-project/llvm
> repo can come to depend on the bolt code until and if it gets refactored.
> This is also how things like LLDB started out (and it would be great for
> more of the reusable libraries in LLDB to be merged into LLVM over time).
> >>
> >>
> >>
> >> Does anyone have any concerns about this approach?
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Unrelatedly, I’d also love to see the llvm repository exploded a
bit
> into more top level repos, e.g. splitting support/adt out to their own
> thing.  It is also worth considering splitting the MC layer out to its own
> thing as well, LLVM IR and the mid-level optimizer into its own thing, and
> CodeGen and the targets into its own thing.
> >>
> >>
> >>
> >> The major constraint we need is that we want the dependences
between
> top-level subproject to be a strong DAG between the subproject now and
> defensible into the future, and we don’t want minor evolution of the
> codebase to cause libraries to have to be moved around.  The benefit of
> splitting it up is easier to enforce layering, encouraging LLVM developers
> to work across subproject a bit more, and making it easier for subproject
> to depend on slices of “the big llvm directory”.
> >>
> >>
> >>
> >> -Chris
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210317/f393e92d/attachment.html>

Eric Christopher via llvm-dev

2021-Mar-18 00:13 UTC

head link

[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

On Wed, Mar 17, 2021 at 3:55 PM Mehdi AMINI via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> On Tue, Mar 16, 2021 at 4:24 AM Andrey Bokhanko via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Let me add my modest +1 vote to committing BOLT as it is, and *then*
>> restructuring it as a part of LLVM development process -- with proper
>> reviews, etc.
>>
>> This is how flang and OpenMP runtime had been added to LLVM project.
>>
>
> Actually if I remember correctly flang went through multiple months of
> preparatory upgrade that were asked for by some people in the community,
> and they did so out-of-tree before getting ready to land in a single merge.
>
>As the person who requested the most changes for flang I concur here. There
was some negotiation as to what was reasonable to expect before and what
was easier to add after. I think we should get a proposal and a change that
shows what we're looking at as far as inclusion and we can make our
evaluations at this point.

Thanks!

-eric
>
>
>> This is a sure way to start things going; otherwise we may end up with
>> a project preparing for inclusion into LLVM ad infinitum.
>>
>
> We just have to make the expectation very clear and having a "moving
> goalposts" situation and it should work fine. Any particular reason
that
> would put us in a "ad infinitum" situation?
>
> --
> Mehdi
>
>
>
>>
>> Yours,
>> Andrey
>>
>>
>>
>>
>> On Tue, Mar 16, 2021 at 7:16 AM Xinliang David Li <xinliangli at
gmail.com>
>> wrote:
>> >
>> >
>> >
>> > On Fri, Mar 12, 2021 at 11:57 AM Rafael Auler <rafaelauler at
fb.com>
>> wrote:
>> >>
>> >> Chris, the approach of living under /bolt sounds reasonable to
me.
>> >>
>> >>
>> >>
>> >> Mehdi and David, the difference of doing things in-tree vs
out-of-tree
>> is that, currently, BOLT out-of-tree has
>> >>
>> >>   (1) different legal requirements for accepting contributions
>> (external contributions require devs to sign a CLA). So I agree with
Mehdi
>> that the same forks will get broken as we refactor code, but once BOLT
is
>> in the llvm monorepo, at least they will have the chance to upstream it
>> with different legal requirements. If they don’t want to upstream it,
>> that’s fine too, but I would like to give them a chance.
>> >>   (2) a different development workflow that is less open than
LLVM’s.
>> Because we want the input of the community on a refactoring that
reflects
>> how they want to use the libraries too, it would be more natural for
this
>> to happen inside in-tree LLVM.
>> >>
>> >>
>> >>
>> >> David, if we try to coordinate this refactoring happening in
both
>> repos (library part in LLVM while the client part in our separate
repo),
>> that will be challenging to do because we wouldn’t be able to easily
test
>> the LLVM’s diffs – a problem we are already facing with upstreaming our
>> changes to LLVM without BOLT being there to easily show devs how our
>> changes are actually used and tested. Moreover, other contributors who
>> don’t have easy access to our github repo will have a hard time working
>> with us in the refactor as they wouldn’t be able to do work on the tool
>> (just the open library).
>> >
>> >
>> > Hi Rafael, I am not actually proposing an intermediate state where
>> parts of BOLT lives in LLVM while the client lives in a separate repo.
What
>> I meant is a restructuring step within BOLT before dropping in LLVM. 
For
>> instance, in the bolt's top directory, there are lots of different
things
>> -- different driver programs, profile reader/writers, debug info
handling,
>> exception handling code, BOLT IR/core data structures (BB, Loop,
Function)
>> etc, pass managers etc. The Pass directory is also pretty flat.   Some
>> preliminary reorganization with more tests added can reduce a lot of
churns
>> in the future. WDYT?
>> >
>> > thanks,
>> >
>> > David
>> >
>> >
>> >
>> >>
>> >>
>> >>
>> >> Mehdi, your suggestion looks good, I intend to show everyone
the
>> monorepo snapshot. We are making sure it is ready to be published and
>> that’s why I’ve been referring to our snapshot as “imagine our github
repo
>> contents are under /bolt” because that is pretty much it, but I will
>> present it soon.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> From: Xinliang David Li <xinliangli at gmail.com>
>> >> Date: Thursday, March 11, 2021 at 11:33 PM
>> >> To: Chris Lattner <clattner at nondot.org>
>> >> Cc: Rafael Auler <rafaelauler at fb.com>, llvm-dev <
>> llvm-dev at lists.llvm.org>, Andrey Bokhanko <andreybokhanko at
gmail.com>
>> >> Subject: Re: [llvm-dev] [RFC] BOLT: A Framework for Binary
Analysis,
>> Transformation, and Optimization
>> >>
>> >> Dropping Bolt to the top level directory sounds reasonable,
but
>> perhaps a hybrid approach similar to what is mentioned by Medhi can be
>> applied. Basically Bolt first goes through a round of refactoring in
github
>> upstream first with design that is close to the future structure in
LLVM,
>> and then drops in as a monolithic piece initially. This will make
future
>> restructuring much easier. There are other benefits: 1) it is a good
>> opportunity to clean up Bolt's internal APIs 2) It is time to beef
up
>> unittests;  3) it makes code review easier.
>> >>
>> >>
>> >>
>> >> David
>> >>
>> >>
>> >>
>> >> On Thu, Mar 11, 2021 at 10:34 PM Chris Lattner via llvm-dev
<
>> llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> On Mar 11, 2021, at 9:40 PM, Rafael Auler via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >>
>> >>
>> >>
>> >> Hi Mehdi and David,
>> >>
>> >>
>> >>
>> >> Indeed, we share similar concerns. We do intend to move
functionality
>> of BOLT to live as a library, but the timeline is unclear. In fact,
most of
>> BOLT could live in a library already, it’s just a matter of moving some
>> files into separate components. Instead of the files living in
>> tools/llvm-bolt, most could just be moved under lib/something, and we
>> already have a llvm-bolt.cpp file that instantiates the driver that
>> coordinates the binary rewriting process, which is the entry point of
BOLT
>> as a library. People could already leverage this to use BOLT in
different
>> ways (for example, I wrote some time ago a different utility that runs
the
>> driver for two different binaries and compares the two – this was named
>> boltdiff later).
>> >>
>> >>
>> >>
>> >> My main reason for committing the project as a whole first, in
the
>> same way as flang did, though, (as a project merged into the monorepo),
is
>> because BOLT is already opensource for a while, and it is a 6-year old
>> project with about 800 commits and 50K lines of code and we know we
have
>> people who forked the project and would like to contribute to it. If I
>> commit into LLVM a different BOLT (not just rebased), then I (a) break
or
>> make it hard for any work on top of it from other contributors, (b)
lose
>> the original history or make it harder to preserve it.  That’s why I
was
>> going for a more smoother transition. I, as a developer, put value in
the
>> ability to blame and to understand why things were built a certain way,
and
>> not bringing BOLT’s history (in the same way as flang did) would mean
we
>> and the community loses a lot of context on the decisions of the
project.
>> And I guess that’s also the rationale for a monorepo, to have multiple
>> projects merged together.
>> >>
>> >>
>> >>
>> >> Because of that, I initially put bolt under /bolt, following
flang’s
>> model of merging the history so every developer has the right context.
But
>> the original location was under llvm/tools.
>> >>
>> >>
>> >>
>> >> As with others, I’m not very aware of the internal
architecture of
>> bolt, so take this with a grain of salt:
>> >>
>> >>
>> >>
>> >> From what I understand, I have a slight preference for
starting this
>> out as a /bolt top level “subproject”, because the code currently
sounds
>> monolithic.  As the implementation logic is refactored into more
reusable
>> units, those library can be cleanly movable within the monorepo, e.g.
under
>> the llvm-project/llvm directory if appropriate.
>> >>
>> >>
>> >>
>> >> The advantage of doing this is that nothing in the
llvm-project/llvm
>> repo can come to depend on the bolt code until and if it gets
refactored.
>> This is also how things like LLDB started out (and it would be great
for
>> more of the reusable libraries in LLDB to be merged into LLVM over
time).
>> >>
>> >>
>> >>
>> >> Does anyone have any concerns about this approach?
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Unrelatedly, I’d also love to see the llvm repository exploded
a bit
>> into more top level repos, e.g. splitting support/adt out to their own
>> thing.  It is also worth considering splitting the MC layer out to its
own
>> thing as well, LLVM IR and the mid-level optimizer into its own thing,
and
>> CodeGen and the targets into its own thing.
>> >>
>> >>
>> >>
>> >> The major constraint we need is that we want the dependences
between
>> top-level subproject to be a strong DAG between the subproject now and
>> defensible into the future, and we don’t want minor evolution of the
>> codebase to cause libraries to have to be moved around.  The benefit of
>> splitting it up is easier to enforce layering, encouraging LLVM
developers
>> to work across subproject a bit more, and making it easier for
subproject
>> to depend on slices of “the big llvm directory”.
>> >>
>> >>
>> >>
>> >> -Chris
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210317/045fb147/attachment.html>

Andrey Bokhanko via llvm-dev

2021-Mar-18 08:48 UTC

head link

[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

On Wed, Mar 17, 2021 at 10:54 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:> Actually if I remember correctly flang went through multiple months of
preparatory upgrade that were asked for by some people in the community, and
they did so out-of-tree before getting ready to land in a single merge.
I have to admit that contrary to OpenMP, that I followed very closely,
I only superficially followed flang development. Thus, I stand
corrected by Mehdi and Eric here.
> We just have to make the expectation very clear and having a "moving
goalposts" situation and it should work fine. Any particular reason that
would put us in a "ad infinitum" situation?
I said "we may end up" -- or we may not. :-) No particular reason
apart of history of software engineering. As you said, clear
expectations from the very start are a key ingredient to avoid this
happening.

IMHO, it's infinitely better to start project development in a wide
and mature open source community ASAP -- at expense of some potential
refactoring work -- rather than delay until code is "good enough".
This says a man who spent most of his life working on proprietary
projects and used to argue with Chandler that "proprietary development
model is less expensive and leads to higher quality" (now I know
better). Just one man's opinion. It's fine to disagree.

Yours,
Andrey

llvm dev - Mar 2021 - [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization