thr3ads.net - llvm dev - [llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Chris Lattner via llvm-dev

2021-Mar-12 06:34 UTC

[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

On Mar 11, 2021, at 9:40 PM, Rafael Auler via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> Hi Mehdi and David,
>  
> Indeed, we share similar concerns. We do intend to move functionality of
BOLT to live as a library, but the timeline is unclear. In fact, most of BOLT
could live in a library already, it’s just a matter of moving some files into
separate components. Instead of the files living in tools/llvm-bolt, most could
just be moved under lib/something, and we already have a llvm-bolt.cpp file that
instantiates the driver that coordinates the binary rewriting process, which is
the entry point of BOLT as a library. People could already leverage this to use
BOLT in different ways (for example, I wrote some time ago a different utility
that runs the driver for two different binaries and compares the two – this was
named boltdiff later).
>  
> My main reason for committing the project as a whole first, in the same way
as flang did, though, (as a project merged into the monorepo), is because BOLT
is already opensource for a while, and it is a 6-year old project with about 800
commits and 50K lines of code and we know we have people who forked the project
and would like to contribute to it. If I commit into LLVM a different BOLT (not
just rebased), then I (a) break or make it hard for any work on top of it from
other contributors, (b) lose the original history or make it harder to preserve
it.  That’s why I was going for a more smoother transition. I, as a developer,
put value in the ability to blame and to understand why things were built a
certain way, and not bringing BOLT’s history (in the same way as flang did)
would mean we and the community loses a lot of context on the decisions of the
project. And I guess that’s also the rationale for a monorepo, to have multiple
projects merged together.
>  
> Because of that, I initially put bolt under /bolt, following flang’s model
of merging the history so every developer has the right context. But the
original location was under llvm/tools.
As with others, I’m not very aware of the internal architecture of bolt, so take
this with a grain of salt:

From what I understand, I have a slight preference for starting this out as a
/bolt top level “subproject”, because the code currently sounds monolithic.  As
the implementation logic is refactored into more reusable units, those library
can be cleanly movable within the monorepo, e.g. under the llvm-project/llvm
directory if appropriate.

The advantage of doing this is that nothing in the llvm-project/llvm repo can
come to depend on the bolt code until and if it gets refactored.  This is also
how things like LLDB started out (and it would be great for more of the reusable
libraries in LLDB to be merged into LLVM over time).

Does anyone have any concerns about this approach?

Unrelatedly, I’d also love to see the llvm repository exploded a bit into more
top level repos, e.g. splitting support/adt out to their own thing.  It is also
worth considering splitting the MC layer out to its own thing as well, LLVM IR
and the mid-level optimizer into its own thing, and CodeGen and the targets into
its own thing.

The major constraint we need is that we want the dependences between top-level
subproject to be a strong DAG between the subproject now and defensible into the
future, and we don’t want minor evolution of the codebase to cause libraries to
have to be moved around.  The benefit of splitting it up is easier to enforce
layering, encouraging LLVM developers to work across subproject a bit more, and
making it easier for subproject to depend on slices of “the big llvm directory”.

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210311/af7c67ca/attachment.html>

Xinliang David Li via llvm-dev

2021-Mar-12 07:33 UTC

head link

[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

Dropping Bolt to the top level directory sounds reasonable, but perhaps a
hybrid approach similar to what is mentioned by Medhi can be applied.
Basically Bolt first goes through a round of refactoring in github upstream
first with design that is close to the future structure in LLVM, and then
drops in as a monolithic piece initially. This will make future
restructuring much easier. There are other benefits: 1) it is a good
opportunity to clean up Bolt's internal APIs 2) It is time to beef up
unittests;  3) it makes code review easier.

David

On Thu, Mar 11, 2021 at 10:34 PM Chris Lattner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Mar 11, 2021, at 9:40 PM, Rafael Auler via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> Hi Mehdi and David,
>
> Indeed, we share similar concerns. We do intend to move functionality of
> BOLT to live as a library, but the timeline is unclear. In fact, most of
> BOLT could live in a library already, it’s just a matter of moving some
> files into separate components. Instead of the files living in
> tools/llvm-bolt, most could just be moved under lib/something, and we
> already have a llvm-bolt.cpp file that instantiates the driver that
> coordinates the binary rewriting process, which is the entry point of BOLT
> as a library. People could already leverage this to use BOLT in different
> ways (for example, I wrote some time ago a different utility that runs the
> driver for two different binaries and compares the two – this was named
> boltdiff later).
>
> My main reason for committing the project as a whole first, in the same
> way as flang did, though, (as a project merged into the monorepo), is
> because BOLT is already opensource for a while, and it is a 6-year old
> project with about 800 commits and 50K lines of code and we know we have
> people who forked the project and would like to contribute to it. If I
> commit into LLVM a different BOLT (not just rebased), then I (a) break or
> make it hard for any work on top of it from other contributors, (b) lose
> the original history or make it harder to preserve it.  That’s why I was
> going for a more smoother transition. I, as a developer, put value in the
> ability to blame and to understand why things were built a certain way, and
> not bringing BOLT’s history (in the same way as flang did) would mean we
> and the community loses a lot of context on the decisions of the project.
> And I guess that’s also the rationale for a monorepo, to have multiple
> projects merged together.
>
> Because of that, I initially put bolt under /bolt, following flang’s model
> of merging the history so every developer has the right context. But the
> original location was under llvm/tools.
>
>
> As with others, I’m not very aware of the internal architecture of bolt,
> so take this with a grain of salt:
>
> From what I understand, I have a slight preference for starting this out
> as a /bolt top level “subproject”, because the code currently sounds
> monolithic.  As the implementation logic is refactored into more reusable
> units, those library can be cleanly movable within the monorepo, e.g. under
> the llvm-project/llvm directory if appropriate.
>
> The advantage of doing this is that nothing in the llvm-project/llvm repo
> can come to depend on the bolt code until and if it gets refactored.  This
> is also how things like LLDB started out (and it would be great for more of
> the reusable libraries in LLDB to be merged into LLVM over time).
>
> Does anyone have any concerns about this approach?
>
>
>
> Unrelatedly, I’d also love to see the llvm repository exploded a bit into
> more top level repos, e.g. splitting support/adt out to their own thing.
> It is also worth considering splitting the MC layer out to its own thing as
> well, LLVM IR and the mid-level optimizer into its own thing, and CodeGen
> and the targets into its own thing.
>
> The major constraint we need is that we want the dependences between
> top-level subproject to be a strong DAG between the subproject now and
> defensible into the future, and we don’t want minor evolution of the
> codebase to cause libraries to have to be moved around.  The benefit of
> splitting it up is easier to enforce layering, encouraging LLVM developers
> to work across subproject a bit more, and making it easier for subproject
> to depend on slices of “the big llvm directory”.
>
> -Chris
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210311/366159a3/attachment-0001.html>

llvm dev - Mar 2021 - [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization

[llvm-dev] [RFC] BOLT: A Framework for Binary Analysis, Transformation, and Optimization