thr3ads.net - llvm dev - [llvm-dev] [RFC] llvm-bisectd: a bisection daemon for supporting bisection with parallel builds [Nov 2021]

If this information is useful, please help other people find it:
Share via:

Amara Emerson via llvm-dev

2021-Nov-02 17:22 UTC

[llvm-dev] [RFC] llvm-bisectd: a bisection daemon for supporting bisection with parallel builds

Hi all,

I’d just like to draw attention to some patch reviews for a new tool I’m
proposing, llvm-bisectd: https://reviews.llvm.org/D113030

There’s documentation in the Bisection.md file in the patch, which I’ll just
paste here for convenience.

# Bisection with llvm-bisectd

## Introduction

The `llvm-bisectd` tool allows LLVM developers to rapidly bisect miscompiles in
clang or other tools running in parallel. This document explains how the tool
works and how to leverage it for bisecting your own specific issues.

Bisection as a general debugging technique can be done in multiple ways. We can
bisect across the *time* dimension, which usually means that we're bisecting
commits made to LLVM. We could instead bisect across the dimension of the LLVM
codebase itself, disabling some optimizations and leaving others enabled, to
narrow down the configuration that reproduces the issue. We can also bisect in
the dimension of the target program being compiled, e.g. compiling some parts
with a known good configuration to narrow down the problematic location in the
program. The `llvm-bisectd` tool is intended to help with this last approach to
debugging: finding the place where a bug is introduced. It does so with the aim
of being minimally intrusive to the build system of the target program.

## High level design

The bisection process with `llvm-bisectd` uses a client/server model, where all
the state about the bisection is maintained by the `llvm-bisectd` daemon. The
compilation tools (e.g. clang) send requests and get responses back telling
them what to do. As a developer, debugging using this methodology is intended
to be simple, with the daemon taking care of most of the complexity.

### Bisection keys

This process relies on a user-defined key that's used to represent a
particular
action being done at a unique place in the target program's build. The key
is a
string to allow the most flexibility of data representation. `llvm-bisectd`
doesn't care what the meaning of the key is, as long as has the following
properties:
1. The key maps onto a specific place in the source program in a stable manner.
Even if the software is being built with multiple compilers running
concurrently, the key should not be affected.
2. Between one build of the target software and the next (clean) build, the
same set of keys should be generated exactly.

For our example of bisecting a novel optimization pass, a good choice of key
would be the module + function name of the target program being compiled. The
function name meets requirement 1. because each module + function string refers
to a unique place in the target program. (A module may not have two functions
with the same symbol name). The inclusion of the module name in the key helps
to disambiguate two local linkage functions with the same name in two different
translation units. The key also satisfies requirement 2. because the function
names are static between one build and the next (e.g. no random auto-generation
going on).

## Bisection workflow

The bisection process has two stages. The first is called the *learning* stage,
and the second is the main *bisection* stage. The purpose of the learning
stage is for the bisection daemon to *learn* about all the keys that will be
bisected through during each bisection round.

The first thing that needs to be done is that `llvm-bisectd` needs to be
started as a daemon.

```console
$ llvm-bisectd
bisectd > _
```

On start, `llvm-bisectd` initializes into the learning phase, so nothing else
needs to be done.

Then, the software project being debugged is built with the client tools like
clang having the bisection mode enabled. This can be a compiler flag or some
other mechanism. For example, to bisect GlobalISel across target functions,
we can pass `-mllvm -gisel-bisect-selection` to clang.

During the first build of the project, the client tools are sending a bisection
request to `llvm-bisectd` for each key. `llvm-bisectd` in the learning phase
just replies to the clients with the answer "YES". In the background,
it's
storing each unique key it receives into a vector for later.

### Bisection phase

After the first build is done, the learning phase is over, and `llvm-bisectd`
should know about all the keys that will be requested in future builds.
We can start the bisection phase now by using the `start-bisection` command in
the `llvm-bisectd` command interpreter.

```
bisectd > start-bisect
Starting bisection with 17306 total keys to search
bisectd > _
```

We're now in the bisection phase. Now, we perform the following actions in a
repeatedly until `llvm-bisectd` terminates with an answer.
1. Do a clean build of the project (with the bisection flags as before)
2. Test the resulting build to see if it still exhibits the bug.
3. If the bug remains, then we type the command `bad` into the `llvm-bisectd`
interpreter. If the bug has disappeared, we type the `good` command instead.

And that's it! Eventually the bisection will finish and `llvm-bisectd` will
print the *key* that, when enabled, triggers the bug.

``` console
Bisection completed. Failing key was:
/work/testing/llvm-test-suite/CTMark/tramp3d-v4/tramp3d-v4.cpp
_ZN17MultiArgEvaluatorI16MainEvaluatorTagE13createIterateI9MultiArg3I5FieldI22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEEd10BrickViewUESC_SC_EN4Adv51Z13MomentumfluxZILi3EEELi3E15EvaluateLocLoopISH_Li3EEEEvRKT_RKT0_RK8IntervalIXT1_EER14ScalarCodeInfoIXT1_EXsrSK_4sizeEERKT2_
Exiting...
```

## Adding bisection support in clients

Adding support for bisecting a new type of action is simple. The client only
needs to generate a key at the point where bisection is needed, and then use
client utilities in `lib/Support/RemoteBisectorClient.cpp` to talk to the
daemon. For example, if the bisection is to done for a `FunctionPass`
optimization, then one place to add the code would be to the `runOnFunction()`
method, using the function name as a key.

```C++
bool runOnFunction(Function &F) {
// ...
if (EnableBisectForNewOptimization) {
std::string Key = F.getParent()->getSourceFileName() + " "
+ F.getName().str();
RemoteBisectClient BisectClient;
if (!BisectClient.shouldPerformAction(Key))
return false; // Bisector daemon told us to skip this action.
}
// Continue with the optimization
// ...
}
```

Thanks,
Amara

David Blaikie via llvm-dev

2021-Nov-02 17:40 UTC

head link

[llvm-dev] [RFC] llvm-bisectd: a bisection daemon for supporting bisection with parallel builds

Nifty! Look forward to seeing how that shakes out/is built upon/etc.

On Tue, Nov 2, 2021 at 10:23 AM Amara Emerson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> I’d just like to draw attention to some patch reviews for a new tool I’m
> proposing, llvm-bisectd: https://reviews.llvm.org/D113030
>
> There’s documentation in the Bisection.md file in the patch, which I’ll
> just paste here for convenience.
>
> # Bisection with llvm-bisectd
>
> ## Introduction
>
> The `llvm-bisectd` tool allows LLVM developers to rapidly bisect
> miscompiles in
> clang or other tools running in parallel. This document explains how the
> tool
> works and how to leverage it for bisecting your own specific issues.
>
> Bisection as a general debugging technique can be done in multiple ways.
> We can
> bisect across the *time* dimension, which usually means that we're
> bisecting
> commits made to LLVM. We could instead bisect across the dimension of the
> LLVM
> codebase itself, disabling some optimizations and leaving others enabled,
> to
> narrow down the configuration that reproduces the issue. We can also
> bisect in
> the dimension of the target program being compiled, e.g. compiling some
> parts
> with a known good configuration to narrow down the problematic location in
> the
> program. The `llvm-bisectd` tool is intended to help with this last
> approach to
> debugging: finding the place where a bug is introduced. It does so with
> the aim
> of being minimally intrusive to the build system of the target program.
>
> ## High level design
>
> The bisection process with `llvm-bisectd` uses a client/server model,
> where all
> the state about the bisection is maintained by the `llvm-bisectd` daemon.
> The
> compilation tools (e.g. clang) send requests and get responses back telling
> them what to do. As a developer, debugging using this methodology is
> intended
> to be simple, with the daemon taking care of most of the complexity.
>
> ### Bisection keys
>
> This process relies on a user-defined key that's used to represent a
> particular
> action being done at a unique place in the target program's build. The
key
> is a
> string to allow the most flexibility of data representation. `llvm-bisectd`
> doesn't care what the meaning of the key is, as long as has the
following
> properties:
>  1. The key maps onto a specific place in the source program in a stable
> manner.
>     Even if the software is being built with multiple compilers running
>     concurrently, the key should not be affected.
>  2. Between one build of the target software and the next (clean) build,
> the
>     same set of keys should be generated exactly.
>
> For our example of bisecting a novel optimization pass, a good choice of
> key
> would be the module + function name of the target program being compiled.
> The
> function name meets requirement 1. because each module + function string
> refers
> to a unique place in the target program. (A module may not have two
> functions
> with the same symbol name). The inclusion of the module name in the key
> helps
> to disambiguate two local linkage functions with the same name in two
> different
> translation units. The key also satisfies requirement 2. because the
> function
> names are static between one build and the next (e.g. no random
> auto-generation
> going on).
>
> ## Bisection workflow
>
> The bisection process has two stages. The first is called the *learning*
> stage,
> and the second is the main *bisection* stage. The purpose of the learning
> stage is for the bisection daemon to *learn* about all the keys that will
> be
> bisected through during each bisection round.
>
> The first thing that needs to be done is that `llvm-bisectd` needs to be
> started as a daemon.
>
> ```console
> $ llvm-bisectd
> bisectd > _
> ```
>
> On start, `llvm-bisectd` initializes into the learning phase, so nothing
> else
> needs to be done.
>
> Then, the software project being debugged is built with the client tools
> like
> clang having the bisection mode enabled. This can be a compiler flag or
> some
> other mechanism. For example, to bisect GlobalISel across target functions,
> we can pass `-mllvm -gisel-bisect-selection` to clang.
>
> During the first build of the project, the client tools are sending a
> bisection
> request to `llvm-bisectd` for each key. `llvm-bisectd` in the learning
> phase
> just replies to the clients with the answer "YES". In the
background, it's
> storing each unique key it receives into a vector for later.
>
> ### Bisection phase
>
> After the first build is done, the learning phase is over, and
> `llvm-bisectd`
> should know about all the keys that will be requested in future builds.
> We can start the bisection phase now by using the `start-bisection`
> command in
> the `llvm-bisectd` command interpreter.
>
> ```
> bisectd > start-bisect
> Starting bisection with 17306 total keys to search
> bisectd > _
> ```
>
> We're now in the bisection phase. Now, we perform the following actions
in
> a
> repeatedly until `llvm-bisectd` terminates with an answer.
>  1. Do a clean build of the project (with the bisection flags as before)
>  2. Test the resulting build to see if it still exhibits the bug.
>  3. If the bug remains, then we type the command `bad` into the
> `llvm-bisectd`
>     interpreter. If the bug has disappeared, we type the `good` command
> instead.
>
> And that's it! Eventually the bisection will finish and `llvm-bisectd`
will
> print the *key* that, when enabled, triggers the bug.
>
> ``` console
> Bisection completed. Failing key was:
> /work/testing/llvm-test-suite/CTMark/tramp3d-v4/tramp3d-v4.cpp
>
_ZN17MultiArgEvaluatorI16MainEvaluatorTagE13createIterateI9MultiArg3I5FieldI22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEEd10BrickViewUESC_SC_EN4Adv51Z13MomentumfluxZILi3EEELi3E15EvaluateLocLoopISH_Li3EEEEvRKT_RKT0_RK8IntervalIXT1_EER14ScalarCodeInfoIXT1_EXsrSK_4sizeEERKT2_
> Exiting...
> ```
>
> ## Adding bisection support in clients
>
> Adding support for bisecting a new type of action is simple. The client
> only
> needs to generate a key at the point where bisection is needed, and then
> use
> client utilities in `lib/Support/RemoteBisectorClient.cpp` to talk to the
> daemon. For example, if the bisection is to done for a `FunctionPass`
> optimization, then one place to add the code would be to the
> `runOnFunction()`
> method, using the function name as a key.
>
> ```C++
> bool runOnFunction(Function &F) {
>   // ...
>   if (EnableBisectForNewOptimization) {
>     std::string Key = F.getParent()->getSourceFileName() + " "
>                         + F.getName().str();
>     RemoteBisectClient BisectClient;
>     if (!BisectClient.shouldPerformAction(Key))
>       return false; // Bisector daemon told us to skip this action.
>   }
>   // Continue with the optimization
>   // ...
> }
> ```
>
> Thanks,
> Amara
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211102/c665519a/attachment.html>

llvm dev - Nov 2021 - [RFC] llvm-bisectd: a bisection daemon for supporting bisection with parallel builds

[llvm-dev] [RFC] llvm-bisectd: a bisection daemon for supporting bisection with parallel builds

[llvm-dev] [RFC] llvm-bisectd: a bisection daemon for supporting bisection with parallel builds