Amara Emerson via llvm-dev
2021-Nov-02 17:22 UTC
[llvm-dev] [RFC] llvm-bisectd: a bisection daemon for supporting bisection with parallel builds
Hi all, I’d just like to draw attention to some patch reviews for a new tool I’m proposing, llvm-bisectd: https://reviews.llvm.org/D113030 There’s documentation in the Bisection.md file in the patch, which I’ll just paste here for convenience. # Bisection with llvm-bisectd ## Introduction The `llvm-bisectd` tool allows LLVM developers to rapidly bisect miscompiles in clang or other tools running in parallel. This document explains how the tool works and how to leverage it for bisecting your own specific issues. Bisection as a general debugging technique can be done in multiple ways. We can bisect across the *time* dimension, which usually means that we're bisecting commits made to LLVM. We could instead bisect across the dimension of the LLVM codebase itself, disabling some optimizations and leaving others enabled, to narrow down the configuration that reproduces the issue. We can also bisect in the dimension of the target program being compiled, e.g. compiling some parts with a known good configuration to narrow down the problematic location in the program. The `llvm-bisectd` tool is intended to help with this last approach to debugging: finding the place where a bug is introduced. It does so with the aim of being minimally intrusive to the build system of the target program. ## High level design The bisection process with `llvm-bisectd` uses a client/server model, where all the state about the bisection is maintained by the `llvm-bisectd` daemon. The compilation tools (e.g. clang) send requests and get responses back telling them what to do. As a developer, debugging using this methodology is intended to be simple, with the daemon taking care of most of the complexity. ### Bisection keys This process relies on a user-defined key that's used to represent a particular action being done at a unique place in the target program's build. The key is a string to allow the most flexibility of data representation. `llvm-bisectd` doesn't care what the meaning of the key is, as long as has the following properties: 1. The key maps onto a specific place in the source program in a stable manner. Even if the software is being built with multiple compilers running concurrently, the key should not be affected. 2. Between one build of the target software and the next (clean) build, the same set of keys should be generated exactly. For our example of bisecting a novel optimization pass, a good choice of key would be the module + function name of the target program being compiled. The function name meets requirement 1. because each module + function string refers to a unique place in the target program. (A module may not have two functions with the same symbol name). The inclusion of the module name in the key helps to disambiguate two local linkage functions with the same name in two different translation units. The key also satisfies requirement 2. because the function names are static between one build and the next (e.g. no random auto-generation going on). ## Bisection workflow The bisection process has two stages. The first is called the *learning* stage, and the second is the main *bisection* stage. The purpose of the learning stage is for the bisection daemon to *learn* about all the keys that will be bisected through during each bisection round. The first thing that needs to be done is that `llvm-bisectd` needs to be started as a daemon. ```console $ llvm-bisectd bisectd > _ ``` On start, `llvm-bisectd` initializes into the learning phase, so nothing else needs to be done. Then, the software project being debugged is built with the client tools like clang having the bisection mode enabled. This can be a compiler flag or some other mechanism. For example, to bisect GlobalISel across target functions, we can pass `-mllvm -gisel-bisect-selection` to clang. During the first build of the project, the client tools are sending a bisection request to `llvm-bisectd` for each key. `llvm-bisectd` in the learning phase just replies to the clients with the answer "YES". In the background, it's storing each unique key it receives into a vector for later. ### Bisection phase After the first build is done, the learning phase is over, and `llvm-bisectd` should know about all the keys that will be requested in future builds. We can start the bisection phase now by using the `start-bisection` command in the `llvm-bisectd` command interpreter. ``` bisectd > start-bisect Starting bisection with 17306 total keys to search bisectd > _ ``` We're now in the bisection phase. Now, we perform the following actions in a repeatedly until `llvm-bisectd` terminates with an answer. 1. Do a clean build of the project (with the bisection flags as before) 2. Test the resulting build to see if it still exhibits the bug. 3. If the bug remains, then we type the command `bad` into the `llvm-bisectd` interpreter. If the bug has disappeared, we type the `good` command instead. And that's it! Eventually the bisection will finish and `llvm-bisectd` will print the *key* that, when enabled, triggers the bug. ``` console Bisection completed. Failing key was: /work/testing/llvm-test-suite/CTMark/tramp3d-v4/tramp3d-v4.cpp _ZN17MultiArgEvaluatorI16MainEvaluatorTagE13createIterateI9MultiArg3I5FieldI22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEEd10BrickViewUESC_SC_EN4Adv51Z13MomentumfluxZILi3EEELi3E15EvaluateLocLoopISH_Li3EEEEvRKT_RKT0_RK8IntervalIXT1_EER14ScalarCodeInfoIXT1_EXsrSK_4sizeEERKT2_ Exiting... ``` ## Adding bisection support in clients Adding support for bisecting a new type of action is simple. The client only needs to generate a key at the point where bisection is needed, and then use client utilities in `lib/Support/RemoteBisectorClient.cpp` to talk to the daemon. For example, if the bisection is to done for a `FunctionPass` optimization, then one place to add the code would be to the `runOnFunction()` method, using the function name as a key. ```C++ bool runOnFunction(Function &F) { // ... if (EnableBisectForNewOptimization) { std::string Key = F.getParent()->getSourceFileName() + " " + F.getName().str(); RemoteBisectClient BisectClient; if (!BisectClient.shouldPerformAction(Key)) return false; // Bisector daemon told us to skip this action. } // Continue with the optimization // ... } ``` Thanks, Amara
David Blaikie via llvm-dev
2021-Nov-02 17:40 UTC
[llvm-dev] [RFC] llvm-bisectd: a bisection daemon for supporting bisection with parallel builds
Nifty! Look forward to seeing how that shakes out/is built upon/etc. On Tue, Nov 2, 2021 at 10:23 AM Amara Emerson via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi all, > > I’d just like to draw attention to some patch reviews for a new tool I’m > proposing, llvm-bisectd: https://reviews.llvm.org/D113030 > > There’s documentation in the Bisection.md file in the patch, which I’ll > just paste here for convenience. > > # Bisection with llvm-bisectd > > ## Introduction > > The `llvm-bisectd` tool allows LLVM developers to rapidly bisect > miscompiles in > clang or other tools running in parallel. This document explains how the > tool > works and how to leverage it for bisecting your own specific issues. > > Bisection as a general debugging technique can be done in multiple ways. > We can > bisect across the *time* dimension, which usually means that we're > bisecting > commits made to LLVM. We could instead bisect across the dimension of the > LLVM > codebase itself, disabling some optimizations and leaving others enabled, > to > narrow down the configuration that reproduces the issue. We can also > bisect in > the dimension of the target program being compiled, e.g. compiling some > parts > with a known good configuration to narrow down the problematic location in > the > program. The `llvm-bisectd` tool is intended to help with this last > approach to > debugging: finding the place where a bug is introduced. It does so with > the aim > of being minimally intrusive to the build system of the target program. > > ## High level design > > The bisection process with `llvm-bisectd` uses a client/server model, > where all > the state about the bisection is maintained by the `llvm-bisectd` daemon. > The > compilation tools (e.g. clang) send requests and get responses back telling > them what to do. As a developer, debugging using this methodology is > intended > to be simple, with the daemon taking care of most of the complexity. > > ### Bisection keys > > This process relies on a user-defined key that's used to represent a > particular > action being done at a unique place in the target program's build. The key > is a > string to allow the most flexibility of data representation. `llvm-bisectd` > doesn't care what the meaning of the key is, as long as has the following > properties: > 1. The key maps onto a specific place in the source program in a stable > manner. > Even if the software is being built with multiple compilers running > concurrently, the key should not be affected. > 2. Between one build of the target software and the next (clean) build, > the > same set of keys should be generated exactly. > > For our example of bisecting a novel optimization pass, a good choice of > key > would be the module + function name of the target program being compiled. > The > function name meets requirement 1. because each module + function string > refers > to a unique place in the target program. (A module may not have two > functions > with the same symbol name). The inclusion of the module name in the key > helps > to disambiguate two local linkage functions with the same name in two > different > translation units. The key also satisfies requirement 2. because the > function > names are static between one build and the next (e.g. no random > auto-generation > going on). > > ## Bisection workflow > > The bisection process has two stages. The first is called the *learning* > stage, > and the second is the main *bisection* stage. The purpose of the learning > stage is for the bisection daemon to *learn* about all the keys that will > be > bisected through during each bisection round. > > The first thing that needs to be done is that `llvm-bisectd` needs to be > started as a daemon. > > ```console > $ llvm-bisectd > bisectd > _ > ``` > > On start, `llvm-bisectd` initializes into the learning phase, so nothing > else > needs to be done. > > Then, the software project being debugged is built with the client tools > like > clang having the bisection mode enabled. This can be a compiler flag or > some > other mechanism. For example, to bisect GlobalISel across target functions, > we can pass `-mllvm -gisel-bisect-selection` to clang. > > During the first build of the project, the client tools are sending a > bisection > request to `llvm-bisectd` for each key. `llvm-bisectd` in the learning > phase > just replies to the clients with the answer "YES". In the background, it's > storing each unique key it receives into a vector for later. > > ### Bisection phase > > After the first build is done, the learning phase is over, and > `llvm-bisectd` > should know about all the keys that will be requested in future builds. > We can start the bisection phase now by using the `start-bisection` > command in > the `llvm-bisectd` command interpreter. > > ``` > bisectd > start-bisect > Starting bisection with 17306 total keys to search > bisectd > _ > ``` > > We're now in the bisection phase. Now, we perform the following actions in > a > repeatedly until `llvm-bisectd` terminates with an answer. > 1. Do a clean build of the project (with the bisection flags as before) > 2. Test the resulting build to see if it still exhibits the bug. > 3. If the bug remains, then we type the command `bad` into the > `llvm-bisectd` > interpreter. If the bug has disappeared, we type the `good` command > instead. > > And that's it! Eventually the bisection will finish and `llvm-bisectd` will > print the *key* that, when enabled, triggers the bug. > > ``` console > Bisection completed. Failing key was: > /work/testing/llvm-test-suite/CTMark/tramp3d-v4/tramp3d-v4.cpp > _ZN17MultiArgEvaluatorI16MainEvaluatorTagE13createIterateI9MultiArg3I5FieldI22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEEd10BrickViewUESC_SC_EN4Adv51Z13MomentumfluxZILi3EEELi3E15EvaluateLocLoopISH_Li3EEEEvRKT_RKT0_RK8IntervalIXT1_EER14ScalarCodeInfoIXT1_EXsrSK_4sizeEERKT2_ > Exiting... > ``` > > ## Adding bisection support in clients > > Adding support for bisecting a new type of action is simple. The client > only > needs to generate a key at the point where bisection is needed, and then > use > client utilities in `lib/Support/RemoteBisectorClient.cpp` to talk to the > daemon. For example, if the bisection is to done for a `FunctionPass` > optimization, then one place to add the code would be to the > `runOnFunction()` > method, using the function name as a key. > > ```C++ > bool runOnFunction(Function &F) { > // ... > if (EnableBisectForNewOptimization) { > std::string Key = F.getParent()->getSourceFileName() + " " > + F.getName().str(); > RemoteBisectClient BisectClient; > if (!BisectClient.shouldPerformAction(Key)) > return false; // Bisector daemon told us to skip this action. > } > // Continue with the optimization > // ... > } > ``` > > Thanks, > Amara > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211102/c665519a/attachment.html>