thr3ads.net - llvm dev - [llvm-dev] distributed lit testing [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Sam McCall via llvm-dev

2021-Mar-12 14:56 UTC

[llvm-dev] distributed lit testing

Hi James,

We run lit tests at Google using a custom runner on a distributed build
system similar to Bazel.
In particular we run most of the llvm-project tests both when pulling in
upstream revisions, and for any change to our internal repository that
touches nearby files.

I wanted to share some of our experiences in case they're useful, and in
the hope that this project may result in something we can use too :-)
I'm being brief here, but happy to provide more details.

Our build system wants to run each test in isolation (separate process,
sandboxed).
Making each test hermetic separates concerns nicely (the same distributed
runner is used for all kinds of testing, not just lit).
This model is also easier to fit into other containers (e.g. I imagine
Ninja could make a good local test driver).
Compared to e.g. a custom driver that talks to a custom worker server that
runs many tests per subprocess... there's not very much of that we would be
able to reuse.
I know there are OSS Bazel projects that want to run lit tests that would
struggle with this model too.

The biggest problem with using the standard lit tool for hermetic tests is
it was too slow to start to run a single test.
Fundamentally the slow parts are the config system, and init of python
programs.

We had a greatly simplified time with the config system, because test
(mostly) in a single config, so we could flatten it out into a list of
features and substitutions.
But in a more general system, if we can produce the config data from config
logic as a *build* step, then it can be cached in the usual way and simply
fed into each test.
You'll need to untangle config specific to the machine running the test
from config specific to the machine driving the tests.

I wrote a hermetic test runner in Go - not my favorite language but it
starts up fast and has good subprocess support.
It's greatly simplifying to be able to assume you can fork a real shell and
only limited state (CWD, exported vars) can leak from one RUN line to the
next, this works fine for us in practice (but we don't test on windows).
It has some nice features like printing a transcript of the test run,
highlighting directives and stderr output, showing pre/post expansion
lines, annotating each line with the result.
I should be able to share the code of this, it's nothing terribly
surprising.
It's less than 1000LOC and runs almost all LLVM tests - IMO it would be
worthwhile to keep the lit spec very simple and removing some of the
marginal features that have crept in over the years. We chose to simply
drop some tests rather than deal with all the corners.
(Before this existed, we ran sed over the lit tests to turn them into shell
scripts, which worked but was hard to maintain and to read the output on
failure... actually the upstream lit runner has the latter problem too!)

I'm sure I've forgotten things, but I think those were my biggest
takeaways. Needing to solve the config problem + the go dependency were the
main reasons I didn't push to make these changes upstream :-(
Hope this is useful or maybe at least interesting :-)

Cheers, Sam

On Wed, Feb 24, 2021 at 9:54 AM James Henderson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Victor,
>
> The lit test framework is the main testing framework used by LLVM. You can
> find the source code for it in the LLVM github repository (see in
> particular https://github.com/llvm/llvm-project/tree/main/llvm/utils/lit),
> and there is documentation available for it on the LLVM website -
> https://llvm.org/docs/TestingGuide.html gives the high-level picture of
> how LLVM is tested, whilst https://llvm.org/docs/CommandGuide/lit.htmlis
> more focused on lit specifically.
>
> Examples of where lit is used include the individual test files located in
> places like llvm/test, clang/test and lld/test within the github tree.
> These test directories include additional configuration files, some of
> which are configured when CMake is used to generate the build files for the
> LLVM project. If you aren't already familiar with LLVM, I highly
recommend
> reading up on https://llvm.org/docs/GettingStarted.html, and following
> the steps to make sure you can build and run LLVM components locally.
>
> Lit works as a python process which spawns many child processes, each of
> which runs one or more of the tests located in the directory under test.
> These tests typically are a sequence of commands that use components of
> LLVM that have already been built. You can build the test dependencies and
> run the tests by building one of the CMake-generated targets called check-*
> (where * might be llvm, lld, clang, etc to run a test subset or
"check-all"
> to run all known tests. Currently, the tests run in parallel on the
user's
> machine, using the python multiprocessing library to do this. There also
> exists the --num-shards and related options which allows multiple computers
> to each run a subset of the tests. I am not too familiar on how this option
> is used in practice, but I believe it requires the computers to all have
> access to some shared filesystem which contains the tests and build
> artifacts, or to each have the same version checked out and to have been
> sent the full set of build artifacts to use. Others on this list might be
> able to clarify further.
>
> The project goal is to provide a framework for distributing these tests
> across multiple computers in a more flexible manner than the existing
> sharding mechanism. I can think of two different high-level options -
> either a layer on top of lit which uses the existing sharding mechanism
> somehow, or something built into the existing lit code that goes wide with
> the tests across the machines. It would be up to you to identify and
> implement a way forward doing this. The hope would be that this framework
> could be used for multiple different distributed systems, as described in
> the original project description on the Open Projects page.
>
> This project is intended to be a possible Google Summer of Code project.
> As such, to participate in it, you'd need to sign up on the GSOC
website,
> and provide a project proposal there which details how you plan to solve
> the challenge. It would help your proposal get accepted if you can show
> some understanding of the lit testsuite, and some evidence of contributions
> to LLVM (perhaps in the form of additional testing you might identify that
> is missing in some tests, or by fixing one or more bugs from the LLVM
> bugzilla page, perhaps labelled with the "beginner" keyword). I
am happy to
> work with you on your proposal if you are uncertain about anything, but the
> core of the proposal needs to come from you.
>
> I hope that gives you the information you are looking for. Please feel
> free to ask any further questions that you may have.
>
> James
>
> On Tue, 23 Feb 2021 at 17:28, Victor Kukshiev via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hello I am Victor Kukshiev (cetjs2 in IRC), 2rd course student of
PetrSU
>> university.
>> Distributed lit testing idea is interested and possible for me, I
think.
>> Could you tell us more about this project?
>> What is lit test suite?
>> I know python  language.
>> What do I participate in thiis project?
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/85c13a47/attachment.html>

Petr Hosek via llvm-dev

2021-Mar-12 19:41 UTC

head link

[llvm-dev] distributed lit testing

Thank you for sharing your experience Sam! I'd be interested in taking a
look at your test runner if it's something you could publish.

I started looking into this topic recently since we're now looking into a
way to run lit tests on Fuchsia. I started experimenting with the remote
execution support in libc++ but using SCP and SSH for each test doesn't
really scale.

In Fuchsia, the unit of distribution is a package that's completely
hermetic. We then run these packages as components, where each component
has its own filesystem and doesn't have any unnecessary privileges. It's
similar to containers in many ways.

The idea I got was to extend lit to separate configuration from execution,
which would allow us to package up all tests on the host, push them to the
target and run each of them as a separate component using our test runner
(we already have a Fuchsia test runner that runs tests as components).

It sounds very similar to what you already did and I'd be interested in
seeing if we could reuse some of your tooling. Furthermore, it'd be great
if we could come up with a way to support this workflow directly in lit and
LLVM.

Nico also looked into this area in the past, experimenting with a custom
test runner written in Go (github.com/nico/glitch) and using Ninja as a
test runner (reviews.llvm.org/D47506) which may be worth checking.

On Fri, Mar 12, 2021 at 6:56 AM Sam McCall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi James,
>
> We run lit tests at Google using a custom runner on a distributed build
> system similar to Bazel.
> In particular we run most of the llvm-project tests both when pulling in
> upstream revisions, and for any change to our internal repository that
> touches nearby files.
>
> I wanted to share some of our experiences in case they're useful, and
in
> the hope that this project may result in something we can use too :-)
> I'm being brief here, but happy to provide more details.
>
> Our build system wants to run each test in isolation (separate process,
> sandboxed).
> Making each test hermetic separates concerns nicely (the same distributed
> runner is used for all kinds of testing, not just lit).
> This model is also easier to fit into other containers (e.g. I imagine
> Ninja could make a good local test driver).
> Compared to e.g. a custom driver that talks to a custom worker server that
> runs many tests per subprocess... there's not very much of that we
would be
> able to reuse.
> I know there are OSS Bazel projects that want to run lit tests that would
> struggle with this model too.
>
> The biggest problem with using the standard lit tool for hermetic tests is
> it was too slow to start to run a single test.
> Fundamentally the slow parts are the config system, and init of python
> programs.
>
> We had a greatly simplified time with the config system, because test
> (mostly) in a single config, so we could flatten it out into a list of
> features and substitutions.
> But in a more general system, if we can produce the config data from
> config logic as a *build* step, then it can be cached in the usual way and
> simply fed into each test.
> You'll need to untangle config specific to the machine running the test
> from config specific to the machine driving the tests.
>
> I wrote a hermetic test runner in Go - not my favorite language but it
> starts up fast and has good subprocess support.
> It's greatly simplifying to be able to assume you can fork a real shell
> and only limited state (CWD, exported vars) can leak from one RUN line to
> the next, this works fine for us in practice (but we don't test on
windows).
> It has some nice features like printing a transcript of the test run,
> highlighting directives and stderr output, showing pre/post expansion
> lines, annotating each line with the result.
> I should be able to share the code of this, it's nothing terribly
> surprising.
> It's less than 1000LOC and runs almost all LLVM tests - IMO it would be
> worthwhile to keep the lit spec very simple and removing some of the
> marginal features that have crept in over the years. We chose to simply
> drop some tests rather than deal with all the corners.
> (Before this existed, we ran sed over the lit tests to turn them into
> shell scripts, which worked but was hard to maintain and to read the output
> on failure... actually the upstream lit runner has the latter problem too!)
>
> I'm sure I've forgotten things, but I think those were my biggest
> takeaways. Needing to solve the config problem + the go dependency were the
> main reasons I didn't push to make these changes upstream :-(
> Hope this is useful or maybe at least interesting :-)
>
> Cheers, Sam
>
> On Wed, Feb 24, 2021 at 9:54 AM James Henderson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Victor,
>>
>> The lit test framework is the main testing framework used by LLVM. You
>> can find the source code for it in the LLVM github repository (see in
>> particular
https://github.com/llvm/llvm-project/tree/main/llvm/utils/lit),
>> and there is documentation available for it on the LLVM website -
>> https://llvm.org/docs/TestingGuide.html gives the high-level picture of
>> how LLVM is tested, whilst
https://llvm.org/docs/CommandGuide/lit.htmlis
>> more focused on lit specifically.
>>
>> Examples of where lit is used include the individual test files located
>> in places like llvm/test, clang/test and lld/test within the github
tree.
>> These test directories include additional configuration files, some of
>> which are configured when CMake is used to generate the build files for
the
>> LLVM project. If you aren't already familiar with LLVM, I highly
recommend
>> reading up on https://llvm.org/docs/GettingStarted.html, and following
>> the steps to make sure you can build and run LLVM components locally.
>>
>> Lit works as a python process which spawns many child processes, each
of
>> which runs one or more of the tests located in the directory under
test.
>> These tests typically are a sequence of commands that use components of
>> LLVM that have already been built. You can build the test dependencies
and
>> run the tests by building one of the CMake-generated targets called
check-*
>> (where * might be llvm, lld, clang, etc to run a test subset or
"check-all"
>> to run all known tests. Currently, the tests run in parallel on the
user's
>> machine, using the python multiprocessing library to do this. There
also
>> exists the --num-shards and related options which allows multiple
computers
>> to each run a subset of the tests. I am not too familiar on how this
option
>> is used in practice, but I believe it requires the computers to all
have
>> access to some shared filesystem which contains the tests and build
>> artifacts, or to each have the same version checked out and to have
been
>> sent the full set of build artifacts to use. Others on this list might
be
>> able to clarify further.
>>
>> The project goal is to provide a framework for distributing these tests
>> across multiple computers in a more flexible manner than the existing
>> sharding mechanism. I can think of two different high-level options -
>> either a layer on top of lit which uses the existing sharding mechanism
>> somehow, or something built into the existing lit code that goes wide
with
>> the tests across the machines. It would be up to you to identify and
>> implement a way forward doing this. The hope would be that this
framework
>> could be used for multiple different distributed systems, as described
in
>> the original project description on the Open Projects page.
>>
>> This project is intended to be a possible Google Summer of Code
project.
>> As such, to participate in it, you'd need to sign up on the GSOC
website,
>> and provide a project proposal there which details how you plan to
solve
>> the challenge. It would help your proposal get accepted if you can show
>> some understanding of the lit testsuite, and some evidence of
contributions
>> to LLVM (perhaps in the form of additional testing you might identify
that
>> is missing in some tests, or by fixing one or more bugs from the LLVM
>> bugzilla page, perhaps labelled with the "beginner" keyword).
I am happy to
>> work with you on your proposal if you are uncertain about anything, but
the
>> core of the proposal needs to come from you.
>>
>> I hope that gives you the information you are looking for. Please feel
>> free to ask any further questions that you may have.
>>
>> James
>>
>> On Tue, 23 Feb 2021 at 17:28, Victor Kukshiev via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hello I am Victor Kukshiev (cetjs2 in IRC), 2rd course student of
PetrSU
>>> university.
>>> Distributed lit testing idea is interested and possible for me, I
think.
>>> Could you tell us more about this project?
>>> What is lit test suite?
>>> I know python  language.
>>> What do I participate in thiis project?
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/f03cea6f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3996 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/f03cea6f/attachment-0001.bin>

llvm dev - Mar 2021 - distributed lit testing

[llvm-dev] distributed lit testing

[llvm-dev] distributed lit testing