thr3ads.net - llvm dev - [llvm-dev] Timeout tests timing out [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Dan Liew via llvm-dev

2020-Sep-17 04:31 UTC

[llvm-dev] Timeout tests timing out

Hi David,

Unfortunately writing a reliable test is tricky given that the
functionality we're trying to test involves timing. I would advise
against disabling the test entirely because it actually tests
functionality that people use. I'd suggest bumping up the time limits.
This is what I've done in the past. See

commit 6dfcc78364fa3e8104d6e6634733863eb0bf4be8
Author: Dan Liew <dan at su-root.co.uk>
Date:   Tue May 22 15:06:29 2018 +0000

    [lit] Try to make `shtest-timeout.py` test more reliable by using a
    larger timeout value. This really isn't very good because it will
    still be susceptible to machine performance.

    While we are here also fix a bug in validation of
    `maxIndividualTestTime` where previously it wasn't checked if the
    type was an int.

    rdar://problem/40221572

    llvm-svn: 332987

HTH,
Dan.

On Wed, 16 Sep 2020 at 09:37, David Blaikie <dblaikie at gmail.com>
wrote:>
> Ping on this
>
> On Wed, Sep 9, 2020 at 8:27 PM David Blaikie <dblaikie at gmail.com>
wrote:
> >
> > The clang-cmake-armv8-lld (linaro-toolchain owners) buildbot is timing
out trying to run some timeout tests (Dan Liew author):
> >
> > Pass:
http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/5672
> > Fail:
http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/5673
> >
> > Is there anything we can do to the buildbot? Or the tests? (bump up
the time limits or maybe remove the tests as unreliable?)

David Blaikie via llvm-dev

2020-Sep-17 05:23 UTC

head link

[llvm-dev] Timeout tests timing out

I appreciate the value of the feature - but it's possible the test
doesn't pull its weight. Is the code that implements the feature
liable to failure/often touched? If it's pretty static/failure is
unlikely, possibly the time and flaky failures aren't worth the value
of possibly catching a low-chance bug.

Another option might be to reduce how often/in which configurations
the test is run - LLVM_ENABLE_EXPENSIVE_CHECKS presumably only works
for code within LLVM itself, and not test cases - but maybe I'm wrong
there & this parameter could be used (& then the timing bumped up
quite a bit to try to make it much more reliable), or something
similar could be implemented at the lit check level?

Ah, compiler-rt tests use EXPENSIVE_CHECKS to disable certain tests:

./compiler-rt/test/lit.common.configured.in:set_default("expensive_checks",
@LLVM_ENABLE_EXPENSIVE_CHECKS_PYBOOL@)
./compiler-rt/test/fuzzer/large.test:UNSUPPORTED: expensive_checks

Could you bump the timeouts a fair bit and disable the tests except
under expensive checks?

On Wed, Sep 16, 2020 at 9:31 PM Dan Liew <dan at su-root.co.uk>
wrote:>
> Hi David,
>
> Unfortunately writing a reliable test is tricky given that the
> functionality we're trying to test involves timing. I would advise
> against disabling the test entirely because it actually tests
> functionality that people use. I'd suggest bumping up the time limits.
> This is what I've done in the past. See
>
> commit 6dfcc78364fa3e8104d6e6634733863eb0bf4be8
> Author: Dan Liew <dan at su-root.co.uk>
> Date:   Tue May 22 15:06:29 2018 +0000
>
>     [lit] Try to make `shtest-timeout.py` test more reliable by using a
>     larger timeout value. This really isn't very good because it will
>     still be susceptible to machine performance.
>
>     While we are here also fix a bug in validation of
>     `maxIndividualTestTime` where previously it wasn't checked if the
>     type was an int.
>
>     rdar://problem/40221572
>
>     llvm-svn: 332987
>
> HTH,
> Dan.
>
> On Wed, 16 Sep 2020 at 09:37, David Blaikie <dblaikie at gmail.com>
wrote:
> >
> > Ping on this
> >
> > On Wed, Sep 9, 2020 at 8:27 PM David Blaikie <dblaikie at
gmail.com> wrote:
> > >
> > > The clang-cmake-armv8-lld (linaro-toolchain owners) buildbot is
timing out trying to run some timeout tests (Dan Liew author):
> > >
> > > Pass:
http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/5672
> > > Fail:
http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/5673
> > >
> > > Is there anything we can do to the buildbot? Or the tests? (bump
up the time limits or maybe remove the tests as unreliable?)

Mehdi AMINI via llvm-dev

2020-Sep-17 05:31 UTC

head link

[llvm-dev] Timeout tests timing out

Hi,

Depending on timing in a test is quite brittle in general: can we mock the
timeout instead and make this fully deterministic somehow?

The test has this comment:
> FIXME: This test is fragile because it relies on time which can
> be affected by system performance. In particular we are currently
> assuming that `short.py` can be successfully executed within 2
> seconds of wallclock time.
Maybe "short.py" can be replaced by adding into lit itself a "no
op" which
would just not really spawn a process and instead mark the task as
completed immediately internally?

-- 
Mehdi


On Wed, Sep 16, 2020 at 10:24 PM David Blaikie via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I appreciate the value of the feature - but it's possible the test
> doesn't pull its weight. Is the code that implements the feature
> liable to failure/often touched? If it's pretty static/failure is
> unlikely, possibly the time and flaky failures aren't worth the value
> of possibly catching a low-chance bug.
>
> Another option might be to reduce how often/in which configurations
> the test is run - LLVM_ENABLE_EXPENSIVE_CHECKS presumably only works
> for code within LLVM itself, and not test cases - but maybe I'm wrong
> there & this parameter could be used (& then the timing bumped up
> quite a bit to try to make it much more reliable), or something
> similar could be implemented at the lit check level?
>
> Ah, compiler-rt tests use EXPENSIVE_CHECKS to disable certain tests:
>
> ./compiler-rt/test/lit.common.configured.in:
> set_default("expensive_checks",
> @LLVM_ENABLE_EXPENSIVE_CHECKS_PYBOOL@)
> ./compiler-rt/test/fuzzer/large.test:UNSUPPORTED: expensive_checks
>
> Could you bump the timeouts a fair bit and disable the tests except
> under expensive checks?
>
> On Wed, Sep 16, 2020 at 9:31 PM Dan Liew <dan at su-root.co.uk>
wrote:
> >
> > Hi David,
> >
> > Unfortunately writing a reliable test is tricky given that the
> > functionality we're trying to test involves timing. I would advise
> > against disabling the test entirely because it actually tests
> > functionality that people use. I'd suggest bumping up the time
limits.
> > This is what I've done in the past. See
> >
> > commit 6dfcc78364fa3e8104d6e6634733863eb0bf4be8
> > Author: Dan Liew <dan at su-root.co.uk>
> > Date:   Tue May 22 15:06:29 2018 +0000
> >
> >     [lit] Try to make `shtest-timeout.py` test more reliable by using
a
> >     larger timeout value. This really isn't very good because it
will
> >     still be susceptible to machine performance.
> >
> >     While we are here also fix a bug in validation of
> >     `maxIndividualTestTime` where previously it wasn't checked if
the
> >     type was an int.
> >
> >     rdar://problem/40221572
> >
> >     llvm-svn: 332987
> >
> > HTH,
> > Dan.
> >
> > On Wed, 16 Sep 2020 at 09:37, David Blaikie <dblaikie at
gmail.com> wrote:
> > >
> > > Ping on this
> > >
> > > On Wed, Sep 9, 2020 at 8:27 PM David Blaikie <dblaikie at
gmail.com>
> wrote:
> > > >
> > > > The clang-cmake-armv8-lld (linaro-toolchain owners) buildbot
is
> timing out trying to run some timeout tests (Dan Liew author):
> > > >
> > > > Pass:
> http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/5672
> > > > Fail:
> http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/5673
> > > >
> > > > Is there anything we can do to the buildbot? Or the tests?
(bump up
> the time limits or maybe remove the tests as unreliable?)
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200916/b09deaaf/attachment.html>

Dan Liew via llvm-dev

2020-Sep-18 23:31 UTC

head link

[llvm-dev] Timeout tests timing out

On Wed, 16 Sep 2020 at 22:24, David Blaikie <dblaikie at gmail.com>
wrote:>
> I appreciate the value of the feature - but it's possible the test
> doesn't pull its weight. Is the code that implements the feature
> liable to failure/often touched? If it's pretty static/failure is
> unlikely, possibly the time and flaky failures aren't worth the value
> of possibly catching a low-chance bug.
I don't have many data points here. I haven't needed to touch the code
or tests since 2018. I haven't been asked to review any patches
related to the timeout feature since then so I presume nobody has
touched it since.

> Another option might be to reduce how often/in which configurations
> the test is run - LLVM_ENABLE_EXPENSIVE_CHECKS presumably only works
> for code within LLVM itself, and not test cases - but maybe I'm wrong
> there & this parameter could be used (& then the timing bumped up
> quite a bit to try to make it much more reliable), or something
> similar could be implemented at the lit check level?
>
> Ah, compiler-rt tests use EXPENSIVE_CHECKS to disable certain tests:
>
>
./compiler-rt/test/lit.common.configured.in:set_default("expensive_checks",
> @LLVM_ENABLE_EXPENSIVE_CHECKS_PYBOOL@)
> ./compiler-rt/test/fuzzer/large.test:UNSUPPORTED: expensive_checks
>
> Could you bump the timeouts a fair bit and disable the tests except
> under expensive checks?
Only running that test in the "expensive_checks" configuration kind of
abuses that configuration option because the test is actually not an
expensive check. It's just that the test is sensitive to the
performance of the running system. If the system is under heavy load
when the test is running it's more likely to fail.

One thing we could do to remove fragility in the test is to remove the
running of `short.py` in the test. This is only invoked to check that
it's possible for a command to run to completion in the presence of a
fixed timeout. If we can live without testing that part (i.e. we only
test that a timeout can be reached) then the test should be much more
robust.

If for some reason that's not enough another thing that might help
with this problem is to not run the lit tests at the same time as all
the other tests. The hope here is that the lit tests would run while
the system load is lower. We could do this by changing the build
system to

* Make all `check-*` (except `check-lit`) targets depend on the
`check-lit` target. This would ensure that lit gets tested before we
test everything else in LLVM.
* Remove the lit test directory from inclusion in all `check-*`
targets (except `check-lit`) so we don't test lit twice.

This also has a nice benefit of testing lit **before** we use it to
test everything else in LLVM. Today we don't do that, we just run all
the tests in one big lit invocation (i.e. we test lit at the same time
that we are using it). It does have some downsides though

* If any of the lit tests fail we won't run the rest of the tests so
you won't get the results of running the other tests until the lit
tests are fixed.
* When running any of the `check-*` targets (apart from `check-lit`)
you have to wait for the lit tests to run before any other tests run.

What do you think?
> On Wed, Sep 16, 2020 at 9:31 PM Dan Liew <dan at su-root.co.uk>
wrote:
> >
> > Hi David,
> >
> > Unfortunately writing a reliable test is tricky given that the
> > functionality we're trying to test involves timing. I would advise
> > against disabling the test entirely because it actually tests
> > functionality that people use. I'd suggest bumping up the time
limits.
> > This is what I've done in the past. See
> >
> > commit 6dfcc78364fa3e8104d6e6634733863eb0bf4be8
> > Author: Dan Liew <dan at su-root.co.uk>
> > Date:   Tue May 22 15:06:29 2018 +0000
> >
> >     [lit] Try to make `shtest-timeout.py` test more reliable by using
a
> >     larger timeout value. This really isn't very good because it
will
> >     still be susceptible to machine performance.
> >
> >     While we are here also fix a bug in validation of
> >     `maxIndividualTestTime` where previously it wasn't checked if
the
> >     type was an int.
> >
> >     rdar://problem/40221572
> >
> >     llvm-svn: 332987
> >
> > HTH,
> > Dan.
> >
> > On Wed, 16 Sep 2020 at 09:37, David Blaikie <dblaikie at
gmail.com> wrote:
> > >
> > > Ping on this
> > >
> > > On Wed, Sep 9, 2020 at 8:27 PM David Blaikie <dblaikie at
gmail.com> wrote:
> > > >
> > > > The clang-cmake-armv8-lld (linaro-toolchain owners) buildbot
is timing out trying to run some timeout tests (Dan Liew author):
> > > >
> > > > Pass:
http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/5672
> > > > Fail:
http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/5673
> > > >
> > > > Is there anything we can do to the buildbot? Or the tests?
(bump up the time limits or maybe remove the tests as unreliable?)

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Sep 2020 - Timeout tests timing out

[llvm-dev] Timeout tests timing out

[llvm-dev] Timeout tests timing out

[llvm-dev] Timeout tests timing out

[llvm-dev] Timeout tests timing out

Apparently Analagous Threads