thr3ads.net - llvm dev - [llvm-dev] Flakey failure on clang-ppc64le-linux-multistage [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Nemanja Ivanovic via llvm-dev

2020-Sep-03 12:02 UTC

[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage

Sure.
I didn't use lit or ninja. I simply copied the script produced by lit
(/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/tools/clang/test/Driver/Output/target-override.c.script)
into a temporary directory (along with a deep copy of the build directory).
I modified the paths in the script to point to the temporary directory.
Then I ran the script in a loop.
For running a bunch in parallel, I just produced a wrapper script to invoke
that one:
target-override.c.script $LINENO &
target-override.c.script $LINENO &
target-override.c.script $LINENO &
...
wait
And ran that in a loop. For thousands of iterations...

On Wed, Sep 2, 2020 at 3:51 PM David Blaikie <dblaikie at gmail.com>
wrote:
> Thanks for looking into it!
>
> Could you describe your test process in more detail? Were you running lit
> from your script? Running the build system (ninja?)?
>
> On Wed, Sep 2, 2020 at 10:47 AM Nemanja Ivanovic <nemanja.i.ibm at
gmail.com>
> wrote:
>
>> Well, I am at my wit's end. I have copied over the script and
directories
>> for this test case and run it a few million times. First I was running
one
>> at a time, then I switched to kicking off 1000 at a time. All the
while,
>> the bots continued to run on the same machine. The script never failed
even
>> once. I am not sure if this has something to do with Python as part of
>> llvm-lit or what is going on.
>> I am thinking that the best course of action for us is to mark this
test
>> case UNSUPPORTED for PPC.
>>
>> On Wed, Sep 2, 2020 at 12:41 PM Nemanja Ivanovic via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Interesting, thanks for bringing this to our attention. I just took
a
>>> quick look through the last 100 builds and this test has failed 13
times.
>>> This is certainly something we need to look at. We will investigate
and see
>>> if we can make any sense of this.
>>>
>>> Nemanja Ivanovic
>>> LLVM PPC Backend Development
>>> IBM Toronto Lab
>>> Email: nemanjai at ca.ibm.com
>>> Phone: 905-413-3388
>>>
>>>
>>>
>>> ----- Original message -----
>>> From: David Blaikie <dblaikie at gmail.com>
>>> To: llvm-dev <llvm-dev at lists.llvm.org>, Nico Weber
<thakis at chromium.org>,
>>> Serge Pavlov <sepavloff at gmail.com>, powerllvm at
ca.ibm.com
>>> Cc:
>>> Subject: [EXTERNAL] Flakey failure on
clang-ppc64le-linux-multistage
>>> Date: Tue, Sep 1, 2020 6:10 PM
>>>
>>> Seems there were a couple of correlated failures that appear to be
>>> flakes on this buildbot recently:
>>>
>>> green:
>>>
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974
>>> red:
>>>
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975
(target-override.c
>>> during stage 1, seems to be missing the directory/symlink it just
created)
>>> red:
>>>
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976
(same
>>> test failure as the last, but during stage 2, not stage 1)
>>> green:
>>>
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977
>>>
>>> Including Nico & Pavlov as the people who wrote/edited the
test, but I'm
>>> guessing this is something interesting going on on the buildbot
itself?
>>>
>>> powerllvm at ca.ibm.com, whoever you are on the end of that mailing
list -
>>> could you take a look at this? Possibly manually running that test
in a
>>> loop a bunch of times to see if it fails sometimes & try to
help us
>>> understand why?
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/cf98252b/attachment-0001.html>

David Blaikie via llvm-dev

2020-Sep-03 16:59 UTC

head link

[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage

Is the machine running any jobs in parallel? Would it be worth trying
running lit in the loop, rather than the script? (perhaps lit's doing
something interesting) or maybe the full test run from ninja, but I
appreciate that that is expensive.

Are there other PPC bots? Any idea if they are experiencing this failure?

There are also other tests that do similar mkdir/symlink things, I think -
yet they are not failing? Maybe they do it in some slightly different
manner?

On Thu, Sep 3, 2020 at 5:03 AM Nemanja Ivanovic <nemanja.i.ibm at
gmail.com>
wrote:
> Sure.
> I didn't use lit or ninja. I simply copied the script produced by lit
>
(/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/tools/clang/test/Driver/Output/target-override.c.script)
> into a temporary directory (along with a deep copy of the build directory).
> I modified the paths in the script to point to the temporary directory.
> Then I ran the script in a loop.
> For running a bunch in parallel, I just produced a wrapper script to
> invoke that one:
> target-override.c.script $LINENO &
> target-override.c.script $LINENO &
> target-override.c.script $LINENO &
> ...
> wait
> And ran that in a loop. For thousands of iterations...
>
> On Wed, Sep 2, 2020 at 3:51 PM David Blaikie <dblaikie at gmail.com>
wrote:
>
>> Thanks for looking into it!
>>
>> Could you describe your test process in more detail? Were you running
lit
>> from your script? Running the build system (ninja?)?
>>
>> On Wed, Sep 2, 2020 at 10:47 AM Nemanja Ivanovic <nemanja.i.ibm at
gmail.com>
>> wrote:
>>
>>> Well, I am at my wit's end. I have copied over the script and
>>> directories for this test case and run it a few million times.
First I was
>>> running one at a time, then I switched to kicking off 1000 at a
time. All
>>> the while, the bots continued to run on the same machine. The
script never
>>> failed even once. I am not sure if this has something to do with
Python as
>>> part of llvm-lit or what is going on.
>>> I am thinking that the best course of action for us is to mark this
test
>>> case UNSUPPORTED for PPC.
>>>
>>> On Wed, Sep 2, 2020 at 12:41 PM Nemanja Ivanovic via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Interesting, thanks for bringing this to our attention. I just
took a
>>>> quick look through the last 100 builds and this test has failed
13 times.
>>>> This is certainly something we need to look at. We will
investigate and see
>>>> if we can make any sense of this.
>>>>
>>>> Nemanja Ivanovic
>>>> LLVM PPC Backend Development
>>>> IBM Toronto Lab
>>>> Email: nemanjai at ca.ibm.com
>>>> Phone: 905-413-3388
>>>>
>>>>
>>>>
>>>> ----- Original message -----
>>>> From: David Blaikie <dblaikie at gmail.com>
>>>> To: llvm-dev <llvm-dev at lists.llvm.org>, Nico Weber
<thakis at chromium.org>,
>>>> Serge Pavlov <sepavloff at gmail.com>, powerllvm at
ca.ibm.com
>>>> Cc:
>>>> Subject: [EXTERNAL] Flakey failure on
clang-ppc64le-linux-multistage
>>>> Date: Tue, Sep 1, 2020 6:10 PM
>>>>
>>>> Seems there were a couple of correlated failures that appear to
be
>>>> flakes on this buildbot recently:
>>>>
>>>> green:
>>>>
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974
>>>> red:
>>>>
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975
(target-override.c
>>>> during stage 1, seems to be missing the directory/symlink it
just created)
>>>> red:
>>>>
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976
(same
>>>> test failure as the last, but during stage 2, not stage 1)
>>>> green:
>>>>
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977
>>>>
>>>> Including Nico & Pavlov as the people who wrote/edited the
test, but
>>>> I'm guessing this is something interesting going on on the
buildbot itself?
>>>>
>>>> powerllvm at ca.ibm.com, whoever you are on the end of that
mailing list
>>>> - could you take a look at this? Possibly manually running that
test in a
>>>> loop a bunch of times to see if it fails sometimes & try to
help us
>>>> understand why?
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/a1295589/attachment.html>

Fāng-ruì Sòng via llvm-dev

2020-Sep-03 18:13 UTC

head link

[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage

This is likely due to a race condition (%T is a shared parent
directory). I'll put up a patch to fix it.

On Thu, Sep 3, 2020 at 10:00 AM David Blaikie via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Is the machine running any jobs in parallel? Would it be worth trying
running lit in the loop, rather than the script? (perhaps lit's doing
something interesting) or maybe the full test run from ninja, but I appreciate
that that is expensive.
>
> Are there other PPC bots? Any idea if they are experiencing this failure?
>
> There are also other tests that do similar mkdir/symlink things, I think -
yet they are not failing? Maybe they do it in some slightly different manner?
>
> On Thu, Sep 3, 2020 at 5:03 AM Nemanja Ivanovic <nemanja.i.ibm at
gmail.com> wrote:
>>
>> Sure.
>> I didn't use lit or ninja. I simply copied the script produced by
lit
(/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/tools/clang/test/Driver/Output/target-override.c.script)
into a temporary directory (along with a deep copy of the build directory). I
modified the paths in the script to point to the temporary directory.
>> Then I ran the script in a loop.
>> For running a bunch in parallel, I just produced a wrapper script to
invoke that one:
>> target-override.c.script $LINENO &
>> target-override.c.script $LINENO &
>> target-override.c.script $LINENO &
>> ...
>> wait
>> And ran that in a loop. For thousands of iterations...
>>
>> On Wed, Sep 2, 2020 at 3:51 PM David Blaikie <dblaikie at
gmail.com> wrote:
>>>
>>> Thanks for looking into it!
>>>
>>> Could you describe your test process in more detail? Were you
running lit from your script? Running the build system (ninja?)?
>>>
>>> On Wed, Sep 2, 2020 at 10:47 AM Nemanja Ivanovic <nemanja.i.ibm
at gmail.com> wrote:
>>>>
>>>> Well, I am at my wit's end. I have copied over the script
and directories for this test case and run it a few million times. First I was
running one at a time, then I switched to kicking off 1000 at a time. All the
while, the bots continued to run on the same machine. The script never failed
even once. I am not sure if this has something to do with Python as part of
llvm-lit or what is going on.
>>>> I am thinking that the best course of action for us is to mark
this test case UNSUPPORTED for PPC.
>>>>
>>>> On Wed, Sep 2, 2020 at 12:41 PM Nemanja Ivanovic via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>> Interesting, thanks for bringing this to our attention. I
just took a quick look through the last 100 builds and this test has failed 13
times. This is certainly something we need to look at. We will investigate and
see if we can make any sense of this.
>>>>>
>>>>> Nemanja Ivanovic
>>>>> LLVM PPC Backend Development
>>>>> IBM Toronto Lab
>>>>> Email: nemanjai at ca.ibm.com
>>>>> Phone: 905-413-3388
>>>>>
>>>>>
>>>>>
>>>>> ----- Original message -----
>>>>> From: David Blaikie <dblaikie at gmail.com>
>>>>> To: llvm-dev <llvm-dev at lists.llvm.org>, Nico Weber
<thakis at chromium.org>, Serge Pavlov <sepavloff at gmail.com>,
powerllvm at ca.ibm.com
>>>>> Cc:
>>>>> Subject: [EXTERNAL] Flakey failure on
clang-ppc64le-linux-multistage
>>>>> Date: Tue, Sep 1, 2020 6:10 PM
>>>>>
>>>>> Seems there were a couple of correlated failures that
appear to be flakes on this buildbot recently:
>>>>>
>>>>> green:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974
>>>>> red:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975
(target-override.c during stage 1, seems to be missing the directory/symlink it
just created)
>>>>> red:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976
(same test failure as the last, but during stage 2, not stage 1)
>>>>> green:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977
>>>>>
>>>>> Including Nico & Pavlov as the people who wrote/edited
the test, but I'm guessing this is something interesting going on on the
buildbot itself?
>>>>>
>>>>> powerllvm at ca.ibm.com, whoever you are on the end of that
mailing list - could you take a look at this? Possibly manually running that
test in a loop a bunch of times to see if it fails sometimes & try to help
us understand why?
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


-- 
宋方睿

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Sep 2020 - Flakey failure on clang-ppc64le-linux-multistage

[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage

[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage

[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage

Maybe Matching Threads