Nemanja Ivanovic via llvm-dev
2020-Sep-04 14:40 UTC
[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage
Thanks everyone for this discussion. Turns out that in my effort to make it possible to run multiple instances of this script in parallel, I inadvertently hid the issue. I made each instance use a directory that has $1 appended to the name and the wrapper script provided a unique value with $LINENO. :( MaskRay, thanks for fixing the problem. All the PPC bots are back to green now. On Thu, Sep 3, 2020 at 6:37 PM Robinson, Paul via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Thank you! It sounds like libcxx is rolling their own, so it seems like > finishing the removal of %T should be fine. > > But it should still be announced separately. > > --paulr > > > > *From:* Nico Weber <thakis at chromium.org> > *Sent:* Thursday, September 3, 2020 6:23 PM > *To:* Robinson, Paul <paul.robinson at sony.com> > *Cc:* David Blaikie <dblaikie at gmail.com>; LLVM on Power < > powerllvm at ca.ibm.com>; Nemanja Ivanovic <nemanjai at ca.ibm.com>; > llvm-dev at lists.llvm.org > *Subject:* Re: [llvm-dev] Flakey failure on clang-ppc64le-linux-multistage > > > > I think that was maybe the discussion on https://reviews.llvm.org/D78245 > <https://urldefense.com/v3/__https:/reviews.llvm.org/D78245__;!!JmoZiZGBv3RvKRSx!qUxaxgSDhAK0UvjN4Fw4SZNtpJl1UErEyIr-afENDwN-PUSngxiJsLPt6J7cBGD38A$> > > > > On Thu, Sep 3, 2020 at 6:22 PM Robinson, Paul <paul.robinson at sony.com> > wrote: > > I have a vague memory that libcxx wanted it for something, and claimed it > would be hard to work around not having it. > > Anyone else remember that? I can’t dredge up the details, sorry… > > In any event, a separate properly-titled thread on llvm-dev would be the > right way to decide this. > > --paulr > > > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Nico > Weber via llvm-dev > *Sent:* Thursday, September 3, 2020 4:16 PM > *To:* David Blaikie <dblaikie at gmail.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; LLVM on Power < > powerllvm at ca.ibm.com>; Nemanja Ivanovic <nemanjai at ca.ibm.com> > *Subject:* Re: [llvm-dev] Flakey failure on clang-ppc64le-linux-multistage > > > > https://llvm.org/docs/CommandGuide/lit.html > <https://urldefense.com/v3/__https:/llvm.org/docs/CommandGuide/lit.html__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mWS8jKYBA$> > already lists %T as "parent directory of %t (not unique, deprecated, do not > use)". See also https://reviews.llvm.org/D35396 > <https://urldefense.com/v3/__https:/reviews.llvm.org/D35396__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mVseLHLGw$> > > > > On Thu, Sep 3, 2020 at 3:37 PM David Blaikie <dblaikie at gmail.com> wrote: > > Yeah, I think I'd be up for considering deprecation of %T due to the risk > of race conditions/conflicts between tests. %t gives a unique name you can > do whatever you want with - only need one file, use %t as a file, need a > directory full of files, mkdir %t and use that, etc. > > But will depend a bit on what the uses of %T look like, maybe there are > some good uses of it that we haven't thought of until we see them. > > > > On Thu, Sep 3, 2020 at 12:33 PM Fāng-ruì Sòng <maskray at google.com> wrote: > > Should be fixed by https://reviews.llvm.org/D87103 > <https://urldefense.com/v3/__https:/reviews.llvm.org/D87103__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXzlKTnBw$> > > Shall we consider deprecating(emitting a warning)/removing %T from > lit? lldb, lld/COFF and clang-tools-extra are the three major users of > %T. There are a few other %T in other places but there are not too > many. We will also investigate whether other projects using lit are > using %T. > > On Thu, Sep 3, 2020 at 11:25 AM David Blaikie <dblaikie at gmail.com> wrote: > > > > Oh yeah, good catch! Thanks! > > > > On Thu, Sep 3, 2020 at 11:13 AM Fāng-ruì Sòng <maskray at google.com> > wrote: > >> > >> This is likely due to a race condition (%T is a shared parent > >> directory). I'll put up a patch to fix it. > >> > >> On Thu, Sep 3, 2020 at 10:00 AM David Blaikie via llvm-dev > >> <llvm-dev at lists.llvm.org> wrote: > >> > > >> > Is the machine running any jobs in parallel? Would it be worth trying > running lit in the loop, rather than the script? (perhaps lit's doing > something interesting) or maybe the full test run from ninja, but I > appreciate that that is expensive. > >> > > >> > Are there other PPC bots? Any idea if they are experiencing this > failure? > >> > > >> > There are also other tests that do similar mkdir/symlink things, I > think - yet they are not failing? Maybe they do it in some slightly > different manner? > >> > > >> > On Thu, Sep 3, 2020 at 5:03 AM Nemanja Ivanovic < > nemanja.i.ibm at gmail.com> wrote: > >> >> > >> >> Sure. > >> >> I didn't use lit or ninja. I simply copied the script produced by > lit > (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/tools/clang/test/Driver/Output/target-override.c.script) > into a temporary directory (along with a deep copy of the build directory). > I modified the paths in the script to point to the temporary directory. > >> >> Then I ran the script in a loop. > >> >> For running a bunch in parallel, I just produced a wrapper script to > invoke that one: > >> >> target-override.c.script $LINENO & > >> >> target-override.c.script $LINENO & > >> >> target-override.c.script $LINENO & > >> >> ... > >> >> wait > >> >> And ran that in a loop. For thousands of iterations... > >> >> > >> >> On Wed, Sep 2, 2020 at 3:51 PM David Blaikie <dblaikie at gmail.com> > wrote: > >> >>> > >> >>> Thanks for looking into it! > >> >>> > >> >>> Could you describe your test process in more detail? Were you > running lit from your script? Running the build system (ninja?)? > >> >>> > >> >>> On Wed, Sep 2, 2020 at 10:47 AM Nemanja Ivanovic < > nemanja.i.ibm at gmail.com> wrote: > >> >>>> > >> >>>> Well, I am at my wit's end. I have copied over the script and > directories for this test case and run it a few million times. First I was > running one at a time, then I switched to kicking off 1000 at a time. All > the while, the bots continued to run on the same machine. The script never > failed even once. I am not sure if this has something to do with Python as > part of llvm-lit or what is going on. > >> >>>> I am thinking that the best course of action for us is to mark > this test case UNSUPPORTED for PPC. > >> >>>> > >> >>>> On Wed, Sep 2, 2020 at 12:41 PM Nemanja Ivanovic via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> >>>>> > >> >>>>> Interesting, thanks for bringing this to our attention. I just > took a quick look through the last 100 builds and this test has failed 13 > times. This is certainly something we need to look at. We will investigate > and see if we can make any sense of this. > >> >>>>> > >> >>>>> Nemanja Ivanovic > >> >>>>> LLVM PPC Backend Development > >> >>>>> IBM Toronto Lab > >> >>>>> Email: nemanjai at ca.ibm.com > >> >>>>> Phone: 905-413-3388 <(905)%20413-3388> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> ----- Original message ----- > >> >>>>> From: David Blaikie <dblaikie at gmail.com> > >> >>>>> To: llvm-dev <llvm-dev at lists.llvm.org>, Nico Weber < > thakis at chromium.org>, Serge Pavlov <sepavloff at gmail.com>, > powerllvm at ca.ibm.com > >> >>>>> Cc: > >> >>>>> Subject: [EXTERNAL] Flakey failure on > clang-ppc64le-linux-multistage > >> >>>>> Date: Tue, Sep 1, 2020 6:10 PM > >> >>>>> > >> >>>>> Seems there were a couple of correlated failures that appear to > be flakes on this buildbot recently: > >> >>>>> > >> >>>>> green: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXw4VrwUw$> > >> >>>>> red: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXhob0Wcg$> > (target-override.c during stage 1, seems to be missing the > directory/symlink it just created) > >> >>>>> red: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mU1HOQs2Q$> > (same test failure as the last, but during stage 2, not stage 1) > >> >>>>> green: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mVp5e-Lnw$> > >> >>>>> > >> >>>>> Including Nico & Pavlov as the people who wrote/edited the test, > but I'm guessing this is something interesting going on on the buildbot > itself? > >> >>>>> > >> >>>>> powerllvm at ca.ibm.com, whoever you are on the end of that mailing > list - could you take a look at this? Possibly manually running that test > in a loop a bunch of times to see if it fails sometimes & try to help us > understand why? > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> _______________________________________________ > >> >>>>> LLVM Developers mailing list > >> >>>>> llvm-dev at lists.llvm.org > >> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mUugA-Hgw$> > >> > > >> > _______________________________________________ > >> > LLVM Developers mailing list > >> > llvm-dev at lists.llvm.org > >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mUugA-Hgw$> > >> > >> > >> > >> -- > >> 宋方睿 > > > > -- > 宋方睿 > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/b7197859/attachment.html>
Florian Hahn via llvm-dev
2020-Sep-04 16:29 UTC
[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage
> On Sep 4, 2020, at 15:40, Nemanja Ivanovic via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Thanks everyone for this discussion. Turns out that in my effort to make it possible to run multiple instances of this script in parallel, I inadvertently hid the issue. I made each instance use a directory that has $1 appended to the name and the wrapper script provided a unique value with $LINENO. :( > > MaskRay, thanks for fixing the problem. All the PPC bots are back to green now.That’s great, thank you for taking care of this! It looks like there may still be a problem with the clang-ppc64le-linux bot, which is failing due to gcc getting killed when building llvm/clang/unittests/Tooling/RecursiveASTVisitorTests/Callbacks.cpp. It seems like it sometimes passes, but fails due to getting killed most of the time. I took a look at the build history and it has been happening at least since http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/32287 <http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/32287> Cheers, Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/768d9064/attachment.html>
Nemanja Ivanovic via llvm-dev
2020-Sep-04 16:51 UTC
[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage
Right, I have left a message to the original author of that test case in the review where it was added (https://reviews.llvm.org/D82485) in the hopes the test case can be broken up into multiple files to avoid this problem. On Fri, Sep 4, 2020 at 12:29 PM Florian Hahn <florian_hahn at apple.com> wrote:> > > On Sep 4, 2020, at 15:40, Nemanja Ivanovic via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Thanks everyone for this discussion. Turns out that in my effort to make > it possible to run multiple instances of this script in parallel, I > inadvertently hid the issue. I made each instance use a directory that has > $1 appended to the name and the wrapper script provided a unique value with > $LINENO. :( > > MaskRay, thanks for fixing the problem. All the PPC bots are back to green > now. > > > > That’s great, thank you for taking care of this! > > It looks like there may still be a problem with the clang-ppc64le-linux > bot, which is failing due to gcc getting killed when > building llvm/clang/unittests/Tooling/RecursiveASTVisitorTests/Callbacks.cpp. > > It seems like it sometimes passes, but fails due to getting killed most of > the time. I took a look at the build history and it has been happening at > least since > http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/32287 > > Cheers, > Florian > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/ed1bd433/attachment.html>