Nico Weber via llvm-dev
2020-Sep-03 22:23 UTC
[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage
I think that was maybe the discussion on https://reviews.llvm.org/D78245 On Thu, Sep 3, 2020 at 6:22 PM Robinson, Paul <paul.robinson at sony.com> wrote:> I have a vague memory that libcxx wanted it for something, and claimed it > would be hard to work around not having it. > > Anyone else remember that? I can’t dredge up the details, sorry… > > In any event, a separate properly-titled thread on llvm-dev would be the > right way to decide this. > > --paulr > > > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Nico > Weber via llvm-dev > *Sent:* Thursday, September 3, 2020 4:16 PM > *To:* David Blaikie <dblaikie at gmail.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; LLVM on Power < > powerllvm at ca.ibm.com>; Nemanja Ivanovic <nemanjai at ca.ibm.com> > *Subject:* Re: [llvm-dev] Flakey failure on clang-ppc64le-linux-multistage > > > > https://llvm.org/docs/CommandGuide/lit.html > <https://urldefense.com/v3/__https:/llvm.org/docs/CommandGuide/lit.html__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mWS8jKYBA$> > already lists %T as "parent directory of %t (not unique, deprecated, do not > use)". See also https://reviews.llvm.org/D35396 > <https://urldefense.com/v3/__https:/reviews.llvm.org/D35396__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mVseLHLGw$> > > > > On Thu, Sep 3, 2020 at 3:37 PM David Blaikie <dblaikie at gmail.com> wrote: > > Yeah, I think I'd be up for considering deprecation of %T due to the risk > of race conditions/conflicts between tests. %t gives a unique name you can > do whatever you want with - only need one file, use %t as a file, need a > directory full of files, mkdir %t and use that, etc. > > But will depend a bit on what the uses of %T look like, maybe there are > some good uses of it that we haven't thought of until we see them. > > > > On Thu, Sep 3, 2020 at 12:33 PM Fāng-ruì Sòng <maskray at google.com> wrote: > > Should be fixed by https://reviews.llvm.org/D87103 > <https://urldefense.com/v3/__https:/reviews.llvm.org/D87103__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXzlKTnBw$> > > Shall we consider deprecating(emitting a warning)/removing %T from > lit? lldb, lld/COFF and clang-tools-extra are the three major users of > %T. There are a few other %T in other places but there are not too > many. We will also investigate whether other projects using lit are > using %T. > > On Thu, Sep 3, 2020 at 11:25 AM David Blaikie <dblaikie at gmail.com> wrote: > > > > Oh yeah, good catch! Thanks! > > > > On Thu, Sep 3, 2020 at 11:13 AM Fāng-ruì Sòng <maskray at google.com> > wrote: > >> > >> This is likely due to a race condition (%T is a shared parent > >> directory). I'll put up a patch to fix it. > >> > >> On Thu, Sep 3, 2020 at 10:00 AM David Blaikie via llvm-dev > >> <llvm-dev at lists.llvm.org> wrote: > >> > > >> > Is the machine running any jobs in parallel? Would it be worth trying > running lit in the loop, rather than the script? (perhaps lit's doing > something interesting) or maybe the full test run from ninja, but I > appreciate that that is expensive. > >> > > >> > Are there other PPC bots? Any idea if they are experiencing this > failure? > >> > > >> > There are also other tests that do similar mkdir/symlink things, I > think - yet they are not failing? Maybe they do it in some slightly > different manner? > >> > > >> > On Thu, Sep 3, 2020 at 5:03 AM Nemanja Ivanovic < > nemanja.i.ibm at gmail.com> wrote: > >> >> > >> >> Sure. > >> >> I didn't use lit or ninja. I simply copied the script produced by > lit > (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/tools/clang/test/Driver/Output/target-override.c.script) > into a temporary directory (along with a deep copy of the build directory). > I modified the paths in the script to point to the temporary directory. > >> >> Then I ran the script in a loop. > >> >> For running a bunch in parallel, I just produced a wrapper script to > invoke that one: > >> >> target-override.c.script $LINENO & > >> >> target-override.c.script $LINENO & > >> >> target-override.c.script $LINENO & > >> >> ... > >> >> wait > >> >> And ran that in a loop. For thousands of iterations... > >> >> > >> >> On Wed, Sep 2, 2020 at 3:51 PM David Blaikie <dblaikie at gmail.com> > wrote: > >> >>> > >> >>> Thanks for looking into it! > >> >>> > >> >>> Could you describe your test process in more detail? Were you > running lit from your script? Running the build system (ninja?)? > >> >>> > >> >>> On Wed, Sep 2, 2020 at 10:47 AM Nemanja Ivanovic < > nemanja.i.ibm at gmail.com> wrote: > >> >>>> > >> >>>> Well, I am at my wit's end. I have copied over the script and > directories for this test case and run it a few million times. First I was > running one at a time, then I switched to kicking off 1000 at a time. All > the while, the bots continued to run on the same machine. The script never > failed even once. I am not sure if this has something to do with Python as > part of llvm-lit or what is going on. > >> >>>> I am thinking that the best course of action for us is to mark > this test case UNSUPPORTED for PPC. > >> >>>> > >> >>>> On Wed, Sep 2, 2020 at 12:41 PM Nemanja Ivanovic via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> >>>>> > >> >>>>> Interesting, thanks for bringing this to our attention. I just > took a quick look through the last 100 builds and this test has failed 13 > times. This is certainly something we need to look at. We will investigate > and see if we can make any sense of this. > >> >>>>> > >> >>>>> Nemanja Ivanovic > >> >>>>> LLVM PPC Backend Development > >> >>>>> IBM Toronto Lab > >> >>>>> Email: nemanjai at ca.ibm.com > >> >>>>> Phone: 905-413-3388 <(905)%20413-3388> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> ----- Original message ----- > >> >>>>> From: David Blaikie <dblaikie at gmail.com> > >> >>>>> To: llvm-dev <llvm-dev at lists.llvm.org>, Nico Weber < > thakis at chromium.org>, Serge Pavlov <sepavloff at gmail.com>, > powerllvm at ca.ibm.com > >> >>>>> Cc: > >> >>>>> Subject: [EXTERNAL] Flakey failure on > clang-ppc64le-linux-multistage > >> >>>>> Date: Tue, Sep 1, 2020 6:10 PM > >> >>>>> > >> >>>>> Seems there were a couple of correlated failures that appear to > be flakes on this buildbot recently: > >> >>>>> > >> >>>>> green: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXw4VrwUw$> > >> >>>>> red: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXhob0Wcg$> > (target-override.c during stage 1, seems to be missing the > directory/symlink it just created) > >> >>>>> red: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mU1HOQs2Q$> > (same test failure as the last, but during stage 2, not stage 1) > >> >>>>> green: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mVp5e-Lnw$> > >> >>>>> > >> >>>>> Including Nico & Pavlov as the people who wrote/edited the test, > but I'm guessing this is something interesting going on on the buildbot > itself? > >> >>>>> > >> >>>>> powerllvm at ca.ibm.com, whoever you are on the end of that mailing > list - could you take a look at this? Possibly manually running that test > in a loop a bunch of times to see if it fails sometimes & try to help us > understand why? > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> _______________________________________________ > >> >>>>> LLVM Developers mailing list > >> >>>>> llvm-dev at lists.llvm.org > >> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mUugA-Hgw$> > >> > > >> > _______________________________________________ > >> > LLVM Developers mailing list > >> > llvm-dev at lists.llvm.org > >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mUugA-Hgw$> > >> > >> > >> > >> -- > >> 宋方睿 > > > > -- > 宋方睿 > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/1501d428/attachment.html>
Robinson, Paul via llvm-dev
2020-Sep-03 22:36 UTC
[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage
Thank you! It sounds like libcxx is rolling their own, so it seems like finishing the removal of %T should be fine. But it should still be announced separately. --paulr From: Nico Weber <thakis at chromium.org> Sent: Thursday, September 3, 2020 6:23 PM To: Robinson, Paul <paul.robinson at sony.com> Cc: David Blaikie <dblaikie at gmail.com>; LLVM on Power <powerllvm at ca.ibm.com>; Nemanja Ivanovic <nemanjai at ca.ibm.com>; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Flakey failure on clang-ppc64le-linux-multistage I think that was maybe the discussion on https://reviews.llvm.org/D78245<https://urldefense.com/v3/__https:/reviews.llvm.org/D78245__;!!JmoZiZGBv3RvKRSx!qUxaxgSDhAK0UvjN4Fw4SZNtpJl1UErEyIr-afENDwN-PUSngxiJsLPt6J7cBGD38A$> On Thu, Sep 3, 2020 at 6:22 PM Robinson, Paul <paul.robinson at sony.com<mailto:paul.robinson at sony.com>> wrote: I have a vague memory that libcxx wanted it for something, and claimed it would be hard to work around not having it. Anyone else remember that? I can’t dredge up the details, sorry… In any event, a separate properly-titled thread on llvm-dev would be the right way to decide this. --paulr From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of Nico Weber via llvm-dev Sent: Thursday, September 3, 2020 4:16 PM To: David Blaikie <dblaikie at gmail.com<mailto:dblaikie at gmail.com>> Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>; LLVM on Power <powerllvm at ca.ibm.com<mailto:powerllvm at ca.ibm.com>>; Nemanja Ivanovic <nemanjai at ca.ibm.com<mailto:nemanjai at ca.ibm.com>> Subject: Re: [llvm-dev] Flakey failure on clang-ppc64le-linux-multistage https://llvm.org/docs/CommandGuide/lit.html<https://urldefense.com/v3/__https:/llvm.org/docs/CommandGuide/lit.html__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mWS8jKYBA$> already lists %T as "parent directory of %t (not unique, deprecated, do not use)". See also https://reviews.llvm.org/D35396<https://urldefense.com/v3/__https:/reviews.llvm.org/D35396__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mVseLHLGw$> On Thu, Sep 3, 2020 at 3:37 PM David Blaikie <dblaikie at gmail.com<mailto:dblaikie at gmail.com>> wrote: Yeah, I think I'd be up for considering deprecation of %T due to the risk of race conditions/conflicts between tests. %t gives a unique name you can do whatever you want with - only need one file, use %t as a file, need a directory full of files, mkdir %t and use that, etc. But will depend a bit on what the uses of %T look like, maybe there are some good uses of it that we haven't thought of until we see them. On Thu, Sep 3, 2020 at 12:33 PM Fāng-ruì Sòng <maskray at google.com<mailto:maskray at google.com>> wrote: Should be fixed by https://reviews.llvm.org/D87103<https://urldefense.com/v3/__https:/reviews.llvm.org/D87103__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXzlKTnBw$> Shall we consider deprecating(emitting a warning)/removing %T from lit? lldb, lld/COFF and clang-tools-extra are the three major users of %T. There are a few other %T in other places but there are not too many. We will also investigate whether other projects using lit are using %T. On Thu, Sep 3, 2020 at 11:25 AM David Blaikie <dblaikie at gmail.com<mailto:dblaikie at gmail.com>> wrote:> > Oh yeah, good catch! Thanks! > > On Thu, Sep 3, 2020 at 11:13 AM Fāng-ruì Sòng <maskray at google.com<mailto:maskray at google.com>> wrote: >> >> This is likely due to a race condition (%T is a shared parent >> directory). I'll put up a patch to fix it. >> >> On Thu, Sep 3, 2020 at 10:00 AM David Blaikie via llvm-dev >> <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: >> > >> > Is the machine running any jobs in parallel? Would it be worth trying running lit in the loop, rather than the script? (perhaps lit's doing something interesting) or maybe the full test run from ninja, but I appreciate that that is expensive. >> > >> > Are there other PPC bots? Any idea if they are experiencing this failure? >> > >> > There are also other tests that do similar mkdir/symlink things, I think - yet they are not failing? Maybe they do it in some slightly different manner? >> > >> > On Thu, Sep 3, 2020 at 5:03 AM Nemanja Ivanovic <nemanja.i.ibm at gmail.com<mailto:nemanja.i.ibm at gmail.com>> wrote: >> >> >> >> Sure. >> >> I didn't use lit or ninja. I simply copied the script produced by lit (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/tools/clang/test/Driver/Output/target-override.c.script) into a temporary directory (along with a deep copy of the build directory). I modified the paths in the script to point to the temporary directory. >> >> Then I ran the script in a loop. >> >> For running a bunch in parallel, I just produced a wrapper script to invoke that one: >> >> target-override.c.script $LINENO & >> >> target-override.c.script $LINENO & >> >> target-override.c.script $LINENO & >> >> ... >> >> wait >> >> And ran that in a loop. For thousands of iterations... >> >> >> >> On Wed, Sep 2, 2020 at 3:51 PM David Blaikie <dblaikie at gmail.com<mailto:dblaikie at gmail.com>> wrote: >> >>> >> >>> Thanks for looking into it! >> >>> >> >>> Could you describe your test process in more detail? Were you running lit from your script? Running the build system (ninja?)? >> >>> >> >>> On Wed, Sep 2, 2020 at 10:47 AM Nemanja Ivanovic <nemanja.i.ibm at gmail.com<mailto:nemanja.i.ibm at gmail.com>> wrote: >> >>>> >> >>>> Well, I am at my wit's end. I have copied over the script and directories for this test case and run it a few million times. First I was running one at a time, then I switched to kicking off 1000 at a time. All the while, the bots continued to run on the same machine. The script never failed even once. I am not sure if this has something to do with Python as part of llvm-lit or what is going on. >> >>>> I am thinking that the best course of action for us is to mark this test case UNSUPPORTED for PPC. >> >>>> >> >>>> On Wed, Sep 2, 2020 at 12:41 PM Nemanja Ivanovic via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: >> >>>>> >> >>>>> Interesting, thanks for bringing this to our attention. I just took a quick look through the last 100 builds and this test has failed 13 times. This is certainly something we need to look at. We will investigate and see if we can make any sense of this. >> >>>>> >> >>>>> Nemanja Ivanovic >> >>>>> LLVM PPC Backend Development >> >>>>> IBM Toronto Lab >> >>>>> Email: nemanjai at ca.ibm.com<mailto:nemanjai at ca.ibm.com> >> >>>>> Phone: 905-413-3388<tel:(905)%20413-3388> >> >>>>> >> >>>>> >> >>>>> >> >>>>> ----- Original message ----- >> >>>>> From: David Blaikie <dblaikie at gmail.com<mailto:dblaikie at gmail.com>> >> >>>>> To: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>, Nico Weber <thakis at chromium.org<mailto:thakis at chromium.org>>, Serge Pavlov <sepavloff at gmail.com<mailto:sepavloff at gmail.com>>, powerllvm at ca.ibm.com<mailto:powerllvm at ca.ibm.com> >> >>>>> Cc: >> >>>>> Subject: [EXTERNAL] Flakey failure on clang-ppc64le-linux-multistage >> >>>>> Date: Tue, Sep 1, 2020 6:10 PM >> >>>>> >> >>>>> Seems there were a couple of correlated failures that appear to be flakes on this buildbot recently: >> >>>>> >> >>>>> green: http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974<https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXw4VrwUw$> >> >>>>> red: http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975<https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXhob0Wcg$> (target-override.c during stage 1, seems to be missing the directory/symlink it just created) >> >>>>> red: http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976<https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mU1HOQs2Q$> (same test failure as the last, but during stage 2, not stage 1) >> >>>>> green: http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977<https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mVp5e-Lnw$> >> >>>>> >> >>>>> Including Nico & Pavlov as the people who wrote/edited the test, but I'm guessing this is something interesting going on on the buildbot itself? >> >>>>> >> >>>>> powerllvm at ca.ibm.com<mailto:powerllvm at ca.ibm.com>, whoever you are on the end of that mailing list - could you take a look at this? Possibly manually running that test in a loop a bunch of times to see if it fails sometimes & try to help us understand why? >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> LLVM Developers mailing list >> >>>>> llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> >> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mUugA-Hgw$> >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mUugA-Hgw$> >> >> >> >> -- >> 宋方睿-- 宋方睿 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/26b47761/attachment-0001.html>
Nemanja Ivanovic via llvm-dev
2020-Sep-04 14:40 UTC
[llvm-dev] Flakey failure on clang-ppc64le-linux-multistage
Thanks everyone for this discussion. Turns out that in my effort to make it possible to run multiple instances of this script in parallel, I inadvertently hid the issue. I made each instance use a directory that has $1 appended to the name and the wrapper script provided a unique value with $LINENO. :( MaskRay, thanks for fixing the problem. All the PPC bots are back to green now. On Thu, Sep 3, 2020 at 6:37 PM Robinson, Paul via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Thank you! It sounds like libcxx is rolling their own, so it seems like > finishing the removal of %T should be fine. > > But it should still be announced separately. > > --paulr > > > > *From:* Nico Weber <thakis at chromium.org> > *Sent:* Thursday, September 3, 2020 6:23 PM > *To:* Robinson, Paul <paul.robinson at sony.com> > *Cc:* David Blaikie <dblaikie at gmail.com>; LLVM on Power < > powerllvm at ca.ibm.com>; Nemanja Ivanovic <nemanjai at ca.ibm.com>; > llvm-dev at lists.llvm.org > *Subject:* Re: [llvm-dev] Flakey failure on clang-ppc64le-linux-multistage > > > > I think that was maybe the discussion on https://reviews.llvm.org/D78245 > <https://urldefense.com/v3/__https:/reviews.llvm.org/D78245__;!!JmoZiZGBv3RvKRSx!qUxaxgSDhAK0UvjN4Fw4SZNtpJl1UErEyIr-afENDwN-PUSngxiJsLPt6J7cBGD38A$> > > > > On Thu, Sep 3, 2020 at 6:22 PM Robinson, Paul <paul.robinson at sony.com> > wrote: > > I have a vague memory that libcxx wanted it for something, and claimed it > would be hard to work around not having it. > > Anyone else remember that? I can’t dredge up the details, sorry… > > In any event, a separate properly-titled thread on llvm-dev would be the > right way to decide this. > > --paulr > > > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Nico > Weber via llvm-dev > *Sent:* Thursday, September 3, 2020 4:16 PM > *To:* David Blaikie <dblaikie at gmail.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; LLVM on Power < > powerllvm at ca.ibm.com>; Nemanja Ivanovic <nemanjai at ca.ibm.com> > *Subject:* Re: [llvm-dev] Flakey failure on clang-ppc64le-linux-multistage > > > > https://llvm.org/docs/CommandGuide/lit.html > <https://urldefense.com/v3/__https:/llvm.org/docs/CommandGuide/lit.html__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mWS8jKYBA$> > already lists %T as "parent directory of %t (not unique, deprecated, do not > use)". See also https://reviews.llvm.org/D35396 > <https://urldefense.com/v3/__https:/reviews.llvm.org/D35396__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mVseLHLGw$> > > > > On Thu, Sep 3, 2020 at 3:37 PM David Blaikie <dblaikie at gmail.com> wrote: > > Yeah, I think I'd be up for considering deprecation of %T due to the risk > of race conditions/conflicts between tests. %t gives a unique name you can > do whatever you want with - only need one file, use %t as a file, need a > directory full of files, mkdir %t and use that, etc. > > But will depend a bit on what the uses of %T look like, maybe there are > some good uses of it that we haven't thought of until we see them. > > > > On Thu, Sep 3, 2020 at 12:33 PM Fāng-ruì Sòng <maskray at google.com> wrote: > > Should be fixed by https://reviews.llvm.org/D87103 > <https://urldefense.com/v3/__https:/reviews.llvm.org/D87103__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXzlKTnBw$> > > Shall we consider deprecating(emitting a warning)/removing %T from > lit? lldb, lld/COFF and clang-tools-extra are the three major users of > %T. There are a few other %T in other places but there are not too > many. We will also investigate whether other projects using lit are > using %T. > > On Thu, Sep 3, 2020 at 11:25 AM David Blaikie <dblaikie at gmail.com> wrote: > > > > Oh yeah, good catch! Thanks! > > > > On Thu, Sep 3, 2020 at 11:13 AM Fāng-ruì Sòng <maskray at google.com> > wrote: > >> > >> This is likely due to a race condition (%T is a shared parent > >> directory). I'll put up a patch to fix it. > >> > >> On Thu, Sep 3, 2020 at 10:00 AM David Blaikie via llvm-dev > >> <llvm-dev at lists.llvm.org> wrote: > >> > > >> > Is the machine running any jobs in parallel? Would it be worth trying > running lit in the loop, rather than the script? (perhaps lit's doing > something interesting) or maybe the full test run from ninja, but I > appreciate that that is expensive. > >> > > >> > Are there other PPC bots? Any idea if they are experiencing this > failure? > >> > > >> > There are also other tests that do similar mkdir/symlink things, I > think - yet they are not failing? Maybe they do it in some slightly > different manner? > >> > > >> > On Thu, Sep 3, 2020 at 5:03 AM Nemanja Ivanovic < > nemanja.i.ibm at gmail.com> wrote: > >> >> > >> >> Sure. > >> >> I didn't use lit or ninja. I simply copied the script produced by > lit > (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/tools/clang/test/Driver/Output/target-override.c.script) > into a temporary directory (along with a deep copy of the build directory). > I modified the paths in the script to point to the temporary directory. > >> >> Then I ran the script in a loop. > >> >> For running a bunch in parallel, I just produced a wrapper script to > invoke that one: > >> >> target-override.c.script $LINENO & > >> >> target-override.c.script $LINENO & > >> >> target-override.c.script $LINENO & > >> >> ... > >> >> wait > >> >> And ran that in a loop. For thousands of iterations... > >> >> > >> >> On Wed, Sep 2, 2020 at 3:51 PM David Blaikie <dblaikie at gmail.com> > wrote: > >> >>> > >> >>> Thanks for looking into it! > >> >>> > >> >>> Could you describe your test process in more detail? Were you > running lit from your script? Running the build system (ninja?)? > >> >>> > >> >>> On Wed, Sep 2, 2020 at 10:47 AM Nemanja Ivanovic < > nemanja.i.ibm at gmail.com> wrote: > >> >>>> > >> >>>> Well, I am at my wit's end. I have copied over the script and > directories for this test case and run it a few million times. First I was > running one at a time, then I switched to kicking off 1000 at a time. All > the while, the bots continued to run on the same machine. The script never > failed even once. I am not sure if this has something to do with Python as > part of llvm-lit or what is going on. > >> >>>> I am thinking that the best course of action for us is to mark > this test case UNSUPPORTED for PPC. > >> >>>> > >> >>>> On Wed, Sep 2, 2020 at 12:41 PM Nemanja Ivanovic via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> >>>>> > >> >>>>> Interesting, thanks for bringing this to our attention. I just > took a quick look through the last 100 builds and this test has failed 13 > times. This is certainly something we need to look at. We will investigate > and see if we can make any sense of this. > >> >>>>> > >> >>>>> Nemanja Ivanovic > >> >>>>> LLVM PPC Backend Development > >> >>>>> IBM Toronto Lab > >> >>>>> Email: nemanjai at ca.ibm.com > >> >>>>> Phone: 905-413-3388 <(905)%20413-3388> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> ----- Original message ----- > >> >>>>> From: David Blaikie <dblaikie at gmail.com> > >> >>>>> To: llvm-dev <llvm-dev at lists.llvm.org>, Nico Weber < > thakis at chromium.org>, Serge Pavlov <sepavloff at gmail.com>, > powerllvm at ca.ibm.com > >> >>>>> Cc: > >> >>>>> Subject: [EXTERNAL] Flakey failure on > clang-ppc64le-linux-multistage > >> >>>>> Date: Tue, Sep 1, 2020 6:10 PM > >> >>>>> > >> >>>>> Seems there were a couple of correlated failures that appear to > be flakes on this buildbot recently: > >> >>>>> > >> >>>>> green: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXw4VrwUw$> > >> >>>>> red: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mXhob0Wcg$> > (target-override.c during stage 1, seems to be missing the > directory/symlink it just created) > >> >>>>> red: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13976__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mU1HOQs2Q$> > (same test failure as the last, but during stage 2, not stage 1) > >> >>>>> green: > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977 > <https://urldefense.com/v3/__http:/lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13977__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mVp5e-Lnw$> > >> >>>>> > >> >>>>> Including Nico & Pavlov as the people who wrote/edited the test, > but I'm guessing this is something interesting going on on the buildbot > itself? > >> >>>>> > >> >>>>> powerllvm at ca.ibm.com, whoever you are on the end of that mailing > list - could you take a look at this? Possibly manually running that test > in a loop a bunch of times to see if it fails sometimes & try to help us > understand why? > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> _______________________________________________ > >> >>>>> LLVM Developers mailing list > >> >>>>> llvm-dev at lists.llvm.org > >> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mUugA-Hgw$> > >> > > >> > _______________________________________________ > >> > LLVM Developers mailing list > >> > llvm-dev at lists.llvm.org > >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pMM0AcKS3gRL1wx2OJk-DMZG6KNuO3f602ILYnDX01_Q_Se_K_tNOHEg9mUugA-Hgw$> > >> > >> > >> > >> -- > >> 宋方睿 > > > > -- > 宋方睿 > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/b7197859/attachment.html>