thr3ads.net - llvm dev - [llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner [Jan 2022]

If this information is useful, please help other people find it:
Share via:

Stella Stamenova via llvm-dev

2022-Jan-11 17:59 UTC

[llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner

The windows lldb bot is running on a Hyper-V virtual machine, so it would make
sense that if watchpoints don't work correctly in virtual environments they
would be failing there. On the rare occasion I've had to run these tests
locally, I have also seen them fail though, so that's not the only source of
issues.

Since I disabled the couple of tests yesterday, there's only one watchpoint
test that is still failing randomly. One option would be to disable just this
test and let the remaining few watchpoint tests continue to run on Windows (I
prefer this option since some tests would continue to run). Alternatively, all
the watchpoint tests can be skipped via the category flag, but in that case,
I'd like us to undo the individual skips.

I did notice while going through the watchpoint tests to see what is still
enabled on Windows, that the same watchpoint tests that are disabled/failing on
Windows are disabled on multiple other platforms as well. The tests passing on
Windows are also the ones that are not disabled on other platforms. A third
option would be to add a separate category for the watchpoint tests that
don't run correctly everywhere and use that to disable them instead. This
would be a more generic way to disable the tests instead of adding multiple
`skipIf` statements to each test.

Thanks,
-Stella

-----Original Message-----
From: Pavel Labath <pavel at labath.sk> 
Sent: Tuesday, January 11, 2022 9:46 AM
To: Philip Reames <listmail at philipreames.com>; Stella Stamenova
<stilis at microsoft.com>; Jim Ingham <jingham at apple.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; zturner at google.com
Subject: Re: [llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner

On 11/01/2022 18:22, Philip Reames wrote:> 
> On 1/11/22 3:32 AM, Pavel Labath wrote:
>> I am afraid I too have to say that I believe the real problem here is 
>> the lack active developers with interest in/commitment to the windows 
>> port of lldb. While I appreciate having Stella's windows buildbot 
>> around, and it prevents windows from bitrotting completely, it would 
>> take a much more active involvement to resolve the multitude of 
>> systemic issues affecting windows support. Like, if we tried to apply 
>> the current llvm support policy guidelines to the windows (host-side, 
>> at least) support code, I don't think it would even meet the
criteria
>> for inclusion in the peripheral tier (active sub-community).
>>
>> Now for something slightly more constructive:
>>
>> While I am not familiar with the windows-specific parts of the 
>> watchpoint code, I think I can say without exaggerating that I have a
>> *lot* of experience in fixing flaky tests. That experience tells me 
>> that flaky watchpoint tests are often/usually caused by factors 
>> outside lldb.  (due to watchpoints being a global, scarce, hardware 
>> resource). Virtualization is particularly tricky here -- every 
>> virtualization technology that I've tried has had (at some point in
>> time at least) a watchpoint-related bug. The problem described here 
>> sounds a lot like the issue I observed on Google Compute Engine, 
>> which could also miss some watchpoints "randomly". So, if
this bot is
>> running in any kind of a virtualized environment, the first thing
I'd
>> do is check whether the issue happens on physical hardware.
>>
>> Relatedly to that, I also want to mention that we also have the 
>> ability to skip categories of tests in lldb. All the watchpoint tests 
>> are (should be) annotated by the watchpoint category, and so you can 
>> easily skip all of them, either by hard-disabling the category for 
>> windows in the source code (if this is an lldb issue) or externally 
>> through the buildbot config (if this is due to the bot environment
=>
>> LLDB_TEST_USER_ARGS="--skip-category watchpoint").
> 
> Would it be reasonable to recommend that all of our windows bots 
> testing lldb add this flag?  Or maybe even check something in so that 
> all builds default to not running these tests on Windows? The former 
> would make sense if we primarily think this is virtualization related, 
> the later if we think it's more likely a code problem.
> 
If that question was meant for me, then my answer is yes. I think those tests
should be disabled regardless of the cause. I actually tried to say the same
thing, but I may not have succeeded in getting it across.
Stella, can you share what kind of environment is that bot running in?
> I noticed last night that we have a couple of other windows bots which 
> seem to be hitting the same false positives.  Much lower frequencies, 
> but it does seem this is not specific to the particular bot.Hmm.. do you have a link to those bots or something? Stella's bot is the
only windows (lldb) bot I am aware of and I'd be surprised if there were
more of them.

Pavel Labath via llvm-dev

2022-Jan-11 18:31 UTC

head link

[llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner

On 11/01/2022 18:59, Stella Stamenova wrote:> The windows lldb bot is running on a Hyper-V virtual machine, so it would
make sense that if watchpoints don't work correctly in virtual environments
they would be failing there. On the rare occasion I've had to run these
tests locally, I have also seen them fail though, so that's not the only
source of issues.
> 
> Since I disabled the couple of tests yesterday, there's only one
watchpoint test that is still failing randomly. One option would be to disable
just this test and let the remaining few watchpoint tests continue to run on
Windows (I prefer this option since some tests would continue to run).
Alternatively, all the watchpoint tests can be skipped via the category flag,
but in that case, I'd like us to undo the individual skips.
For better or worse, you're currently the most (only?) interested person 
in keeping windows host support working, so I think you can manage the 
windows skips/fails in any way you see fit. The rest of us are mostly 
interested in having green builds. :)

Hyper-V is _not_ among the virtualization systems I've tried using with 
lldb, so I cannot conclusively say anything about it (though I still 
have my doubts).
> 
> I did notice while going through the watchpoint tests to see what is still
enabled on Windows, that the same watchpoint tests that are disabled/failing on
Windows are disabled on multiple other platforms as well. The tests passing on
Windows are also the ones that are not disabled on other platforms. A third
option would be to add a separate category for the watchpoint tests that
don't run correctly everywhere and use that to disable them instead. This
would be a more generic way to disable the tests instead of adding multiple
`skipIf` statements to each test.
On non-x86 architectures, watchpoints tend to be available only on 
special (developer) hardware or similar (x86 is the outlier in having 
universal support), which is why these tests tend to accumulate various 
annotations. However, I don't think we need to solve this problem (how 
to skip the tests "nicely") here...

pl

llvm dev - Jan 2022 - [EXTERNAL] Re: Responsibilities of a buildbot owner

[llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner

[llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner