thr3ads.net - llvm dev - [llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner [Jan 2022]

If this information is useful, please help other people find it:
Share via:

Pavel Labath via llvm-dev

2022-Jan-11 17:45 UTC

[llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner

On 11/01/2022 18:22, Philip Reames wrote:> 
> On 1/11/22 3:32 AM, Pavel Labath wrote:
>> I am afraid I too have to say that I believe the real problem here is 
>> the lack active developers with interest in/commitment to the windows 
>> port of lldb. While I appreciate having Stella's windows buildbot 
>> around, and it prevents windows from bitrotting completely, it would 
>> take a much more active involvement to resolve the multitude of 
>> systemic issues affecting windows support. Like, if we tried to apply 
>> the current llvm support policy guidelines to the windows (host-side, 
>> at least) support code, I don't think it would even meet the
criteria
>> for inclusion in the peripheral tier (active sub-community).
>>
>> Now for something slightly more constructive:
>>
>> While I am not familiar with the windows-specific parts of the 
>> watchpoint code, I think I can say without exaggerating that I have a 
>> *lot* of experience in fixing flaky tests. That experience tells me 
>> that flaky watchpoint tests are often/usually caused by factors 
>> outside lldb.  (due to watchpoints being a global, scarce, hardware 
>> resource). Virtualization is particularly tricky here -- every 
>> virtualization technology that I've tried has had (at some point in
>> time at least) a watchpoint-related bug. The problem described here 
>> sounds a lot like the issue I observed on Google Compute Engine, which 
>> could also miss some watchpoints "randomly". So, if this bot
is
>> running in any kind of a virtualized environment, the first thing
I'd
>> do is check whether the issue happens on physical hardware.
>>
>> Relatedly to that, I also want to mention that we also have the 
>> ability to skip categories of tests in lldb. All the watchpoint tests 
>> are (should be) annotated by the watchpoint category, and so you can 
>> easily skip all of them, either by hard-disabling the category for 
>> windows in the source code (if this is an lldb issue) or externally 
>> through the buildbot config (if this is due to the bot environment
=>
>> LLDB_TEST_USER_ARGS="--skip-category watchpoint").
> 
> Would it be reasonable to recommend that all of our windows bots testing 
> lldb add this flag?  Or maybe even check something in so that all builds 
> default to not running these tests on Windows? The former would make 
> sense if we primarily think this is virtualization related, the later if 
> we think it's more likely a code problem.
> 
If that question was meant for me, then my answer is yes. I think those 
tests should be disabled regardless of the cause. I actually tried to 
say the same thing, but I may not have succeeded in getting it across. 
Stella, can you share what kind of environment is that bot running in?
> I noticed last night that we have a couple of other windows bots which 
> seem to be hitting the same false positives.  Much lower frequencies, 
> but it does seem this is not specific to the particular bot.Hmm.. do you have a link to those bots or something? Stella's bot is the 
only windows (lldb) bot I am aware of and I'd be surprised if there were 
more of them.

Philip Reames via llvm-dev

2022-Jan-11 17:51 UTC

head link

[llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner

On 1/11/22 9:45 AM, Pavel Labath wrote:> On 11/01/2022 18:22, Philip Reames wrote:
>>
>> On 1/11/22 3:32 AM, Pavel Labath wrote:
>>> I am afraid I too have to say that I believe the real problem here 
>>> is the lack active developers with interest in/commitment to the 
>>> windows port of lldb. While I appreciate having Stella's
windows
>>> buildbot around, and it prevents windows from bitrotting
completely,
>>> it would take a much more active involvement to resolve the 
>>> multitude of systemic issues affecting windows support. Like, if we
>>> tried to apply the current llvm support policy guidelines to the 
>>> windows (host-side, at least) support code, I don't think it
would
>>> even meet the criteria for inclusion in the peripheral tier (active
>>> sub-community).
>>>
>>> Now for something slightly more constructive:
>>>
>>> While I am not familiar with the windows-specific parts of the 
>>> watchpoint code, I think I can say without exaggerating that I have
>>> a *lot* of experience in fixing flaky tests. That experience tells 
>>> me that flaky watchpoint tests are often/usually caused by factors 
>>> outside lldb.  (due to watchpoints being a global, scarce, hardware
>>> resource). Virtualization is particularly tricky here -- every 
>>> virtualization technology that I've tried has had (at some
point in
>>> time at least) a watchpoint-related bug. The problem described here
>>> sounds a lot like the issue I observed on Google Compute Engine, 
>>> which could also miss some watchpoints "randomly". So, if
this bot
>>> is running in any kind of a virtualized environment, the first
thing
>>> I'd do is check whether the issue happens on physical hardware.
>>>
>>> Relatedly to that, I also want to mention that we also have the 
>>> ability to skip categories of tests in lldb. All the watchpoint 
>>> tests are (should be) annotated by the watchpoint category, and so 
>>> you can easily skip all of them, either by hard-disabling the 
>>> category for windows in the source code (if this is an lldb issue) 
>>> or externally through the buildbot config (if this is due to the
bot
>>> environment => LLDB_TEST_USER_ARGS="--skip-category
watchpoint").
>>
>> Would it be reasonable to recommend that all of our windows bots 
>> testing lldb add this flag?  Or maybe even check something in so that 
>> all builds default to not running these tests on Windows? The former 
>> would make sense if we primarily think this is virtualization 
>> related, the later if we think it's more likely a code problem.
>>
>
> If that question was meant for me, then my answer is yes. I think 
> those tests should be disabled regardless of the cause. I actually 
> tried to say the same thing, but I may not have succeeded in getting 
> it across. Stella, can you share what kind of environment is that bot 
> running in?
>
>> I noticed last night that we have a couple of other windows bots 
>> which seem to be hitting the same false positives.  Much lower 
>> frequencies, but it does seem this is not specific to the particular 
>> bot.
> Hmm.. do you have a link to those bots or something? Stella's bot is 
> the only windows (lldb) bot I am aware of and I'd be surprised if 
> there were more of them.I went back and checked.  Turns out I was wrong here.  I had a couple of 
build failures with similar messages, but they were from this bot.

Stella Stamenova via llvm-dev

2022-Jan-11 17:59 UTC

head link

[llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner

The windows lldb bot is running on a Hyper-V virtual machine, so it would make
sense that if watchpoints don't work correctly in virtual environments they
would be failing there. On the rare occasion I've had to run these tests
locally, I have also seen them fail though, so that's not the only source of
issues.

Since I disabled the couple of tests yesterday, there's only one watchpoint
test that is still failing randomly. One option would be to disable just this
test and let the remaining few watchpoint tests continue to run on Windows (I
prefer this option since some tests would continue to run). Alternatively, all
the watchpoint tests can be skipped via the category flag, but in that case,
I'd like us to undo the individual skips.

I did notice while going through the watchpoint tests to see what is still
enabled on Windows, that the same watchpoint tests that are disabled/failing on
Windows are disabled on multiple other platforms as well. The tests passing on
Windows are also the ones that are not disabled on other platforms. A third
option would be to add a separate category for the watchpoint tests that
don't run correctly everywhere and use that to disable them instead. This
would be a more generic way to disable the tests instead of adding multiple
`skipIf` statements to each test.

Thanks,
-Stella

-----Original Message-----
From: Pavel Labath <pavel at labath.sk> 
Sent: Tuesday, January 11, 2022 9:46 AM
To: Philip Reames <listmail at philipreames.com>; Stella Stamenova
<stilis at microsoft.com>; Jim Ingham <jingham at apple.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; zturner at google.com
Subject: Re: [llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner

On 11/01/2022 18:22, Philip Reames wrote:> 
> On 1/11/22 3:32 AM, Pavel Labath wrote:
>> I am afraid I too have to say that I believe the real problem here is 
>> the lack active developers with interest in/commitment to the windows 
>> port of lldb. While I appreciate having Stella's windows buildbot 
>> around, and it prevents windows from bitrotting completely, it would 
>> take a much more active involvement to resolve the multitude of 
>> systemic issues affecting windows support. Like, if we tried to apply 
>> the current llvm support policy guidelines to the windows (host-side, 
>> at least) support code, I don't think it would even meet the
criteria
>> for inclusion in the peripheral tier (active sub-community).
>>
>> Now for something slightly more constructive:
>>
>> While I am not familiar with the windows-specific parts of the 
>> watchpoint code, I think I can say without exaggerating that I have a
>> *lot* of experience in fixing flaky tests. That experience tells me 
>> that flaky watchpoint tests are often/usually caused by factors 
>> outside lldb.  (due to watchpoints being a global, scarce, hardware 
>> resource). Virtualization is particularly tricky here -- every 
>> virtualization technology that I've tried has had (at some point in
>> time at least) a watchpoint-related bug. The problem described here 
>> sounds a lot like the issue I observed on Google Compute Engine, 
>> which could also miss some watchpoints "randomly". So, if
this bot is
>> running in any kind of a virtualized environment, the first thing
I'd
>> do is check whether the issue happens on physical hardware.
>>
>> Relatedly to that, I also want to mention that we also have the 
>> ability to skip categories of tests in lldb. All the watchpoint tests 
>> are (should be) annotated by the watchpoint category, and so you can 
>> easily skip all of them, either by hard-disabling the category for 
>> windows in the source code (if this is an lldb issue) or externally 
>> through the buildbot config (if this is due to the bot environment
=>
>> LLDB_TEST_USER_ARGS="--skip-category watchpoint").
> 
> Would it be reasonable to recommend that all of our windows bots 
> testing lldb add this flag?  Or maybe even check something in so that 
> all builds default to not running these tests on Windows? The former 
> would make sense if we primarily think this is virtualization related, 
> the later if we think it's more likely a code problem.
> 
If that question was meant for me, then my answer is yes. I think those tests
should be disabled regardless of the cause. I actually tried to say the same
thing, but I may not have succeeded in getting it across.
Stella, can you share what kind of environment is that bot running in?
> I noticed last night that we have a couple of other windows bots which 
> seem to be hitting the same false positives.  Much lower frequencies, 
> but it does seem this is not specific to the particular bot.Hmm.. do you have a link to those bots or something? Stella's bot is the
only windows (lldb) bot I am aware of and I'd be surprised if there were
more of them.

llvm dev - Jan 2022 - [EXTERNAL] Re: Responsibilities of a buildbot owner

[llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner

[llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner

[llvm-dev] [EXTERNAL] Re: Responsibilities of a buildbot owner