On 10/18/2012 01:39 PM, David Blaikie wrote:> On Thu, Oct 18, 2012 at 11:19 AM, Rick Foos<rfoos at codeaurora.org>
wrote:
>> On 10/18/2012 10:36 AM, David Blaikie wrote:
>>> On Thu, Oct 18, 2012 at 12:48 AM, Renato Golin<rengolin at
systemcall.org>
>>> wrote:
>>>> On 18 October 2012 05:11, Rick Foos<rfoos at
codeaurora.org> wrote:
>>>>> I don't think GDB testsuite should block a commit, it
can vary by a few
>>>>> tests, they rarely if ever all pass 100%. Tracking the
results over time
>>>>> can
>>>>> catch big regressions, as well as the ones that slowly
increase the
>>>>> failed
>>>>> tests.
>>>> Agreed. It should at least serve as comparison between two
branches,
>>>> but hopefully, being actively monitored.
>>>>
>>>> Maybe would be good to add a directory (if there isn't one
yet) to the
>>>> testsuite repository, or at least the code necessary to make it
run on
>>>> LLVM.
>>>
>>> The clang-tests repository (
>>> http://llvm.org/viewvc/llvm-project/clang-tests/ ) includes an
Apple
>>> GCC 4.2 compatible version of the GCC and GDB test suites that
Apple
>>> run internally. I'm working on bringing up an equivalent public
>>> buildbot at least for the GDB suite here (
>>> http://lab.llvm.org:8011/builders/clang-x86_64-darwin10-gdb-gcc ) -
>>> just a few timing out tests I need to look at to get that green.
>>> Apparently it's fairly stable.
>>>
>>> Beyond that I'll be trying to bring up one with the latest
suite (7.4
>>> is what I've been playing with) on Linux as well.
>>>
>> Since you're going to bring a bot up in zorg, I'll stop working
on bringing
>> mine testsuite runner forward.
> I'm still interested in any details you have about issues you've
> resolved/learnt, etc.
>
>> A couple thoughts:
>>
>> 1) I've been running on the latest test suite, polling once a day.
I think
>> Eric and anyone working dwarf 4/5 should be running against the
upstream
>> testsuite. (I have no problems with running 7.4 too)
> Interesting thought. (just so we're all on the same page when you say
> "test suite" you're talking about the GDB dejagnu test suite
(the same
> one (well, more recent version of it) that's in clang-tests)) Though I
> hesitate to have such a moving target, I can see how it could be
> useful.
>
Yes, the sourceware.org site. I hesitated as well, but I tried it and
it's OK.>> It's been stable to run at the tip of GDB this way, the test
results aren't
>> varying much.
> With the right heuristics I suppose this could be valuable, but will
> require more work to find the right signal in the (even small) noise.
>
I wrote a 10 line awk script to create a CSV file out of the test
summaries to make a one-to-one comparison of tests. It's over 70 lines
now...It's not a matter that I should have used Python, Everything is an
exception to the rules. You can get close, but I can't say you can get
perfect signals.
From a compiler developer point of view, the spreadsheet was worthless.
We're not testing GDB, but rather what the compiler feeds to GDB.
Take the log file, check out the suite, rerun a failing test, use
dwarfdump and llvm-dwarfdump, find the "bad" dwarf records produced by
the compiler.
All the eventual bugs are about dwarf records, and a gdb testsuite test
to duplicate.
A bad/confused dwarf record fails multiple tests without a way to map a
failure back to dwarf.
In the end, a fine grained signal doesn't do what you might want.
>> 2) A surprise benefit of running this way is that hundreds of obsolete
>> tests, or broken tests are getting removed. This hasn't resulted in
any
>> broken backwards compatibility here at least. Saves tons of time
debugging
>> tests that don't work, and developing around compatible things that
>> reasonable people have decided no longer matter.
> Fair point.
>
>> 3) Testsuite runs against two compilers at a time makes it easier to
see
>> regressions. By comparing against a known stable compiler, or GCC,
>> regressions are visible by summary numbers.
> I assume GDB runs their own test suite against some version (or the
> simultaneous latest) of GCC? If we can't scrape those existing results
> we can reproduce them (running the full suite with both GCC& Clang
> side-by-side).
>
gdb-testers at sourceware.org has a run every night. Yes, I reproduce. A
non-x86 target has very different results, so I look for a good gcc
cross compiler to establish a baseline.
In the case of clang, all the arch's share the Dwarf processing. So an
x86 run covers a lot more of the dwarf processing than worrying too much
about a cross compiler run. (But some worry about limiting fixes to
regressions from a cross-compiled GCC, so have to run that as
well)>> 4) I have plots of the summary numbers online with a window of a month
or
>> two. The trend allows you to see regressions occurring, and remaining
as
>> regressions. Sometimes GDB Testsuite or a compiler has a bad day. The
trend
>> let's you see a stable regression, and when you get a round toit,
tells you
>> when the regression started.
> Yep, also if we're trying to address all these issues, would be to
> prioritize the very stable failures (where Clang fails a test that GCC
> passes& does so consistently for a long time) first. Then look at the
> unstable ones last - figure out which compiler's to blame, "XFAIL:
> clang" them or whatever is necessary.
>
I avoid doing that the XFAIL thing. Plotting all the lines from the
summary makes more sense, and it's what you see when you run the test
manually.
When you move a Fail artificially to Xfail, the plot just has a few V's
in it where the Fail line drops, and the Xfail line goes up. No new
information.
I prefer leaving the actual summary numbers in place. All the data you
need is there.
As you might have guessed, I like tests that fail, and want to get rid
of the ones the pass too often :)
>> <soapbox>
>> I've been doing this with Jenkins. It's fairly easy to set up,
and does the
>> plotting. Developers can grab a copy of the script to duplicate a run
on
>> their broken compiler. Running the testsuite under JNLP increased the
number
>> of executed tests - don't know why just did.
>> </soapbox>
> I wouldn't mind seeing your jenkins setup/config/tweaks/etc as a
> reference point, if you've got it lying around.
>
I'll see what I can send, or just as easy to walk through it. Jenkins
isn't really like buildbot. Do you have Jenkins running there?
> - David
--
Rick Foos
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The
Linux Foundation