thr3ads.net - llvm dev - [LLVMdev] DWARF 2/3 backwards compatibility? [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Rick Foos

2012-Oct-18 18:19 UTC

[LLVMdev] DWARF 2/3 backwards compatibility?

On 10/18/2012 10:36 AM, David Blaikie wrote:> On Thu, Oct 18, 2012 at 12:48 AM, Renato Golin<rengolin at
systemcall.org>  wrote:
>> On 18 October 2012 05:11, Rick Foos<rfoos at codeaurora.org> 
wrote:
>>> I don't think GDB testsuite should block a commit, it can vary
by a few
>>> tests, they rarely if ever all pass 100%. Tracking the results over
time can
>>> catch big regressions, as well as the ones that slowly increase the
failed
>>> tests.
>> Agreed. It should at least serve as comparison between two branches,
>> but hopefully, being actively monitored.
>>
>> Maybe would be good to add a directory (if there isn't one yet) to
the
>> testsuite repository, or at least the code necessary to make it run on
>> LLVM.
>
> The clang-tests repository (
> http://llvm.org/viewvc/llvm-project/clang-tests/ ) includes an Apple
> GCC 4.2 compatible version of the GCC and GDB test suites that Apple
> run internally. I'm working on bringing up an equivalent public
> buildbot at least for the GDB suite here (
> http://lab.llvm.org:8011/builders/clang-x86_64-darwin10-gdb-gcc ) -
> just a few timing out tests I need to look at to get that green.
> Apparently it's fairly stable.
>
> Beyond that I'll be trying to bring up one with the latest suite (7.4
> is what I've been playing with) on Linux as well.
>Since you're going to bring a bot up in zorg, I'll stop working on 
bringing mine testsuite runner forward. A couple thoughts:

1) I've been running on the latest test suite, polling once a day. I 
think Eric and anyone working dwarf 4/5 should be running against the 
upstream testsuite. (I have no problems with running 7.4 too)

It's been stable to run at the tip of GDB this way, the test results 
aren't varying much.

2) A surprise benefit of running this way is that hundreds of obsolete 
tests, or broken tests are getting removed. This hasn't resulted in any 
broken backwards compatibility here at least. Saves tons of time 
debugging tests that don't work, and developing around compatible things 
that reasonable people have decided no longer matter.

3) Testsuite runs against two compilers at a time makes it easier to see 
regressions. By comparing against a known stable compiler, or GCC, 
regressions are visible by summary numbers.

4) I have plots of the summary numbers online with a window of a month 
or two. The trend allows you to see regressions occurring, and remaining 
as regressions. Sometimes GDB Testsuite or a compiler has a bad day. The 
trend let's you see a stable regression, and when you get a round toit, 
tells you when the regression started.

<soapbox>
I've been doing this with Jenkins. It's fairly easy to set up, and does 
the plotting. Developers can grab a copy of the script to duplicate a 
run on their broken compiler. Running the testsuite under JNLP increased 
the number of executed tests - don't know why just did.
</soapbox>

> - David
>
>> --
>> cheers,
>> --renato
>>
>> http://systemcall.org/
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-- 
Rick Foos
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The
Linux Foundation

David Blaikie

2012-Oct-18 18:39 UTC

head link

[LLVMdev] DWARF 2/3 backwards compatibility?

On Thu, Oct 18, 2012 at 11:19 AM, Rick Foos <rfoos at codeaurora.org>
wrote:> On 10/18/2012 10:36 AM, David Blaikie wrote:
>>
>> On Thu, Oct 18, 2012 at 12:48 AM, Renato Golin<rengolin at
systemcall.org>
>> wrote:
>>>
>>> On 18 October 2012 05:11, Rick Foos<rfoos at codeaurora.org> 
wrote:
>>>>
>>>> I don't think GDB testsuite should block a commit, it can
vary by a few
>>>> tests, they rarely if ever all pass 100%. Tracking the results
over time
>>>> can
>>>> catch big regressions, as well as the ones that slowly increase
the
>>>> failed
>>>> tests.
>>>
>>> Agreed. It should at least serve as comparison between two
branches,
>>> but hopefully, being actively monitored.
>>>
>>> Maybe would be good to add a directory (if there isn't one yet)
to the
>>> testsuite repository, or at least the code necessary to make it run
on
>>> LLVM.
>>
>>
>> The clang-tests repository (
>> http://llvm.org/viewvc/llvm-project/clang-tests/ ) includes an Apple
>> GCC 4.2 compatible version of the GCC and GDB test suites that Apple
>> run internally. I'm working on bringing up an equivalent public
>> buildbot at least for the GDB suite here (
>> http://lab.llvm.org:8011/builders/clang-x86_64-darwin10-gdb-gcc ) -
>> just a few timing out tests I need to look at to get that green.
>> Apparently it's fairly stable.
>>
>> Beyond that I'll be trying to bring up one with the latest suite
(7.4
>> is what I've been playing with) on Linux as well.
>>
> Since you're going to bring a bot up in zorg, I'll stop working on
bringing
> mine testsuite runner forward.
I'm still interested in any details you have about issues you've
resolved/learnt, etc.
> A couple thoughts:
>
> 1) I've been running on the latest test suite, polling once a day. I
think
> Eric and anyone working dwarf 4/5 should be running against the upstream
> testsuite. (I have no problems with running 7.4 too)
Interesting thought. (just so we're all on the same page when you say
"test suite" you're talking about the GDB dejagnu test suite (the
same
one (well, more recent version of it) that's in clang-tests)) Though I
hesitate to have such a moving target, I can see how it could be
useful.
> It's been stable to run at the tip of GDB this way, the test results
aren't
> varying much.
With the right heuristics I suppose this could be valuable, but will
require more work to find the right signal in the (even small) noise.
> 2) A surprise benefit of running this way is that hundreds of obsolete
> tests, or broken tests are getting removed. This hasn't resulted in any
> broken backwards compatibility here at least. Saves tons of time debugging
> tests that don't work, and developing around compatible things that
> reasonable people have decided no longer matter.
Fair point.
> 3) Testsuite runs against two compilers at a time makes it easier to see
> regressions. By comparing against a known stable compiler, or GCC,
> regressions are visible by summary numbers.
I assume GDB runs their own test suite against some version (or the
simultaneous latest) of GCC? If we can't scrape those existing results
we can reproduce them (running the full suite with both GCC & Clang
side-by-side).
> 4) I have plots of the summary numbers online with a window of a month or
> two. The trend allows you to see regressions occurring, and remaining as
> regressions. Sometimes GDB Testsuite or a compiler has a bad day. The trend
> let's you see a stable regression, and when you get a round toit, tells
you
> when the regression started.
Yep, also if we're trying to address all these issues, would be to
prioritize the very stable failures (where Clang fails a test that GCC
passes & does so consistently for a long time) first. Then look at the
unstable ones last - figure out which compiler's to blame, "XFAIL:
clang" them or whatever is necessary.
> <soapbox>
> I've been doing this with Jenkins. It's fairly easy to set up, and
does the
> plotting. Developers can grab a copy of the script to duplicate a run on
> their broken compiler. Running the testsuite under JNLP increased the
number
> of executed tests - don't know why just did.
> </soapbox>
I wouldn't mind seeing your jenkins setup/config/tweaks/etc as a
reference point, if you've got it lying around.

- David

Rick Foos

2012-Oct-18 21:11 UTC

head link

[LLVMdev] DWARF 2/3 backwards compatibility?

On 10/18/2012 01:39 PM, David Blaikie wrote:> On Thu, Oct 18, 2012 at 11:19 AM, Rick Foos<rfoos at codeaurora.org> 
wrote:
>> On 10/18/2012 10:36 AM, David Blaikie wrote:
>>> On Thu, Oct 18, 2012 at 12:48 AM, Renato Golin<rengolin at
systemcall.org>
>>> wrote:
>>>> On 18 October 2012 05:11, Rick Foos<rfoos at
codeaurora.org>   wrote:
>>>>> I don't think GDB testsuite should block a commit, it
can vary by a few
>>>>> tests, they rarely if ever all pass 100%. Tracking the
results over time
>>>>> can
>>>>> catch big regressions, as well as the ones that slowly
increase the
>>>>> failed
>>>>> tests.
>>>> Agreed. It should at least serve as comparison between two
branches,
>>>> but hopefully, being actively monitored.
>>>>
>>>> Maybe would be good to add a directory (if there isn't one
yet) to the
>>>> testsuite repository, or at least the code necessary to make it
run on
>>>> LLVM.
>>>
>>> The clang-tests repository (
>>> http://llvm.org/viewvc/llvm-project/clang-tests/ ) includes an
Apple
>>> GCC 4.2 compatible version of the GCC and GDB test suites that
Apple
>>> run internally. I'm working on bringing up an equivalent public
>>> buildbot at least for the GDB suite here (
>>> http://lab.llvm.org:8011/builders/clang-x86_64-darwin10-gdb-gcc ) -
>>> just a few timing out tests I need to look at to get that green.
>>> Apparently it's fairly stable.
>>>
>>> Beyond that I'll be trying to bring up one with the latest
suite (7.4
>>> is what I've been playing with) on Linux as well.
>>>
>> Since you're going to bring a bot up in zorg, I'll stop working
on bringing
>> mine testsuite runner forward.
> I'm still interested in any details you have about issues you've
> resolved/learnt, etc.
>
>> A couple thoughts:
>>
>> 1) I've been running on the latest test suite, polling once a day.
I think
>> Eric and anyone working dwarf 4/5 should be running against the
upstream
>> testsuite. (I have no problems with running 7.4 too)
> Interesting thought. (just so we're all on the same page when you say
> "test suite" you're talking about the GDB dejagnu test suite
(the same
> one (well, more recent version of it) that's in clang-tests)) Though I
> hesitate to have such a moving target, I can see how it could be
> useful.
>Yes, the sourceware.org site. I hesitated as well, but I tried it and 
it's OK.>> It's been stable to run at the tip of GDB this way, the test
results aren't
>> varying much.
> With the right heuristics I suppose this could be valuable, but will
> require more work to find the right signal in the (even small) noise.
>I wrote a 10 line awk script to create a CSV file out of the test 
summaries to make a one-to-one comparison of tests. It's over 70 lines 
now...It's not a matter that I should have used Python, Everything is an 
exception to the rules. You can get close, but I can't say you can get 
perfect signals.

 From a compiler developer point of view, the spreadsheet was worthless. 
We're not testing GDB, but rather what the compiler feeds to GDB.

Take the log file, check out the suite, rerun a failing test, use 
dwarfdump and llvm-dwarfdump, find the "bad" dwarf records produced by
the compiler.

All the eventual bugs are about dwarf records, and a gdb testsuite test 
to duplicate.

A bad/confused dwarf record fails multiple tests without a way to map a 
failure back to dwarf.

In the end, a fine grained signal doesn't do what you might want.

>> 2) A surprise benefit of running this way is that hundreds of obsolete
>> tests, or broken tests are getting removed. This hasn't resulted in
any
>> broken backwards compatibility here at least. Saves tons of time
debugging
>> tests that don't work, and developing around compatible things that
>> reasonable people have decided no longer matter.
> Fair point.
>
>> 3) Testsuite runs against two compilers at a time makes it easier to
see
>> regressions. By comparing against a known stable compiler, or GCC,
>> regressions are visible by summary numbers.
> I assume GDB runs their own test suite against some version (or the
> simultaneous latest) of GCC? If we can't scrape those existing results
> we can reproduce them (running the full suite with both GCC&  Clang
> side-by-side).
>gdb-testers at sourceware.org has a run every night. Yes, I reproduce. A 
non-x86 target has very different results, so I look for a good gcc 
cross compiler to establish a baseline.

In the case of clang, all the arch's share the Dwarf processing. So an 
x86 run covers a lot more of the dwarf processing than worrying too much 
about a cross compiler run. (But some worry about limiting fixes to 
regressions from a cross-compiled GCC, so have to run that as
well)>> 4) I have plots of the summary numbers online with a window of a month
or
>> two. The trend allows you to see regressions occurring, and remaining
as
>> regressions. Sometimes GDB Testsuite or a compiler has a bad day. The
trend
>> let's you see a stable regression, and when you get a round toit,
tells you
>> when the regression started.
> Yep, also if we're trying to address all these issues, would be to
> prioritize the very stable failures (where Clang fails a test that GCC
> passes&  does so consistently for a long time) first. Then look at the
> unstable ones last - figure out which compiler's to blame, "XFAIL:
> clang" them or whatever is necessary.
>I avoid doing that the XFAIL thing. Plotting all the lines from the 
summary makes more sense, and it's what you see when you run the test 
manually.

When you move a Fail artificially to Xfail, the plot just has a few V's 
in it where the Fail line drops, and the Xfail line goes up. No new 
information.

I prefer leaving the actual summary numbers in place. All the data you 
need is there.

As you might have guessed, I like tests that fail, and want to get rid 
of the ones the pass too often :)
>> <soapbox>
>> I've been doing this with Jenkins. It's fairly easy to set up,
and does the
>> plotting. Developers can grab a copy of the script to duplicate a run
on
>> their broken compiler. Running the testsuite under JNLP increased the
number
>> of executed tests - don't know why just did.
>> </soapbox>
> I wouldn't mind seeing your jenkins setup/config/tweaks/etc as a
> reference point, if you've got it lying around.
>I'll see what I can send, or just as easy to walk through it. Jenkins 
isn't really like buildbot. Do you have Jenkins running there?
> - David

-- 
Rick Foos
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The
Linux Foundation

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Oct 2012 - [LLVMdev] DWARF 2/3 backwards compatibility?

[LLVMdev] DWARF 2/3 backwards compatibility?

[LLVMdev] DWARF 2/3 backwards compatibility?

[LLVMdev] DWARF 2/3 backwards compatibility?

Maybe Matching Threads