thr3ads.net - llvm dev - [LLVMdev] DWARF 2/3 backwards compatibility? [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Rick Foos

2012-Oct-18 21:11 UTC

[LLVMdev] DWARF 2/3 backwards compatibility?

On 10/18/2012 01:39 PM, David Blaikie wrote:> On Thu, Oct 18, 2012 at 11:19 AM, Rick Foos<rfoos at codeaurora.org> 
wrote:
>> On 10/18/2012 10:36 AM, David Blaikie wrote:
>>> On Thu, Oct 18, 2012 at 12:48 AM, Renato Golin<rengolin at
systemcall.org>
>>> wrote:
>>>> On 18 October 2012 05:11, Rick Foos<rfoos at
codeaurora.org>   wrote:
>>>>> I don't think GDB testsuite should block a commit, it
can vary by a few
>>>>> tests, they rarely if ever all pass 100%. Tracking the
results over time
>>>>> can
>>>>> catch big regressions, as well as the ones that slowly
increase the
>>>>> failed
>>>>> tests.
>>>> Agreed. It should at least serve as comparison between two
branches,
>>>> but hopefully, being actively monitored.
>>>>
>>>> Maybe would be good to add a directory (if there isn't one
yet) to the
>>>> testsuite repository, or at least the code necessary to make it
run on
>>>> LLVM.
>>>
>>> The clang-tests repository (
>>> http://llvm.org/viewvc/llvm-project/clang-tests/ ) includes an
Apple
>>> GCC 4.2 compatible version of the GCC and GDB test suites that
Apple
>>> run internally. I'm working on bringing up an equivalent public
>>> buildbot at least for the GDB suite here (
>>> http://lab.llvm.org:8011/builders/clang-x86_64-darwin10-gdb-gcc ) -
>>> just a few timing out tests I need to look at to get that green.
>>> Apparently it's fairly stable.
>>>
>>> Beyond that I'll be trying to bring up one with the latest
suite (7.4
>>> is what I've been playing with) on Linux as well.
>>>
>> Since you're going to bring a bot up in zorg, I'll stop working
on bringing
>> mine testsuite runner forward.
> I'm still interested in any details you have about issues you've
> resolved/learnt, etc.
>
>> A couple thoughts:
>>
>> 1) I've been running on the latest test suite, polling once a day.
I think
>> Eric and anyone working dwarf 4/5 should be running against the
upstream
>> testsuite. (I have no problems with running 7.4 too)
> Interesting thought. (just so we're all on the same page when you say
> "test suite" you're talking about the GDB dejagnu test suite
(the same
> one (well, more recent version of it) that's in clang-tests)) Though I
> hesitate to have such a moving target, I can see how it could be
> useful.
>Yes, the sourceware.org site. I hesitated as well, but I tried it and 
it's OK.>> It's been stable to run at the tip of GDB this way, the test
results aren't
>> varying much.
> With the right heuristics I suppose this could be valuable, but will
> require more work to find the right signal in the (even small) noise.
>I wrote a 10 line awk script to create a CSV file out of the test 
summaries to make a one-to-one comparison of tests. It's over 70 lines 
now...It's not a matter that I should have used Python, Everything is an 
exception to the rules. You can get close, but I can't say you can get 
perfect signals.

 From a compiler developer point of view, the spreadsheet was worthless. 
We're not testing GDB, but rather what the compiler feeds to GDB.

Take the log file, check out the suite, rerun a failing test, use 
dwarfdump and llvm-dwarfdump, find the "bad" dwarf records produced by
the compiler.

All the eventual bugs are about dwarf records, and a gdb testsuite test 
to duplicate.

A bad/confused dwarf record fails multiple tests without a way to map a 
failure back to dwarf.

In the end, a fine grained signal doesn't do what you might want.

>> 2) A surprise benefit of running this way is that hundreds of obsolete
>> tests, or broken tests are getting removed. This hasn't resulted in
any
>> broken backwards compatibility here at least. Saves tons of time
debugging
>> tests that don't work, and developing around compatible things that
>> reasonable people have decided no longer matter.
> Fair point.
>
>> 3) Testsuite runs against two compilers at a time makes it easier to
see
>> regressions. By comparing against a known stable compiler, or GCC,
>> regressions are visible by summary numbers.
> I assume GDB runs their own test suite against some version (or the
> simultaneous latest) of GCC? If we can't scrape those existing results
> we can reproduce them (running the full suite with both GCC&  Clang
> side-by-side).
>gdb-testers at sourceware.org has a run every night. Yes, I reproduce. A 
non-x86 target has very different results, so I look for a good gcc 
cross compiler to establish a baseline.

In the case of clang, all the arch's share the Dwarf processing. So an 
x86 run covers a lot more of the dwarf processing than worrying too much 
about a cross compiler run. (But some worry about limiting fixes to 
regressions from a cross-compiled GCC, so have to run that as
well)>> 4) I have plots of the summary numbers online with a window of a month
or
>> two. The trend allows you to see regressions occurring, and remaining
as
>> regressions. Sometimes GDB Testsuite or a compiler has a bad day. The
trend
>> let's you see a stable regression, and when you get a round toit,
tells you
>> when the regression started.
> Yep, also if we're trying to address all these issues, would be to
> prioritize the very stable failures (where Clang fails a test that GCC
> passes&  does so consistently for a long time) first. Then look at the
> unstable ones last - figure out which compiler's to blame, "XFAIL:
> clang" them or whatever is necessary.
>I avoid doing that the XFAIL thing. Plotting all the lines from the 
summary makes more sense, and it's what you see when you run the test 
manually.

When you move a Fail artificially to Xfail, the plot just has a few V's 
in it where the Fail line drops, and the Xfail line goes up. No new 
information.

I prefer leaving the actual summary numbers in place. All the data you 
need is there.

As you might have guessed, I like tests that fail, and want to get rid 
of the ones the pass too often :)
>> <soapbox>
>> I've been doing this with Jenkins. It's fairly easy to set up,
and does the
>> plotting. Developers can grab a copy of the script to duplicate a run
on
>> their broken compiler. Running the testsuite under JNLP increased the
number
>> of executed tests - don't know why just did.
>> </soapbox>
> I wouldn't mind seeing your jenkins setup/config/tweaks/etc as a
> reference point, if you've got it lying around.
>I'll see what I can send, or just as easy to walk through it. Jenkins 
isn't really like buildbot. Do you have Jenkins running there?
> - David

-- 
Rick Foos
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The
Linux Foundation

Eric Christopher

2012-Oct-18 21:23 UTC

head link

[LLVMdev] DWARF 2/3 backwards compatibility?

>
> Take the log file, check out the suite, rerun a failing test, use dwarfdump
> and llvm-dwarfdump, find the "bad" dwarf records produced by the
compiler.
>
> All the eventual bugs are about dwarf records, and a gdb testsuite test to
> duplicate.
>
That's not necessarily the case.
> A bad/confused dwarf record fails multiple tests without a way to map a
> failure back to dwarf.
>
I'm not sure what you mean here.

> In the case of clang, all the arch's share the Dwarf processing. So an
x86
> run covers a lot more of the dwarf processing than worrying too much about
a
> cross compiler run. (But some worry about limiting fixes to regressions
from
> a cross-compiled GCC, so have to run that as well)
For what it's worth you're going to miss some aspects since location
information is going to be dependent upon how various instructions are
lowered and could lead to missing/different location information than
would be correct.
> I avoid doing that the XFAIL thing. Plotting all the lines from the summary
> makes more sense, and it's what you see when you run the test manually.
>
> When you move a Fail artificially to Xfail, the plot just has a few V's
in
> it where the Fail line drops, and the Xfail line goes up. No new
> information.
>
xfailing tests or some such is a good starting way of detecting
regressions which is one of the advantages of a buildbot like system -
to give quick feedback on changes that make things start failing.
Ideally, of course, we'd like to have a testsuite full of only passing
tests and any failure is an immediate problem as well. As we move
forward on debug information though a good first step is to make sure
with no new regressions as we do work and see the trend line of
failures going down over time.

-eric

Robinson, Paul

2012-Oct-18 21:58 UTC

head link

[LLVMdev] DWARF 2/3 backwards compatibility?

> > Since GDB already has a good and standard test infrastructure,
it'd
> > likely get a good chunk of bad Dwarf our of the way before you start
> > worrying about Lauterbach's specifics.
> The gdb testsuite is pretty good as a "what's expected" set
of tests,
> however, one thing to keep in mind is that a lot of the checks aren't
> particularly fuzzy. I.e. it checks what's expected but it's not
> necessarily valid dwarf that it's looking for but a particular
> behavior.
And that behavior is not necessarily related to the DWARF.  My Ubuntu
workstation consistently gets flakey results from the multi-threaded
GDB tests, which have exactly zero relation to what's in the DWARF.

(consistently flakey... ooh, dear.)
> > A bad/confused dwarf record fails multiple tests without a way to map
a
> > failure back to dwarf.
> >
>
> I'm not sure what you mean here.
I don't know what Rick has experienced, but I've seen that a DWARF
problem can lead to an arbitrary number of GDB failures, none of which
necessarily make it obvious what the problem is with the DWARF.
For example, when tweaking the placement of the end_prologue flag
in the line table made the "list" command break.  Eh???  In that case
at
least I knew what I'd tweaked; coming on it cold, there's no way to know
what's going on.

--paulr

Rick Foos

2012-Oct-18 22:09 UTC

head link

[LLVMdev] DWARF 2/3 backwards compatibility?

On 10/18/2012 04:23 PM, Eric Christopher wrote:>> Take the log file, check out the suite, rerun a failing test, use
dwarfdump
>> and llvm-dwarfdump, find the "bad" dwarf records produced by
the compiler.
>>
>> All the eventual bugs are about dwarf records, and a gdb testsuite test
to
>> duplicate.
>>
> That's not necessarily the case.
>
Fair enough. Bugs against dwarf records were what we ended up with here 
after working it down to what was to be fixed in the compiler.
>> A bad/confused dwarf record fails multiple tests without a way to map a
>> failure back to dwarf.
>>
> I'm not sure what you mean here.
>Building on a fine grained signal, a bug or two in GDB Testsuite, might 
not be the most important measure.
>> In the case of clang, all the arch's share the Dwarf processing. So
an x86
>> run covers a lot more of the dwarf processing than worrying too much
about a
>> cross compiler run. (But some worry about limiting fixes to regressions
from
>> a cross-compiled GCC, so have to run that as well)
> For what it's worth you're going to miss some aspects since
location
> information is going to be dependent upon how various instructions are
> lowered and could lead to missing/different location information than
> would be correct.
>
Agreed
>> I avoid doing that the XFAIL thing. Plotting all the lines from the
summary
>> makes more sense, and it's what you see when you run the test
manually.
>>
>> When you move a Fail artificially to Xfail, the plot just has a few
V's in
>> it where the Fail line drops, and the Xfail line goes up. No new
>> information.
>>
> xfailing tests or some such is a good starting way of detecting
> regressions which is one of the advantages of a buildbot like system -
> to give quick feedback on changes that make things start failing.
> Ideally, of course, we'd like to have a testsuite full of only passing
> tests and any failure is an immediate problem as well. As we move
> forward on debug information though a good first step is to make sure
> with no new regressions as we do work and see the trend line of
> failures going down over time.
>
OK, that may work. Running GDB Testsuite, I often see a few failures 
that go away the next day.

My concern that an absolute XFAIL measure would slow down the rate of 
development changes. If that becomes a problem it's an easy adjustment 
to fix.

Trend lines are just another way to do the same thing, and yes that is 
easier in Jenkins than buildbot :)
> -eric

-- 
Rick Foos
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The
Linux Foundation

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Oct 2012 - [LLVMdev] DWARF 2/3 backwards compatibility?

[LLVMdev] DWARF 2/3 backwards compatibility?

[LLVMdev] DWARF 2/3 backwards compatibility?

[LLVMdev] DWARF 2/3 backwards compatibility?

[LLVMdev] DWARF 2/3 backwards compatibility?

Seemingly Similar Threads