thr3ads.net - llvm dev - [LLVMdev] ARM Qualification [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Joerg Sonnenberger

2011-Oct-11 23:48 UTC

[LLVMdev] ARM Qualification

On Tue, Oct 11, 2011 at 06:20:17PM -0500, David A. Greene
wrote:> This seems excessive and unrealistic.  We're never going to come up
with
> a testsuite that satisfies everyone's needs and doing so could well be
> counter-productive.  If no one can commit anything unless it passes
> every test (including performance) for every target under multiple
> option combinations, nothing will ever get committed.  Especially if no
> one has access to systems to debug on.
As I see it, there are regulary commits that introduce performance and
code size regressions. There doesn't seem to be any formal testing in
place. Not for X86, not for ARM. Hunting down regressions like
enable-iv-rewrite=false, which added 130 Bytes to a piece of code that
can only be 8KB large in total is painful and slow. From my point of
view, the only way to ensure that the compiler does a good job is
providing a test infrastructure to monitor this. This is about forcing
pre-commit test, it is about ensuring that the testing is done at all
in a timely manner.

Joerg

Don Quixote de la Mancha

2011-Oct-12 00:04 UTC

head link

[LLVMdev] ARM Qualification

The need for ARM hardware can be partially satisfied by using ARM
emulators like softgun, QEMU and I think there is an ARM emulator that
can be built as part of GDB.  Of course the ARM Holdings development
system comes with an emulator.

You could run several emulator processes on an x86 or x86_64 server
that has more RAM than your typical ARM boards do and get a full test
run in a short amount of time.

Of course this risks that test failures may actually be bugs in the
emulators, but I assert those would be useful results.  If we can't
explain our own test failures as bugs in LLVM then maybe we have a
useful bug to report to the emulator developers.

Real ARM boards can be had cheap these days.  The Raspberry Pi
lower-end model will just be $25.00.  I own a Gumstix Overo Fire COM
that cost me about $200, plus about $200 for a Tobi add-on board for
I/O.  Gumstix also sells boards for making distributed computers that
are connected via 100 MBPS ethernet.

Don Quixote
-- 
Don Quixote de la Mancha
Dulcinea Technologies Corporation
Software of Elegance and Beauty
http://www.dulcineatech.com
quixote at dulcineatech.com

Eric Christopher

2011-Oct-12 00:13 UTC

head link

[LLVMdev] ARM Qualification

On Oct 11, 2011, at 4:48 PM, Joerg Sonnenberger wrote:
> On Tue, Oct 11, 2011 at 06:20:17PM -0500, David A. Greene wrote:
>> This seems excessive and unrealistic.  We're never going to come up
with
>> a testsuite that satisfies everyone's needs and doing so could well
be
>> counter-productive.  If no one can commit anything unless it passes
>> every test (including performance) for every target under multiple
>> option combinations, nothing will ever get committed.  Especially if no
>> one has access to systems to debug on.
> 
> As I see it, there are regulary commits that introduce performance and
> code size regressions. There doesn't seem to be any formal testing in
> place. Not for X86, not for ARM. Hunting down regressions like
> enable-iv-rewrite=false, which added 130 Bytes to a piece of code that
> can only be 8KB large in total is painful and slow. From my point of
> view, the only way to ensure that the compiler does a good job is
> providing a test infrastructure to monitor this. This is about forcing
> pre-commit test, it is about ensuring that the testing is done at all
> in a timely manner.
I'm sorry you feel that way. Perhaps you could elaborate on what
you want as testing and how to make it work, who would do the work, 
and how to integrate it in with everyone's disparate desires?

-eric

Owen Anderson

2011-Oct-12 00:20 UTC

head link

[LLVMdev] ARM Qualification

On Oct 11, 2011, at 4:48 PM, Joerg Sonnenberger wrote:> As I see it, there are regulary commits that introduce performance and
> code size regressions. There doesn't seem to be any formal testing in
> place. Not for X86, not for ARM. Hunting down regressions like
> enable-iv-rewrite=false, which added 130 Bytes to a piece of code that
> can only be 8KB large in total is painful and slow. From my point of
> view, the only way to ensure that the compiler does a good job is
> providing a test infrastructure to monitor this. This is about forcing
> pre-commit test, it is about ensuring that the testing is done at all
> in a timely manner.
In a world of multiple developers with conflicting priorities, this simply
isn't realistic.  I know that those 130 bytes are very important to those
concerned with the NetBSD bootloader, but the patch that added them was worth
significant performance improvements on important benchmarks (see Jack
Howarth's posting for 9/6/11, for instance), which lots of other developers
consider an obviously good tradeoff.

A policy of "never regress anything" is not tenable, because ANY
change in code generation has the possibility to regress something.  We end up
in a world where either we never make any forward progress, or where developers
hoard up trivial improvements they can use to "negate" the regressions
caused by real development work.  Neither of these is a desirable direction.

The existing modus operandi on X86 and other targets has been that there is a
core of functionality (what is represented by the LLVM regression tests and
test-suite) that all developers implicitly agree to avoid regressing on set of
"blessed" configurations.  We are deliberately cautious in expanding
the range of functionality that cannot be regressed, or on widening the set of
configurations (beyond those easily accessible to all developers) on which those
regressions must not occur.  This allows us to improve quality over time without
preventing forward progress.

While I do think it would be a good idea to consider expanding the blessed
configurations to include some ARM targets, the heterogeneity of ARM targets
makes defining a configuration that is easily accessibly to all developers quite
difficult.  Apple developers obviously care strongly about the processors on
which Darwin runs, and those targets are easily the best supported, but other
developers can't easily replicate that for pre-commit testing.  Blessing a
target whose support is "fragile" may create problems down the road if
it needs significant, possibly-regression-causing work to improve the target in
future.

In summary, we can only commit to a no-regressions policy on targets that are
already well-supported (unlikely to need drastic breaking work in the future),
easily testable by all developers, and on a controlled body of testcases that
are universally acceptable.  Defining those targets and those testcases is a
hard but necessary job to ensure quality while continuing to improve the
compiler.  Simply freezing code generation as-is is not an acceptable solution.

--Owen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111011/a4fd7b32/attachment.html>

Joerg Sonnenberger

2011-Oct-12 13:03 UTC

head link

[LLVMdev] ARM Qualification

On Tue, Oct 11, 2011 at 05:20:43PM -0700, Owen Anderson
wrote:> 
> On Oct 11, 2011, at 4:48 PM, Joerg Sonnenberger wrote:
> > As I see it, there are regulary commits that introduce performance and
> > code size regressions. There doesn't seem to be any formal testing
in
> > place. Not for X86, not for ARM. Hunting down regressions like
> > enable-iv-rewrite=false, which added 130 Bytes to a piece of code that
> > can only be 8KB large in total is painful and slow. From my point of
> > view, the only way to ensure that the compiler does a good job is
> > providing a test infrastructure to monitor this. This is about forcing
                                                             ^^^ not
> > pre-commit test, it is about ensuring that the testing is done at all
> > in a timely manner.
> 
> In a world of multiple developers with conflicting priorities, this
> simply isn't realistic.  I know that those 130 bytes are very important
> to those concerned with the NetBSD bootloader, but the patch that added
> them was worth significant performance improvements on important
> benchmarks (see Jack Howarth's posting for 9/6/11, for instance), which
> lots of other developers consider an obviously good tradeoff.
Don't get me wrong, my problem is not the patch by itself. LLVM at the
moment is relatively bad at creating compact code on x86. I'm not sure
what the status is on ARM for that, but there are use cases where it
matters a lot. Boot loaders are one of them. So disabling some
optimisations when using -Os or -Oz is fine.

The bigger issue is that accepting a size/performance trade off here and
another one there and yet another trade off in that corner sums up. It
can get to the point any of the trade offs by itself is fine, but the
total result goes over the CPU instruction cache and completely kills
performance. More importantly, it will happen with completely harmless
looking changes at some point.
> A policy of "never regress anything" is not tenable, because ANY
change
> in code generation has the possibility to regress something.  We end up
> in a world where either we never make any forward progress, or where
> developers hoard up trivial improvements they can use to "negate"
the
> regressions caused by real development work.  Neither of these is a
> desirable direction. 
This is not what I was asking for. For GCC there are not only build bots
and functional regression tests, but also regular runs of benchmarks
like SPEC etc. Consider it a call for the community to identify useful
real-world test cases to measure:

(1) Changes in the performance of compiled code, both with and without
LTO.

(2) Changes in the size of compiled code, both with and without
explicitly optimising for it.

(3) Changes in compilation time.

I know that for many bigger changes at least (1) and (3) are often
checked. This is about doing a general testing over a long time. When a
regression on one of the metrics occur, it can be evaluated. But that's
a separate discussion, e.g. whether to disable an optimisation for
-Os/-Oz or move it to a higher optimiser level etc.
> The existing modus operandi on X86 and other targets has been that
> there is a core of functionality (what is represented by the LLVM
> regression tests and test-suite) that all developers implicitly agree
> to avoid regressing on set of "blessed" configurations.  We are
> deliberately cautious in expanding the range of functionality that
> cannot be regressed, or on widening the set of configurations (beyond
> those easily accessible to all developers) on which those regressions
> must not occur.  This allows us to improve quality over time without
> preventing forward progress.
As I see it, the current regression test suite is aimed at preventing
bad compilation. It's not that useful to handle the other cases above.
Of course, checking for compile or runtime regressions is a lot harder
to do as they require a reproducable environment. So my request can't
replace the existing tests and it isn't meant to.

I hope I made myself a bit clearer.

Joerg

Andrew Trick

2011-Oct-13 22:56 UTC

head link

[LLVMdev] ARM Qualification

On Oct 11, 2011, at 4:48 PM, Joerg Sonnenberger wrote:
> On Tue, Oct 11, 2011 at 06:20:17PM -0500, David A. Greene wrote:
>> This seems excessive and unrealistic.  We're never going to come up
with
>> a testsuite that satisfies everyone's needs and doing so could well
be
>> counter-productive.  If no one can commit anything unless it passes
>> every test (including performance) for every target under multiple
>> option combinations, nothing will ever get committed.  Especially if no
>> one has access to systems to debug on.
> 
> As I see it, there are regulary commits that introduce performance and
> code size regressions. There doesn't seem to be any formal testing in
> place. Not for X86, not for ARM. Hunting down regressions like
> enable-iv-rewrite=false, which added 130 Bytes to a piece of code that
> can only be 8KB large in total is painful and slow. From my point of
> view, the only way to ensure that the compiler does a good job is
> providing a test infrastructure to monitor this. This is about forcing
> pre-commit test, it is about ensuring that the testing is done at all
> in a timely manner.

The change you refer to was not intended to improve performance at the expense
of code size.
Has this been fixed on trunk, or did you discover the workaround and move on
without filing a bug?

Thanks,
-Andy

Joerg Sonnenberger

2011-Oct-13 23:16 UTC

head link

[LLVMdev] ARM Qualification

On Thu, Oct 13, 2011 at 03:56:55PM -0700, Andrew Trick
wrote:> On Oct 11, 2011, at 4:48 PM, Joerg Sonnenberger wrote:
> 
> > On Tue, Oct 11, 2011 at 06:20:17PM -0500, David A. Greene wrote:
> >> This seems excessive and unrealistic.  We're never going to
come up with
> >> a testsuite that satisfies everyone's needs and doing so could
well be
> >> counter-productive.  If no one can commit anything unless it
passes
> >> every test (including performance) for every target under multiple
> >> option combinations, nothing will ever get committed.  Especially
if no
> >> one has access to systems to debug on.
> > 
> > As I see it, there are regulary commits that introduce performance and
> > code size regressions. There doesn't seem to be any formal testing
in
> > place. Not for X86, not for ARM. Hunting down regressions like
> > enable-iv-rewrite=false, which added 130 Bytes to a piece of code that
> > can only be 8KB large in total is painful and slow. From my point of
> > view, the only way to ensure that the compiler does a good job is
> > providing a test infrastructure to monitor this. This is about forcing
> > pre-commit test, it is about ensuring that the testing is done at all
> > in a timely manner.
> 
> 
> The change you refer to was not intended to improve performance at the
expense of code size.
> Has this been fixed on trunk, or did you discover the workaround and move
on without filing a bug?
I haven't filled a bug yet, since I haven't had time to produce a proper
test case. I have disabled the option for now explicitly.

Joerg

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Oct 2011 - [LLVMdev] ARM Qualification

[LLVMdev] ARM Qualification

[LLVMdev] ARM Qualification

[LLVMdev] ARM Qualification

[LLVMdev] ARM Qualification

[LLVMdev] ARM Qualification

[LLVMdev] ARM Qualification

[LLVMdev] ARM Qualification

Maybe Matching Threads