On Tue, Oct 11, 2011 at 06:20:17PM -0500, David A. Greene wrote:> This seems excessive and unrealistic. We're never going to come up with > a testsuite that satisfies everyone's needs and doing so could well be > counter-productive. If no one can commit anything unless it passes > every test (including performance) for every target under multiple > option combinations, nothing will ever get committed. Especially if no > one has access to systems to debug on.As I see it, there are regulary commits that introduce performance and code size regressions. There doesn't seem to be any formal testing in place. Not for X86, not for ARM. Hunting down regressions like enable-iv-rewrite=false, which added 130 Bytes to a piece of code that can only be 8KB large in total is painful and slow. From my point of view, the only way to ensure that the compiler does a good job is providing a test infrastructure to monitor this. This is about forcing pre-commit test, it is about ensuring that the testing is done at all in a timely manner. Joerg
The need for ARM hardware can be partially satisfied by using ARM emulators like softgun, QEMU and I think there is an ARM emulator that can be built as part of GDB. Of course the ARM Holdings development system comes with an emulator. You could run several emulator processes on an x86 or x86_64 server that has more RAM than your typical ARM boards do and get a full test run in a short amount of time. Of course this risks that test failures may actually be bugs in the emulators, but I assert those would be useful results. If we can't explain our own test failures as bugs in LLVM then maybe we have a useful bug to report to the emulator developers. Real ARM boards can be had cheap these days. The Raspberry Pi lower-end model will just be $25.00. I own a Gumstix Overo Fire COM that cost me about $200, plus about $200 for a Tobi add-on board for I/O. Gumstix also sells boards for making distributed computers that are connected via 100 MBPS ethernet. Don Quixote -- Don Quixote de la Mancha Dulcinea Technologies Corporation Software of Elegance and Beauty http://www.dulcineatech.com quixote at dulcineatech.com
On Oct 11, 2011, at 4:48 PM, Joerg Sonnenberger wrote:> On Tue, Oct 11, 2011 at 06:20:17PM -0500, David A. Greene wrote: >> This seems excessive and unrealistic. We're never going to come up with >> a testsuite that satisfies everyone's needs and doing so could well be >> counter-productive. If no one can commit anything unless it passes >> every test (including performance) for every target under multiple >> option combinations, nothing will ever get committed. Especially if no >> one has access to systems to debug on. > > As I see it, there are regulary commits that introduce performance and > code size regressions. There doesn't seem to be any formal testing in > place. Not for X86, not for ARM. Hunting down regressions like > enable-iv-rewrite=false, which added 130 Bytes to a piece of code that > can only be 8KB large in total is painful and slow. From my point of > view, the only way to ensure that the compiler does a good job is > providing a test infrastructure to monitor this. This is about forcing > pre-commit test, it is about ensuring that the testing is done at all > in a timely manner.I'm sorry you feel that way. Perhaps you could elaborate on what you want as testing and how to make it work, who would do the work, and how to integrate it in with everyone's disparate desires? -eric
On Oct 11, 2011, at 4:48 PM, Joerg Sonnenberger wrote:> As I see it, there are regulary commits that introduce performance and > code size regressions. There doesn't seem to be any formal testing in > place. Not for X86, not for ARM. Hunting down regressions like > enable-iv-rewrite=false, which added 130 Bytes to a piece of code that > can only be 8KB large in total is painful and slow. From my point of > view, the only way to ensure that the compiler does a good job is > providing a test infrastructure to monitor this. This is about forcing > pre-commit test, it is about ensuring that the testing is done at all > in a timely manner.In a world of multiple developers with conflicting priorities, this simply isn't realistic. I know that those 130 bytes are very important to those concerned with the NetBSD bootloader, but the patch that added them was worth significant performance improvements on important benchmarks (see Jack Howarth's posting for 9/6/11, for instance), which lots of other developers consider an obviously good tradeoff. A policy of "never regress anything" is not tenable, because ANY change in code generation has the possibility to regress something. We end up in a world where either we never make any forward progress, or where developers hoard up trivial improvements they can use to "negate" the regressions caused by real development work. Neither of these is a desirable direction. The existing modus operandi on X86 and other targets has been that there is a core of functionality (what is represented by the LLVM regression tests and test-suite) that all developers implicitly agree to avoid regressing on set of "blessed" configurations. We are deliberately cautious in expanding the range of functionality that cannot be regressed, or on widening the set of configurations (beyond those easily accessible to all developers) on which those regressions must not occur. This allows us to improve quality over time without preventing forward progress. While I do think it would be a good idea to consider expanding the blessed configurations to include some ARM targets, the heterogeneity of ARM targets makes defining a configuration that is easily accessibly to all developers quite difficult. Apple developers obviously care strongly about the processors on which Darwin runs, and those targets are easily the best supported, but other developers can't easily replicate that for pre-commit testing. Blessing a target whose support is "fragile" may create problems down the road if it needs significant, possibly-regression-causing work to improve the target in future. In summary, we can only commit to a no-regressions policy on targets that are already well-supported (unlikely to need drastic breaking work in the future), easily testable by all developers, and on a controlled body of testcases that are universally acceptable. Defining those targets and those testcases is a hard but necessary job to ensure quality while continuing to improve the compiler. Simply freezing code generation as-is is not an acceptable solution. --Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111011/a4fd7b32/attachment.html>
On Tue, Oct 11, 2011 at 05:20:43PM -0700, Owen Anderson wrote:> > On Oct 11, 2011, at 4:48 PM, Joerg Sonnenberger wrote: > > As I see it, there are regulary commits that introduce performance and > > code size regressions. There doesn't seem to be any formal testing in > > place. Not for X86, not for ARM. Hunting down regressions like > > enable-iv-rewrite=false, which added 130 Bytes to a piece of code that > > can only be 8KB large in total is painful and slow. From my point of > > view, the only way to ensure that the compiler does a good job is > > providing a test infrastructure to monitor this. This is about forcing^^^ not> > pre-commit test, it is about ensuring that the testing is done at all > > in a timely manner. > > In a world of multiple developers with conflicting priorities, this > simply isn't realistic. I know that those 130 bytes are very important > to those concerned with the NetBSD bootloader, but the patch that added > them was worth significant performance improvements on important > benchmarks (see Jack Howarth's posting for 9/6/11, for instance), which > lots of other developers consider an obviously good tradeoff.Don't get me wrong, my problem is not the patch by itself. LLVM at the moment is relatively bad at creating compact code on x86. I'm not sure what the status is on ARM for that, but there are use cases where it matters a lot. Boot loaders are one of them. So disabling some optimisations when using -Os or -Oz is fine. The bigger issue is that accepting a size/performance trade off here and another one there and yet another trade off in that corner sums up. It can get to the point any of the trade offs by itself is fine, but the total result goes over the CPU instruction cache and completely kills performance. More importantly, it will happen with completely harmless looking changes at some point.> A policy of "never regress anything" is not tenable, because ANY change > in code generation has the possibility to regress something. We end up > in a world where either we never make any forward progress, or where > developers hoard up trivial improvements they can use to "negate" the > regressions caused by real development work. Neither of these is a > desirable direction.This is not what I was asking for. For GCC there are not only build bots and functional regression tests, but also regular runs of benchmarks like SPEC etc. Consider it a call for the community to identify useful real-world test cases to measure: (1) Changes in the performance of compiled code, both with and without LTO. (2) Changes in the size of compiled code, both with and without explicitly optimising for it. (3) Changes in compilation time. I know that for many bigger changes at least (1) and (3) are often checked. This is about doing a general testing over a long time. When a regression on one of the metrics occur, it can be evaluated. But that's a separate discussion, e.g. whether to disable an optimisation for -Os/-Oz or move it to a higher optimiser level etc.> The existing modus operandi on X86 and other targets has been that > there is a core of functionality (what is represented by the LLVM > regression tests and test-suite) that all developers implicitly agree > to avoid regressing on set of "blessed" configurations. We are > deliberately cautious in expanding the range of functionality that > cannot be regressed, or on widening the set of configurations (beyond > those easily accessible to all developers) on which those regressions > must not occur. This allows us to improve quality over time without > preventing forward progress.As I see it, the current regression test suite is aimed at preventing bad compilation. It's not that useful to handle the other cases above. Of course, checking for compile or runtime regressions is a lot harder to do as they require a reproducable environment. So my request can't replace the existing tests and it isn't meant to. I hope I made myself a bit clearer. Joerg
On Oct 11, 2011, at 4:48 PM, Joerg Sonnenberger wrote:> On Tue, Oct 11, 2011 at 06:20:17PM -0500, David A. Greene wrote: >> This seems excessive and unrealistic. We're never going to come up with >> a testsuite that satisfies everyone's needs and doing so could well be >> counter-productive. If no one can commit anything unless it passes >> every test (including performance) for every target under multiple >> option combinations, nothing will ever get committed. Especially if no >> one has access to systems to debug on. > > As I see it, there are regulary commits that introduce performance and > code size regressions. There doesn't seem to be any formal testing in > place. Not for X86, not for ARM. Hunting down regressions like > enable-iv-rewrite=false, which added 130 Bytes to a piece of code that > can only be 8KB large in total is painful and slow. From my point of > view, the only way to ensure that the compiler does a good job is > providing a test infrastructure to monitor this. This is about forcing > pre-commit test, it is about ensuring that the testing is done at all > in a timely manner.The change you refer to was not intended to improve performance at the expense of code size. Has this been fixed on trunk, or did you discover the workaround and move on without filing a bug? Thanks, -Andy
On Thu, Oct 13, 2011 at 03:56:55PM -0700, Andrew Trick wrote:> On Oct 11, 2011, at 4:48 PM, Joerg Sonnenberger wrote: > > > On Tue, Oct 11, 2011 at 06:20:17PM -0500, David A. Greene wrote: > >> This seems excessive and unrealistic. We're never going to come up with > >> a testsuite that satisfies everyone's needs and doing so could well be > >> counter-productive. If no one can commit anything unless it passes > >> every test (including performance) for every target under multiple > >> option combinations, nothing will ever get committed. Especially if no > >> one has access to systems to debug on. > > > > As I see it, there are regulary commits that introduce performance and > > code size regressions. There doesn't seem to be any formal testing in > > place. Not for X86, not for ARM. Hunting down regressions like > > enable-iv-rewrite=false, which added 130 Bytes to a piece of code that > > can only be 8KB large in total is painful and slow. From my point of > > view, the only way to ensure that the compiler does a good job is > > providing a test infrastructure to monitor this. This is about forcing > > pre-commit test, it is about ensuring that the testing is done at all > > in a timely manner. > > > The change you refer to was not intended to improve performance at the expense of code size. > Has this been fixed on trunk, or did you discover the workaround and move on without filing a bug?I haven't filled a bug yet, since I haven't had time to produce a proper test case. I have disabled the option for now explicitly. Joerg