I think we need to think along two dimensions - Breadth of testing and depth of testing 1. Breadth: What the best supported ARM ISA versions in LLVM ARM? Say its armv6 and armv7; We need to - regression test ARM mode, Thumb-2 and Thumb-1 mode (armv6) - Performance/code-size test ARM mode, Thumb-2 and Thumb-1 modes We need to agree on an optimization level for regression as well as performance (such as -O3 for performance and -Os for code-size) 2. Depth: (a) Adding more regression tests: Every new commit comes with a set of tests, but these are just regression tests. We need global access to validation suites; unfortunately most validation suites are commercial and their licensing prohibits even proxy public use. What about leveraging some other open source test suites? (b) Adding more performance tests: We need to identify performance and code-size regressions before committing. Currently there are wrappers for SPEC. What other performance/code-size suites can we get? Should there be guidelines for performance reporting on SPEC and/or other suites? A lot of users depend on LLVM ARM performance/codesize remaining stable or getting better, so any degradation will trigger extra work for all consumers. 3. Reporting: We need a more formal reporting process of validation done for a commit. Currently, the validation process for ARM is same as x86 (just run the tests and make sure they pass). We need to expand reporting to include breadth and depth above to ensure reduced work for community in tracking down regressions. Of course, all this is going to increase the threshold for committing. Either the committer pays early by running all these tests or the community pays late by fixing them. The risk of paying late is also an unstable LLVM tip for ARM. Availability of ARM hardware is certainly an issue. I can make available ARM hardware to run regressions through buildbot (or some other bot mechanism), but making login access available to ARM hardware (for debugging) raises firewall and security issues. I would like to hear community's thoughts on these. Thanks --Raja> -----Original Message----- > From: David A. Greene [mailto:greened at obbligato.org] > Sent: Tuesday, October 11, 2011 2:43 PM > To: Bill Wendling > Cc: rajav at codeaurora.org; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] ARM Qualification > > Bill Wendling <wendling at apple.com> writes: > > > Improving the test suite is always welcome. > > Do we have an idea of what sorts of improvements we'd like? Any codes > that we want to add, for example? What would be useful for ARM? > > > In addition, we send out pre-release tarballs and have people in the > > community build and test their programs with it. This is not a perfect > > system, but it's one which works for us given the number of testers > > available, the amount of time and resources they have, and whatever > > fixes need to be merged into the release. > > > > ARM qualification is a bit trickier, because of the different specific > > chips out there, different OSes, and having to verify ARM, Thumb1, and > > Thumb2 for the same configurations. And the tests tend to run a bit > > slower than, say, an x86 chip. So it's mostly a matter of time and > > resources. Unless we can get people who are willing to perform these > > tests, we won't be able to release ARM as an official supported > > platform. > > Resources isn't the only problem. I've asked several times about adding > my personal machines to the testing pool but I never get a reply. So > there they sit idle every day, processing the occasional e-mail when > they could be chewing LLVM tests. > > It is in fact highly in my own interest to get them running. I just > need to be pointed to the document that tells me what the buildbot > master expects to see and defines a procedure for adding them as slaves. > > One thing that could help these situations is virtualization. > > I've toyed with the idea of setting up various virtual machines to test > various OS/architecture combinations. With QEMU I can imagine testing > various ISAs as well. > > If there are any ARM full-system simulators we could use those as well. > I'd be happy to run them. > > -Dave
Hello Raja,> 1. Breadth: What the best supported ARM ISA versions in LLVM ARM? Say its > armv6 and armv7; We need to > - regression test ARM mode, Thumb-2 and Thumb-1 mode (armv6) > - Performance/code-size test ARM mode, Thumb-2 and Thumb-1 modesYou forget about different platforms, e.g. arm/linux vs arm/darwin. Also, even for ARMv7 we can get different results since we have different schedulers for Cortex-A8 and Cortex-A9 :)> 2. Depth: > (a) Adding more regression tests: Every new commit comes with a set of > tests, but these are just regression tests. We need global access to > validation suites; unfortunately most validation suites are commercial and > their licensing prohibits even proxy public use. What about leveraging some > other open source test suites?LLVM has its own test-suite which should be enough for first step. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
"Raja Venkateswaran" <rajav at codeaurora.org> writes:> I think we need to think along two dimensions - Breadth of testing and depth > of testing > > 1. Breadth: What the best supported ARM ISA versions in LLVM ARM? Say its > armv6 and armv7; We need to > - regression test ARM mode, Thumb-2 and Thumb-1 mode (armv6) > - Performance/code-size test ARM mode, Thumb-2 and Thumb-1 modes > > We need to agree on an optimization level for regression as well as > performance (such as -O3 for performance and -Os for code-size) > > 2. Depth: > (a) Adding more regression tests: Every new commit comes with a set of > tests, but these are just regression tests. We need global access to > validation suites; unfortunately most validation suites are commercial and > their licensing prohibits even proxy public use. What about leveraging some > other open source test suites? > (b) Adding more performance tests: We need to identify performance and > code-size regressions before committing. Currently there are wrappers for > SPEC. What other performance/code-size suites can we get? Should there be > guidelines for performance reporting on SPEC and/or other suites? A lot of > users depend on LLVM ARM performance/codesize remaining stable or getting > better, so any degradation will trigger extra work for all consumers.This seems excessive and unrealistic. We're never going to come up with a testsuite that satisfies everyone's needs and doing so could well be counter-productive. If no one can commit anything unless it passes every test (including performance) for every target under multiple option combinations, nothing will ever get committed. Especially if no one has access to systems to debug on. I think it's reasonable for the LLVM community to expect that LLVM users who have such rigorous testing needs develop their own systems. Testing is an extremely costly process in terms of dollars, work hours and equipment expenditures. There's no way the LLVM community can support such things. We have our own test suites with tens of thousands of tests that gets run every night. When we find a problem in LLVM, we fix it and report it upstream when feasible. If we don't report it upstream or commit the fix, we have implicitly accepted responsibility for maintaining the fix. This has worked well for us for years and I don't see any need to push that cost and responsibility onto the community. -Dave
On Tue, Oct 11, 2011 at 06:20:17PM -0500, David A. Greene wrote:> This seems excessive and unrealistic. We're never going to come up with > a testsuite that satisfies everyone's needs and doing so could well be > counter-productive. If no one can commit anything unless it passes > every test (including performance) for every target under multiple > option combinations, nothing will ever get committed. Especially if no > one has access to systems to debug on.As I see it, there are regulary commits that introduce performance and code size regressions. There doesn't seem to be any formal testing in place. Not for X86, not for ARM. Hunting down regressions like enable-iv-rewrite=false, which added 130 Bytes to a piece of code that can only be 8KB large in total is painful and slow. From my point of view, the only way to ensure that the compiler does a good job is providing a test infrastructure to monitor this. This is about forcing pre-commit test, it is about ensuring that the testing is done at all in a timely manner. Joerg