Matthijs Kooijman
2008-May-21 15:09 UTC
[LLVMdev] Using the test suite to benchmark patches
Hi, just a quick email. I've been working on a patch to simplifycfg last week and want to test its performance. I've ran the test-suite succesfully, both with the patched and unpatched versions. However, I could find no easy way to compare both results. I see that the web pages of the nightly tester provide nice results (changes compared to the day before, together with percentages and colors etc). Something like that should be supported for two local test runs as well, but I couldn't find how. I did a bit of hacking on the HTMLColDiff.pl script that I found lying around, the (rough) patch is attached. Is this script a quick hackup that got forgotten, or is it still used by people for a purpose that I don't see right now? Any thoughts or suggestions on how to do this testing in a structured manner? In particular, it would be useful to have some means of running a test a few times and taking mean values for comparison or something... Gr. Matthijs -------------- next part -------------- A non-text attachment was scrubbed... Name: coldiff.diff Type: text/x-diff Size: 2950 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080521/168cc2dd/attachment.diff> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080521/168cc2dd/attachment.sig>
Tanya M. Lattner
2008-May-21 18:16 UTC
[LLVMdev] Using the test suite to benchmark patches
> just a quick email. I've been working on a patch to simplifycfg last week and > want to test its performance. I've ran the test-suite succesfully, both with > the patched and unpatched versions. However, I could find no easy way to > compare both results. I see that the web pages of the nightly tester provide > nice results (changes compared to the day before, together with percentages > and colors etc). Something like that should be supported for two local test > runs as well, but I couldn't find how.Currently, the nightly tester scripts only compare current day to previous and do not have the ability to compare between two test runs that are more than 1 day/run apart. There is a GSOC student who will be working to improve this and add this feature. However, the nightly tester scripts are PHP based and require the results to be in a database. You could just send them to the LLVM server to do your comparisions if you wanted. Another option is to do a local set up, but this is more work and really only necessary if you have results that you don't want out in the public.> I did a bit of hacking on the HTMLColDiff.pl script that I found lying around, > the (rough) patch is attached. Is this script a quick hackup that got > forgotten, or is it still used by people for a purpose that I don't see right > now?I do not know if people are using this frequently. If it doesn't work, I am sure the answer is no. :) It was probably used before the nightly tester was around.> Any thoughts or suggestions on how to do this testing in a structured manner? > In particular, it would be useful to have some means of running a test a few > times and taking mean values for comparison or something...I would look at the TEST.*.Makefile and TEST.*.report for a way to do your multiple run testing. You still have the problem with comparing 2 different sets of results though.. and that would require a new script. Of course, someone else may have a better suggestion. -Tanya
On May 21, 2008, at 8:09 AM, Matthijs Kooijman wrote:> Any thoughts or suggestions on how to do this testing in a > structured manner?I think that if what you're doing is sound, and you get the results you want, say, on compiling something like gcc with it and others review the basic idea (hi evan or chris) and like it, just checking it in and watching the performance numbers for the next day seems reasonable to me. You can always revert the patch if there are unexpected downsides that make it not worth while. If you can run something larger like spec, that'd also help.
On Wed, 21 May 2008, Mike Stump wrote:> On May 21, 2008, at 8:09 AM, Matthijs Kooijman wrote: >> Any thoughts or suggestions on how to do this testing in a >> structured manner? > > I think that if what you're doing is sound, and you get the results > you want, say, on compiling something like gcc with it and others > review the basic idea (hi evan or chris) and like it, just checking it > in and watching the performance numbers for the next day seems > reasonable to me. You can always revert the patch if there are > unexpected downsides that make it not worth while.It depends on the scope of the change. If it is a relatively minor change, getting the code approved, testing it for correctness, and adding a regression test is sufficient. If it is major (adding a new pass, significantly changing pass ordering etc) then the bar is much higher. We don't have a great way of diffing performance runs, other than the nightly tester. Devang has an experimental "opt-beta" mode that can be used for experimenting with optimization passes, and we have "llc-beta" which is great for measuring the impact of codegen changes. The usual approach is to decide that the patch is good, check it in, then watch for unexpected fallout on the nightly testers. -Chris -- http://nondot.org/sabre/ http://llvm.org/