Kristof Beyls via llvm-dev
2016-Apr-22 09:17 UTC
[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization
On 22 Apr 2016, at 11:14, Mehdi Amini <mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>> wrote: On Apr 22, 2016, at 12:45 AM, Kristof Beyls via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at gmail.com<mailto:sergey.yakoushkin at gmail.com>> wrote: Hi Kristof, The way we use LNT, we would run different configuration (e.g. -O3 vs -Os) as different "machines" in LNT's model. O2/O3 is indeed bad example. We're also using different machines for Os/O3 - such parameters apply to all tests and we don't propose major changes. Elena was only extending LNT interface a bit to ease LLVM-testsuite execution with different compiler or HW flags. Oh I see, this boils down to extending the lnt runtest interface to be able to specify a set of configurations, rather than a single configuration and making sure configurations get submitted under different machine names? We kick off the different configuration runs through a script invoking lnt runtest multiple times. I don't see a big problem with extending the lnt runtest interface to do this, assuming it doesn't break the underlying concepts assumed throughout LNT. Maybe the only downside is that this will add even more command line options to lnt runtest, which already has a lot (too many?) command line options. Maybe some changes are required to analyze and compare metrics between "machines": e.g. code size/performance between Os/O2/O3. Do you perform such comparisons? We typically do these kinds of comparisons when we test our patches pre-commit, i.e. comparing for example '-O3' with '-O3 'mllvm -enable-my-new-pass'. To stick with the LNT concepts, tests enabling new passes are stored as a different "machine". The only way I know to be able to do a comparison between runs on 2 different "machine"s is to manually edit the URL for run vs run comparison and fill in the runids of the 2 runs you want to compare. For example, the following URL is a comparison of green-dragon-07-x86_64-O3-flto vs green-dragon-06-x86_64-O0-g on the public llvm.org/perf<http://llvm.org/perf> server: http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634 I had to manually look up and fill in the run ids 70644 and 70634. It would be great if there was a better way to be able to do these kind of comparisons - i.e. not having to manually fill in run ids, but having a webui to easily find and pick the runs you want to compare. (As an aside: I find it intriguing that the URL above suggests that there are quite a few cases where "-O0 -g" produces faster code than "-O3 -flto"). Can you be more explicit which ones? I don't see any regression (other than compared to the baseline, or on the compile time). -- Mehdi D'Oh! I was misinterpreting the compile time differences as execution time differences. Indeed, there is no unexpected result in there. Sorry for the noise! Kristof -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160422/350248e0/attachment.html>
James Molloy via llvm-dev
2016-Apr-25 09:43 UTC
[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization
Hi Sergey, Elena, Firstly, thanks for this RFC. It's great to see more people actively using and modifying LNT and the test metrics support in general is rather weak currently. Metrics ------- I agree with Daniel and Kristof that your proposed schema changes have the potential to make many queries extremely slow. Certainly for the metrics enhancements, I don't see a reason why we need such a radical change in schema. To add custom metrics on the fly, we need to change the schema for the Sample table. Currently this consists of a column for each metric, but actually we never ever query those metric values. We never query for example for "all failing tests in a run" - when we do analyses we use the ComparisonResult class which reads *all* samples from the database for a run and performs the analysis entirely in Python. Therefore, having a semi-structured format where some fields are first-class columns and the rest are in a JSON-encoded BLOB (as Daniel suggests) seems totally acceptable. There is certainly an argument now that we're using the wrong backend storage solution and that a key-value store might be more suitable, but that's a very invasive change and I don't think we've reached the point where we need to force a move from the simplicity of SQLite. Adding an extra BLOB column would be easy - there would just need to be logic in testsuitedb.py for reading and writing it - the Sample model class would expose the JSON-encoded fields as normal python fields so the rest of LNT would be isolated from this change. But I think this is a small detail compared to the bigger problem of how to effectively *display* all this new data. Currently every new metric gets its own separate table in the report/run views, and this does not scale well at all. I think we need some more concepts in the metric system to make it scaleable: * What "attribute" of the test is this metric measuring? For example, both "exec_time" and "score" measure the same attribute; performance of the generated code. It's superfluous to have them displayed in separate tables. However mem_size and compile_time both measure completely different aspects of the test. * Is this metric useful to display at the top level? or should it only be exposed when more data about a test result is requested? * An example of this is the pass statistics. I don't want my daily report view cluttered by the time spent in register allocation for every test! OK, this is useful information when debugging a problem, but it should be available when requested rather than by default. An example of why we need the above is your screenshots in your google doc. I'm looking at the last screenshot, and it's incredibly difficult to read and get useful information out of. I'd also suggest that if we're adding many more metrics to a test, we should create a "test sample information" page that the test link goes to instead of just the graph. This page could contain all counter/metric data, historic sparklines, the full graph and profiling links. Cheers, James On Fri, 22 Apr 2016 at 10:17 Kristof Beyls via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > On 22 Apr 2016, at 11:14, Mehdi Amini <mehdi.amini at apple.com> wrote: > > > On Apr 22, 2016, at 12:45 AM, Kristof Beyls via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at gmail.com> > wrote: > > Hi Kristof, > > The way we use LNT, we would run different configuration (e.g. -O3 > vs -Os) as different "machines" in LNT's model. > > O2/O3 is indeed bad example. We're also using different machines for Os/O3 > - such parameters apply to all tests and we don't propose major changes. > Elena was only extending LNT interface a bit to ease LLVM-testsuite > execution with different compiler or HW flags. > > > Oh I see, this boils down to extending the lnt runtest interface to be > able to specify a set of configurations, rather than a single configuration > and making > sure configurations get submitted under different machine names? We kick > off the different configuration runs through a script invoking lnt runtest > multiple > times. I don't see a big problem with extending the lnt runtest interface > to do this, assuming it doesn't break the underlying concepts assumed > throughout > LNT. Maybe the only downside is that this will add even more command line > options to lnt runtest, which already has a lot (too many?) command line > options. > > Maybe some changes are required to analyze and compare metrics between > "machines": e.g. code size/performance between Os/O2/O3. > Do you perform such comparisons? > > > We typically do these kinds of comparisons when we test our patches > pre-commit, i.e. comparing for example '-O3' with '-O3 'mllvm > -enable-my-new-pass'. > To stick with the LNT concepts, tests enabling new passes are stored as a > different "machine". > The only way I know to be able to do a comparison between runs on 2 > different "machine"s is to manually edit the URL for run vs run comparison > and fill in the runids of the 2 runs you want to compare. > For example, the following URL is a comparison of > green-dragon-07-x86_64-O3-flto vs green-dragon-06-x86_64-O0-g on the public > llvm.org/perf server: > http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634 > I had to manually look up and fill in the run ids 70644 and 70634. > It would be great if there was a better way to be able to do these kind of > comparisons - i.e. not having to manually fill in run ids, but having a > webui to easily find and pick the runs you want to compare. > (As an aside: I find it intriguing that the URL above suggests that there > are quite a few cases where "-O0 -g" produces faster code than "-O3 -flto"). > > > Can you be more explicit which ones? I don't see any regression (other > than compared to the baseline, or on the compile time). > > -- > Mehdi > > > D'Oh! I was misinterpreting the compile time differences as execution time > differences. Indeed, there is no unexpected result in there. > Sorry for the noise! > > Kristof > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160425/d1592300/attachment.html>
Elena Lepilkina via llvm-dev
2016-Apr-25 11:33 UTC
[llvm-dev] FW: RFC: LNT/Test-suite support for custom metrics and test parameterization
From: Elena Lepilkina Sent: Monday, April 25, 2016 2:33 PM To: 'James Molloy' <james at jamesmolloy.co.uk>; Kristof Beyls <Kristof.Beyls at arm.com>; Mehdi Amini <mehdi.amini at apple.com> Cc: nd <nd at arm.com>; Matthias Braun <matze at braunis.de> Subject: RE: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization Hi everyone, Thank you for your answer. BLOB format adds some more actions for working with metrics. We know that ComparisonResult class makes analysis work. But it gets all metrics by request from database, we will need additional time for work with fields during analysis in ComparisonResult class. May be it will be better to do one Sample table for each testsuite, as it was suggested before. It should be more quickly, shouldn’t it? Moreover, next wished LNT changes will need getting some metrics separately and BLOB format will add some delay in time for queries. As we see now problem of performance is actual, because time for rendering graph page is about 3 minutes. [cid:image001.png at 01D19EFA.6E5F0A70] So maybe it will be better to start working with NoSql databases? I made a small prototype with TestSuite, TestSuiteFields, Test, Run and Sample tables for getting time metrics. It works quickly. And using NoSQL helps solve problems with different fields for samples metrics fields. Then it will be possible to store different metrics for different testsuites in one table. What do you think about this proposal? I used MongoDB, but I know that there is NoSQL extension for PostgresSQL with JSONB fields which are more effective than JSON-encoded BLOB, because it can be included in queries very simply and let use indexes. About proposal that not all metrics should be shown. It can be added as a field in JSON in .fields file, which describes fields getted from test-suite. To see other metrics user should choose them with checkboxes in view options. Will be this solution suitable? We can make as you wrote “I'd also suggest that if we're adding many more metrics to a test, we should create a "test sample information" page that the test link goes to instead of just the graph. This page could contain all counter/metric data, historic sparklines, the full graph and profiling links. ” But the render time of this page will be too great because of graph render time. In my opinion, some users wouldn’t like to wait so long for see some additional metrics. Thanks for your suggestions, Elena. From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of James Molloy via llvm-dev Sent: Monday, April 25, 2016 12:43 PM To: Kristof Beyls <Kristof.Beyls at arm.com<mailto:Kristof.Beyls at arm.com>>; Mehdi Amini <mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>> Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>; nd <nd at arm.com<mailto:nd at arm.com>>; Matthias Braun <matze at braunis.de<mailto:matze at braunis.de>> Subject: Re: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization Hi Sergey, Elena, Firstly, thanks for this RFC. It's great to see more people actively using and modifying LNT and the test metrics support in general is rather weak currently. Metrics ------- I agree with Daniel and Kristof that your proposed schema changes have the potential to make many queries extremely slow. Certainly for the metrics enhancements, I don't see a reason why we need such a radical change in schema. To add custom metrics on the fly, we need to change the schema for the Sample table. Currently this consists of a column for each metric, but actually we never ever query those metric values. We never query for example for "all failing tests in a run" - when we do analyses we use the ComparisonResult class which reads *all* samples from the database for a run and performs the analysis entirely in Python. Therefore, having a semi-structured format where some fields are first-class columns and the rest are in a JSON-encoded BLOB (as Daniel suggests) seems totally acceptable. There is certainly an argument now that we're using the wrong backend storage solution and that a key-value store might be more suitable, but that's a very invasive change and I don't think we've reached the point where we need to force a move from the simplicity of SQLite. Adding an extra BLOB column would be easy - there would just need to be logic in testsuitedb.py for reading and writing it - the Sample model class would expose the JSON-encoded fields as normal python fields so the rest of LNT would be isolated from this change. But I think this is a small detail compared to the bigger problem of how to effectively *display* all this new data. Currently every new metric gets its own separate table in the report/run views, and this does not scale well at all. I think we need some more concepts in the metric system to make it scaleable: * What "attribute" of the test is this metric measuring? For example, both "exec_time" and "score" measure the same attribute; performance of the generated code. It's superfluous to have them displayed in separate tables. However mem_size and compile_time both measure completely different aspects of the test. * Is this metric useful to display at the top level? or should it only be exposed when more data about a test result is requested? * An example of this is the pass statistics. I don't want my daily report view cluttered by the time spent in register allocation for every test! OK, this is useful information when debugging a problem, but it should be available when requested rather than by default. An example of why we need the above is your screenshots in your google doc. I'm looking at the last screenshot, and it's incredibly difficult to read and get useful information out of. I'd also suggest that if we're adding many more metrics to a test, we should create a "test sample information" page that the test link goes to instead of just the graph. This page could contain all counter/metric data, historic sparklines, the full graph and profiling links. Cheers, James On Fri, 22 Apr 2016 at 10:17 Kristof Beyls via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: On 22 Apr 2016, at 11:14, Mehdi Amini <mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>> wrote: On Apr 22, 2016, at 12:45 AM, Kristof Beyls via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at gmail.com<mailto:sergey.yakoushkin at gmail.com>> wrote: Hi Kristof, The way we use LNT, we would run different configuration (e.g. -O3 vs -Os) as different "machines" in LNT's model. O2/O3 is indeed bad example. We're also using different machines for Os/O3 - such parameters apply to all tests and we don't propose major changes. Elena was only extending LNT interface a bit to ease LLVM-testsuite execution with different compiler or HW flags. Oh I see, this boils down to extending the lnt runtest interface to be able to specify a set of configurations, rather than a single configuration and making sure configurations get submitted under different machine names? We kick off the different configuration runs through a script invoking lnt runtest multiple times. I don't see a big problem with extending the lnt runtest interface to do this, assuming it doesn't break the underlying concepts assumed throughout LNT. Maybe the only downside is that this will add even more command line options to lnt runtest, which already has a lot (too many?) command line options. Maybe some changes are required to analyze and compare metrics between "machines": e.g. code size/performance between Os/O2/O3. Do you perform such comparisons? We typically do these kinds of comparisons when we test our patches pre-commit, i.e. comparing for example '-O3' with '-O3 'mllvm -enable-my-new-pass'. To stick with the LNT concepts, tests enabling new passes are stored as a different "machine". The only way I know to be able to do a comparison between runs on 2 different "machine"s is to manually edit the URL for run vs run comparison and fill in the runids of the 2 runs you want to compare. For example, the following URL is a comparison of green-dragon-07-x86_64-O3-flto vs green-dragon-06-x86_64-O0-g on the public llvm.org/perf<http://llvm.org/perf> server: http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634 I had to manually look up and fill in the run ids 70644 and 70634. It would be great if there was a better way to be able to do these kind of comparisons - i.e. not having to manually fill in run ids, but having a webui to easily find and pick the runs you want to compare. (As an aside: I find it intriguing that the URL above suggests that there are quite a few cases where "-O0 -g" produces faster code than "-O3 -flto"). Can you be more explicit which ones? I don't see any regression (other than compared to the baseline, or on the compile time). -- Mehdi D'Oh! I was misinterpreting the compile time differences as execution time differences. Indeed, there is no unexpected result in there. Sorry for the noise! Kristof _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160425/4e0aec0e/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 22350 bytes Desc: image001.png URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160425/4e0aec0e/attachment-0001.png>