thr3ads.net - llvm dev - [llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Elena Lepilkina via llvm-dev

2016-Apr-25 11:33 UTC

[llvm-dev] FW: RFC: LNT/Test-suite support for custom metrics and test parameterization

From: Elena Lepilkina
Sent: Monday, April 25, 2016 2:33 PM
To: 'James Molloy' <james at jamesmolloy.co.uk>; Kristof Beyls
<Kristof.Beyls at arm.com>; Mehdi Amini <mehdi.amini at apple.com>
Cc: nd <nd at arm.com>; Matthias Braun <matze at braunis.de>
Subject: RE: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and test
parameterization

Hi everyone,

Thank you for your answer. BLOB format adds some more actions for working with
metrics. We know that ComparisonResult class makes analysis work. But it gets
all metrics by request from database, we will need additional time for work with
fields during analysis in ComparisonResult class. May be it will be better to do
one Sample table for each testsuite, as it was suggested before. It should be
more quickly, shouldn’t it? Moreover, next wished LNT changes will need getting
some metrics separately and BLOB format will add some delay in time for queries.

As we see now problem of performance is actual, because time for rendering graph
page is about 3 minutes.
[cid:image001.png at 01D19EFA.6E5F0A70]
So maybe it will be better to start working with NoSql databases? I made a small
prototype with TestSuite, TestSuiteFields, Test, Run and Sample tables for
getting time metrics. It works quickly. And using NoSQL helps solve problems
with  different fields for samples metrics fields. Then it will be possible to
store different metrics for different testsuites in one table.
What do you think about this proposal?
I used MongoDB, but I know that there is NoSQL extension for PostgresSQL with
JSONB fields which are more
effective than JSON-encoded BLOB, because it can be included in queries very
simply and let use indexes.

About proposal that not all metrics should be shown. It can be added as a field
in JSON in .fields file, which describes fields getted from test-suite. To see
other metrics user should choose them with checkboxes in view options. Will be
this solution suitable?
We can make as you wrote
“I'd also suggest that if we're adding many more metrics to a test, we
should create a "test sample information" page that the test link goes
to instead of just the graph. This page could contain all counter/metric data,
historic sparklines, the full graph and profiling links.
”
But the render time of this page will be too great because of graph render time.
In my opinion, some users wouldn’t like to wait so long for see some additional
metrics.

Thanks for your suggestions,

Elena.
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of James
Molloy via llvm-dev
Sent: Monday, April 25, 2016 12:43 PM
To: Kristof Beyls <Kristof.Beyls at arm.com<mailto:Kristof.Beyls at
arm.com>>; Mehdi Amini <mehdi.amini at apple.com<mailto:mehdi.amini
at apple.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>; nd <nd at arm.com<mailto:nd at arm.com>>;
Matthias Braun <matze at braunis.de<mailto:matze at braunis.de>>
Subject: Re: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and test
parameterization

Hi Sergey, Elena,

Firstly, thanks for this RFC. It's great to see more people actively using
and modifying LNT and the test metrics support in general is rather weak
currently.

Metrics
-------

I agree with Daniel and Kristof that your proposed schema changes have the
potential to make many queries extremely slow. Certainly for the metrics
enhancements, I don't see a reason why we need such a radical change in
schema.

To add custom metrics on the fly, we need to change the schema for the Sample
table. Currently this consists of a column for each metric, but actually we
never ever query those metric values. We never query for example for "all
failing tests in a run" - when we do analyses we use the ComparisonResult
class which reads *all* samples from the database for a run and performs the
analysis entirely in Python.

Therefore, having a semi-structured format where some fields are first-class
columns and the rest are in a JSON-encoded BLOB (as Daniel suggests) seems
totally acceptable. There is certainly an argument now that we're using the
wrong backend storage solution and that a key-value store might be more
suitable, but that's a very invasive change and I don't think we've
reached the point where we need to force a move from the simplicity of SQLite.

Adding an extra BLOB column would be easy - there would just need to be logic in
testsuitedb.py for reading and writing it - the Sample model class would expose
the JSON-encoded fields as normal python fields so the rest of LNT would be
isolated from this change.

But I think this is a small detail compared to the bigger problem of how to
effectively *display* all this new data. Currently every new metric gets its own
separate table in the report/run views, and this does not scale well at all.

I think we need some more concepts in the metric system to make it scaleable:

  * What "attribute" of the test is this metric measuring? For
example, both "exec_time" and "score" measure the same
attribute; performance of the generated code. It's superfluous to have them
displayed in separate tables. However mem_size and compile_time both measure
completely different aspects of the test.
  * Is this metric useful to display at the top level? or should it only be
exposed when more data about a test result is requested?
    * An example of this is the pass statistics. I don't want my daily
report view cluttered by the time spent in register allocation for every test!
OK, this is useful information when debugging a problem, but it should be
available when requested rather than by default.

An example of why we need the above is your screenshots in your google doc.
I'm looking at the last screenshot, and it's incredibly difficult to
read and get useful information out of.

I'd also suggest that if we're adding many more metrics to a test, we
should create a "test sample information" page that the test link goes
to instead of just the graph. This page could contain all counter/metric data,
historic sparklines, the full graph and profiling links.

Cheers,

James

On Fri, 22 Apr 2016 at 10:17 Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

On 22 Apr 2016, at 11:14, Mehdi Amini <mehdi.amini at
apple.com<mailto:mehdi.amini at apple.com>> wrote:

On Apr 22, 2016, at 12:45 AM, Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at
gmail.com<mailto:sergey.yakoushkin at gmail.com>> wrote:

Hi Kristof,

       The way we use LNT, we would run different configuration (e.g. -O3 vs
-Os) as different "machines" in LNT's model.

O2/O3 is indeed bad example. We're also using different machines for Os/O3 -
such parameters apply to all tests and we don't propose major changes.
Elena was only extending LNT interface a bit to ease LLVM-testsuite execution
with different compiler or HW flags.

Oh I see, this boils down to extending the lnt runtest interface to be able to
specify a set of configurations, rather than a single configuration and making
sure configurations get submitted under different machine names? We kick off the
different configuration runs through a script invoking lnt runtest multiple
times. I don't see a big problem with extending the lnt runtest interface to
do this, assuming it doesn't break the underlying concepts assumed
throughout
LNT. Maybe the only downside is that this will add even more command line
options to lnt runtest, which already has a lot (too many?) command line
options.

Maybe some changes are required to analyze and compare metrics between
"machines": e.g. code size/performance between Os/O2/O3.
Do you perform such comparisons?

We typically do these kinds of comparisons when we test our patches pre-commit,
i.e. comparing for example '-O3' with '-O3 'mllvm
-enable-my-new-pass'.
To stick with the LNT concepts, tests enabling new passes are stored as a
different "machine".
The only way I know to be able to do a comparison between runs on 2 different
"machine"s is to manually edit the URL for run vs run comparison
and fill in the runids of the 2 runs you want to compare.
For example, the following URL is a comparison of green-dragon-07-x86_64-O3-flto
vs green-dragon-06-x86_64-O0-g on the public
llvm.org/perf<http://llvm.org/perf> server:
http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634
I had to manually look up and fill in the run ids 70644 and 70634.
It would be great if there was a better way to be able to do these kind of
comparisons - i.e. not having to manually fill in run ids, but having a webui to
easily find and pick the runs you want to compare.
(As an aside: I find it intriguing that the URL above suggests that there are
quite a few cases where "-O0 -g" produces faster code than "-O3
-flto").

Can you be more explicit which ones? I don't see any regression (other than
compared to the baseline, or on the compile time).

--
Mehdi

D'Oh! I was misinterpreting the compile time differences as execution time
differences. Indeed, there is no unexpected result in there.
Sorry for the noise!

Kristof

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160425/4e0aec0e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 22350 bytes
Desc: image001.png
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160425/4e0aec0e/attachment-0001.png>

Chris Matthews via llvm-dev

2016-Apr-25 16:51 UTC

head link

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

Questions from this thread that I can help address:


The LNT compile suite: we use it a lot here at Apple.  It has a metric set that
is customized for the analysis of compile time regressions.  Given the recent
interest in compile time, I hope to be able to setup a public bot to collect
data in this suite sometime soon.

On the topic of encoding configurations in or across machines: both work and are
used.  LNT Compile stores all the optimization levels in the same run, and uses
part of the benchmark name to encode that. “compile.benchmark.name (opt and
flags).metric”.  This kind of flexibility is nice.  The tradeoff is that it is
harder to compare results from different machines (the web ui make it almost
impossible - but you can do it by editing the URLs by hand.  I think James made
this a bit better recently?).
 
The ill-fated FieldChange table.  This was added to LNT ages ago, and there were
no consumers of the data.  When I went to do the regression tracking feature, I
realized there was an error in how the data was being calculated (a missing join
was mixing data from other machines).  Since sometime people disagree with me,
and people blindly update their LNT instances, I decided best thing to do was
not DROP the table, but just leave it incase a rollback was needed. That is how
FieldChangeV2 came about, and that is why FieldChange still exists.  Everyone
can feel free to DROP the old table any time, and now that we have not been
using it for a while and no one has complained, it is probably safe to remove it
with a migration.

The intent of the test-suites as the primary database entity in LNT is manage
the schema of the metics.  The test-suite abstraction adds gobs of complexity to
LNT in the backend.  I’d be happy to drop that idea in favor of a more flexible
scheme.  IMO, the complexity comes from how we dynamically create the database
schema from the test-suite definition.  I’ve tried to add things to LNT in the
past, (JSON api, admin interface, better migration to name three), and most of
the third party flask modules require DB schema to be defined up front.  That
said about the test-suite system, it might help out here.  We could implement
your proposed changes as a third test-suite kind.


> On Apr 25, 2016, at 4:33 AM, Elena Lepilkina via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>  
>  
> From: Elena Lepilkina 
> Sent: Monday, April 25, 2016 2:33 PM
> To: 'James Molloy' <james at jamesmolloy.co.uk <mailto:james
at jamesmolloy.co.uk>>; Kristof Beyls <Kristof.Beyls at arm.com
<mailto:Kristof.Beyls at arm.com>>; Mehdi Amini <mehdi.amini at
apple.com <mailto:mehdi.amini at apple.com>>
> Cc: nd <nd at arm.com <mailto:nd at arm.com>>; Matthias Braun
<matze at braunis.de <mailto:matze at braunis.de>>
> Subject: RE: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and
test parameterization
>  
> Hi everyone,
>  
> Thank you for your answer. BLOB format adds some more actions for working
with metrics. We know that ComparisonResult class makes analysis work. But it
gets all metrics by request from database, we will need additional time for work
with fields during analysis in ComparisonResult class. May be it will be better
to do one Sample table for each testsuite, as it was suggested before. It should
be more quickly, shouldn’t it? Moreover, next wished LNT changes will need
getting some metrics separately and BLOB format will add some delay in time for
queries.
>  
> As we see now problem of performance is actual, because time for rendering
graph page is about 3 minutes.
> <image001.png>
> So maybe it will be better to start working with NoSql databases? I made a
small prototype with TestSuite, TestSuiteFields, Test, Run and Sample tables for
getting time metrics. It works quickly. And using NoSQL helps solve problems
with  different fields for samples metrics fields. Then it will be possible to
store different metrics for different testsuites in one table.
> What do you think about this proposal?
> I used MongoDB, but I know that there is NoSQL extension for PostgresSQL
with JSONB fields which are more
> effective than JSON-encoded BLOB, because it can be included in queries
very simply and let use indexes.
>  
> About proposal that not all metrics should be shown. It can be added as a
field in JSON in .fields file, which describes fields getted from test-suite. To
see other metrics user should choose them with checkboxes in view options. Will
be this solution suitable?
> We can make as you wrote
> “I'd also suggest that if we're adding many more metrics to a test,
we should create a "test sample information" page that the test link
goes to instead of just the graph. This page could contain all counter/metric
data, historic sparklines, the full graph and profiling links.
> ”
> But the render time of this page will be too great because of graph render
time. In my opinion, some users wouldn’t like to wait so long for see some
additional metrics.
>  
> Thanks for your suggestions,
>  
> Elena.
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of James Molloy via
llvm-dev
> Sent: Monday, April 25, 2016 12:43 PM
> To: Kristof Beyls <Kristof.Beyls at arm.com <mailto:Kristof.Beyls at
arm.com>>; Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini
at apple.com>>
> Cc: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>; nd <nd at arm.com <mailto:nd at arm.com>>;
Matthias Braun <matze at braunis.de <mailto:matze at braunis.de>>
> Subject: Re: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and
test parameterization
>  
> Hi Sergey, Elena,
>  
> Firstly, thanks for this RFC. It's great to see more people actively
using and modifying LNT and the test metrics support in general is rather weak
currently.
>  
> Metrics
> -------
>  
> I agree with Daniel and Kristof that your proposed schema changes have the
potential to make many queries extremely slow. Certainly for the metrics
enhancements, I don't see a reason why we need such a radical change in
schema.
>  
> To add custom metrics on the fly, we need to change the schema for the
Sample table. Currently this consists of a column for each metric, but actually
we never ever query those metric values. We never query for example for
"all failing tests in a run" - when we do analyses we use the
ComparisonResult class which reads *all* samples from the database for a run and
performs the analysis entirely in Python.
>  
> Therefore, having a semi-structured format where some fields are
first-class columns and the rest are in a JSON-encoded BLOB (as Daniel suggests)
seems totally acceptable. There is certainly an argument now that we're
using the wrong backend storage solution and that a key-value store might be
more suitable, but that's a very invasive change and I don't think
we've reached the point where we need to force a move from the simplicity of
SQLite.
>  
> Adding an extra BLOB column would be easy - there would just need to be
logic in testsuitedb.py for reading and writing it - the Sample model class
would expose the JSON-encoded fields as normal python fields so the rest of LNT
would be isolated from this change.
>  
> But I think this is a small detail compared to the bigger problem of how to
effectively *display* all this new data. Currently every new metric gets its own
separate table in the report/run views, and this does not scale well at all.
>  
> I think we need some more concepts in the metric system to make it
scaleable:
>  
>   * What "attribute" of the test is this metric measuring? For
example, both "exec_time" and "score" measure the same
attribute; performance of the generated code. It's superfluous to have them
displayed in separate tables. However mem_size and compile_time both measure
completely different aspects of the test.
>   * Is this metric useful to display at the top level? or should it only be
exposed when more data about a test result is requested?
>     * An example of this is the pass statistics. I don't want my daily
report view cluttered by the time spent in register allocation for every test!
OK, this is useful information when debugging a problem, but it should be
available when requested rather than by default.
>  
> An example of why we need the above is your screenshots in your google doc.
I'm looking at the last screenshot, and it's incredibly difficult to
read and get useful information out of.
>  
> I'd also suggest that if we're adding many more metrics to a test,
we should create a "test sample information" page that the test link
goes to instead of just the graph. This page could contain all counter/metric
data, historic sparklines, the full graph and profiling links.
>  
> Cheers,
>  
> James
>  
> On Fri, 22 Apr 2016 at 10:17 Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>  
> On 22 Apr 2016, at 11:14, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
>  
> 
> On Apr 22, 2016, at 12:45 AM, Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>  
> 
> On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at
gmail.com <mailto:sergey.yakoushkin at gmail.com>> wrote:
>  
> Hi Kristof,
>  
>        The way we use LNT, we would run different configuration (e.g. -O3
vs -Os) as different "machines" in LNT's model.
>  
> O2/O3 is indeed bad example. We're also using different machines for
Os/O3 - such parameters apply to all tests and we don't propose major
changes.
> Elena was only extending LNT interface a bit to ease LLVM-testsuite
execution with different compiler or HW flags.
>  
> Oh I see, this boils down to extending the lnt runtest interface to be able
to specify a set of configurations, rather than a single configuration and
making
> sure configurations get submitted under different machine names? We kick
off the different configuration runs through a script invoking lnt runtest
multiple
> times. I don't see a big problem with extending the lnt runtest
interface to do this, assuming it doesn't break the underlying concepts
assumed throughout
> LNT. Maybe the only downside is that this will add even more command line
options to lnt runtest, which already has a lot (too many?) command line
> options.
>  
> 
> Maybe some changes are required to analyze and compare metrics between
"machines": e.g. code size/performance between Os/O2/O3.
> Do you perform such comparisons?
>  
> We typically do these kinds of comparisons when we test our patches
pre-commit, i.e. comparing for example '-O3' with '-O3 'mllvm
-enable-my-new-pass'.
> To stick with the LNT concepts, tests enabling new passes are stored as a
different "machine".
> The only way I know to be able to do a comparison between runs on 2
different "machine"s is to manually edit the URL for run vs run
comparison
> and fill in the runids of the 2 runs you want to compare.
> For example, the following URL is a comparison of
green-dragon-07-x86_64-O3-flto vs green-dragon-06-x86_64-O0-g on the public
llvm.org/perf <http://llvm.org/perf> server:
> http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634
<http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634>
> I had to manually look up and fill in the run ids 70644 and 70634.
> It would be great if there was a better way to be able to do these kind of
comparisons - i.e. not having to manually fill in run ids, but having a webui to
easily find and pick the runs you want to compare.
> (As an aside: I find it intriguing that the URL above suggests that there
are quite a few cases where "-O0 -g" produces faster code than
"-O3 -flto").
>  
> Can you be more explicit which ones? I don't see any regression (other
than compared to the baseline, or on the compile time).
>  
> -- 
> Mehdi
>  
> D'Oh! I was misinterpreting the compile time differences as execution
time differences. Indeed, there is no unexpected result in there.
> Sorry for the noise!
>  
> Kristof
>  
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>_______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160425/76520790/attachment.html>

Chris Matthews via llvm-dev

2016-Apr-25 17:06 UTC

head link

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

I am really torn about this.

When I implemented the regression tracking stuff recently, it really showed me
how badly we are scaling.  On our production server, the run ingestion can take
well over 100s.  Time is mostly spent in FieldChange generation and regression
grouping. Both have to access a lot of recent samples. This is not the end of
the world, because it runs in a background process.  Where this really sucks is
when a regression has a lot indicators. The web interface renders these in a
graph, and just trying to pull down 100 graphs worth of data kills the server. 
I ended up limiting those to a max of 10 datasets, and even that takes 30s to
load.

So I do think we need some improvements to the scalability.

LNT usage is spread between two groups. Users who setup big servers, with
Postgres and apache/Gunicorn. For those uses I think a NoSQL is the way to go.  
However, our second (and probably more common) user, is the people running
little instance on their own machine to do some local compiler benchmarking. 
Their setup process needs to be dead simple, and I think requiring a NoSQL
database to be setup on their machine first is a no starter.  Like we do with
sqlite, I think we need a transparent fall back for people who don’t have a
NoSQL database.

Would it be helpful to anyone if I got a dump of the llvm.org
<http://llvm.org/> LNT Postgres database?  It is a good dataset big
dataset to test with, and I assume everyone is okay with it being public, since
the LNT server already is.

> On Apr 25, 2016, at 4:33 AM, Elena Lepilkina via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>  
>  
> From: Elena Lepilkina 
> Sent: Monday, April 25, 2016 2:33 PM
> To: 'James Molloy' <james at jamesmolloy.co.uk <mailto:james
at jamesmolloy.co.uk>>; Kristof Beyls <Kristof.Beyls at arm.com
<mailto:Kristof.Beyls at arm.com>>; Mehdi Amini <mehdi.amini at
apple.com <mailto:mehdi.amini at apple.com>>
> Cc: nd <nd at arm.com <mailto:nd at arm.com>>; Matthias Braun
<matze at braunis.de <mailto:matze at braunis.de>>
> Subject: RE: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and
test parameterization
>  
> Hi everyone,
>  
> Thank you for your answer. BLOB format adds some more actions for working
with metrics. We know that ComparisonResult class makes analysis work. But it
gets all metrics by request from database, we will need additional time for work
with fields during analysis in ComparisonResult class. May be it will be better
to do one Sample table for each testsuite, as it was suggested before. It should
be more quickly, shouldn’t it? Moreover, next wished LNT changes will need
getting some metrics separately and BLOB format will add some delay in time for
queries.
>  
> As we see now problem of performance is actual, because time for rendering
graph page is about 3 minutes.
> <image001.png>
> So maybe it will be better to start working with NoSql databases? I made a
small prototype with TestSuite, TestSuiteFields, Test, Run and Sample tables for
getting time metrics. It works quickly. And using NoSQL helps solve problems
with  different fields for samples metrics fields. Then it will be possible to
store different metrics for different testsuites in one table.
> What do you think about this proposal?
> I used MongoDB, but I know that there is NoSQL extension for PostgresSQL
with JSONB fields which are more
> effective than JSON-encoded BLOB, because it can be included in queries
very simply and let use indexes.
>  
> About proposal that not all metrics should be shown. It can be added as a
field in JSON in .fields file, which describes fields getted from test-suite. To
see other metrics user should choose them with checkboxes in view options. Will
be this solution suitable?
> We can make as you wrote
> “I'd also suggest that if we're adding many more metrics to a test,
we should create a "test sample information" page that the test link
goes to instead of just the graph. This page could contain all counter/metric
data, historic sparklines, the full graph and profiling links.
> ”
> But the render time of this page will be too great because of graph render
time. In my opinion, some users wouldn’t like to wait so long for see some
additional metrics.
>  
> Thanks for your suggestions,
>  
> Elena.
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of James Molloy via
llvm-dev
> Sent: Monday, April 25, 2016 12:43 PM
> To: Kristof Beyls <Kristof.Beyls at arm.com <mailto:Kristof.Beyls at
arm.com>>; Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini
at apple.com>>
> Cc: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>; nd <nd at arm.com <mailto:nd at arm.com>>;
Matthias Braun <matze at braunis.de <mailto:matze at braunis.de>>
> Subject: Re: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and
test parameterization
>  
> Hi Sergey, Elena,
>  
> Firstly, thanks for this RFC. It's great to see more people actively
using and modifying LNT and the test metrics support in general is rather weak
currently.
>  
> Metrics
> -------
>  
> I agree with Daniel and Kristof that your proposed schema changes have the
potential to make many queries extremely slow. Certainly for the metrics
enhancements, I don't see a reason why we need such a radical change in
schema.
>  
> To add custom metrics on the fly, we need to change the schema for the
Sample table. Currently this consists of a column for each metric, but actually
we never ever query those metric values. We never query for example for
"all failing tests in a run" - when we do analyses we use the
ComparisonResult class which reads *all* samples from the database for a run and
performs the analysis entirely in Python.
>  
> Therefore, having a semi-structured format where some fields are
first-class columns and the rest are in a JSON-encoded BLOB (as Daniel suggests)
seems totally acceptable. There is certainly an argument now that we're
using the wrong backend storage solution and that a key-value store might be
more suitable, but that's a very invasive change and I don't think
we've reached the point where we need to force a move from the simplicity of
SQLite.
>  
> Adding an extra BLOB column would be easy - there would just need to be
logic in testsuitedb.py for reading and writing it - the Sample model class
would expose the JSON-encoded fields as normal python fields so the rest of LNT
would be isolated from this change.
>  
> But I think this is a small detail compared to the bigger problem of how to
effectively *display* all this new data. Currently every new metric gets its own
separate table in the report/run views, and this does not scale well at all.
>  
> I think we need some more concepts in the metric system to make it
scaleable:
>  
>   * What "attribute" of the test is this metric measuring? For
example, both "exec_time" and "score" measure the same
attribute; performance of the generated code. It's superfluous to have them
displayed in separate tables. However mem_size and compile_time both measure
completely different aspects of the test.
>   * Is this metric useful to display at the top level? or should it only be
exposed when more data about a test result is requested?
>     * An example of this is the pass statistics. I don't want my daily
report view cluttered by the time spent in register allocation for every test!
OK, this is useful information when debugging a problem, but it should be
available when requested rather than by default.
>  
> An example of why we need the above is your screenshots in your google doc.
I'm looking at the last screenshot, and it's incredibly difficult to
read and get useful information out of.
>  
> I'd also suggest that if we're adding many more metrics to a test,
we should create a "test sample information" page that the test link
goes to instead of just the graph. This page could contain all counter/metric
data, historic sparklines, the full graph and profiling links.
>  
> Cheers,
>  
> James
>  
> On Fri, 22 Apr 2016 at 10:17 Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>  
> On 22 Apr 2016, at 11:14, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
>  
> 
> On Apr 22, 2016, at 12:45 AM, Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>  
> 
> On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at
gmail.com <mailto:sergey.yakoushkin at gmail.com>> wrote:
>  
> Hi Kristof,
>  
>        The way we use LNT, we would run different configuration (e.g. -O3
vs -Os) as different "machines" in LNT's model.
>  
> O2/O3 is indeed bad example. We're also using different machines for
Os/O3 - such parameters apply to all tests and we don't propose major
changes.
> Elena was only extending LNT interface a bit to ease LLVM-testsuite
execution with different compiler or HW flags.
>  
> Oh I see, this boils down to extending the lnt runtest interface to be able
to specify a set of configurations, rather than a single configuration and
making
> sure configurations get submitted under different machine names? We kick
off the different configuration runs through a script invoking lnt runtest
multiple
> times. I don't see a big problem with extending the lnt runtest
interface to do this, assuming it doesn't break the underlying concepts
assumed throughout
> LNT. Maybe the only downside is that this will add even more command line
options to lnt runtest, which already has a lot (too many?) command line
> options.
>  
> 
> Maybe some changes are required to analyze and compare metrics between
"machines": e.g. code size/performance between Os/O2/O3.
> Do you perform such comparisons?
>  
> We typically do these kinds of comparisons when we test our patches
pre-commit, i.e. comparing for example '-O3' with '-O3 'mllvm
-enable-my-new-pass'.
> To stick with the LNT concepts, tests enabling new passes are stored as a
different "machine".
> The only way I know to be able to do a comparison between runs on 2
different "machine"s is to manually edit the URL for run vs run
comparison
> and fill in the runids of the 2 runs you want to compare.
> For example, the following URL is a comparison of
green-dragon-07-x86_64-O3-flto vs green-dragon-06-x86_64-O0-g on the public
llvm.org/perf <http://llvm.org/perf> server:
> http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634
<http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634>
> I had to manually look up and fill in the run ids 70644 and 70634.
> It would be great if there was a better way to be able to do these kind of
comparisons - i.e. not having to manually fill in run ids, but having a webui to
easily find and pick the runs you want to compare.
> (As an aside: I find it intriguing that the URL above suggests that there
are quite a few cases where "-O0 -g" produces faster code than
"-O3 -flto").
>  
> Can you be more explicit which ones? I don't see any regression (other
than compared to the baseline, or on the compile time).
>  
> -- 
> Mehdi
>  
> D'Oh! I was misinterpreting the compile time differences as execution
time differences. Indeed, there is no unexpected result in there.
> Sorry for the noise!
>  
> Kristof
>  
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>_______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160425/9a7bd5b3/attachment-0001.html>

Elena Lepilkina via llvm-dev

2016-Apr-26 06:07 UTC

head link

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

Hi, Chris.

Thank you for your answer about compile tests. As I understood during looking
through code of compile tests they don’t use test suite at all. Am I right?
There is lack of information and examples of running compile tests in LNT
documentation.
We understood that there are two groups of users: users using servers and
collecting a lot of data and SQLite users, but these users as I think wouldn’t
have about millions of sample records.
I think that it’s obvious that there is no universal solution for simple
installing process and flexible high-loaded system.
I will update proposal and take into consideration your suggestion about third
part of test-suite.

Thanks

Elena.

From: chris.matthews at apple.com [mailto:chris.matthews at apple.com]
Sent: Monday, April 25, 2016 8:06 PM
To: Elena Lepilkina <Elena.Lepilkina at synopsys.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and test
parameterization

I am really torn about this.

When I implemented the regression tracking stuff recently, it really showed me
how badly we are scaling.  On our production server, the run ingestion can take
well over 100s.  Time is mostly spent in FieldChange generation and regression
grouping. Both have to access a lot of recent samples. This is not the end of
the world, because it runs in a background process.  Where this really sucks is
when a regression has a lot indicators. The web interface renders these in a
graph, and just trying to pull down 100 graphs worth of data kills the server. 
I ended up limiting those to a max of 10 datasets, and even that takes 30s to
load.

So I do think we need some improvements to the scalability.

LNT usage is spread between two groups. Users who setup big servers, with
Postgres and apache/Gunicorn. For those uses I think a NoSQL is the way to go.  
However, our second (and probably more common) user, is the people running
little instance on their own machine to do some local compiler benchmarking. 
Their setup process needs to be dead simple, and I think requiring a NoSQL
database to be setup on their machine first is a no starter.  Like we do with
sqlite, I think we need a transparent fall back for people who don’t have a
NoSQL database.

Would it be helpful to anyone if I got a dump of the
llvm.org<http://llvm.org> LNT Postgres database?  It is a good dataset big
dataset to test with, and I assume everyone is okay with it being public, since
the LNT server already is.


On Apr 25, 2016, at 4:33 AM, Elena Lepilkina via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:



From: Elena Lepilkina
Sent: Monday, April 25, 2016 2:33 PM
To: 'James Molloy' <james at jamesmolloy.co.uk<mailto:james at
jamesmolloy.co.uk>>; Kristof Beyls <Kristof.Beyls at
arm.com<mailto:Kristof.Beyls at arm.com>>; Mehdi Amini <mehdi.amini
at apple.com<mailto:mehdi.amini at apple.com>>
Cc: nd <nd at arm.com<mailto:nd at arm.com>>; Matthias Braun
<matze at braunis.de<mailto:matze at braunis.de>>
Subject: RE: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and test
parameterization

Hi everyone,

Thank you for your answer. BLOB format adds some more actions for working with
metrics. We know that ComparisonResult class makes analysis work. But it gets
all metrics by request from database, we will need additional time for work with
fields during analysis in ComparisonResult class. May be it will be better to do
one Sample table for each testsuite, as it was suggested before. It should be
more quickly, shouldn’t it? Moreover, next wished LNT changes will need getting
some metrics separately and BLOB format will add some delay in time for queries.

As we see now problem of performance is actual, because time for rendering graph
page is about 3 minutes.
<image001.png>
So maybe it will be better to start working with NoSql databases? I made a small
prototype with TestSuite, TestSuiteFields, Test, Run and Sample tables for
getting time metrics. It works quickly. And using NoSQL helps solve problems
with  different fields for samples metrics fields. Then it will be possible to
store different metrics for different testsuites in one table.
What do you think about this proposal?
I used MongoDB, but I know that there is NoSQL extension for PostgresSQL with
JSONB fields which are more
effective than JSON-encoded BLOB, because it can be included in queries very
simply and let use indexes.

About proposal that not all metrics should be shown. It can be added as a field
in JSON in .fields file, which describes fields getted from test-suite. To see
other metrics user should choose them with checkboxes in view options. Will be
this solution suitable?
We can make as you wrote
“I'd also suggest that if we're adding many more metrics to a test, we
should create a "test sample information" page that the test link goes
to instead of just the graph. This page could contain all counter/metric data,
historic sparklines, the full graph and profiling links.
”
But the render time of this page will be too great because of graph render time.
In my opinion, some users wouldn’t like to wait so long for see some additional
metrics.

Thanks for your suggestions,

Elena.
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of James
Molloy via llvm-dev
Sent: Monday, April 25, 2016 12:43 PM
To: Kristof Beyls <Kristof.Beyls at arm.com<mailto:Kristof.Beyls at
arm.com>>; Mehdi Amini <mehdi.amini at apple.com<mailto:mehdi.amini
at apple.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>; nd <nd at arm.com<mailto:nd at arm.com>>;
Matthias Braun <matze at braunis.de<mailto:matze at braunis.de>>
Subject: Re: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and test
parameterization

Hi Sergey, Elena,

Firstly, thanks for this RFC. It's great to see more people actively using
and modifying LNT and the test metrics support in general is rather weak
currently.

Metrics
-------

I agree with Daniel and Kristof that your proposed schema changes have the
potential to make many queries extremely slow. Certainly for the metrics
enhancements, I don't see a reason why we need such a radical change in
schema.

To add custom metrics on the fly, we need to change the schema for the Sample
table. Currently this consists of a column for each metric, but actually we
never ever query those metric values. We never query for example for "all
failing tests in a run" - when we do analyses we use the ComparisonResult
class which reads *all* samples from the database for a run and performs the
analysis entirely in Python.

Therefore, having a semi-structured format where some fields are first-class
columns and the rest are in a JSON-encoded BLOB (as Daniel suggests) seems
totally acceptable. There is certainly an argument now that we're using the
wrong backend storage solution and that a key-value store might be more
suitable, but that's a very invasive change and I don't think we've
reached the point where we need to force a move from the simplicity of SQLite.

Adding an extra BLOB column would be easy - there would just need to be logic in
testsuitedb.py for reading and writing it - the Sample model class would expose
the JSON-encoded fields as normal python fields so the rest of LNT would be
isolated from this change.

But I think this is a small detail compared to the bigger problem of how to
effectively *display* all this new data. Currently every new metric gets its own
separate table in the report/run views, and this does not scale well at all.

I think we need some more concepts in the metric system to make it scaleable:

  * What "attribute" of the test is this metric measuring? For
example, both "exec_time" and "score" measure the same
attribute; performance of the generated code. It's superfluous to have them
displayed in separate tables. However mem_size and compile_time both measure
completely different aspects of the test.
  * Is this metric useful to display at the top level? or should it only be
exposed when more data about a test result is requested?
    * An example of this is the pass statistics. I don't want my daily
report view cluttered by the time spent in register allocation for every test!
OK, this is useful information when debugging a problem, but it should be
available when requested rather than by default.

An example of why we need the above is your screenshots in your google doc.
I'm looking at the last screenshot, and it's incredibly difficult to
read and get useful information out of.

I'd also suggest that if we're adding many more metrics to a test, we
should create a "test sample information" page that the test link goes
to instead of just the graph. This page could contain all counter/metric data,
historic sparklines, the full graph and profiling links.

Cheers,

James

On Fri, 22 Apr 2016 at 10:17 Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

On 22 Apr 2016, at 11:14, Mehdi Amini <mehdi.amini at
apple.com<mailto:mehdi.amini at apple.com>> wrote:


On Apr 22, 2016, at 12:45 AM, Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:


On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at
gmail.com<mailto:sergey.yakoushkin at gmail.com>> wrote:

Hi Kristof,

       The way we use LNT, we would run different configuration (e.g. -O3 vs
-Os) as different "machines" in LNT's model.

O2/O3 is indeed bad example. We're also using different machines for Os/O3 -
such parameters apply to all tests and we don't propose major changes.
Elena was only extending LNT interface a bit to ease LLVM-testsuite execution
with different compiler or HW flags.

Oh I see, this boils down to extending the lnt runtest interface to be able to
specify a set of configurations, rather than a single configuration and making
sure configurations get submitted under different machine names? We kick off the
different configuration runs through a script invoking lnt runtest multiple
times. I don't see a big problem with extending the lnt runtest interface to
do this, assuming it doesn't break the underlying concepts assumed
throughout
LNT. Maybe the only downside is that this will add even more command line
options to lnt runtest, which already has a lot (too many?) command line
options.

Maybe some changes are required to analyze and compare metrics between
"machines": e.g. code size/performance between Os/O2/O3.
Do you perform such comparisons?

We typically do these kinds of comparisons when we test our patches pre-commit,
i.e. comparing for example '-O3' with '-O3 'mllvm
-enable-my-new-pass'.
To stick with the LNT concepts, tests enabling new passes are stored as a
different "machine".
The only way I know to be able to do a comparison between runs on 2 different
"machine"s is to manually edit the URL for run vs run comparison
and fill in the runids of the 2 runs you want to compare.
For example, the following URL is a comparison of green-dragon-07-x86_64-O3-flto
vs green-dragon-06-x86_64-O0-g on the public
llvm.org/perf<http://llvm.org/perf> server:
http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634
I had to manually look up and fill in the run ids 70644 and 70634.
It would be great if there was a better way to be able to do these kind of
comparisons - i.e. not having to manually fill in run ids, but having a webui to
easily find and pick the runs you want to compare.
(As an aside: I find it intriguing that the URL above suggests that there are
quite a few cases where "-O0 -g" produces faster code than "-O3
-flto").

Can you be more explicit which ones? I don't see any regression (other than
compared to the baseline, or on the compile time).

--
Mehdi

D'Oh! I was misinterpreting the compile time differences as execution time
differences. Indeed, there is no unexpected result in there.
Sorry for the noise!

Kristof

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160426/38d1349e/attachment-0001.html>

Kristof Beyls via llvm-dev

2016-May-13 14:53 UTC

head link

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

Hi Chris,

I'm wondering if you managed to figure out exactly what the bottleneck is on
loading those 100 graphs?
Is it:
* The number of bytes that need to be sent over the network between server and
client - with a low ratio of useful info to useless bytes?
* A poor execution plan of the queries involved on the database side?
* Lots of small queries, resulting in latency of network traffic between client
and server being the bottleneck?
* Or there is just a humongous amount of useful data that needs to be sent over
the network?

I think that for each of those different causes of inefficiency, different
techniques can be used to overcome them.
In a previous life, I used to work on a product doing lots of complex SQL
querying from python to huge databases. I've seen lots of techniques to make
big queries on relational databases go fast.
Unfortunately, I think it's not possible to use those techniques with
SQLAlchemy.

If you would happen to have a bit more insight into underlying bottlenecks
causing slowness of getting data from the DB,
I think that would go a long way in being able to make a more informed decision
into whether the right way forward is:
* NoSQL DB
* Dropping SqlAlchemy and using SQL queries directly with a few optimization
techniques not possible without using SQL queries directly.

Thanks,

Kristof

On 25 Apr 2016, at 19:06, Chris Matthews via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

I am really torn about this.

When I implemented the regression tracking stuff recently, it really showed me
how badly we are scaling.  On our production server, the run ingestion can take
well over 100s.  Time is mostly spent in FieldChange generation and regression
grouping. Both have to access a lot of recent samples. This is not the end of
the world, because it runs in a background process.  Where this really sucks is
when a regression has a lot indicators. The web interface renders these in a
graph, and just trying to pull down 100 graphs worth of data kills the server. 
I ended up limiting those to a max of 10 datasets, and even that takes 30s to
load.

So I do think we need some improvements to the scalability.

LNT usage is spread between two groups. Users who setup big servers, with
Postgres and apache/Gunicorn. For those uses I think a NoSQL is the way to go.  
However, our second (and probably more common) user, is the people running
little instance on their own machine to do some local compiler benchmarking. 
Their setup process needs to be dead simple, and I think requiring a NoSQL
database to be setup on their machine first is a no starter.  Like we do with
sqlite, I think we need a transparent fall back for people who don’t have a
NoSQL database.

Would it be helpful to anyone if I got a dump of the
llvm.org<http://llvm.org/> LNT Postgres database?  It is a good dataset
big dataset to test with, and I assume everyone is okay with it being public,
since the LNT server already is.


On Apr 25, 2016, at 4:33 AM, Elena Lepilkina via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:



From: Elena Lepilkina
Sent: Monday, April 25, 2016 2:33 PM
To: 'James Molloy' <james at jamesmolloy.co.uk<mailto:james at
jamesmolloy.co.uk>>; Kristof Beyls <Kristof.Beyls at
arm.com<mailto:Kristof.Beyls at arm.com>>; Mehdi Amini <mehdi.amini
at apple.com<mailto:mehdi.amini at apple.com>>
Cc: nd <nd at arm.com<mailto:nd at arm.com>>; Matthias Braun
<matze at braunis.de<mailto:matze at braunis.de>>
Subject: RE: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and test
parameterization

Hi everyone,

Thank you for your answer. BLOB format adds some more actions for working with
metrics. We know that ComparisonResult class makes analysis work. But it gets
all metrics by request from database, we will need additional time for work with
fields during analysis in ComparisonResult class. May be it will be better to do
one Sample table for each testsuite, as it was suggested before. It should be
more quickly, shouldn’t it? Moreover, next wished LNT changes will need getting
some metrics separately and BLOB format will add some delay in time for queries.

As we see now problem of performance is actual, because time for rendering graph
page is about 3 minutes.
<image001.png>
So maybe it will be better to start working with NoSql databases? I made a small
prototype with TestSuite, TestSuiteFields, Test, Run and Sample tables for
getting time metrics. It works quickly. And using NoSQL helps solve problems
with  different fields for samples metrics fields. Then it will be possible to
store different metrics for different testsuites in one table.
What do you think about this proposal?
I used MongoDB, but I know that there is NoSQL extension for PostgresSQL with
JSONB fields which are more
effective than JSON-encoded BLOB, because it can be included in queries very
simply and let use indexes.

About proposal that not all metrics should be shown. It can be added as a field
in JSON in .fields file, which describes fields getted from test-suite. To see
other metrics user should choose them with checkboxes in view options. Will be
this solution suitable?
We can make as you wrote
“I'd also suggest that if we're adding many more metrics to a test, we
should create a "test sample information" page that the test link goes
to instead of just the graph. This page could contain all counter/metric data,
historic sparklines, the full graph and profiling links.
”
But the render time of this page will be too great because of graph render time.
In my opinion, some users wouldn’t like to wait so long for see some additional
metrics.

Thanks for your suggestions,

Elena.
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of James
Molloy via llvm-dev
Sent: Monday, April 25, 2016 12:43 PM
To: Kristof Beyls <Kristof.Beyls at arm.com<mailto:Kristof.Beyls at
arm.com>>; Mehdi Amini <mehdi.amini at apple.com<mailto:mehdi.amini
at apple.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>; nd <nd at arm.com<mailto:nd at arm.com>>;
Matthias Braun <matze at braunis.de<mailto:matze at braunis.de>>
Subject: Re: [llvm-dev] RFC: LNT/Test-suite support for custom metrics and test
parameterization

Hi Sergey, Elena,

Firstly, thanks for this RFC. It's great to see more people actively using
and modifying LNT and the test metrics support in general is rather weak
currently.

Metrics
-------

I agree with Daniel and Kristof that your proposed schema changes have the
potential to make many queries extremely slow. Certainly for the metrics
enhancements, I don't see a reason why we need such a radical change in
schema.

To add custom metrics on the fly, we need to change the schema for the Sample
table. Currently this consists of a column for each metric, but actually we
never ever query those metric values. We never query for example for "all
failing tests in a run" - when we do analyses we use the ComparisonResult
class which reads *all* samples from the database for a run and performs the
analysis entirely in Python.

Therefore, having a semi-structured format where some fields are first-class
columns and the rest are in a JSON-encoded BLOB (as Daniel suggests) seems
totally acceptable. There is certainly an argument now that we're using the
wrong backend storage solution and that a key-value store might be more
suitable, but that's a very invasive change and I don't think we've
reached the point where we need to force a move from the simplicity of SQLite.

Adding an extra BLOB column would be easy - there would just need to be logic in
testsuitedb.py for reading and writing it - the Sample model class would expose
the JSON-encoded fields as normal python fields so the rest of LNT would be
isolated from this change.

But I think this is a small detail compared to the bigger problem of how to
effectively *display* all this new data. Currently every new metric gets its own
separate table in the report/run views, and this does not scale well at all.

I think we need some more concepts in the metric system to make it scaleable:

  * What "attribute" of the test is this metric measuring? For
example, both "exec_time" and "score" measure the same
attribute; performance of the generated code. It's superfluous to have them
displayed in separate tables. However mem_size and compile_time both measure
completely different aspects of the test.
  * Is this metric useful to display at the top level? or should it only be
exposed when more data about a test result is requested?
    * An example of this is the pass statistics. I don't want my daily
report view cluttered by the time spent in register allocation for every test!
OK, this is useful information when debugging a problem, but it should be
available when requested rather than by default.

An example of why we need the above is your screenshots in your google doc.
I'm looking at the last screenshot, and it's incredibly difficult to
read and get useful information out of.

I'd also suggest that if we're adding many more metrics to a test, we
should create a "test sample information" page that the test link goes
to instead of just the graph. This page could contain all counter/metric data,
historic sparklines, the full graph and profiling links.

Cheers,

James

On Fri, 22 Apr 2016 at 10:17 Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

On 22 Apr 2016, at 11:14, Mehdi Amini <mehdi.amini at
apple.com<mailto:mehdi.amini at apple.com>> wrote:


On Apr 22, 2016, at 12:45 AM, Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:


On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at
gmail.com<mailto:sergey.yakoushkin at gmail.com>> wrote:

Hi Kristof,

       The way we use LNT, we would run different configuration (e.g. -O3 vs
-Os) as different "machines" in LNT's model.

O2/O3 is indeed bad example. We're also using different machines for Os/O3 -
such parameters apply to all tests and we don't propose major changes.
Elena was only extending LNT interface a bit to ease LLVM-testsuite execution
with different compiler or HW flags.

Oh I see, this boils down to extending the lnt runtest interface to be able to
specify a set of configurations, rather than a single configuration and making
sure configurations get submitted under different machine names? We kick off the
different configuration runs through a script invoking lnt runtest multiple
times. I don't see a big problem with extending the lnt runtest interface to
do this, assuming it doesn't break the underlying concepts assumed
throughout
LNT. Maybe the only downside is that this will add even more command line
options to lnt runtest, which already has a lot (too many?) command line
options.

Maybe some changes are required to analyze and compare metrics between
"machines": e.g. code size/performance between Os/O2/O3.
Do you perform such comparisons?

We typically do these kinds of comparisons when we test our patches pre-commit,
i.e. comparing for example '-O3' with '-O3 'mllvm
-enable-my-new-pass'.
To stick with the LNT concepts, tests enabling new passes are stored as a
different "machine".
The only way I know to be able to do a comparison between runs on 2 different
"machine"s is to manually edit the URL for run vs run comparison
and fill in the runids of the 2 runs you want to compare.
For example, the following URL is a comparison of green-dragon-07-x86_64-O3-flto
vs green-dragon-06-x86_64-O0-g on the public
llvm.org/perf<http://llvm.org/perf> server:
http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634
I had to manually look up and fill in the run ids 70644 and 70634.
It would be great if there was a better way to be able to do these kind of
comparisons - i.e. not having to manually fill in run ids, but having a webui to
easily find and pick the runs you want to compare.
(As an aside: I find it intriguing that the URL above suggests that there are
quite a few cases where "-O0 -g" produces faster code than "-O3
-flto").

Can you be more explicit which ones? I don't see any regression (other than
compared to the baseline, or on the compile time).

--
Mehdi

D'Oh! I was misinterpreting the compile time differences as execution time
differences. Indeed, there is no unexpected result in there.
Sorry for the noise!

Kristof

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160513/bed1b7af/attachment-0001.html>

llvm dev - Apr 2016 - RFC: LNT/Test-suite support for custom metrics and test parameterization

[llvm-dev] FW: RFC: LNT/Test-suite support for custom metrics and test parameterization

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization