thr3ads.net - llvm dev - [llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Kristof Beyls via llvm-dev

2016-Apr-22 07:45 UTC

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at
gmail.com<mailto:sergey.yakoushkin at gmail.com>> wrote:

Hi Kristof,

The way we use LNT, we would run different configuration (e.g. -O3 vs
-Os) as different "machines" in LNT's model.

O2/O3 is indeed bad example. We're also using different machines for Os/O3 -
such parameters apply to all tests and we don't propose major changes.
Elena was only extending LNT interface a bit to ease LLVM-testsuite execution
with different compiler or HW flags.

Oh I see, this boils down to extending the lnt runtest interface to be able to
specify a set of configurations, rather than a single configuration and making
sure configurations get submitted under different machine names? We kick off the
different configuration runs through a script invoking lnt runtest multiple
times. I don't see a big problem with extending the lnt runtest interface to
do this, assuming it doesn't break the underlying concepts assumed
throughout
LNT. Maybe the only downside is that this will add even more command line
options to lnt runtest, which already has a lot (too many?) command line
options.

Maybe some changes are required to analyze and compare metrics between
"machines": e.g. code size/performance between Os/O2/O3.
Do you perform such comparisons?

We typically do these kinds of comparisons when we test our patches pre-commit,
i.e. comparing for example '-O3' with '-O3 'mllvm
-enable-my-new-pass'.
To stick with the LNT concepts, tests enabling new passes are stored as a
different "machine".
The only way I know to be able to do a comparison between runs on 2 different
"machine"s is to manually edit the URL for run vs run comparison
and fill in the runids of the 2 runs you want to compare.
For example, the following URL is a comparison of green-dragon-07-x86_64-O3-flto
vs green-dragon-06-x86_64-O0-g on the public
llvm.org/perf<http://llvm.org/perf> server:
http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634
I had to manually look up and fill in the run ids 70644 and 70634.
It would be great if there was a better way to be able to do these kind of
comparisons - i.e. not having to manually fill in run ids, but having a webui to
easily find and pick the runs you want to compare.
(As an aside: I find it intriguing that the URL above suggests that there are
quite a few cases where "-O0 -g" produces faster code than "-O3
-flto").

"test parameters" are different, they allow exploring multiple
variants of the same test case. E.g. can be:
* index of input data sets, length of input vector, size of matrix, etc;
* macro that affect source code such as changing 1) static data allocation to
dynamic or 2) constants to variables (compile-time unknown)
* extra sets of internal compilation options that are relevant only for
particular test case

Same parameters can apply to multiple tests with different value sets:

test1: param1={v1,v2,v3}
test2: param1={v2,v4}
test3:

Of course, original test cases can be duplicated (copied under different names)
- that is enough to execute tests.
Explicit "test parameters" allow exploring dependencies between test
parameters and metrics.

Right. In the new cmake+lit way of driving the test-suite, some of these test
parameters are input to cmake (like macros) and others will be input to lit
(like changing inputs), I think.
We see this also in e.g. running SPEC with ref vs train vs test data sets. TBH,
I'm not quite sure how to best drive this. I guess Mathhias may have better
ideas than me here.
I do think that to comply with LNT's current conceptual model, tests being
run with different parameters will have to have different test names in the LNT
view.

Thanks,

Kristof

On Thu, Apr 21, 2016 at 4:36 PM, Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

On 21 Apr 2016, at 15:00, Elena Lepilkina <Elena.Lepilkina at
synopsys.com<mailto:Elena.Lepilkina at synopsys.com>> wrote:

Hi Kristof and Daniel,

Thanks for your answers.

Unfortunately I haven’t tried scaling up to a large data set before. Today I
tried and results are quite bad.
So database scheme should be rebuild. Now I thought about creating one sample
table for each test-suite, but not cloning all tables in packet. As I see other
tables can be same for all testsuites. I mean if user run tests of new
testsuite, new sample table would be created during importing data from json, if
it doesn’t exist. Are there some problems with this solution? May be, I don’t
know some details.

It's unfortunate to see performance doesn't scale with the proposed
initial schema, but not entirely surprising. I don't really have much
feedback on how the schema could be adapted otherwise as I haven't worked
much on that. I hope Daniel will have more insights to share here.

Moreover, I have question about compile tests. Are compile tests runnable? In
http://llvm.org/perf there is no compile test. Does that mean that they are
deprecated for now?

About test parameters, for example, we would like to have opportunity to compare
benchmark results of test compiled with -O3 and -Os in context of one run.

The way we use LNT, we would run different configuration (e.g. -O3 vs -Os) as
different "machines" in LNT's model. This is also explained in
LNT's documentation, see
https://github.com/llvm-mirror/lnt/blob/master/docs/concepts.rst. Unfortunately,
this version of the documentation hasn't found it's way yet to
http://llvm.org/docs/lnt/contents.html.
Is there a reason why storing different configurations as different
"machines" in the LNT model doesn't work for you?
I assume that there are a number of places in LNT's analyses that assume
that different runs coming from the same "machine" are always produced
by the same configuration. But I'm not entirely sure about that.

Thanks,

Kristof

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160422/73c27cae/attachment-0001.html>

Mehdi Amini via llvm-dev

2016-Apr-22 09:14 UTC

head link

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

> On Apr 22, 2016, at 12:45 AM, Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at
gmail.com <mailto:sergey.yakoushkin at gmail.com>> wrote:
>> 
>> Hi Kristof,
>> 
>>        The way we use LNT, we would run different configuration (e.g.
-O3 vs -Os) as different "machines" in LNT's model.
>> 
>> O2/O3 is indeed bad example. We're also using different machines
for Os/O3 - such parameters apply to all tests and we don't propose major
changes.
>> Elena was only extending LNT interface a bit to ease LLVM-testsuite
execution with different compiler or HW flags.
> 
> Oh I see, this boils down to extending the lnt runtest interface to be able
to specify a set of configurations, rather than a single configuration and
making
> sure configurations get submitted under different machine names? We kick
off the different configuration runs through a script invoking lnt runtest
multiple
> times. I don't see a big problem with extending the lnt runtest
interface to do this, assuming it doesn't break the underlying concepts
assumed throughout
> LNT. Maybe the only downside is that this will add even more command line
options to lnt runtest, which already has a lot (too many?) command line
> options.
> 
>> Maybe some changes are required to analyze and compare metrics between
"machines": e.g. code size/performance between Os/O2/O3.
>> Do you perform such comparisons?
> 
> We typically do these kinds of comparisons when we test our patches
pre-commit, i.e. comparing for example '-O3' with '-O3 'mllvm
-enable-my-new-pass'.
> To stick with the LNT concepts, tests enabling new passes are stored as a
different "machine".
> The only way I know to be able to do a comparison between runs on 2
different "machine"s is to manually edit the URL for run vs run
comparison
> and fill in the runids of the 2 runs you want to compare.
> For example, the following URL is a comparison of
green-dragon-07-x86_64-O3-flto vs green-dragon-06-x86_64-O0-g on the public
llvm.org/perf <http://llvm.org/perf> server:
> http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634
<http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634>
> I had to manually look up and fill in the run ids 70644 and 70634.
> It would be great if there was a better way to be able to do these kind of
comparisons - i.e. not having to manually fill in run ids, but having a webui to
easily find and pick the runs you want to compare.
> (As an aside: I find it intriguing that the URL above suggests that there
are quite a few cases where "-O0 -g" produces faster code than
"-O3 -flto").
Can you be more explicit which ones? I don't see any regression (other than
compared to the baseline, or on the compile time).

-- 
Mehdi


> 
> 
>> 
>> 
>> "test parameters" are different, they allow exploring
multiple variants of the same test case. E.g. can be:
>> * index of input data sets, length of input vector, size of matrix,
etc;
>> * macro that affect source code such as changing 1) static data
allocation to dynamic or 2) constants to variables (compile-time unknown)
>> * extra sets of internal compilation options that are relevant only for
particular test case
>> 
>> Same parameters can apply to multiple tests with different value sets:
>> 
>> test1: param1={v1,v2,v3}
>> test2: param1={v2,v4}
>> test3:
>> 
>> Of course, original test cases can be duplicated (copied under
different names) - that is enough to execute tests.
>> Explicit "test parameters" allow exploring dependencies
between test parameters and metrics.
> 
> Right. In the new cmake+lit way of driving the test-suite, some of these
test parameters are input to cmake (like macros) and others will be input to lit
(like changing inputs), I think.
> We see this also in e.g. running SPEC with ref vs train vs test data sets.
TBH, I'm not quite sure how to best drive this. I guess Mathhias may have
better ideas than me here.
> I do think that to comply with LNT's current conceptual model, tests
being run with different parameters will have to have different test names in
the LNT view.
> 
> Thanks,
> 
> Kristof
> 
>> 
>> 
>> On Thu, Apr 21, 2016 at 4:36 PM, Kristof Beyls via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>> 
>>> On 21 Apr 2016, at 15:00, Elena Lepilkina <Elena.Lepilkina at
synopsys.com <mailto:Elena.Lepilkina at synopsys.com>> wrote:
>>> 
>>> Hi Kristof and Daniel,
>>>  
>>> Thanks for your answers.
>>>  
>>> Unfortunately I haven’t tried scaling up to a large data set
before. Today I tried and results are quite bad.
>>> So database scheme should be rebuild. Now I thought about creating
one sample table for each test-suite, but not cloning all tables in packet. As I
see other tables can be same for all testsuites. I mean if user run tests of new
testsuite, new sample table would be created during importing data from json, if
it doesn’t exist. Are there some problems with this solution? May be, I don’t
know some details.
>> 
>> It's unfortunate to see performance doesn't scale with the
proposed initial schema, but not entirely surprising. I don't really have
much feedback on how the schema could be adapted otherwise as I haven't
worked much on that. I hope Daniel will have more insights to share here.
>> 
>>> Moreover, I have question about compile tests. Are compile tests
runnable? In http://llvm.org/perf <http://llvm.org/perf> there is no
compile test. Does that mean that they are deprecated for now?
>>>  
>>> About test parameters, for example, we would like to have
opportunity to compare benchmark results of test compiled with -O3 and -Os in
context of one run.
>> 
>> The way we use LNT, we would run different configuration (e.g. -O3 vs
-Os) as different "machines" in LNT's model. This is also
explained in LNT's documentation, see
>> https://github.com/llvm-mirror/lnt/blob/master/docs/concepts.rst
<https://github.com/llvm-mirror/lnt/blob/master/docs/concepts.rst>.
Unfortunately, this version of the documentation hasn't found it's way
yet to http://llvm.org/docs/lnt/contents.html
<http://llvm.org/docs/lnt/contents.html>.
>> Is there a reason why storing different configurations as different
"machines" in the LNT model doesn't work for you?
>> I assume that there are a number of places in LNT's analyses that
assume that different runs coming from the same "machine" are always
produced by the same configuration. But I'm not entirely sure about that.
>> 
>> Thanks,
>> 
>> Kristof
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160422/58af4602/attachment.html>

Kristof Beyls via llvm-dev

2016-Apr-22 09:17 UTC

head link

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

On 22 Apr 2016, at 11:14, Mehdi Amini <mehdi.amini at
apple.com<mailto:mehdi.amini at apple.com>> wrote:


On Apr 22, 2016, at 12:45 AM, Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:


On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at
gmail.com<mailto:sergey.yakoushkin at gmail.com>> wrote:

Hi Kristof,

       The way we use LNT, we would run different configuration (e.g. -O3 vs
-Os) as different "machines" in LNT's model.

O2/O3 is indeed bad example. We're also using different machines for Os/O3 -
such parameters apply to all tests and we don't propose major changes.
Elena was only extending LNT interface a bit to ease LLVM-testsuite execution
with different compiler or HW flags.

Oh I see, this boils down to extending the lnt runtest interface to be able to
specify a set of configurations, rather than a single configuration and making
sure configurations get submitted under different machine names? We kick off the
different configuration runs through a script invoking lnt runtest multiple
times. I don't see a big problem with extending the lnt runtest interface to
do this, assuming it doesn't break the underlying concepts assumed
throughout
LNT. Maybe the only downside is that this will add even more command line
options to lnt runtest, which already has a lot (too many?) command line
options.

Maybe some changes are required to analyze and compare metrics between
"machines": e.g. code size/performance between Os/O2/O3.
Do you perform such comparisons?

We typically do these kinds of comparisons when we test our patches pre-commit,
i.e. comparing for example '-O3' with '-O3 'mllvm
-enable-my-new-pass'.
To stick with the LNT concepts, tests enabling new passes are stored as a
different "machine".
The only way I know to be able to do a comparison between runs on 2 different
"machine"s is to manually edit the URL for run vs run comparison
and fill in the runids of the 2 runs you want to compare.
For example, the following URL is a comparison of green-dragon-07-x86_64-O3-flto
vs green-dragon-06-x86_64-O0-g on the public
llvm.org/perf<http://llvm.org/perf> server:
http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634
I had to manually look up and fill in the run ids 70644 and 70634.
It would be great if there was a better way to be able to do these kind of
comparisons - i.e. not having to manually fill in run ids, but having a webui to
easily find and pick the runs you want to compare.
(As an aside: I find it intriguing that the URL above suggests that there are
quite a few cases where "-O0 -g" produces faster code than "-O3
-flto").

Can you be more explicit which ones? I don't see any regression (other than
compared to the baseline, or on the compile time).

--
Mehdi

D'Oh! I was misinterpreting the compile time differences as execution time
differences. Indeed, there is no unexpected result in there.
Sorry for the noise!

Kristof

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160422/350248e0/attachment.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Apr 2016 - RFC: LNT/Test-suite support for custom metrics and test parameterization

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

Seemingly Similar Threads