thr3ads.net - llvm dev - [LLVMdev] Proposal: Improvements to Performance Tracking Infrastructure. [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Arnaud Allard de Grandmaison

2013-Nov-13 13:42 UTC

[LLVMdev] Proposal: Improvements to Performance Tracking Infrastructure.

Great summary Kristof !

I do not know how frequent is the addition of a new benchmark, but this
would disrupt the compile time measurement. On the other hand, we just want
to see a (hopefully negative) slope and ignore steps due to new benchmark
being added.

Cheers,
--
Arnaud



On Wed, Nov 13, 2013 at 2:14 PM, Kristof Beyls <kristof.beyls at
arm.com>wrote:
> Hi,
>
>
>
> This is a summary of what was discussed at the Performance Tracking and
>
> Benchmarking Infrastructure BoF session last week at the LLVM dev meeting.
>
>
>
> At the same time it contains a proposal on a few next steps to improve the
>
> setup and use of buildbots to track performance changes in code generated
> by
>
> LLVM.
>
>
>
> The buildbots currently are very valuable in detecting correctness
> regressions,
>
> and getting the community to quickly rectify those regressions. However,
>
> performance regressions are hardly noted and it seems as a community, we
> don't
>
> really keep track of those well.
>
>
>
> The goal for the BoF was to try and find a number of actions that could
> take us
>
> closer to the point where as a community, we would at least notice some of
> the
>
> performance regressions and take action to fix the regressions.  Given that
>
> this has been discussed already quite a few times at previous BoF sessions
> at
>
> multiple developer meetings, we thought we should aim for a small,
> incremental,
>
> but sure improvement over the current status. Ideally, we should initially
> aim
>
> for getting to the point where at least some of the performance
> regressions are
>
> detected and acted upon.
>
>
>
> We already have a central database that stores benchmarking numbers,
> produced
>
> for 2 boards, see
>
> http://llvm.org/perf/db_default/v4/nts/recent_activity#machines.
> However, it
>
> seems no-one monitors the produced results, nor is it easy to derive from
> those
>
> numbers if a particular patch really introduced a significant regression.
>
>
>
> At the BoF, we identified the following issues blocking us from being able
> to
>
> detect significant regressions more easily:
>
> * A lot of the Execution Time and Compile Time results are very noisy,
> because
>
>   the individual programs don't run long enough and don't take long
enough
> to
>
>   compile (e.g. less than 0.1 seconds).
>
>   * The proposed actions to improve the execution time is, for the programs
>
>     under the Benchmarks sub-directories in the test-suite, to:
>
>     a) increase the run time of the benchmark so it runs long enough to
> avoid
>
>        noisy results. "Long enough" probably means roughly 10
seconds. We'd
>
>        probably need a number of different settings, or a parameter that
> can
>
>        be set per program, so that the running time on individual boards
> can
>
>        be tuned. E.g. on a faster board, more iterations would be run than
> on
>
>        a slower board.
>
>     b) evaluate if the main running time of the benchmark is caused by
> running
>
>        code compiled or by something else, e.g. file IO. Programs
> dominated by
>
>        file IO shouldn't be used to track performance changes over
time.
>
>        The proposal to resolve this is to create a way to run the test
> suite in
>
>        'benchmark mode', which includes only a subset of the test
suite
> useful
>
>        for benchmarking.  Hal Finkel volunteered to start this work.
>
>   * The identified action to improve the compile time measurements is to
> just
>
>     add up the compilation time for all benchmarks and measure that,
> instead
>
>     of the compile times of the individual benchmarks.
>
>     It seems this could be implemented by simply changing or adding a view
>
>     in the web interface, showing the trend of the compilation time for all
>
>     benchmarks over time, rather than trend graphs for individual programs.
>
>   * Furthermore, on each individual board, the noise introduced by the
> board
>
>     itself should be minimized. Each board should have a maintainer, who
> ensures
>
>     the board doesn't produce a significant level of noise.
>
>     If the board starts producing a high level of noise, and the maintainer
>
>     doesn't fix it quickly, the performance numbers coming from the
board
> will
>
>     be ignored. It's not clear what the best way would be to mark a
board
> as
>
>     being ignored.
>
>     The suggestion was made that board maintainers could get a script to
> run
>
>     before each benchmarking run, to check whether the board seems in a
>
>     reasonable state, e.g. by checking the load on the board is near zero;
> "dd"
>
>     executes as fast as expected; .... It's expected that the checks in
the
>
>     script might be somewhat dependent on the operating system the board
>
>     runs.
>
>   * To reduce the noise levels further, it would be nice if the execution
> time
>
>     of individual benchmarks could be averaged out over a number (e.g. 5)
>
>     consecutive runs. That way, each individual benchmark run remains
>
>     relatively fast, by having to run each program just once; while at the
> same
>
>     time the averaging should reduce some of the insignificant noise in the
>
>     individual runs.
>
>
>
> I'd appreciate any feedback on the above proposals. We're also
looking for
> more
>
> volunteers to implement the above improvements; so if you're interested
in
>
> working on some of the above, please let us know.
>
>
>
> Thanks,
>
>
>
> Kristof
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131113/1f3fb2f1/attachment.html>

Kristof Beyls

2013-Nov-13 16:22 UTC

head link

[LLVMdev] Proposal: Improvements to Performance Tracking Infrastructure.

Hi Arnaud,

 

Yes, adding or removing benchmarks will result in not comparing
like-for-like between different benchmark runs.
My expectation is that we'll need to find some solution for this, not just
for aggregated compile time measurements,
but also for the execution time measurements. We can't assume the benchmark
sources will remain bit-identical
for a long time; as we encourage improving (i.e. changing) the benchmark
sources.

Not allowing changes to benchmark sources would result in a slower rate of
improving the benchmark sources.

 

Kristof

 

From: Arnaud Allard de Grandmaison [mailto:arnaud.adegm at gmail.com] 
Sent: 13 November 2013 13:43
To: Kristof Beyls
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Proposal: Improvements to Performance Tracking
Infrastructure.

 

Great summary Kristof !

I do not know how frequent is the addition of a new benchmark, but this
would disrupt the compile time measurement. On the other hand, we just want
to see a (hopefully negative) slope and ignore steps due to new benchmark
being added.

Cheers,
--

Arnaud

 

 

On Wed, Nov 13, 2013 at 2:14 PM, Kristof Beyls <kristof.beyls at arm.com>
wrote:

Hi,

 

This is a summary of what was discussed at the Performance Tracking and

Benchmarking Infrastructure BoF session last week at the LLVM dev meeting.

 

At the same time it contains a proposal on a few next steps to improve the

setup and use of buildbots to track performance changes in code generated by

LLVM.

 

The buildbots currently are very valuable in detecting correctness
regressions,

and getting the community to quickly rectify those regressions. However,

performance regressions are hardly noted and it seems as a community, we
don't

really keep track of those well.

 

The goal for the BoF was to try and find a number of actions that could take
us

closer to the point where as a community, we would at least notice some of
the

performance regressions and take action to fix the regressions.  Given that

this has been discussed already quite a few times at previous BoF sessions
at

multiple developer meetings, we thought we should aim for a small,
incremental,

but sure improvement over the current status. Ideally, we should initially
aim

for getting to the point where at least some of the performance regressions
are

detected and acted upon.

 

We already have a central database that stores benchmarking numbers,
produced

for 2 boards, see

http://llvm.org/perf/db_default/v4/nts/recent_activity#machines.  However,
it

seems no-one monitors the produced results, nor is it easy to derive from
those

numbers if a particular patch really introduced a significant regression.

 

At the BoF, we identified the following issues blocking us from being able
to

detect significant regressions more easily:

* A lot of the Execution Time and Compile Time results are very noisy,
because

  the individual programs don't run long enough and don't take long
enough
to

  compile (e.g. less than 0.1 seconds).

  * The proposed actions to improve the execution time is, for the programs

    under the Benchmarks sub-directories in the test-suite, to:

    a) increase the run time of the benchmark so it runs long enough to
avoid

       noisy results. "Long enough" probably means roughly 10 seconds.
We'd

       probably need a number of different settings, or a parameter that can

       be set per program, so that the running time on individual boards can

       be tuned. E.g. on a faster board, more iterations would be run than
on

       a slower board.

    b) evaluate if the main running time of the benchmark is caused by
running

       code compiled or by something else, e.g. file IO. Programs dominated
by

       file IO shouldn't be used to track performance changes over time.

       The proposal to resolve this is to create a way to run the test suite
in

       'benchmark mode', which includes only a subset of the test suite
useful

       for benchmarking.  Hal Finkel volunteered to start this work.

  * The identified action to improve the compile time measurements is to
just

    add up the compilation time for all benchmarks and measure that, instead

    of the compile times of the individual benchmarks.

    It seems this could be implemented by simply changing or adding a view

    in the web interface, showing the trend of the compilation time for all

    benchmarks over time, rather than trend graphs for individual programs.

  * Furthermore, on each individual board, the noise introduced by the board

    itself should be minimized. Each board should have a maintainer, who
ensures

    the board doesn't produce a significant level of noise.

    If the board starts producing a high level of noise, and the maintainer

    doesn't fix it quickly, the performance numbers coming from the board
will

    be ignored. It's not clear what the best way would be to mark a board as

    being ignored.

    The suggestion was made that board maintainers could get a script to run

    before each benchmarking run, to check whether the board seems in a

    reasonable state, e.g. by checking the load on the board is near zero;
"dd"

    executes as fast as expected; .... It's expected that the checks in the

    script might be somewhat dependent on the operating system the board

    runs.

  * To reduce the noise levels further, it would be nice if the execution
time

    of individual benchmarks could be averaged out over a number (e.g. 5)

    consecutive runs. That way, each individual benchmark run remains

    relatively fast, by having to run each program just once; while at the
same

    time the averaging should reduce some of the insignificant noise in the

    individual runs.

 

I'd appreciate any feedback on the above proposals. We're also looking
for
more

volunteers to implement the above improvements; so if you're interested in

working on some of the above, please let us know.

 

Thanks,

 

Kristof

    


_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131113/c3cd07e7/attachment.html>

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Nov 2013 - [LLVMdev] Proposal: Improvements to Performance Tracking Infrastructure.

[LLVMdev] Proposal: Improvements to Performance Tracking Infrastructure.

[LLVMdev] Proposal: Improvements to Performance Tracking Infrastructure.

Maybe Matching Threads