thr3ads.net - llvm dev - [LLVMdev] Dev Meeting BOF: Performance Tracking [Aug 2014]

If this information is useful, please help other people find it:
Share via:

Chad Rosier

2014-Aug-01 23:04 UTC

[LLVMdev] Dev Meeting BOF: Performance Tracking

All,
I'm curious to know if anyone is interested in tracking performance
(compile-time and/or execution-time) from a community perspective?  This
is a much loftier goal then just supporting build bots.  If so, I'd be
happy to propose a BOF at the upcoming Dev Meeting.

 Chad

Chris Matthews

2014-Aug-01 23:39 UTC

head link

[LLVMdev] Dev Meeting BOF: Performance Tracking

That is a great idea! :)
> On Aug 1, 2014, at 4:04 PM, Chad Rosier <mcrosier at codeaurora.org>
wrote:
> 
> All,
> I'm curious to know if anyone is interested in tracking performance
> (compile-time and/or execution-time) from a community perspective?  This
> is a much loftier goal then just supporting build bots.  If so, I'd be
> happy to propose a BOF at the upcoming Dev Meeting.
> 
> Chad
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Renato Golin

2014-Aug-01 23:40 UTC

head link

[LLVMdev] Dev Meeting BOF: Performance Tracking

On 2 August 2014 00:04, Chad Rosier <mcrosier at codeaurora.org>
wrote:> I'm curious to know if anyone is interested in tracking performance
> (compile-time and/or execution-time) from a community perspective?  This
> is a much loftier goal then just supporting build bots.  If so, I'd be
> happy to propose a BOF at the upcoming Dev Meeting.
Hi Chad,

I'm not sure I'll be at the US dev meeting this year, but we had a
performance BoF last year and I think we should have another, at least
to check progress (that has been made) and to plan ahead. I'm sure
Kristof, Tobias and others will be very glad to see it, too.

If memory serves me well (it doesn't), these are the list of things we
agreed on making, and their progress:

1. Performance-specific test-suite: a group of specific benchmarks
that should be tracked with the LNT infrastructure. Hal proposed to
look at this, but other people helped implement it. Last I heard there
was some way of running it but I'm not sure how to do it. I'd love to
have this as a buildbot, though, so we can track its progress.

2. Statistical analysis of the LNT data. A lot of work has been put
into this and I believe it's a lot better. Anton, Yi and others have
been discussing and submitting many patches to make the LNT reporting
infrastructure more stable, less prone to noise and more useful all
round. It's not perfect yet, but a lot better than last year's.

Some other things happened since then that are also worth mentioning...

3. LNT website got really unstable (Internal Server Error every other
day). This is the reason I stopped submitting results to it, since it
would make my bot fail. And because I still don't have a performance
test-suite bot, I don't really care much for the results. But with the
noise reduction, it'd be really interesting to monitor the progress,
even of the full test-suite, but right now, I can't afford to have
random failures. This seriously needs looking into and would be good
to have that as a topic in this BoF.

4. Big Endian results got in, and the infrastructure now is able to
have both "golden standard" results. That's done and working
(AFAIK).

5. Renovation of the test/benchmarks. The tests and benchmarks in the
test-suite are getting really old. One good example is the ClamAV
anti-virus, that is not just old, but the results are bogus and
cooked, which doesn't really tell much signal from noise. Other
benchmarks have such short run-time that it's almost pointless. One
needs to go through the things we test/benchmark and make sure they're
valid and meaningful. This is probably similar, but more extensive,
than item 1.

About non-test-suite benchmarking...

I have been running some closed source benchmarks, but since we can't
share any data on it, getting historical relative results is almost
pointless. I don't think we should be worried as a community to run
keep open scores about them. Also, since almost every one is running
them behind closed doors, and fixing the bugs with reduced cases, I
think that's the best deal we can get.

I've also tried a few other benchmarks, like running ImageMagick
libraries, or Phoronix, and I have to say, they're not really that
great at spotting regressions. ImageMagick will take a lot of work to
make it into a meaningful benchmark, and Phoronix is not really ready
to be a compiler benchmark (it only compiles once, with the system
compiler, so you have to heavily hack the scripts). If you're up to
it, maybe you could hack those into a nice package, but it won't be
easy.

I know people have done it internally, like I did, but none of these
scripts are ready to be left out in the open, since they're either
very ugly (like mine) or contain private information...

Hope that helps...

cheers,
--renato

Yi Kong

2014-Aug-02 00:24 UTC

head link

[LLVMdev] Dev Meeting BOF: Performance Tracking

On 2 August 2014 00:40, Renato Golin <renato.golin at linaro.org>
wrote:> If memory serves me well (it doesn't), these are the list of things we
> agreed on making, and their progress:
>
> 1. Performance-specific test-suite: a group of specific benchmarks
> that should be tracked with the LNT infrastructure. Hal proposed to
> look at this, but other people helped implement it. Last I heard there
> was some way of running it but I'm not sure how to do it. I'd love
to
> have this as a buildbot, though, so we can track its progress.
We have this in LNT now which can be activated using `--benchmarking-only`.
It's about 50% faster than a full run and massively reduces the number of
false positives.

Chris has also posted a patch to rerun tests which the server said changed.
Haven't tried it yet but it looks really promising.
> 2. Statistical analysis of the LNT data. A lot of work has been put
> into this and I believe it's a lot better. Anton, Yi and others have
> been discussing and submitting many patches to make the LNT reporting
> infrastructure more stable, less prone to noise and more useful all
> round. It's not perfect yet, but a lot better than last year's.
There's definitely lots of room for improvement. I'm going to propose
some
more once we've solved the LNT stability issues.
> 3. LNT website got really unstable (Internal Server Error every other
> day). This is the reason I stopped submitting results to it, since it
> would make my bot fail. And because I still don't have a performance
> test-suite bot, I don't really care much for the results. But with the
> noise reduction, it'd be really interesting to monitor the progress,
> even of the full test-suite, but right now, I can't afford to have
> random failures. This seriously needs looking into and would be good
> to have that as a topic in this BoF.
We are now testing PostgreSQL as database backend on the public perf
server, replacing the SQLite db. Hopefully this can improve the stability
and system performance.

Also being discussing is to move the LNT server to a PaaS service, as it
has higher availability and saves a lot of maintenance work. However this
will need community to provide or fund the hosting service.

-Yi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140802/e9716261/attachment.html>

Kristof Beyls

2014-Aug-04 09:28 UTC

head link

[LLVMdev] Dev Meeting BOF: Performance Tracking

Hi Chad,

I'm definitely interested and would have proposed such a BOF myself if
you wouldn't have beaten me to it :)

I think the BOF on the same topic last year was very productive in
identifying the most needed changes to enable tracking performance
from a community perspective. I think that by now most of the
suggestions made at that BOF have been implemented, and as the rest
of the thread shows, we'll hopefully soon have a few more performance
tracking bots that produce useful (i.e. low-noise) data.

I think it'll definitely be worthwhile to have a similar BOF this year.

Thanks,

Kristof
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of Chad Rosier
> Sent: 02 August 2014 00:04
> To: llvmdev at cs.uiuc.edu
> Subject: [LLVMdev] Dev Meeting BOF: Performance Tracking
> 
> All,
> I'm curious to know if anyone is interested in tracking performance
> (compile-time and/or execution-time) from a community perspective?  This
> is a much loftier goal then just supporting build bots.  If so, I'd be
> happy to propose a BOF at the upcoming Dev Meeting.
> 
>  Chad
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Tobias Grosser

2014-Aug-04 11:02 UTC

head link

[LLVMdev] Dev Meeting BOF: Performance Tracking

On 04/08/2014 11:28, Kristof Beyls wrote:> Hi Chad,
>
> I'm definitely interested and would have proposed such a BOF myself if
> you wouldn't have beaten me to it :)
>
> I think the BOF on the same topic last year was very productive in
> identifying the most needed changes to enable tracking performance
> from a community perspective. I think that by now most of the
> suggestions made at that BOF have been implemented, and as the rest
> of the thread shows, we'll hopefully soon have a few more performance
> tracking bots that produce useful (i.e. low-noise) data.
>
> I think it'll definitely be worthwhile to have a similar BOF this year.
There is little for me to add, except that I would also be interested in 
such a BoF.

Cheers,
Tobias

Chad Rosier

2014-Aug-04 14:38 UTC

head link

[LLVMdev] Dev Meeting BOF: Performance Tracking

Kristof,
> Hi Chad,
>
> I'm definitely interested and would have proposed such a BOF myself if
> you wouldn't have beaten me to it :)
Given you have much more context than I, I would be very happy to work
together on this BOF.  :)
> I think the BOF on the same topic last year was very productive in
> identifying the most needed changes to enable tracking performance
> from a community perspective. I think that by now most of the
> suggestions made at that BOF have been implemented, and as the rest
> of the thread shows, we'll hopefully soon have a few more performance
> tracking bots that produce useful (i.e. low-noise) data.
I'll grep through the dev/commits list to get up to speed.

For everyone's reference here are the notes from last year:
http://llvm.org/devmtg/2013-11/slides/BenchmarkBOFNotes.html

Kristof, feel free to comment further on these, if you feel so inclined.
> I think it'll definitely be worthwhile to have a similar BOF this year.
I'll start working on some notes.

 Chad
>
> Thanks,
>
> Kristof
>
>> -----Original Message-----
>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
>> On Behalf Of Chad Rosier
>> Sent: 02 August 2014 00:04
>> To: llvmdev at cs.uiuc.edu
>> Subject: [LLVMdev] Dev Meeting BOF: Performance Tracking
>>
>> All,
>> I'm curious to know if anyone is interested in tracking performance
>> (compile-time and/or execution-time) from a community perspective? 
This
>> is a much loftier goal then just supporting build bots.  If so, I'd
be
>> happy to propose a BOF at the upcoming Dev Meeting.
>>
>>  Chad
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Chad Rosier

2014-Aug-05 14:41 UTC

head link

[LLVMdev] Dev Meeting BOF: Performance Tracking

Kristof,
Unfortunately, our merge process is less than ideal.  It has vastly improved
over the past few months (years I hear), but we still have times where we
bring
in days/weeks worth of commits en mass.  To that end, I've setup a nightly
performance run against the community branch, but it's still an overwhelming
amount of work to track/report/bisect regressions.  As you guessed, this is
what motivated my initial email.
> On 5 August 2014 10:30, Kristof Beyls <Kristof.Beyls at arm.com>
wrote:
>> The biggest problem that we were trying to solve this year was to
produce
>> data without too much noise. I think with Renato hopefully setting up
>> a chromebook (Cortex-A15) soon there will finally be an ARM
architecture
>> board producing useful data and pushing it into the central database.
>
> I haven't got around finishing that work (at least not reporting to
> Perf anyway) because of the instability issues.
>
> I think getting Perf stable is priority 0 right now in the LLVM
> benchmarking field.
I agree 110%; we don't want the bots crying wolf.  Otherwise, real issues
will
fall on deaf ears.
>> I think this should be the main topic of the BoF this year: now that we
>> can produce useful data; what do we do with the data to actually
improve
>> LLVM?
>
> With the benchmark LNT reporting meaningful results and warning users
> of spikes, I think we have at least the base covered.
I haven't used LNT in well over a year, but I recall Daniel Dunbar and I
having
many discussion on how LNT could be improved.  (Forgive me if any of my
suggestions have already been address. I'm playing catch up at the moment.)
> Further improvements I can think of would be to:
>
> * Allow Perf/LNT to fix a set of "golden standards" based on past
releases
> * Mark the levels of those standards on every graph as coloured horizontal
> lines
> * Add warning systems when the current values deviate from any past
> golden standard
I agree.  IIRC, there's functionality to set a baseline run to compare
against.
Unfortunately, I think this is too coarse.  It would be great if the golden
standard could be set on a per benchmark basis.  Thus, upward trending
benchmarks can have their standard updated while other benchmarks remain
static.
> * Allow Perf/LNT to report on differences between two distinct bots
> * Create GCC buildbots with the same configurations/architectures and
> compare them to LLVM's
> * Mark golden standards for GCC releases, too, as a visual aid (no
> warnings)
>
> * Implement trend detection (gradual decrease of performance) and
> historical comparisons (against older releases)
> * Implement warning systems to the admin (not users) for such trends
Would it be useful to detect upwards trends as well?  Per my comment above,
it would be great to update the golden standard so we're always moving in
the
right direction.
> * Improve spike detection to wait one or two more builds to make sure
> the spike was an actual regression, but then email the original blame
> list, not the current builds' one.
I recall Daniel and I discussing this issue.  IIRC, we considered an
eager approach where the current build would rerun the benchmark to
verify the spikes.  However, I like the lazy detection approach you're
suggesting.  This avoids long running builds when there are real
regressions.
> * Implement this feature on all warnings (previous runs, golden
> standards, GCC comparisons)
>
> * Renovate the list of tests and benchmarks, extending their run times
> dynamically instead of running them multiple times, getting the times
> for the core functionality instead of whole-program timing, etc.
Could we create a minimal test-suite that includes only benchmarks that
are known to have little variance and run times greater than some decided
upon threshold?  With that in place we could begin the performance
tracking (and hopefully adoption) sooner.
> I agree with Kristof that, with the world of benchmarks being what it
> is, focusing on test-suite buildbots will probably give the best
> return on investment for the community.
>
> cheers,
> --renato
Kristof/All,
I would be more than happy to contribute to this BOF in any way I can.

 Chad


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

James Molloy

2014-Aug-05 14:45 UTC

head link

[LLVMdev] Dev Meeting BOF: Performance Tracking

Hi Chad,
> I recall Daniel and I discussing this issue.  IIRC, we considered an eagerapproach where the current build would rerun the benchmark to verify the
spikes.  However, I like the lazy detection approach you're suggesting.
This avoids long running builds when there are real regressions.

I think the real issue behind this one is that it would change LNT from
being a passive system to an active system. Currently the LNT tests can be
run in any way one wishes, so long as a report is produced. Similarly, we
can add other benchmarks to the report, which we currently do internally to
avoid putting things like EEMBC into LNT's build system.

With an "eager" approach as you mention, LNT would have to know how to
ssh
onto certain boxen, run the command and get the result back. Which would be
a ton of work to do well!

Cheers,

James

-----Original Message-----
From: Chad Rosier [mailto:mcrosier at codeaurora.org] 
Sent: 05 August 2014 15:42
To: Renato Golin
Cc: Kristof Beyls; mcrosier at codeaurora.org; James Molloy; Yi Kong;
llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Dev Meeting BOF: Performance Tracking

Kristof,
Unfortunately, our merge process is less than ideal.  It has vastly improved
over the past few months (years I hear), but we still have times where we
bring in days/weeks worth of commits en mass.  To that end, I've setup a
nightly performance run against the community branch, but it's still an
overwhelming amount of work to track/report/bisect regressions.  As you
guessed, this is what motivated my initial email.
> On 5 August 2014 10:30, Kristof Beyls <Kristof.Beyls at arm.com>
wrote:
>> The biggest problem that we were trying to solve this year was to 
>> produce data without too much noise. I think with Renato hopefully 
>> setting up a chromebook (Cortex-A15) soon there will finally be an 
>> ARM architecture board producing useful data and pushing it into the
central database.>
> I haven't got around finishing that work (at least not reporting to 
> Perf anyway) because of the instability issues.
>
> I think getting Perf stable is priority 0 right now in the LLVM 
> benchmarking field.
I agree 110%; we don't want the bots crying wolf.  Otherwise, real issues
will fall on deaf ears.
>> I think this should be the main topic of the BoF this year: now that 
>> we can produce useful data; what do we do with the data to actually 
>> improve LLVM?
>
> With the benchmark LNT reporting meaningful results and warning users 
> of spikes, I think we have at least the base covered.
I haven't used LNT in well over a year, but I recall Daniel Dunbar and I
having many discussion on how LNT could be improved.  (Forgive me if any of
my suggestions have already been address. I'm playing catch up at the
moment.)
> Further improvements I can think of would be to:
>
> * Allow Perf/LNT to fix a set of "golden standards" based on past
> releases
> * Mark the levels of those standards on every graph as coloured 
> horizontal lines
> * Add warning systems when the current values deviate from any past 
> golden standard
I agree.  IIRC, there's functionality to set a baseline run to compare
against.
Unfortunately, I think this is too coarse.  It would be great if the golden
standard could be set on a per benchmark basis.  Thus, upward trending
benchmarks can have their standard updated while other benchmarks remain
static.
> * Allow Perf/LNT to report on differences between two distinct bots
> * Create GCC buildbots with the same configurations/architectures and 
> compare them to LLVM's
> * Mark golden standards for GCC releases, too, as a visual aid (no
> warnings)
>
> * Implement trend detection (gradual decrease of performance) and 
> historical comparisons (against older releases)
> * Implement warning systems to the admin (not users) for such trends
Would it be useful to detect upwards trends as well?  Per my comment above,
it would be great to update the golden standard so we're always moving in
the right direction.
> * Improve spike detection to wait one or two more builds to make sure 
> the spike was an actual regression, but then email the original blame 
> list, not the current builds' one.
I recall Daniel and I discussing this issue.  IIRC, we considered an eager
approach where the current build would rerun the benchmark to verify the
spikes.  However, I like the lazy detection approach you're suggesting.
This avoids long running builds when there are real regressions.
> * Implement this feature on all warnings (previous runs, golden 
> standards, GCC comparisons)
>
> * Renovate the list of tests and benchmarks, extending their run times 
> dynamically instead of running them multiple times, getting the times 
> for the core functionality instead of whole-program timing, etc.
Could we create a minimal test-suite that includes only benchmarks that are
known to have little variance and run times greater than some decided upon
threshold?  With that in place we could begin the performance tracking (and
hopefully adoption) sooner.
> I agree with Kristof that, with the world of benchmarks being what it 
> is, focusing on test-suite buildbots will probably give the best 
> return on investment for the community.
>
> cheers,
> --renato
Kristof/All,
I would be more than happy to contribute to this BOF in any way I can.

 Chad


--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

Renato Golin

2014-Aug-05 15:19 UTC

head link

[LLVMdev] Dev Meeting BOF: Performance Tracking

On 5 August 2014 15:41, Chad Rosier <mcrosier at codeaurora.org>
wrote:> I agree.  IIRC, there's functionality to set a baseline run to compare
against.
> Unfortunately, I think this is too coarse.  It would be great if the golden
> standard could be set on a per benchmark basis.  Thus, upward trending
> benchmarks can have their standard updated while other benchmarks remain
> static.
Having multiple "golden standards" showing as a coloured line would
give the visual impression of mostly the highest score, no matter
which release that was. Programatically, it'd also allow us to enquire
about the "best golden standard" and always compare against it. I
think the historical values are important to show a graph of the
progress of releases, as well as the current revision, so you know how
that fluctuated in the past few years as well as in the past few
weeks.

> Would it be useful to detect upwards trends as well?  Per my comment above,
> it would be great to update the golden standard so we're always moving
in the
> right direction.
Upwards trends are nice to know, but the "current standard" can be the
highest average of a set of N points since the last golden standard,
and for that we don't explicitly need to be tracking upwards trends.
If the last moving average is NOT the current standard, than we must
have had detected a downwards slope since then.

> Could we create a minimal test-suite that includes only benchmarks that
> are known to have little variance and run times greater than some decided
> upon threshold?  With that in place we could begin the performance
> tracking (and hopefully adoption) sooner.
That's done. I haven't tested yet because of the failures in Perf.

In the beginning, we could start with the set of
gloden/current/previous standards for the benchmark-specific results,
not the whole test-suite. As we progress towards more stability, we
can implement that for all, but still allow configurations to only
warn (user/admin) of the restricted set, to avoid extra noise on noisy
targets (like ARM).

cheers,
--renato

Gerolf Hoflehner

2014-Aug-19 23:24 UTC

head link

[LLVMdev] Dev Meeting BOF: Performance Tracking

My experience from leading BOFs at other conferences is more talk than action.
So I suggest a different setup for this topic: how about having a working group
meeting with participants who can commit time to work on this topic? The group
meets for some time (TBD, during the conference of course), discusses and
brainstorms the options, and  - as a first immediate outcome  - proposes a road
forward in a  5-10 min report out talk.

There might be other topics that could benefit from the working group format as
well, so we could have a separate report out session at the conference.

Cheers
Gerolf

On Aug 1, 2014, at 4:04 PM, Chad Rosier <mcrosier at codeaurora.org>
wrote:
> All,
> I'm curious to know if anyone is interested in tracking performance
> (compile-time and/or execution-time) from a community perspective?  This
> is a much loftier goal then just supporting build bots.  If so, I'd be
> happy to propose a BOF at the upcoming Dev Meeting.
> 
> Chad
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Renato Golin

2014-Aug-20 00:02 UTC

head link

[LLVMdev] Dev Meeting BOF: Performance Tracking

On 20 August 2014 00:24, Gerolf Hoflehner <ghoflehner at apple.com>
wrote:> My experience from leading BOFs at other conferences is more talk than
action. So I suggest a different setup for this topic: how about having a
working group meeting with participants who can commit time to work on this
topic?
Mine too, but in this case I have to say it wasn't at all what
happened. It started with a 10 minute description of what we had and
why it was bad, followed by a 40 minute discussion on what to do and
how.

There were about 80 people in the room, all actively involved in
defining actions and actors. In the end we had clear goals, with clear
owners and we have implemented every single one of them to date. I
have to say, I've never seen that happen before!

Furthermore, the "working group" was about the 80 people in the room
anyway, and they all helped in one way or another. So, for any other
discussion, I'd agree with you. For this one, I think we should stick
to what's working. :)

cheers,
--renato

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Aug 2014 - [LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

[LLVMdev] Dev Meeting BOF: Performance Tracking

Seemingly Similar Threads