thr3ads.net - llvm dev - [llvm-dev] [RFC] Implementing the BHive methodology in llvm-exegesis [Jan 2020]

If this information is useful, please help other people find it:
Share via:

Ondrej Sykora via llvm-dev

2020-Jan-16 17:32 UTC

[llvm-dev] [RFC] Implementing the BHive methodology in llvm-exegesis

Hi all,

In a recent IISWC paper
<http://groups.csail.mit.edu/commit/papers/19/ithemal-measurement.pdf>,
we've proposed BHive - a new methodology for benchmarking arbitrary basic
blocks that has several advantages over the one currently used in
llvm-exegesis. In particular, the new methodology:
- automatically handles memory accesses in the basic block, without the
need to manually annotate live-ins,
- maps all memory addresses accessed by the basic block to the same page,
significantly reducing the probability of cache misses during benchmarking,
- the benchmarked code runs in a separate process, reducing risks of
compromising the monitor process memory,
- computes the throughput in a way that subtracts away the effects of the
scaffolding code.

A possible challenge is increased complexity of the code: BHive uses a
separate process to run the benchmarked basic block and changes memory
mapping of the process to ensure that all memory accesses lead to the same
page. Most operating systems have the necessary APIs, but these may differ
significantly. In particular, the Windows API for memory mapping and
process creation/control is very different from the Unix world. Initially,
we might be able to support the new methodology only on Linux and Unix-like
systems.

Before we start the implementation, we would like to collect feedback on
the proposed design:
- We're planning to implement the methodology as a new implementation of
BenchmarkRunner::FunctionExecutor that will exist alongside the current
runner. The existing functionality will be preserved, and the user will be
able to select the benchmark runner using a command-line flag.
- We're considering using the LLDB API to control the execution of the
benchmarking process in a platform-independent way.

You can find a more detailed proposal here
<https://docs.google.com/document/d/1Z6sYes0jBRwHUkUZmI2JieDZZBm-HHhAzHlKJOFLLtU/edit#>.
A stand-alone Linux implementation of the methodology used in the paper is
available on GitHub <https://github.com/ithemal/timing-harness>.

Comments and suggestions are most welcome!

Ondrej and Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200116/32c003ab/attachment.html>

Clement Courbet via llvm-dev

2020-Jan-17 10:46 UTC

head link

[llvm-dev] [RFC] Implementing the BHive methodology in llvm-exegesis

Hi Ondrej, Tom,

This is very exciting. We're not doing a very good job on
memory instructions right now, it would be really cool to be able to
measure them better. Comments inline.

On Thu, Jan 16, 2020 at 6:32 PM Ondrej Sykora via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> In a recent IISWC paper
>
<http://groups.csail.mit.edu/commit/papers/19/ithemal-measurement.pdf>,
> we've proposed BHive - a new methodology for benchmarking arbitrary
basic
> blocks that has several advantages over the one currently used in
> llvm-exegesis. In particular, the new methodology:
> - automatically handles memory accesses in the basic block, without the
> need to manually annotate live-ins,
> - maps all memory addresses accessed by the basic block to the same page,
> significantly reducing the probability of cache misses during benchmarking,
> - the benchmarked code runs in a separate process, reducing risks of
> compromising the monitor process memory,
> - computes the throughput in a way that subtracts away the effects of the
> scaffolding code.
>
I've never actually seen a case where the scaffolding code had much
influence on the results (at least on X86), especially in loop mode.
However, I can see some value in snippet mode (not generated code mode):
this allows the snippet code to exhaust all available registers and still
be measurable.

> A possible challenge is increased complexity of the code: BHive uses a
> separate process to run the benchmarked basic block and changes memory
> mapping of the process to ensure that all memory accesses lead to the same
> page. Most operating systems have the necessary APIs, but these may differ
> significantly. In particular, the Windows API for memory mapping and
> process creation/control is very different from the Unix world. Initially,
> we might be able to support the new methodology only on Linux and Unix-like
> systems.
>
Though I think it's fine to have linux only as an initial implementation, I
think there should be a clear plan to support windows: there are  people in
the LLVM community who are using llvm-exegesis on windows (e.g. folks at
Sony). Note that you might be able to reuse some code in LLVM: compiler-rt
already has an abstraction layer in "WindowsMMap.c" on top of
MapViewOfFile.

> Before we start the implementation, we would like to collect feedback on
> the proposed design:
> - We're planning to implement the methodology as a new implementation
of
> BenchmarkRunner::FunctionExecutor that will exist alongside the current
> runner. The existing functionality will be preserved, and the user will be
> able to select the benchmark runner using a command-line flag.
>
LGTM.

> - We're considering using the LLDB API to control the execution of the
> benchmarking process in a platform-independent way.
>
 I think it's a great idea to avoid introducing any other external
dependencies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200117/7c8ddcfb/attachment.html>

Ondrej Sykora via llvm-dev

2020-Jan-27 12:21 UTC

head link

[llvm-dev] [RFC] Implementing the BHive methodology in llvm-exegesis

Hi Clement,

thanks for the feedback!


On Fri, Jan 17, 2020 at 11:47 AM Clement Courbet <courbet at google.com>
wrote:
> On Thu, Jan 16, 2020 at 6:32 PM Ondrej Sykora via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> In a recent IISWC paper
>>
<http://groups.csail.mit.edu/commit/papers/19/ithemal-measurement.pdf>,
>> we've proposed BHive - a new methodology for benchmarking arbitrary
basic
>> blocks that has several advantages over the one currently used in
>> llvm-exegesis. In particular, the new methodology:
>> - automatically handles memory accesses in the basic block, without the
>> need to manually annotate live-ins,
>> - maps all memory addresses accessed by the basic block to the same
page,
>> significantly reducing the probability of cache misses during
benchmarking,
>> - the benchmarked code runs in a separate process, reducing risks of
>> compromising the monitor process memory,
>> - computes the throughput in a way that subtracts away the effects of
the
>> scaffolding code.
>>
>
> I've never actually seen a case where the scaffolding code had much
> influence on the results (at least on X86), especially in loop mode.
> However, I can see some value in snippet mode (not generated code mode):
> this allows the snippet code to exhaust all available registers and still
> be measurable.
>
Yes, our main goal is benchmarking arbitrary basic blocks, where we do not
control the register allocation.

> A possible challenge is increased complexity of the code: BHive uses a
>> separate process to run the benchmarked basic block and changes memory
>> mapping of the process to ensure that all memory accesses lead to the
same
>> page. Most operating systems have the necessary APIs, but these may
differ
>> significantly. In particular, the Windows API for memory mapping and
>> process creation/control is very different from the Unix world.
Initially,
>> we might be able to support the new methodology only on Linux and
Unix-like
>> systems.
>>
>
> Though I think it's fine to have linux only as an initial
implementation,
> I think there should be a clear plan to support windows: there are  people
> in the LLVM community who are using llvm-exegesis on windows (e.g. folks at
> Sony). Note that you might be able to reuse some code in LLVM: compiler-rt
> already has an abstraction layer in "WindowsMMap.c" on top of
MapViewOfFile.
>
Thanks for the pointers! That said, replacing mmap is relatively
straightforward. The difficult part is replacing munmap, which does not
have a direct equivalent on Windows and you need to query the system for
all mapped blocks, and then unmap them one by one. This is a very specific
functionality, and I'd be surprised if someone implemented that.

> Before we start the implementation, we would like to collect feedback on
>> the proposed design:
>> - We're planning to implement the methodology as a new
implementation of
>> BenchmarkRunner::FunctionExecutor that will exist alongside the current
>> runner. The existing functionality will be preserved, and the user will
be
>> able to select the benchmark runner using a command-line flag.
>>
>
> LGTM.
>
>
>> - We're considering using the LLDB API to control the execution of
the
>> benchmarking process in a platform-independent way.
>>
>
>  I think it's a great idea to avoid introducing any other external
> dependencies.
>
-- 
Ondrej
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200127/8fa186c7/attachment.html>

llvm dev - Jan 2020 - [RFC] Implementing the BHive methodology in llvm-exegesis

[llvm-dev] [RFC] Implementing the BHive methodology in llvm-exegesis

[llvm-dev] [RFC] Implementing the BHive methodology in llvm-exegesis

[llvm-dev] [RFC] Implementing the BHive methodology in llvm-exegesis