thr3ads.net - llvm dev - [llvm-dev] [RFC] Implementing the BHive methodology in llvm-exegesis [Apr 2021]

If this information is useful, please help other people find it:
Share via:
Mendis, Charith via llvm-dev
2021-Apr-21 14:20 UTC
[llvm-dev] [RFC] Implementing the BHive methodology in llvm-exegesis

Hi all,

This is an exciting development.

BHive code base now supports both x86-64 and ARM timing. It would be great if
these developments are integrated into LLVM-exegesis in a more portable manner.

From the perspective of the compiler research community, this would enable us to
use precise timing data inside LLVM IR passes if we want more precision than
what TTI supports. We will not have to worry about segmentation faults when
timing memory heavy basic blocks if BHive’s timing methodology is adopted.
Further, this will also pave way to interesting new directions in developing new
performance models that are tightly coupled with the LLVM infrastructure.

Therefore, I welcome and support this contribution by Ondrej and his colleagues
in getting the BHive timing infrastructure embedded within the LLVM eco-system.

Best,
Charith.

On Apr 21, 2021, at 9:32 AM, Ondrej Sykora <ondrasej at
google.com<mailto:ondrasej at google.com>> wrote:



---------- Forwarded message ---------
From: Ondrej Sykora <ondrasej at google.com<mailto:ondrasej at
google.com>>
Date: Fri, Mar 19, 2021 at 10:16 AM
Subject: Re: [llvm-dev] [RFC] Implementing the BHive methodology in
llvm-exegesis
To: Clement Courbet <courbet at google.com<mailto:courbet at
google.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>, Tom Chen <cyt046 at gmail.com<mailto:cyt046 at
gmail.com>>


Hi all,

I'm sorry to revive such an old thread. Due to lack of time, we did not make
progress as fast as we planned. We started building a prototype based on the
original
proposal<https://urldefense.com/v3/__https://docs.google.com/document/d/1Z6sYes0jBRwHUkUZmI2JieDZZBm-HHhAzHlKJOFLLtU/edit*__;Iw!!DZ3fjg!tKYTdxu_GJ9pdWz4Qm6bknG4ija8etDoHkiUtzu1cFhAElkj1SoJb30bHUiE_8J5VK0$>,
but we found a couple of blockers:
- we found it very difficult to implement the interaction between the
llvm-exegesis process and the child process running the benchmarked code using
the LLDB API.
- in the meantime, the MIT team continued development of the BHive algorithm,
and replaced most of the assembly with C. The new code is simpler and easier to
port to other architectures.

Based on our experience, we're considering a simpler approach compared to
the original
proposal<https://urldefense.com/v3/__https://docs.google.com/document/d/1Z6sYes0jBRwHUkUZmI2JieDZZBm-HHhAzHlKJOFLLtU/edit*__;Iw!!DZ3fjg!tKYTdxu_GJ9pdWz4Qm6bknG4ija8etDoHkiUtzu1cFhAElkj1SoJb30bHUiE_8J5VK0$>:
- if possible, we will use the same C-oriented design as the latest version of
the tool developed at MIT.
- we will focus on a Linux and x86-64 implementation first, with porting to
other architectures (but not operating systems) in mind. This would allow us to
depend on the stable and well-defined Linux syscall interface. To our best
knowledge, llvm-exegesis is already limited to Linux because of its dependence
on the Linux perf subsystem, so this does not create any new portability
restrictions.
- we would use
ptrace<https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Ptrace__;!!DZ3fjg!tKYTdxu_GJ9pdWz4Qm6bknG4ija8etDoHkiUtzu1cFhAElkj1SoJb30bHUiEJ32eXN8$>
as a simpler and more powerful alternative to LLDB. It is a syscall, so it does
not introduce any new external library dependencies.
- in the same spirit, we will depend on the mmap and munmap syscalls rather than
on their abstractions.

Let us know what you think!

Best regards

Ondrej

On Mon, Jan 27, 2020 at 1:21 PM Ondrej Sykora <ondrasej at
google.com<mailto:ondrasej at google.com>> wrote:
Hi Clement,

thanks for the feedback!


On Fri, Jan 17, 2020 at 11:47 AM Clement Courbet <courbet at
google.com<mailto:courbet at google.com>> wrote:
On Thu, Jan 16, 2020 at 6:32 PM Ondrej Sykora via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
In a recent IISWC
paper<https://urldefense.com/v3/__http://groups.csail.mit.edu/commit/papers/19/ithemal-measurement.pdf__;!!DZ3fjg!tKYTdxu_GJ9pdWz4Qm6bknG4ija8etDoHkiUtzu1cFhAElkj1SoJb30bHUiEKQ0dQVk$>,
we've proposed BHive - a new methodology for benchmarking arbitrary basic
blocks that has several advantages over the one currently used in llvm-exegesis.
In particular, the new methodology:
- automatically handles memory accesses in the basic block, without the need to
manually annotate live-ins,
- maps all memory addresses accessed by the basic block to the same page,
significantly reducing the probability of cache misses during benchmarking,
- the benchmarked code runs in a separate process, reducing risks of
compromising the monitor process memory,
- computes the throughput in a way that subtracts away the effects of the
scaffolding code.

I've never actually seen a case where the scaffolding code had much
influence on the results (at least on X86), especially in loop mode. However, I
can see some value in snippet mode (not generated code mode): this allows the
snippet code to exhaust all available registers and still be measurable.

Yes, our main goal is benchmarking arbitrary basic blocks, where we do not
control the register allocation.

A possible challenge is increased complexity of the code: BHive uses a separate
process to run the benchmarked basic block and changes memory mapping of the
process to ensure that all memory accesses lead to the same page. Most operating
systems have the necessary APIs, but these may differ significantly. In
particular, the Windows API for memory mapping and process creation/control is
very different from the Unix world. Initially, we might be able to support the
new methodology only on Linux and Unix-like systems.

Though I think it's fine to have linux only as an initial implementation, I
think there should be a clear plan to support windows: there are  people in the
LLVM community who are using llvm-exegesis on windows (e.g. folks at Sony). Note
that you might be able to reuse some code in LLVM: compiler-rt already has an
abstraction layer in "WindowsMMap.c" on top of MapViewOfFile.

Thanks for the pointers! That said, replacing mmap is relatively
straightforward. The difficult part is replacing munmap, which does not have a
direct equivalent on Windows and you need to query the system for all mapped
blocks, and then unmap them one by one. This is a very specific functionality,
and I'd be surprised if someone implemented that.

Before we start the implementation, we would like to collect feedback on the
proposed design:
- We're planning to implement the methodology as a new implementation of
BenchmarkRunner::FunctionExecutor that will exist alongside the current runner.
The existing functionality will be preserved, and the user will be able to
select the benchmark runner using a command-line flag.

LGTM.

- We're considering using the LLDB API to control the execution of the
benchmarking process in a platform-independent way.

 I think it's a great idea to avoid introducing any other external
dependencies.

--
Ondrej


--
Ondrej


--
Ondrej

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210421/5925dba5/attachment.html>
llvm dev - Apr 2021 - [RFC] Implementing the BHive methodology in llvm-exegesis

[llvm-dev] [RFC] Implementing the BHive methodology in llvm-exegesis