thr3ads.net - llvm dev - [llvm-dev] Function specialisation pass [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Jon Chesterfield via llvm-dev

2021-Mar-24 13:58 UTC

[llvm-dev] Function specialisation pass

>
> Date: Tue, 23 Mar 2021 19:44:49 +0000
> From: Sjoerd Meijer via llvm-dev <llvm-dev at lists.llvm.org>
> To: "llvm-dev at lists.llvm.org" <llvm-dev at
lists.llvm.org>
> Subject: [llvm-dev] Function specialisation pass
>
>
> I am interested in adding a function specialisation(*) pass to LLVM ...
>
> Both previous attempts were parked at approximately the same point: the
> transformation was implemented but the cost-model to control compile-times
> and code-size was lacking ...
>
This sounds right. The transform is fairly mechanical - clone function,
replace uses of the argument being specialised on with the value, replace
call sites which passed said value. Great to see there's already work in
that direction.

I'd be delighted to contribute to the implementation effort. It may even
qualify as a legitimate thing to do during work hours - it would let the
amdgpu openmp runtime back off on the enthusiasm for inlining everything.
Some initial thoughts below.

Thanks!

Eliding call overhead and specialising on known arguments are currently
bundled together as 'inlining', which also has a challenging cost model.
If
we can reliably specialise, call sites that are presently inlined no longer
need to be. I'm sure the end point reached, with specialisation and
inlining working in harmony, would be better than what we have now in
compile time, code size, code quality.

There's a lot of work to get the heuristics and interaction with inlining
right though. I'd suggest an intermediate step of specialising based on
user annotations, kicking that problem down the road:

void example(int x, __attribute__((bikeshed)) int y) {... big function ...}

where this means a call site with a compile time known value for y, say 42,
gets specialised without reference to heuristics to a call to:

void example.carefully-named.42(int x) {constexpr int y = 42; ...}

We still have to get the machinery right - caching previous
specialisations, mapping multiple specialisations down to the same end
call, care with naming, maybe teaching thinlto about it and so forth.
However we postpone the problem of heuristically determining which
arguments and call sites are profitable.

The big motivating examples for me are things that take function pointers
(qsort style) and functions containing a large switch on one of the
arguments that is often known at the callee. An in-tree example of the
latter using 'the trick' from partial evaluation to get the same end
result
is in llvm/openmp/libomptarget/plugins/amdgpu/impl/msgpack.h:

The entry point is:
template <typename F>
const unsigned char *handle_msgpack(byte_range bytes, F f);

It's going to switch on the first byte in the byte_range but the function
is too large to inline. So it is specialized, not with a convenient
attribute, but by introducing something like:

template <typename F, msgpack::type ty>
const unsigned char *func(byte_range bytes, F f) {
  switch(ty) {}
}
template <typename F>
const unsigned char *handle_msgpack(byte_range bytes, F f) {
auto ty = bytes[0];
switch(ty) {
  case 0: return func<F,0>(...);
  case 1: return func<F,1>(...);
}
}

This is, alas, a slightly more complicated example than `void example(int
x, __attribute__((bikeshed)) int y)`, but I'm optimistic that explicitly
annotating a parameter as 'when this is known at compile time, specialise
on it' would still let me simplify that code.

Function pointer interfaces that are only called with a couple of different
pointers may be a more obvious win. Like openmp target regions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210324/4b1b91d6/attachment.html>

Sjoerd Meijer via llvm-dev

2021-Mar-24 17:24 UTC

head link

[llvm-dev] Function specialisation pass

Hello Jon,

Thanks for your reply. Very interesting, and I really like your suggestion!

I am still exploring and doing some more initial investigations, now also by
experimenting with D93838<https://reviews.llvm.org/D93838>, but I start to
believe that there is a credible way forward to get this in and enabled by
default at some point. Thus, I think a first intermediate step would be to get
this in but have it off by default as that is very convenient for
testing/experimenting. But like I said, I really like your suggestion, and so if
we can drive function specialisation with an attribute then that sounds ideal to
me and also justifies having the infrastructure to support this, then let it be
driven by an attribute, and we can work on cost-modelling and enabling it by
default. Also happy to hear you would like to contribute. I will discuss with
the authors of D93838<https://reviews.llvm.org/D93838> if they can pick it
up, but otherwise I will do that soon. If you for example would like to
contribute with your attribute proposal or other things you mentioned that would
absolutely fantastic and of course I would be happy to help out with reviews.

Cheers,
Sjoerd.


________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Jon
Chesterfield via llvm-dev <llvm-dev at lists.llvm.org>
Sent: 24 March 2021 13:58
To: llvm-dev <llvm-dev at lists.llvm.org>; llvm-dev-request at
lists.llvm.org <llvm-dev-request at lists.llvm.org>
Subject: Re: [llvm-dev] Function specialisation pass

Date: Tue, 23 Mar 2021 19:44:49 +0000
From: Sjoerd Meijer via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
To: "llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>" <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: [llvm-dev] Function specialisation pass


I am interested in adding a function specialisation(*) pass to LLVM ...

Both previous attempts were parked at approximately the same point: the
transformation was implemented but the cost-model to control compile-times
and code-size was lacking ...

This sounds right. The transform is fairly mechanical - clone function, replace
uses of the argument being specialised on with the value, replace call sites
which passed said value. Great to see there's already work in that
direction.

I'd be delighted to contribute to the implementation effort. It may even
qualify as a legitimate thing to do during work hours - it would let the amdgpu
openmp runtime back off on the enthusiasm for inlining everything. Some initial
thoughts below.

Thanks!

Eliding call overhead and specialising on known arguments are currently bundled
together as 'inlining', which also has a challenging cost model. If we
can reliably specialise, call sites that are presently inlined no longer need to
be. I'm sure the end point reached, with specialisation and inlining working
in harmony, would be better than what we have now in compile time, code size,
code quality.

There's a lot of work to get the heuristics and interaction with inlining
right though. I'd suggest an intermediate step of specialising based on user
annotations, kicking that problem down the road:

void example(int x, __attribute__((bikeshed)) int y) {... big function ...}

where this means a call site with a compile time known value for y, say 42, gets
specialised without reference to heuristics to a call to:

void example.carefully-named.42(int x) {constexpr int y = 42; ...}

We still have to get the machinery right - caching previous specialisations,
mapping multiple specialisations down to the same end call, care with naming,
maybe teaching thinlto about it and so forth. However we postpone the problem of
heuristically determining which arguments and call sites are profitable.

The big motivating examples for me are things that take function pointers (qsort
style) and functions containing a large switch on one of the arguments that is
often known at the callee. An in-tree example of the latter using 'the
trick' from partial evaluation to get the same end result is in
llvm/openmp/libomptarget/plugins/amdgpu/impl/msgpack.h:

The entry point is:
template <typename F>
const unsigned char *handle_msgpack(byte_range bytes, F f);

It's going to switch on the first byte in the byte_range but the function is
too large to inline. So it is specialized, not with a convenient attribute, but
by introducing something like:

template <typename F, msgpack::type ty>
const unsigned char *func(byte_range bytes, F f) {
  switch(ty) {}
}
template <typename F>
const unsigned char *handle_msgpack(byte_range bytes, F f) {
auto ty = bytes[0];
switch(ty) {
  case 0: return func<F,0>(...);
  case 1: return func<F,1>(...);
}
}

This is, alas, a slightly more complicated example than `void example(int x,
__attribute__((bikeshed)) int y)`, but I'm optimistic that explicitly
annotating a parameter as 'when this is known at compile time, specialise on
it' would still let me simplify that code.

Function pointer interfaces that are only called with a couple of different
pointers may be a more obvious win. Like openmp target regions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210324/87a58ed0/attachment.html>

llvm dev - Mar 2021 - Function specialisation pass

[llvm-dev] Function specialisation pass

[llvm-dev] Function specialisation pass