thr3ads.net - llvm dev - [llvm-dev] [RFC] Cheaper indirect calls via trampolines [Mar 2020]

If this information is useful, please help other people find it:
Share via:

Jon Chesterfield via llvm-dev

2020-Mar-03 14:04 UTC

[llvm-dev] [RFC] Cheaper indirect calls via trampolines

Taking the address of a function inhibits optimisations for that function.
Essentially any ABI changes are unavailable if we can't adjust the call
site to match. The case of interest here is when a given function is called
directly and indirectly, and we don't want the latter to impose a cost on
the former.

One approach to avoid the ABI constraint cost is to extract/outline the
body of an address taken function into a new function, then replace said
body with a direct call to the new function. This leaves us with two
functions that have the same semantic effect:
- One has its address taken, and may have external visibility. Just calls
the other.
- One does not have its address taken and has internal visibility

Direct call sites to the outer wrapper/trampoline can be optimised to
direct calls to the new internal function, leaving no net change other than
enabling other optimisations. Uses of the address of the symbol are
unchanged as the original function is still present.

Indirect call sites now go through this trampoline to share the code.
There's the runtime cost of undoing whatever ABI optimisations we later
chose to make to the internal function, e.g. some argument
shuffling/discarding, then either a tail call or a normal call if the
return value also needs to adjustment.

That is, the proposed transform has made indirect calls slightly slower
(unless we inline the new function back in to make a clone, in which case
it's made code size bigger) in exchange for re-enabling all the
optimisations that we currently lose from the address of. The same sort of
reasoning applies if the function is external and must expose an ABI
appropriate entry point for other translation units, but we'd like to use a
faster calling convention internally.

If at the end of a pipeline we didn't actually want to change the function
after all, we should be able to fold the two back together.

I think that's plausibly a win. Taking the address of a function no longer
thwarts other optimisations, in exchange for making the indirectly called
function slightly slower. Thoughts?

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200303/4ac5e0fb/attachment.html>

Michael Kruse via llvm-dev

2020-Mar-03 17:16 UTC

head link

[llvm-dev] [RFC] Cheaper indirect calls via trampolines

I associate the word "trampoline" with gcc's technique writing a
function wrapper for nested function to the stack:
https://gcc.gnu.org/onlinedocs/gccint/Trampolines.html
IIUC, you are not proposing writing the the outer wrapper to the
stack. Maybe we use a different term.

@jdoerfert already had thought about this technique for
interprocedural optimizations, in particular argument promotion.

Michael


Am Di., 3. März 2020 um 08:05 Uhr schrieb Jon Chesterfield via
llvm-dev <llvm-dev at lists.llvm.org>:>
> Taking the address of a function inhibits optimisations for that function.
Essentially any ABI changes are unavailable if we can't adjust the call site
to match. The case of interest here is when a given function is called directly
and indirectly, and we don't want the latter to impose a cost on the former.
>
> One approach to avoid the ABI constraint cost is to extract/outline the
body of an address taken function into a new function, then replace said body
with a direct call to the new function. This leaves us with two functions that
have the same semantic effect:
> - One has its address taken, and may have external visibility. Just calls
the other.
> - One does not have its address taken and has internal visibility
>
> Direct call sites to the outer wrapper/trampoline can be optimised to
direct calls to the new internal function, leaving no net change other than
enabling other optimisations. Uses of the address of the symbol are unchanged as
the original function is still present.
>
> Indirect call sites now go through this trampoline to share the code.
There's the runtime cost of undoing whatever ABI optimisations we later
chose to make to the internal function, e.g. some argument shuffling/discarding,
then either a tail call or a normal call if the return value also needs to
adjustment.
>
> That is, the proposed transform has made indirect calls slightly slower
(unless we inline the new function back in to make a clone, in which case
it's made code size bigger) in exchange for re-enabling all the
optimisations that we currently lose from the address of. The same sort of
reasoning applies if the function is external and must expose an ABI appropriate
entry point for other translation units, but we'd like to use a faster
calling convention internally.
>
> If at the end of a pipeline we didn't actually want to change the
function after all, we should be able to fold the two back together.
>
> I think that's plausibly a win. Taking the address of a function no
longer thwarts other optimisations, in exchange for making the indirectly called
function slightly slower. Thoughts?
>
> Jon
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Philip Reames via llvm-dev

2020-Mar-03 20:06 UTC

head link

[llvm-dev] [RFC] Cheaper indirect calls via trampolines

Trampoline is a much more generic term than that particular 
implementation technique, JFYI.

Philip

On 3/3/20 9:16 AM, Michael Kruse via llvm-dev wrote:> I associate the word "trampoline" with gcc's technique
writing a
> function wrapper for nested function to the stack:
> https://gcc.gnu.org/onlinedocs/gccint/Trampolines.html
> IIUC, you are not proposing writing the the outer wrapper to the
> stack. Maybe we use a different term.
>
> @jdoerfert already had thought about this technique for
> interprocedural optimizations, in particular argument promotion.
>
> Michael
>
>
> Am Di., 3. März 2020 um 08:05 Uhr schrieb Jon Chesterfield via
> llvm-dev <llvm-dev at lists.llvm.org>:
>> Taking the address of a function inhibits optimisations for that
function. Essentially any ABI changes are unavailable if we can't adjust the
call site to match. The case of interest here is when a given function is called
directly and indirectly, and we don't want the latter to impose a cost on
the former.
>>
>> One approach to avoid the ABI constraint cost is to extract/outline the
body of an address taken function into a new function, then replace said body
with a direct call to the new function. This leaves us with two functions that
have the same semantic effect:
>> - One has its address taken, and may have external visibility. Just
calls the other.
>> - One does not have its address taken and has internal visibility
>>
>> Direct call sites to the outer wrapper/trampoline can be optimised to
direct calls to the new internal function, leaving no net change other than
enabling other optimisations. Uses of the address of the symbol are unchanged as
the original function is still present.
>>
>> Indirect call sites now go through this trampoline to share the code.
There's the runtime cost of undoing whatever ABI optimisations we later
chose to make to the internal function, e.g. some argument shuffling/discarding,
then either a tail call or a normal call if the return value also needs to
adjustment.
>>
>> That is, the proposed transform has made indirect calls slightly slower
(unless we inline the new function back in to make a clone, in which case
it's made code size bigger) in exchange for re-enabling all the
optimisations that we currently lose from the address of. The same sort of
reasoning applies if the function is external and must expose an ABI appropriate
entry point for other translation units, but we'd like to use a faster
calling convention internally.
>>
>> If at the end of a pipeline we didn't actually want to change the
function after all, we should be able to fold the two back together.
>>
>> I think that's plausibly a win. Taking the address of a function no
longer thwarts other optimisations, in exchange for making the indirectly called
function slightly slower. Thoughts?
>>
>> Jon
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Doerfert, Johannes via llvm-dev

2020-Mar-03 20:14 UTC

head link

[llvm-dev] [RFC] Cheaper indirect calls via trampolines

Hi Jon,

did you see https://reviews.llvm.org/D63312 back in  the day? I want to revive
that one eventually.
The idea is similar to your proposal but it's not focused on address taken
function per se, though
it covers that use case too.

Cheers,
  Johannes

________________________________________
From: Michael Kruse <llvmdev at meinersbur.de>
Sent: Tuesday, March 3, 2020 11:16
To: Jon Chesterfield
Cc: llvm-dev; Doerfert, Johannes
Subject: Re: [llvm-dev] [RFC] Cheaper indirect calls via trampolines

I associate the word "trampoline" with gcc's technique writing a
function wrapper for nested function to the stack:
https://gcc.gnu.org/onlinedocs/gccint/Trampolines.html
IIUC, you are not proposing writing the the outer wrapper to the
stack. Maybe we use a different term.

@jdoerfert already had thought about this technique for
interprocedural optimizations, in particular argument promotion.

Michael


Am Di., 3. März 2020 um 08:05 Uhr schrieb Jon Chesterfield via
llvm-dev <llvm-dev at lists.llvm.org>:>
> Taking the address of a function inhibits optimisations for that function.
Essentially any ABI changes are unavailable if we can't adjust the call site
to match. The case of interest here is when a given function is called directly
and indirectly, and we don't want the latter to impose a cost on the former.
>
> One approach to avoid the ABI constraint cost is to extract/outline the
body of an address taken function into a new function, then replace said body
with a direct call to the new function. This leaves us with two functions that
have the same semantic effect:
> - One has its address taken, and may have external visibility. Just calls
the other.
> - One does not have its address taken and has internal visibility
>
> Direct call sites to the outer wrapper/trampoline can be optimised to
direct calls to the new internal function, leaving no net change other than
enabling other optimisations. Uses of the address of the symbol are unchanged as
the original function is still present.
>
> Indirect call sites now go through this trampoline to share the code.
There's the runtime cost of undoing whatever ABI optimisations we later
chose to make to the internal function, e.g. some argument shuffling/discarding,
then either a tail call or a normal call if the return value also needs to
adjustment.
>
> That is, the proposed transform has made indirect calls slightly slower
(unless we inline the new function back in to make a clone, in which case
it's made code size bigger) in exchange for re-enabling all the
optimisations that we currently lose from the address of. The same sort of
reasoning applies if the function is external and must expose an ABI appropriate
entry point for other translation units, but we'd like to use a faster
calling convention internally.
>
> If at the end of a pipeline we didn't actually want to change the
function after all, we should be able to fold the two back together.
>
> I think that's plausibly a win. Taking the address of a function no
longer thwarts other optimisations, in exchange for making the indirectly called
function slightly slower. Thoughts?
>
> Jon
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reid Kleckner via llvm-dev

2020-Mar-04 23:23 UTC

head link

[llvm-dev] [RFC] Cheaper indirect calls via trampolines

Yes, this is a great idea. It was something that occurred to us as well
when we were adding support for `inalloca` to LLVM. The attribute was added
for MSVC compatibility, and is bad for analysis and IPO. So, it would be
great if globalopt or some other early pass came along and fixed up direct
calls so they avoid this overhead when possible.

On Tue, Mar 3, 2020 at 6:05 AM Jon Chesterfield via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Taking the address of a function inhibits optimisations for that function.
> Essentially any ABI changes are unavailable if we can't adjust the call
> site to match. The case of interest here is when a given function is called
> directly and indirectly, and we don't want the latter to impose a cost
on
> the former.
>
> One approach to avoid the ABI constraint cost is to extract/outline the
> body of an address taken function into a new function, then replace said
> body with a direct call to the new function. This leaves us with two
> functions that have the same semantic effect:
> - One has its address taken, and may have external visibility. Just calls
> the other.
> - One does not have its address taken and has internal visibility
>
> Direct call sites to the outer wrapper/trampoline can be optimised to
> direct calls to the new internal function, leaving no net change other than
> enabling other optimisations. Uses of the address of the symbol are
> unchanged as the original function is still present.
>
> Indirect call sites now go through this trampoline to share the code.
> There's the runtime cost of undoing whatever ABI optimisations we later
> chose to make to the internal function, e.g. some argument
> shuffling/discarding, then either a tail call or a normal call if the
> return value also needs to adjustment.
>
> That is, the proposed transform has made indirect calls slightly slower
> (unless we inline the new function back in to make a clone, in which case
> it's made code size bigger) in exchange for re-enabling all the
> optimisations that we currently lose from the address of. The same sort of
> reasoning applies if the function is external and must expose an ABI
> appropriate entry point for other translation units, but we'd like to
use a
> faster calling convention internally.
>
> If at the end of a pipeline we didn't actually want to change the
function
> after all, we should be able to fold the two back together.
>
> I think that's plausibly a win. Taking the address of a function no
longer
> thwarts other optimisations, in exchange for making the indirectly called
> function slightly slower. Thoughts?
>
> Jon
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200304/585eea3b/attachment.html>

llvm dev - Mar 2020 - [RFC] Cheaper indirect calls via trampolines

[llvm-dev] [RFC] Cheaper indirect calls via trampolines

[llvm-dev] [RFC] Cheaper indirect calls via trampolines

[llvm-dev] [RFC] Cheaper indirect calls via trampolines

[llvm-dev] [RFC] Cheaper indirect calls via trampolines

[llvm-dev] [RFC] Cheaper indirect calls via trampolines