thr3ads.net - llvm dev - [llvm-dev] Altering the return address , for a function with multiple return paths [Jul 2019]

If this information is useful, please help other people find it:
Share via:

Tsur Herman via llvm-dev

2019-Jul-21 09:06 UTC

[llvm-dev] Altering the return address , for a function with multiple return paths

Playing around with calling conventions naked functions and
epilogue/prologue...
Is it possible/expressible/feasible to alter the return address the
function will return to?

For example, when a function may return an Int8 or a Float64, depending on
some external state
(user, or random variable), instead of checking the returned type in the
calling function, is it possible
to pass 2 potential return addresses one suitable for Int8 and one suitable
for Float64 and let the function return to the right place?

if it is possible, what are the implications? do these inhibit the
optimization opportunities somehow?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190721/ad196050/attachment.html>

Jacob Lifshay via llvm-dev

2019-Jul-21 10:17 UTC

head link

[llvm-dev] Altering the return address , for a function with multiple return paths

one (non-LLVM) problem you will run into is that almost all processors
are optimized to have functions return to the instruction right after
the instruction that called them.

The most common method is to predict where the return instruction will
jump to by using a processor-internal stack of return addresses, which
is separate from the in-memory call stack. This enables the processor
to fetch, decode, and execute instructions following (in program
order) the return instruction before the processor knows for sure what
address the return instruction will branch to. If the return address
turns out to be different than the processor predicted, it has to
throw out all the instructions it started executing that it thought
came after the return, causing massive slow-downs.

For an interesting application of changing the return address, lookup
retpolines.

On Sun, Jul 21, 2019 at 2:07 AM Tsur Herman via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Playing around with calling conventions naked functions and
epilogue/prologue...
> Is it possible/expressible/feasible to alter the return address the
function will return to?
>
> For example, when a function may return an Int8 or a Float64, depending on
some external state
> (user, or random variable), instead of checking the returned type in the
calling function, is it possible
> to pass 2 potential return addresses one suitable for Int8 and one suitable
for Float64 and let the function return to the right place?
>
> if it is possible, what are the implications? do these inhibit the
optimization opportunities somehow?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Joan Lluch via llvm-dev

2019-Jul-21 12:14 UTC

head link

[llvm-dev] Altering the return address , for a function with multiple return paths

Hy Jay,

This trick can certainly be used by someone coding in assembly language
directly, but I do not think this is possible for a compiler to do so. High
level language functions are supposed to have a single entry point and a single
return address to the instruction just next to the call. Virtually all high
level languages and their compilers are designed according to these semantics
and processors are optimized for that too. Inside the callee, the compiler may
optimise the actual placement of the return code or it may repeat code to avoid
branching, the compiler may also perform tail call optimisations that modify the
standard return procedure, but the proper epilog code will effectively be
executed in all cases with identical return value and execution transfer to the
same return address.

In order for a compiler to implement what you suggest, I think that some
explicit semantics would have to be incorporated to the high level languages
being compiled. Currently, in order to declare a function to return a Float64 or
an Int8 depending on external conditions, the user must either use function
overloads, or function templates, or closures (on languages supporting them). In
all these cases, the user must either explicitly declare a function for every
type, or the compiler may generate a separate function for every type use case.
So in reality the case where a single function may return multiple types does
not happen. My point is that since in high level languages there’s no way to
specify multiple return types for the same function, there’s no real use case
where the compiler may want to do so. Unless I misunderstood your question.

Joan

> On 21 Jul 2019, at 11:06, Tsur Herman via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Playing around with calling conventions naked functions and
epilogue/prologue...
> Is it possible/expressible/feasible to alter the return address the
function will return to?
> 
> For example, when a function may return an Int8 or a Float64, depending on
some external state
> (user, or random variable), instead of checking the returned type in the
calling function, is it possible
> to pass 2 potential return addresses one suitable for Int8 and one suitable
for Float64 and let the function return to the right place?
> 
> if it is possible, what are the implications? do these inhibit the
optimization opportunities somehow?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Tsur Herman via llvm-dev

2019-Jul-21 13:23 UTC

head link

[llvm-dev] Altering the return address , for a function with multiple return paths

In high-level languages, returning optionally different types of returned
value will is usually handled with
a tagged union and a switch statement in the caller.

My intention is to skip this by giving the callee two different addresses
to return to depending on what it did with the input.

for high-level jitted languages, this can simplify the "type
inference"
pass.

Another question on the topic. If I manage the stack myself somehow and
replace ret with inline assembly jmp , will
the processor be able to prefetch instructions beyond the jmp?

On Sun, Jul 21, 2019 at 3:14 PM Joan Lluch <joan.lluch at icloud.com>
wrote:
> Hy Jay,
>
> This trick can certainly be used by someone coding in assembly language
> directly, but I do not think this is possible for a compiler to do so. High
> level language functions are supposed to have a single entry point and a
> single return address to the instruction just next to the call. Virtually
> all high level languages and their compilers are designed according to
> these semantics and processors are optimized for that too. Inside the
> callee, the compiler may optimise the actual placement of the return code
> or it may repeat code to avoid branching, the compiler may also perform
> tail call optimisations that modify the standard return procedure, but the
> proper epilog code will effectively be executed in all cases with identical
> return value and execution transfer to the same return address.
>
> In order for a compiler to implement what you suggest, I think that some
> explicit semantics would have to be incorporated to the high level
> languages being compiled. Currently, in order to declare a function to
> return a Float64 or an Int8 depending on external conditions, the user must
> either use function overloads, or function templates, or closures (on
> languages supporting them). In all these cases, the user must either
> explicitly declare a function for every type, or the compiler may generate
> a separate function for every type use case. So in reality the case where a
> single function may return multiple types does not happen. My point is that
> since in high level languages there’s no way to specify multiple return
> types for the same function, there’s no real use case where the compiler
> may want to do so. Unless I misunderstood your question.
>
> Joan
>
>
> > On 21 Jul 2019, at 11:06, Tsur Herman via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Playing around with calling conventions naked functions and
> epilogue/prologue...
> > Is it possible/expressible/feasible to alter the return address the
> function will return to?
> >
> > For example, when a function may return an Int8 or a Float64,
depending
> on some external state
> > (user, or random variable), instead of checking the returned type in
the
> calling function, is it possible
> > to pass 2 potential return addresses one suitable for Int8 and one
> suitable for Float64 and let the function return to the right place?
> >
> > if it is possible, what are the implications? do these inhibit the
> optimization opportunities somehow?
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190721/90ff7ca7/attachment.html>

James Y Knight via llvm-dev

2019-Jul-21 16:29 UTC

head link

[llvm-dev] Altering the return address , for a function with multiple return paths

Yes, indeed!

The SBCL lisp compiler (not llvm based) used to emit functions which would
return either via ret to the usual instruction after the call, or else load
the return-address from the stack, then jump 2 bytes later (which would
skip over either a nop or a short jmp at original target location). Which
one it used depended upon whether the function was doing a multi-valued
return (in which case it used ret) or a single-valued return (in which case
it did the jmp retpc+2).

While this seems like a clever and efficient hack, it actually has an
absolutely awful effect on performance, due to the unpaired call vs return,
and the unexpected return address.

SBCL stopped doing this in 2006, a decade later than it should've -- the
Pentium1 MMX from 1997 already had a hardware return stack which made this
a really bad idea!

What it does now is have the called function set or clear the carry flag
(using STC and CLC) immediately before the return. If the caller cares,
then the caller emits JNC as the first instruction after the call. (but
callers typically do not care -- most calls only consume a single value,
and any extra return-values are silently ignored).

On Sun, Jul 21, 2019, 6:18 AM Jacob Lifshay via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> one (non-LLVM) problem you will run into is that almost all processors
> are optimized to have functions return to the instruction right after
> the instruction that called them.
>
> The most common method is to predict where the return instruction will
> jump to by using a processor-internal stack of return addresses, which
> is separate from the in-memory call stack. This enables the processor
> to fetch, decode, and execute instructions following (in program
> order) the return instruction before the processor knows for sure what
> address the return instruction will branch to. If the return address
> turns out to be different than the processor predicted, it has to
> throw out all the instructions it started executing that it thought
> came after the return, causing massive slow-downs.
>
> For an interesting application of changing the return address, lookup
> retpolines.
>
> On Sun, Jul 21, 2019 at 2:07 AM Tsur Herman via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Playing around with calling conventions naked functions and
> epilogue/prologue...
> > Is it possible/expressible/feasible to alter the return address the
> function will return to?
> >
> > For example, when a function may return an Int8 or a Float64,
depending
> on some external state
> > (user, or random variable), instead of checking the returned type in
the
> calling function, is it possible
> > to pass 2 potential return addresses one suitable for Int8 and one
> suitable for Float64 and let the function return to the right place?
> >
> > if it is possible, what are the implications? do these inhibit the
> optimization opportunities somehow?
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190721/f6a48f5b/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Jul 2019 - Altering the return address , for a function with multiple return paths

[llvm-dev] Altering the return address , for a function with multiple return paths

[llvm-dev] Altering the return address , for a function with multiple return paths

[llvm-dev] Altering the return address , for a function with multiple return paths

[llvm-dev] Altering the return address , for a function with multiple return paths

[llvm-dev] Altering the return address , for a function with multiple return paths

Reasonably Related Threads