thr3ads.net - llvm dev - [LLVMdev] Question about implementing exceptions, especially to the VMKit team [May 2014]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das

2014-May-02 19:43 UTC

[LLVMdev] Question about implementing exceptions, especially to the VMKit team

Hi Kevin,

To elaborate on Philip's point, depending on the state Pyston's
runtime already is in, you may have the choice of using a hybrid of a
"pending exception" word in your runtime thread structure, and an
implicit alternate ("exceptional") return address for calls into
functions that may throw.  This lets you elide the check on the
pending exception word after calls by turning them into invokes that
unwind into a landingpad containing a generic exception handler.  This
generic exception handler then checks the type of the pending
exception word and handles the exception (which may involve rethrowing
to the caller if the current frame doesn't have catch handler).

Instead of relying on libgcc to unwind when you throw you can then
parse the [call PC, generic exception handling PC] pairs from the
.eh_frame section, and when throwing to your caller, look up the
generic exception handling PC (using the call PC pushed on the stack)
and "return" to that instead.  Rethrow is similar.

This scheme has the disadvantage of "returning" through every active
frame on an exception throw, even if a particular frame never had an
exception handler and could've been skipped safely.  However, this
scheme allows you to easily switch to one of two other implementations
based on profiling data on a per-callsite basis:

 1. high exception volume -- if an invoke has seen too many exception
    throws, recompile by replacing the invoke with a call followed by
    a test of "pending exception" and branch.  The logic to generate
    the branch target should largely be the same as logic to generate
    the landing pad block.

 2. low exception volume -- keep the invoke, but put a deoptimization
    trap in the landing pad block.

We did some rough benchmarking, and using such implicit exceptions
(i.e. not explicitly checking the pending exception word) reduces
non-throwing call overhead by 20-25%.  I don't have any numbers on how
it affects the performance of exceptional control flow though.

-- Sanjoy

Kevin Modzelewski

2014-May-02 23:37 UTC

head link

[LLVMdev] Question about implementing exceptions, especially to the VMKit team

That's definitely good confirmation to hear that the test+branch for every
call does in fact add noticeable overhead -- thanks for the datapoints.

What I'm taking away from this is that even within the space of
"unwind-based exception handling using DWARF CFI side-tables", there
is a
fair amount of room for different approaches with different tradeoffs, and
also potentially room for a custom-tailored unwinder to beat libgcc.
 That's definitely good to know, and you guys have encouraged me to peel
back the magic one more layer and try to implement my own unwinder :)

As for switching between unwind-based exceptions and checked-status-code
exceptions, I'm not quite sure I buy that that can completely be done by
the catching function, since the throwing function also needs to use the
matching mechanism.  I think if you truly want to do this, you need to
compile separate variants of whatever functions you might call (including
whatever functions they might call), one for each exception mechanism you
want to use.  I'm thinking about doing this, but only for certain built-in
functions that are expected to throw a lot.  Another option I'm thinking of
is to inline those particular functions and then create an optimization
pass that will know that py_throw always throws, and stitch up the CFG
appropriately.  Anyway, lots to chew on, thanks everyone for the responses!

Aside about Python exceptions: Python has interesting for loops, which are
always for-each loops and implement the termination condition using
exceptions:

PyObject *iterator; // what we're iterating over
while (true) {
    PyObject* i;
    try {
        i = iterator.next();
    } except (StopIteration) {
        break;
    }
    // do stuff
}

Percentage-wise, throwing the StopIteration might be rare, but I would
wager that most loops get terminated this way (as opposed to a "break"
statement) so it's certainly not never; I think this means the exception
gets thrown enough that it's better to handle the exception in-line rather
than do a deopt-on-throw.  Microbenchmarks suggest that for-loop overhead
is important enough that it's further worth trying to avoid any
exception-related unwinding entirely, but I'm not sure how true that is for
larger programs (probably somewhat true).

kmod

On Fri, May 2, 2014 at 12:43 PM, Sanjoy Das <sanjoy at azulsystems.com>
wrote:
> Hi Kevin,
>
> To elaborate on Philip's point, depending on the state Pyston's
> runtime already is in, you may have the choice of using a hybrid of a
> "pending exception" word in your runtime thread structure, and an
> implicit alternate ("exceptional") return address for calls into
> functions that may throw.  This lets you elide the check on the
> pending exception word after calls by turning them into invokes that
> unwind into a landingpad containing a generic exception handler.  This
> generic exception handler then checks the type of the pending
> exception word and handles the exception (which may involve rethrowing
> to the caller if the current frame doesn't have catch handler).
>
> Instead of relying on libgcc to unwind when you throw you can then
> parse the [call PC, generic exception handling PC] pairs from the
> .eh_frame section, and when throwing to your caller, look up the
> generic exception handling PC (using the call PC pushed on the stack)
> and "return" to that instead.  Rethrow is similar.
>
> This scheme has the disadvantage of "returning" through every
active
> frame on an exception throw, even if a particular frame never had an
> exception handler and could've been skipped safely.  However, this
> scheme allows you to easily switch to one of two other implementations
> based on profiling data on a per-callsite basis:
>
>  1. high exception volume -- if an invoke has seen too many exception
>     throws, recompile by replacing the invoke with a call followed by
>     a test of "pending exception" and branch.  The logic to
generate
>     the branch target should largely be the same as logic to generate
>     the landing pad block.
>
>  2. low exception volume -- keep the invoke, but put a deoptimization
>     trap in the landing pad block.
>
> We did some rough benchmarking, and using such implicit exceptions
> (i.e. not explicitly checking the pending exception word) reduces
> non-throwing call overhead by 20-25%.  I don't have any numbers on how
> it affects the performance of exceptional control flow though.
>
> -- Sanjoy
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140502/1f332f0d/attachment.html>

Philip Reames

2014-May-02 23:49 UTC

head link

[LLVMdev] Question about implementing exceptions, especially to the VMKit team

On 05/02/2014 04:37 PM, Kevin Modzelewski wrote:> That's definitely good confirmation to hear that the test+branch for 
> every call does in fact add noticeable overhead -- thanks for the 
> datapoints.
>
> What I'm taking away from this is that even within the space of 
> "unwind-based exception handling using DWARF CFI side-tables",
there
> is a fair amount of room for different approaches with different 
> tradeoffs, and also potentially room for a custom-tailored unwinder to 
> beat libgcc.  That's definitely good to know, and you guys have 
> encouraged me to peel back the magic one more layer and try to 
> implement my own unwinder :)Fair warning, I have absolutely no idea if our current implementation is 
actually a good idea or not.  We need to get back to that and actually 
benchmark the various options.  :)  We've been experimenting wildly, but 
without much rigour.  We've been mainly focused on identifying the 
possible options within LLVM.
>
> As for switching between unwind-based exceptions and 
> checked-status-code exceptions, I'm not quite sure I buy that that can 
> completely be done by the catching function, since the throwing 
> function also needs to use the matching mechanism.I think what we do at the moment is *always* set the 'pending exception'
flag, even if we're going to use the unwind table based dispatching.  As 
a result, any frame can decide to use either mechanism.  I'll point out 
though that this is purely an accident of implementation.  We didn't 
purposely design it this way.  :)

> I think if you truly want to do this, you need to compile separate 
> variants of whatever functions you might call (including whatever 
> functions they might call), one for each exception mechanism you want 
> to use.  I'm thinking about doing this, but only for certain built-in 
> functions that are expected to throw a lot.  Another option I'm 
> thinking of is to inline those particular functions and then create an 
> optimization pass that will know that py_throw always throws, and 
> stitch up the CFG appropriately.  Anyway, lots to chew on, thanks 
> everyone for the responses!I'll just mention that you really really want to translate throw/catch 
pairs in the same function into a direct jump where possible.  :)  In 
fact, LLVM should be doing this for you during inlining if you structure 
your IR properly.  Are you not seeing this in practice?
>
>
>
> Aside about Python exceptions: Python has interesting for loops, which 
> are always for-each loops and implement the termination condition 
> using exceptions:
>
> PyObject *iterator; // what we're iterating over
> while (true) {
>     PyObject* i;
>     try {
>         i = iterator.next();
>     } except (StopIteration) {
>         break;
>     }
>     // do stuff
> }
>
> Percentage-wise, throwing the StopIteration might be rare, but I would 
> wager that most loops get terminated this way (as opposed to a
"break"
> statement) so it's certainly not never; I think this means the 
> exception gets thrown enough that it's better to handle the exception 
> in-line rather than do a deopt-on-throw.  Microbenchmarks suggest that 
> for-loop overhead is important enough that it's further worth trying 
> to avoid any exception-related unwinding entirely, but I'm not sure 
> how true that is for larger programs (probably somewhat true).For this case in particular, you probably want to avoid throwing 
exceptions at all.  If you inline the next() function to expose the 
throw, you should be able to convert the "throw; catch;" into a branch
to the exit block.  This will really really help your performance as 
compared to just about any other option.

Philip>
> On Fri, May 2, 2014 at 12:43 PM, Sanjoy Das <sanjoy at azulsystems.com 
> <mailto:sanjoy at azulsystems.com>> wrote:
>
>     Hi Kevin,
>
>     To elaborate on Philip's point, depending on the state Pyston's
>     runtime already is in, you may have the choice of using a hybrid of a
>     "pending exception" word in your runtime thread structure,
and an
>     implicit alternate ("exceptional") return address for calls
into
>     functions that may throw.  This lets you elide the check on the
>     pending exception word after calls by turning them into invokes that
>     unwind into a landingpad containing a generic exception handler.  This
>     generic exception handler then checks the type of the pending
>     exception word and handles the exception (which may involve rethrowing
>     to the caller if the current frame doesn't have catch handler).
>
>     Instead of relying on libgcc to unwind when you throw you can then
>     parse the [call PC, generic exception handling PC] pairs from the
>     .eh_frame section, and when throwing to your caller, look up the
>     generic exception handling PC (using the call PC pushed on the stack)
>     and "return" to that instead.  Rethrow is similar.
>
>     This scheme has the disadvantage of "returning" through every
active
>     frame on an exception throw, even if a particular frame never had an
>     exception handler and could've been skipped safely.  However, this
>     scheme allows you to easily switch to one of two other implementations
>     based on profiling data on a per-callsite basis:
>
>      1. high exception volume -- if an invoke has seen too many exception
>         throws, recompile by replacing the invoke with a call followed by
>         a test of "pending exception" and branch.  The logic to
generate
>         the branch target should largely be the same as logic to generate
>         the landing pad block.
>
>      2. low exception volume -- keep the invoke, but put a deoptimization
>         trap in the landing pad block.
>
>     We did some rough benchmarking, and using such implicit exceptions
>     (i.e. not explicitly checking the pending exception word) reduces
>     non-throwing call overhead by 20-25%.  I don't have any numbers on
how
>     it affects the performance of exceptional control flow though.
>
>     -- Sanjoy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140502/d14f7df0/attachment.html>

Kevin Modzelewski

2014-May-27 22:07 UTC

head link

[LLVMdev] Question about implementing exceptions, especially to the VMKit team

Just a quick follow-up: I ended up using libgcc and C++ exception handling
for the initial implementation.  I managed to do dynamic eh_frame
registration with libunwind, and got to the point that I was starting to
implement a custom exception manager.  Not insurmountable, but then I
discovered that LLVM already has libgcc dynamic registration support (via
__register_frame), so a number of problems were sidestepped by just
directly using libgcc and C++ exception handling functions (ex
__cxa_throw).  I'm sure it's not the most performant implementation --
in
order to implement Python exception semantics on top of the C++ exception
handler, every Python-level exception handler is implemented as a "catch
all" at the C++-level, which then executes the Python matching semantics,
and potentially allocates and throws a new C++-level exception to continue
unwinding.  The longer term plan is to implement a custom exception manager
on libunwind, but for now the current libgcc setup seems good enough.
 There's still the goal to be able to catch Python-thrown exceptions in
C++, which is easy when Python throws C++ exceptions, but it might require
some trickery to avoid using existing C++ exception-handling runtime
functions (ex __cxa_begin_catch).

Also, I think I figured out why dynamically-registered frames are slower to
search than static ones, at least in libunwind -- the static information
ends up getting put into a binary search table in .eh_frame_hdr, but
dynamically registered frames go into a linked list of 1-element binary
search trees.  (I haven't read through libgcc since the plan is to move
away from it eventually, but I assume it does something similar.)  I think
this is solvable in libunwind by periodically compacting the linked list
and combining the search trees, which looks like it can be done completely
through the libunwind API.

kmod


On Fri, May 2, 2014 at 4:37 PM, Kevin Modzelewski <kmod at dropbox.com>
wrote:
> That's definitely good confirmation to hear that the test+branch for
every
> call does in fact add noticeable overhead -- thanks for the datapoints.
>
> What I'm taking away from this is that even within the space of
> "unwind-based exception handling using DWARF CFI side-tables",
there is a
> fair amount of room for different approaches with different tradeoffs, and
> also potentially room for a custom-tailored unwinder to beat libgcc.
>  That's definitely good to know, and you guys have encouraged me to
peel
> back the magic one more layer and try to implement my own unwinder :)
>
> As for switching between unwind-based exceptions and checked-status-code
> exceptions, I'm not quite sure I buy that that can completely be done
by
> the catching function, since the throwing function also needs to use the
> matching mechanism.  I think if you truly want to do this, you need to
> compile separate variants of whatever functions you might call (including
> whatever functions they might call), one for each exception mechanism you
> want to use.  I'm thinking about doing this, but only for certain
built-in
> functions that are expected to throw a lot.  Another option I'm
thinking of
> is to inline those particular functions and then create an optimization
> pass that will know that py_throw always throws, and stitch up the CFG
> appropriately.  Anyway, lots to chew on, thanks everyone for the responses!
>
>
>
> Aside about Python exceptions: Python has interesting for loops, which are
> always for-each loops and implement the termination condition using
> exceptions:
>
> PyObject *iterator; // what we're iterating over
> while (true) {
>     PyObject* i;
>     try {
>         i = iterator.next();
>     } except (StopIteration) {
>         break;
>     }
>     // do stuff
> }
>
> Percentage-wise, throwing the StopIteration might be rare, but I would
> wager that most loops get terminated this way (as opposed to a
"break"
> statement) so it's certainly not never; I think this means the
exception
> gets thrown enough that it's better to handle the exception in-line
rather
> than do a deopt-on-throw.  Microbenchmarks suggest that for-loop overhead
> is important enough that it's further worth trying to avoid any
> exception-related unwinding entirely, but I'm not sure how true that is
for
> larger programs (probably somewhat true).
>
> kmod
>
>
> On Fri, May 2, 2014 at 12:43 PM, Sanjoy Das <sanjoy at
azulsystems.com>wrote:
>
>> Hi Kevin,
>>
>> To elaborate on Philip's point, depending on the state Pyston's
>> runtime already is in, you may have the choice of using a hybrid of a
>> "pending exception" word in your runtime thread structure,
and an
>> implicit alternate ("exceptional") return address for calls
into
>> functions that may throw.  This lets you elide the check on the
>> pending exception word after calls by turning them into invokes that
>> unwind into a landingpad containing a generic exception handler.  This
>> generic exception handler then checks the type of the pending
>> exception word and handles the exception (which may involve rethrowing
>> to the caller if the current frame doesn't have catch handler).
>>
>> Instead of relying on libgcc to unwind when you throw you can then
>> parse the [call PC, generic exception handling PC] pairs from the
>> .eh_frame section, and when throwing to your caller, look up the
>> generic exception handling PC (using the call PC pushed on the stack)
>> and "return" to that instead.  Rethrow is similar.
>>
>> This scheme has the disadvantage of "returning" through every
active
>> frame on an exception throw, even if a particular frame never had an
>> exception handler and could've been skipped safely.  However, this
>> scheme allows you to easily switch to one of two other implementations
>> based on profiling data on a per-callsite basis:
>>
>>  1. high exception volume -- if an invoke has seen too many exception
>>     throws, recompile by replacing the invoke with a call followed by
>>     a test of "pending exception" and branch.  The logic to
generate
>>     the branch target should largely be the same as logic to generate
>>     the landing pad block.
>>
>>  2. low exception volume -- keep the invoke, but put a deoptimization
>>     trap in the landing pad block.
>>
>> We did some rough benchmarking, and using such implicit exceptions
>> (i.e. not explicitly checking the pending exception word) reduces
>> non-throwing call overhead by 20-25%.  I don't have any numbers on
how
>> it affects the performance of exceptional control flow though.
>>
>> -- Sanjoy
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140527/790a441a/attachment.html>

llvm dev - May 2014 - [LLVMdev] Question about implementing exceptions, especially to the VMKit team

[LLVMdev] Question about implementing exceptions, especially to the VMKit team

[LLVMdev] Question about implementing exceptions, especially to the VMKit team

[LLVMdev] Question about implementing exceptions, especially to the VMKit team

[LLVMdev] Question about implementing exceptions, especially to the VMKit team