Hi everyone!
I've been silently working on the previous exception handling proposal I
published last year
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-November/027311.html). However,
there were some major deficiencies in it.
After discussing this further with Jim Grosbach and John McCall, we've come
up with another proposal. As you can see, it incorporates the idea of
"regions" into LLVM's IR. It also leaves the door open for
implementing Chris's EH changes in his LLVMNotes. In fact, this can be
looked upon as a first step towards that goal.
Please enjoy and let me know if you have any comments!
-bw
                           Exception Handling in LLVM
Problem
======
The current exception handling mechanisms in LLVM are inadequate for generating
proper exception handling support for C++ and Objective-C. In particular, it
only approximates the "Itanium C++ ABI: Exception Handling"
documentation:
   http://www.codesourcery.com/public/cxx-abi/abi-eh.html
Also, it is lacking for other languages which could cause catchable exceptions
to be thrown by non-invoke instructions:
   http://nondot.org/~sabre/LLVMNotes/ExceptionHandlingChanges.txt
Thirdly, the exception handling metadata is "encoded" in the CFG,
which makes it
hard to gather all of the information needed for EH table generation.
I believe that the problem is a conceptual one. We are trying to replicate the
concept of a "region" in LLVM's IR where no such facility
currently exists.  In
order to create regions, there has to be a tight coupling of information between
the site where the exception was thrown and the place that handles the
exception. For example, DWARF exception handling needs type information at both
places in order to generate the exception handling tables.
Definitions
==========
A "region" is defined as a section of code where, if an exception is
thrown
anywhere within that section, control is passed to code that will handle the
exception (called a "landing pad"). The code which handles the
specific
exception is unique to that region. For example, "invoke A" and
"invoke B" are
within the same region (X):
Region X
..........................................
:                                        :
:    .----------.          .----------.  :
:    | invoke A |          | invoke B |  :
:    `----------'          `----------'  :
:      /      |              /      |    :
: normal      |         normal      |    :
..........................................
              |                     |
              |                     v
              |            .------------------.
              |            | B's cleanup code |
              |            `------------------'
              |                     |
              `---------------------'
                         |
                         v
             .-----------------------.
             |   A's cleanup code    |
             | dispatch for region X |
             `-----------------------'
                         |
           .-----------------------------.
           |        |          |         |
           v        v          v         v
         .----.   .----.     .----.  .--------.
         | C1 |   | C2 | ... | Cn |  | resume |
C1, C2, ..., Cn are the catches for the exception thrown. If none of the
catches' types match the type of the exception thrown, control passes to the
"resume".
Notice that even though invokes A and B are in the same region, they do not
necessarily jump to the same basic block when an exception occurs.
Proposal
=======
We want to take the concept of a region and make it explicit in the IR.
Dispatch Instruction
--------------------
We intruduce a new instruction called "dispatch." This instruction
holds most of
the information necessary for the exception handler to work.
Syntax:
   dispatch region(<value>) resume to label <resumedest>
     catches [
        <type> <val>, label <dest>
        ...
     ]
     catchall [ <type> <val>, label <dest> ]
     filters [
        <type> <val>, ...
     ]
Description:
* The "catchall", "catches", and "filters" clauses
are optional. If none are
  specified, then the landing pad is implicitly a "cleanup."
* The <resumedest> basic block is the destination to unwind to if the type
  thrown isn't matched to any of the choices.
* The "catches" clause is a list of types which the region can catch
and the
  destinations to jump to for each type.
* The "catchall" clause is the place to jump to if the exception type
doesn't
  match any of the types in the "catches" clause.
* The "region" value is an integer, similar to the
"addrspace" value, and is
  unique to each dispatch in a function. IR objects reference this value to
  indicate that they belong to the same region.
* The "filters" clause lists the types of exceptions which may be
thrown by the
  region.
Example:
  ;; With catch-all block
  dispatch region(37) resume to label %Resume [
      %struct.__fundamental_type_info_pseudo* @_ZTIi, label %bb1.lpad
      %struct.__pointer_type_info_pseudo* @_ZTIPKc, label %bb2.lpad
    ]
    catchall [i8* null, label %catchall.lpad]
    filters [i8* bitcast (%struct.__pointer_type_info_pseudo* @_ZTIPKc to i8*),
             i8* bitcast (%struct.__fundamental_type_info_pseudo* @_ZTIi to i8*]
  ;; No catch-all block
  dispatch region(927) resume to label %Resume [
      %struct.__fundamental_type_info_pseudo* @_ZTIi, label %bb1.lpad
      %struct.__pointer_type_info_pseudo* @_ZTIPKc, label %bb2.lpad
    ]
  ;; Cleanup only
  dispatch region(42) resume to label %Unwind
Invoke Instruction
------------------
The invoke instruction would be augmented to add a "personality" field
and a
"region" indicator.
Syntax:
  <result> = invoke [cconv] [ret attrs] <ptr to function ty>
               <function ptr val>(<function args>) [fn attrs]
               to label <normal label>
               unwind to label <exception label>
               region(<value>)
               personality [<type> <value>]
Description:
* The personality field indicates the personality function at that invoke call.
* The region field's value must match to a dispatch with the same region
value.
Example:
  %retval = invoke i32 @Func(i32 927)
              to label %Normal unwind label %Landing.Pad
              region(1)
              personality [i32 (...)* @__gxx_personality_v0]
Inlining
=======
Inlining regions results in a merging of dispatch instructions. Care must be
taken when inlining into the cleanup code of a landing pad. The
"resume" of the
inlinee may need to point to the "resume" of the inliner region.
Future Work
==========
This design is meant to be extensible. We don't address augmenting basic
blocks
as in Chris's note. However, this design allows for that eventuality; it
simply
doesn't require it. I.e., when that design is implemented, the
"region" keyword
would migrate from the invoke instruction to the basic block. The dispatch
instruction would remain the same.
Hi Bill,
First, I like the idea of being more expressive in IR, since the old
exception handling style didn't quite express exception handling, but
alternate paths (resume/unwind). That made DwarfException.cpp a big
hack. ;)
It also makes the front-end engineer's life a lot easier, since there
is less need to keep a long list of global state for all current
landing pads, clean up areas, terminate handlers, etc.
If I got it right, the dispatch instruction will tell the
instructions/calls to unwind to specific landing pads (cleanup areas,
terminate), but the region number will encode try/catch areas, so that
all those cleanup landing pads should ultimately end up in the catch
area for that region.
If that's so, how do you encode which which landing pad is to be
followed per region?
Consider the following code:
try {
  Foo f();
  f.run(); // can throw exception
  Bar b();
  b.run(); // can throw exception
  Baz z();
  z.run(); // can throw exception
} catch (...) {
}
The object 'f' is in a different cleanup area than 'b' which, in
turn
is in a different area than 'z'. These three regions should point to
three different landing pads (or different offsets in the same landing
pad), which (I believe) are encoded in IR by being declared after
different dispatch instructions, all of which within the same region.
So, if the dispatch instruction points to a catch area, how do you
differentiate between clean-up areas? If they point to clean-up areas,
how to you specify a catch area per group, to go after cleaning?
My second question is, why did you keep the invoke?
As far as I got it, you have a dispatch instruction inside a basic
block, and whatever throws an exception there should unwind to the
dispatch area (if it matches the filters), in order of appearance,
ending in the "resume unwinding" path if none matches.
If that's so, why do you still have the invoke call? Why should you
treat call-exceptions any differently than instruction-exceptions?
Since this is a major refactoring of the IR (new back-ends won't
understand old IRs at all), there is no point of keeping
compatibility...
> * The "catchall", "catches", and "filters"
clauses are optional. If none are
>  specified, then the landing pad is implicitly a "cleanup."
>
> * The <resumedest> basic block is the destination to unwind to if the
type
>  thrown isn't matched to any of the choices.
>
> * The "catches" clause is a list of types which the region can
catch and the
>  destinations to jump to for each type.
>
> * The "catchall" clause is the place to jump to if the exception
type doesn't
>  match any of the types in the "catches" clause.
>
> * The "region" value is an integer, similar to the
"addrspace" value, and is
>  unique to each dispatch in a function. IR objects reference this value to
>  indicate that they belong to the same region.
>
> * The "filters" clause lists the types of exceptions which may be
thrown by the
>  region.
These are major enhancements over the current model, as the back-end
doesn't have to guess anything. Both C++ and Java have those concepts
explicit in the language and (AFAIK) that was somewhat lost in the
"encoding".
best,
--renato
On Nov 24, 2010, at 2:59 AM, Renato Golin wrote:> If I got it right, the dispatch instruction will tell the > instructions/calls to unwind to specific landing pads (cleanup areas, > terminate), but the region number will encode try/catch areas, so that > all those cleanup landing pads should ultimately end up in the catch > area for that region.Caveat: I'm speaking from what I remember of our discussions, which is not necessarily what Bill is intending to propose; that said, I'm pretty confident that the design hasn't significantly changed. A dispatch instruction is part of a landing pad, not part of the normal instruction stream. A dispatch is actually 1-1 with a specific landing pad, and that pair of landing pad + dispatch instruction is basically all a region is. So the term is a bit misleading because it suggests that the landing pad is directly associated in the IR with a range of instructions, whereas in fact the current design is orthogonal from the question of how you actually reach a landing pad in the first place. For now, that's still via explicit invokes; the invoke names the region it unwinds to — Bill has it listing both the region number and the landing pad block, which I think is redundant but harmless. In my opinion, the most crucial property of the new design is that it makes the chaining of regions explicit in the IR. The "resume" edge from a dispatch instruction always leads to either another region or to a bit of code which re-enters the unwinder in some opaque way. When the inliner inlines a call in a protected region (i.e. an invoke, for now), it just forwards the outermost resume edges in the inlined function to the innermost region in the calling function, potentially making the old code unreachable. Frontends are responsible for emitting regions and associated resume code for which this preserves semantics. So every landing pad actually has a stack of regions which CodeGen has to examine to write out the unwind tables, but it's easy to figure out that stack just by chasing links. While I'm at it, there's another important property of dispatch — it's undefined behavior to leave the function between landing at a landing pad and reaching the dispatch.> If that's so, how do you encode which which landing pad is to be > followed per region? > > Consider the following code: > > try { > Foo f(); > f.run(); // can throw exception > Bar b(); > b.run(); // can throw exception > Baz z(); > z.run(); // can throw exception > } catch (...) { > }I assume you don't mean these to be function declarations. :)> The object 'f' is in a different cleanup area than 'b' which, in turn > is in a different area than 'z'. These three regions should point to > three different landing pads (or different offsets in the same landing > pad), which (I believe) are encoded in IR by being declared after > different dispatch instructions, all of which within the same region.Nope. Three regions, three landing pads, three dispatch instructions. (actually four if Foo::Foo() can throw). The Baz-destructing region chains to the Bar-destructing region which chains to the Foo-destructing region which chains to the catching region; the first three are cleanup-only.> If that's so, why do you still have the invoke call? Why should you > treat call-exceptions any differently than instruction-exceptions?One of my favorite things about this design is that it's totally independent of what exactly is allowed to throw. I'm really not sure how best to represent other throwing instructions, except that I'm pretty confident that we don't want anything as heavyweight as invoke. There's a pretty broad range of possibilities — we could make invoke-like instructions for all of them (ick...), or we could tag individual instructions with regions, or we could mark basic blocks as unwinding to particular places. But we can wrestle with that independently of deciding to adopt explicitly-chained landing pads. John.
On Nov 24, 2010, at 2:59 AM, Renato Golin wrote:> Hi Bill, > > First, I like the idea of being more expressive in IR, since the old > exception handling style didn't quite express exception handling, but > alternate paths (resume/unwind). That made DwarfException.cpp a big > hack. ;) > > It also makes the front-end engineer's life a lot easier, since there > is less need to keep a long list of global state for all current > landing pads, clean up areas, terminate handlers, etc. >That's a big goal of this rewrite. :-)> If I got it right, the dispatch instruction will tell the > instructions/calls to unwind to specific landing pads (cleanup areas, > terminate), but the region number will encode try/catch areas, so that > all those cleanup landing pads should ultimately end up in the catch > area for that region. >Yes, that's correct.> If that's so, how do you encode which which landing pad is to be > followed per region? > > Consider the following code: > > try { > Foo f(); > f.run(); // can throw exception > Bar b(); > b.run(); // can throw exception > Baz z(); > z.run(); // can throw exception > } catch (...) { > } > > The object 'f' is in a different cleanup area than 'b' which, in turn > is in a different area than 'z'. These three regions should point to > three different landing pads (or different offsets in the same landing > pad), which (I believe) are encoded in IR by being declared after > different dispatch instructions, all of which within the same region. > > So, if the dispatch instruction points to a catch area, how do you > differentiate between clean-up areas? If they point to clean-up areas, > how to you specify a catch area per group, to go after cleaning? >There are two different bits of information in the proposal that address this: the "unwind edge" from the invoke and the region number. It's the "unwind edge" which carries the bit of information that you're asking about. My mental view of this is that the region number correlates to the dispatch instruction (and, therefore, the catch blocks) and the unwind edge correlates to the cleanups. (This isn't exactly correct, but it helps to understand how all of this information will be used to generate the correct tables, etc.) So the above code would look something like this: entry: invoke @f.run() to %bb unwind to %FooCleanup region(1) bb: invoke @b.run() to %bb2 unwind to %BarCleanup region(1) bb2: invoke @z.run() to %bb3 unwind to %BazCleanup region(1) ... BazCleanup: ; do stuff br %BarCleanup BarCleanup: ; do stuff br %FooCleanup FooCleanup: ; do stuff br %Dispatch Dispatch: dispatch region(1) resume %block catches [ ... ] and so on. If we remove the invoke instructions, the "unwind to" information then goes onto the basic block itself. E.g.: bb: unwind to %FooCleanup call @f1.run() call @f2.run() call @f3.run() call @f4.run()> My second question is, why did you keep the invoke? > > As far as I got it, you have a dispatch instruction inside a basic > block, and whatever throws an exception there should unwind to the > dispatch area (if it matches the filters), in order of appearance, > ending in the "resume unwinding" path if none matches. > > If that's so, why do you still have the invoke call? Why should you > treat call-exceptions any differently than instruction-exceptions? > > Since this is a major refactoring of the IR (new back-ends won't > understand old IRs at all), there is no point of keeping > compatibility... >I kept it purely from a practical standpoint: it will be easier (and thus quicker) to do it this way. :-) Also, it could be done a bit more incrementally than completely removing the invoke. But I'm not opposed to removing the invoke.>> * The "catchall", "catches", and "filters" clauses are optional. If none are >> specified, then the landing pad is implicitly a "cleanup." >> >> * The <resumedest> basic block is the destination to unwind to if the type >> thrown isn't matched to any of the choices. >> >> * The "catches" clause is a list of types which the region can catch and the >> destinations to jump to for each type. >> >> * The "catchall" clause is the place to jump to if the exception type doesn't >> match any of the types in the "catches" clause. >> >> * The "region" value is an integer, similar to the "addrspace" value, and is >> unique to each dispatch in a function. IR objects reference this value to >> indicate that they belong to the same region. >> >> * The "filters" clause lists the types of exceptions which may be thrown by the >> region. > > These are major enhancements over the current model, as the back-end > doesn't have to guess anything. Both C++ and Java have those concepts > explicit in the language and (AFAIK) that was somewhat lost in the > "encoding". >Thanks, Renato! :-) -bw
> dispatch region(927) resume to label %Resume [ > %struct.__fundamental_type_info_pseudo* @_ZTIi, label %bb1.lpad > %struct.__pointer_type_info_pseudo* @_ZTIPKc, label %bb2.lpad > ]How do you handle two language types that map to the same LLVM type? Some languages need fancier catches. For example, being able to catch any object of class foo or a class that derives from foo. This is to be implemented by having a landing pad for foo that does the necessary checks and jumps to the actual (language level) handler? If I understand this correctly and there is indeed the need for some language specific logic at the IL, wouldn't it be simpler to have a simple catch label at the LLVM level? Cheers, Rafael