Hi guys,
I have begun a modification to the invoke/unwind instructions. The following
.ll file demonstrates the change.
define i32 @v(i32 %o) {
%r = icmp eq i32 %o, 0
br i1 %r, label %raise, label %ok
ok:
%m = mul i32 %o, 2
ret i32 %m
raise:
%ex = inttoptr i32 255 to i8 *
; unwind now takes an i8* "exception" pointer
unwind i8* %ex
}
define i32 @g(i32 %o) {
entry:
; invoke produces a different value depending on whether if
; branches to the success case or the failure case.
%s = invoke i32 @v(i32 %o) to label %ok
unwind %x to label %catch
ok:
ret i32 %s
catch:
%v = ptrtoint i8 * %x to i32
%r = icmp eq i32 %v, 255
br i1 %r, label %bad, label %worse
bad:
ret i32 -1
worse:
ret i32 -2
}
With my current change, the unwind instruction is able to pass a value to
the unwind branch of the invoke instruction. I was able to coax LLVM into
generating correct code for this using the LowerInvoke pass generating
expensive but correct code via setjmp/longjmp.
The unwind instruction now takes a single i8* parameter. This value is
propagated to the nearest invoke instruction that generated the call to the
function containing the unwind instruction.
The invoke instruction now generated one of two different values depending
on how the call exits. If the call exits via a return instruction, the
invoke instruction generates a return value (denoted by %s in the sample
code). If the call exits via an unwind instruction, the invoke generates an
exception value (denoted by %x in the sample code). The return value is only
valid if the invoke branches to the return branch. The exception value is
only valid if the invoke instruction branches to the unwind branch.
For sources that are not attempting to integrate into a third parting
exception handling mechanism (gcc, or SEH), this would be enough to
implement exception handling. When integrating into external exception
handling mechanisms, the "exception" value generated from the invoke
instruction would replace the call the 'eh.exception' intrinsic, and
would
have the benefit of making it much easier for analysis passes
to associate this value with the invoke that generated it. For the unwind,
if all thats needed is an exception pointer than an unwind instruction could
be used, and lowered to the appropriate runtime library.
To make this work, the fundamental concept that an instruction always
produces a single value needs to change. This concept was already somewhat
violated by the invoke instruction since if it branched to the unwind block,
the return value was not actually generated. But in its existing form, it
looks like it only generates one value. As far as SSA is concerned, I don't
see any problem with an operation generating multiple values under different
circumstances since there is still only one source for any value. As long as
the block being branched to dominates any usage of the respective value I
think its correct and optimizations should be able to perform correctly.
Unfortunately, the fact that a value and the instruction that generates it
are one and the same makes it very difficult to generate a representation
where a single instruction can generate more that one value. My current
solution (which feels wrong) is to have the invoke instruction own an
additional "exception" value that represents the value that is
generated
when continuing via the unwind branch. This value is quite different from
other values and therefore inherits directly from llvm::Value. When lowering
the invoke instruction the LowerInvoke pass replaces usage of this
"exception" value with the return value of the setjmp call after is
has been
determined that the setjmp returned from a longjmp. When lowering the unwind
instruction the LowerInvoke pass puts the argument to the unwind instruction
as the value parameter to the longjmp call.
While the lowering of this representation seemed natural, parsing it has
proven difficult. This "exception" value must be in the functions
symbol
table, but in the current structure of the parser, the name of the
instructions value is not and cannot be set until after it has been added to
the containing basic block. The problem is that at that point, the parser
doesn't know that the instruction produces another value, and even if it
did, it has lost the needed information to properly register the name with
the symbol table. To get past this point, I put a nasty hack in place. I
gave LLParser permission to see the internals of instruction so I could
temporarily assign the invoke instructions parent pointer ahead of time so
that the call to setName on the "exception" value could succeed. Once
this
value is in the symbol table, there is currently no way to get it out. The
code that removes an Instruction's entry from the symbol table is unaware of
the additional value that needs to be removed. This causes a seemingly
benign assertion at shutdown about the symbol table not being empty.
Bitcode I/O is also another problem, in my current build, it is broken.
There is currently no way to bind to the "exception" value of the
invoke
instruction. I have yet to look into this in any way as it was not needed to
get my sample code through to the code generator.
In closing, I am looking for some feedback as the whether this approach
makes sense. I would also like to know if anyone has any suggestions on how
to deal with some of the issues. I have included a patch with the changes I
have made so far. It is still very rough but I though it might be usefull.
-Nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100925/8c221640/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: exception.patch
Type: application/octet-stream
Size: 24705 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100925/8c221640/attachment.obj>
On 25 September 2010 23:46, Nathan Jeffords <blunted2night at gmail.com> wrote:> catch: > %v = ptrtoint i8 * %x to i32 > %r = icmp eq i32 %v, 255 > br i1 %r, label %bad, label %worse > bad: > ret i32 -1 > worse: > ret i32 -2 > }If I understood correctly, you're trying to pass the clean-up flag through %x directly on the invoke call. But later avoid the eh.exception call, assuming that was in %x. The problem is that you're mixing two concepts: The exception structure contains information about the object that was thrown and not a "good" number. That's the role of the clean-up flag (in case the catch blocks can't deal with the exception) or the landing pads (that should reflect the return values the user asked for in their programs). It's the users role to tell what's good and what's not (return values included). the only thing you (compiler) can do is to explode prematurely in case you can't properly catch the error (ie. throw inside throw, throw inside delete, etc). If that's the case, your implementation will not work for dwarf exceptions, and I wouldn't recommend having an *invoke* syntax for each type of exception handling mechanism. Other question: why are you passing untyped %x? I haven't seen any untyped variable in LLVM, so far, and I think it's good to be redundant in this case. That alone would have caught the mistake. If you need an i32 (for your bad/worse comparison), throwing i8* would have hinted that you crossed the concepts. On a side note... Exception handling was designed by the devil himself. Part of the flow control is designed by the user (try/catch blocks, throw specifications), part of it is designed by the compiler, in exception tables (specific unwinding instructions and types), and part by the library writers (unwinding and personality routines). All that, decided in three different time frames, by three different kinds of developers, have to communicate perfectly in run time. It'd be very difficult for the compiler to optimize automatically without breaking run-time assumptions. All of that is controlled by different ABIs, that make sure all three universes are talking the same language. You can't change one without changing all the others... To be honest, I'm still surprised that it actually works at all! ;) -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
I may me wrong, but I think Nathan used ints for demonstration
purposes only. unwind always takes i8* argument that ideally should be
a pointer to exception structure, variable %x in invoke is also typed
i8*, it's not "untyped". Probably more llvm-ish syntax would be
unwind i8* %x to label %catch
to show the type explicitly.
However throwing a pointer to a structure raises questions about
structure's ownership, so I think Nathan cheated a bit and threw an
int to make the code snippet simpler. Landing pad code then checked
the int, whereas real code would expect exception object.
"Real" code would look more like:
define i32 @v(i32 %o) {
%r = icmp eq i32 %o, 0
br i1 %r, label %raise, label %ok
ok:
%m = mul i32 %o, 2
ret i32 %m
raise:
%ex = call i8* @allocate_exception()
call void @init_div_by_0_eh(i8* %ex)
; unwind now takes an i8* "exception" pointer
unwind i8* %ex
}
define i32 @g(i32 %o) {
entry:
; invoke produces a different value depending on whether if
; branches to the success case or the failure case.
%s = invoke i32 @v(i32 %o) to label %ok
unwind %x to label %catch
ok:
ret i32 %s
catch:
%type = call i32 @exception_type(i8* %x)
%r = icmp eq i32 %type, 255 ; 255 is DivisionByZeroException type
br i1 %r, label %bad, label %worse
bad:
ret i32 -1
worse:
ret i32 -2
}
Nathan -- is this approach simpler than using intrinsics @eh.throw
(assuming it's added) and @eh.exception? The latter seems more
flexible in supporting various levels of ABI (I think ideally LLVM
exception handling should follow general platform ABI and also allow
front-ends for specific languages generate code in accordance with
language specific ABI).
I going with invoke instruction that returns exception pointer (which
feels right to me) maybe this is a good candidate for using union type
-- invoke can produce a single result which is either normal return
value or an exception pointer, since only one of the two values can be
actually produced. This sounds logical but may be taking us too far
from ABIs.
Eugene
On Sun, Sep 26, 2010 at 12:19 PM, Renato Golin <rengolin at
systemcall.org> wrote:> On 25 September 2010 23:46, Nathan Jeffords <blunted2night at
gmail.com> wrote:
>> catch:
>> %v = ptrtoint i8 * %x to i32
>> %r = icmp eq i32 %v, 255
>> br i1 %r, label %bad, label %worse
>> bad:
>> ret i32 -1
>> worse:
>> ret i32 -2
>> }
>
> If I understood correctly, you're trying to pass the clean-up flag
> through %x directly on the invoke call. But later avoid the
> eh.exception call, assuming that was in %x.
>
> The problem is that you're mixing two concepts: The exception
> structure contains information about the object that was thrown and
> not a "good" number. That's the role of the clean-up flag (in
case the
> catch blocks can't deal with the exception) or the landing pads (that
> should reflect the return values the user asked for in their
> programs).
>
> It's the users role to tell what's good and what's not (return
values
> included). the only thing you (compiler) can do is to explode
> prematurely in case you can't properly catch the error (ie. throw
> inside throw, throw inside delete, etc).
>
> If that's the case, your implementation will not work for dwarf
> exceptions, and I wouldn't recommend having an *invoke* syntax for
> each type of exception handling mechanism.
>
> Other question: why are you passing untyped %x? I haven't seen any
> untyped variable in LLVM, so far, and I think it's good to be
> redundant in this case. That alone would have caught the mistake. If
> you need an i32 (for your bad/worse comparison), throwing i8* would
> have hinted that you crossed the concepts.
>
>
> On a side note...
>
> Exception handling was designed by the devil himself. Part of the flow
> control is designed by the user (try/catch blocks, throw
> specifications), part of it is designed by the compiler, in exception
> tables (specific unwinding instructions and types), and part by the
> library writers (unwinding and personality routines). All that,
> decided in three different time frames, by three different kinds of
> developers, have to communicate perfectly in run time.
>
> It'd be very difficult for the compiler to optimize automatically
> without breaking run-time assumptions. All of that is controlled by
> different ABIs, that make sure all three universes are talking the
> same language. You can't change one without changing all the others...
>
> To be honest, I'm still surprised that it actually works at all! ;)
>
> --
> cheers,
> --renato
>
> http://systemcall.org/
>
> Reclaim your digital rights, eliminate DRM, learn more at
> http://www.defectivebydesign.org/what_is_drm
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
On Sun, Sep 26, 2010 at 4:19 AM, Renato Golin <rengolin at systemcall.org>wrote:> On 25 September 2010 23:46, Nathan Jeffords <blunted2night at gmail.com> > wrote: > > catch: > > %v = ptrtoint i8 * %x to i32 > > %r = icmp eq i32 %v, 255 > > br i1 %r, label %bad, label %worse > > bad: > > ret i32 -1 > > worse: > > ret i32 -2 > > } > > If I understood correctly, you're trying to pass the clean-up flag > through %x directly on the invoke call. But later avoid the > eh.exception call, assuming that was in %x. > > The problem is that you're mixing two concepts: The exception > structure contains information about the object that was thrown and > not a "good" number. That's the role of the clean-up flag (in case the > catch blocks can't deal with the exception) or the landing pads (that > should reflect the return values the user asked for in their > programs). > > It's the users role to tell what's good and what's not (return values > included). the only thing you (compiler) can do is to explode > prematurely in case you can't properly catch the error (ie. throw > inside throw, throw inside delete, etc). >The argument to the unwind instruction is always an i8* pointer, which in the case of dwarf exception handling would be the allocated exception object. I did cheat for the example, but this also demonstrates that an arbitrary value could be passed if using an LLVM specific exception handling implementation like the setjmp/longjmp version provided by the LowerInvoke pass.> > If that's the case, your implementation will not work for dwarf > exceptions, and I wouldn't recommend having an *invoke* syntax for > each type of exception handling mechanism. > > Other question: why are you passing untyped %x? I haven't seen any > untyped variable in LLVM, so far, and I think it's good to be > redundant in this case. That alone would have caught the mistake. If > you need an i32 (for your bad/worse comparison), throwing i8* would > have hinted that you crossed the concepts. > >The syntax for the invoke instruction is a little misleading. %x is a value that is being generated by the instruction, not passed to is. It is no different in that regard as to say '%x = call @eh.exception ...'. Since you don't specify the type in that type of assignment, I chose not to here either.> > On a side note... > > Exception handling was designed by the devil himself. Part of the flow > control is designed by the user (try/catch blocks, throw > specifications), part of it is designed by the compiler, in exception > tables (specific unwinding instructions and types), and part by the > library writers (unwinding and personality routines). All that, > decided in three different time frames, by three different kinds of > developers, have to communicate perfectly in run time. > > It'd be very difficult for the compiler to optimize automatically > without breaking run-time assumptions. All of that is controlled by > different ABIs, that make sure all three universes are talking the > same language. You can't change one without changing all the others... > >I agree; this change is not attempting to change how exception handling works, just provide a small change in how it is represented in the IR to make it more direct. Especially for users not using gcc/dwarf exception handling (I hope to attempt an SEH implementation)> To be honest, I'm still surprised that it actually works at all! ;) > > -- > cheers, > --renato > > http://systemcall.org/ > > Reclaim your digital rights, eliminate DRM, learn more at > http://www.defectivebydesign.org/what_is_drm >Thanks for the feedback -Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100926/8451d346/attachment.html>