Hi guys, I have begun a modification to the invoke/unwind instructions. The following .ll file demonstrates the change. define i32 @v(i32 %o) { %r = icmp eq i32 %o, 0 br i1 %r, label %raise, label %ok ok: %m = mul i32 %o, 2 ret i32 %m raise: %ex = inttoptr i32 255 to i8 * ; unwind now takes an i8* "exception" pointer unwind i8* %ex } define i32 @g(i32 %o) { entry: ; invoke produces a different value depending on whether if ; branches to the success case or the failure case. %s = invoke i32 @v(i32 %o) to label %ok unwind %x to label %catch ok: ret i32 %s catch: %v = ptrtoint i8 * %x to i32 %r = icmp eq i32 %v, 255 br i1 %r, label %bad, label %worse bad: ret i32 -1 worse: ret i32 -2 } With my current change, the unwind instruction is able to pass a value to the unwind branch of the invoke instruction. I was able to coax LLVM into generating correct code for this using the LowerInvoke pass generating expensive but correct code via setjmp/longjmp. The unwind instruction now takes a single i8* parameter. This value is propagated to the nearest invoke instruction that generated the call to the function containing the unwind instruction. The invoke instruction now generated one of two different values depending on how the call exits. If the call exits via a return instruction, the invoke instruction generates a return value (denoted by %s in the sample code). If the call exits via an unwind instruction, the invoke generates an exception value (denoted by %x in the sample code). The return value is only valid if the invoke branches to the return branch. The exception value is only valid if the invoke instruction branches to the unwind branch. For sources that are not attempting to integrate into a third parting exception handling mechanism (gcc, or SEH), this would be enough to implement exception handling. When integrating into external exception handling mechanisms, the "exception" value generated from the invoke instruction would replace the call the 'eh.exception' intrinsic, and would have the benefit of making it much easier for analysis passes to associate this value with the invoke that generated it. For the unwind, if all thats needed is an exception pointer than an unwind instruction could be used, and lowered to the appropriate runtime library. To make this work, the fundamental concept that an instruction always produces a single value needs to change. This concept was already somewhat violated by the invoke instruction since if it branched to the unwind block, the return value was not actually generated. But in its existing form, it looks like it only generates one value. As far as SSA is concerned, I don't see any problem with an operation generating multiple values under different circumstances since there is still only one source for any value. As long as the block being branched to dominates any usage of the respective value I think its correct and optimizations should be able to perform correctly. Unfortunately, the fact that a value and the instruction that generates it are one and the same makes it very difficult to generate a representation where a single instruction can generate more that one value. My current solution (which feels wrong) is to have the invoke instruction own an additional "exception" value that represents the value that is generated when continuing via the unwind branch. This value is quite different from other values and therefore inherits directly from llvm::Value. When lowering the invoke instruction the LowerInvoke pass replaces usage of this "exception" value with the return value of the setjmp call after is has been determined that the setjmp returned from a longjmp. When lowering the unwind instruction the LowerInvoke pass puts the argument to the unwind instruction as the value parameter to the longjmp call. While the lowering of this representation seemed natural, parsing it has proven difficult. This "exception" value must be in the functions symbol table, but in the current structure of the parser, the name of the instructions value is not and cannot be set until after it has been added to the containing basic block. The problem is that at that point, the parser doesn't know that the instruction produces another value, and even if it did, it has lost the needed information to properly register the name with the symbol table. To get past this point, I put a nasty hack in place. I gave LLParser permission to see the internals of instruction so I could temporarily assign the invoke instructions parent pointer ahead of time so that the call to setName on the "exception" value could succeed. Once this value is in the symbol table, there is currently no way to get it out. The code that removes an Instruction's entry from the symbol table is unaware of the additional value that needs to be removed. This causes a seemingly benign assertion at shutdown about the symbol table not being empty. Bitcode I/O is also another problem, in my current build, it is broken. There is currently no way to bind to the "exception" value of the invoke instruction. I have yet to look into this in any way as it was not needed to get my sample code through to the code generator. In closing, I am looking for some feedback as the whether this approach makes sense. I would also like to know if anyone has any suggestions on how to deal with some of the issues. I have included a patch with the changes I have made so far. It is still very rough but I though it might be usefull. -Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100925/8c221640/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: exception.patch Type: application/octet-stream Size: 24705 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100925/8c221640/attachment.obj>
On 25 September 2010 23:46, Nathan Jeffords <blunted2night at gmail.com> wrote:> catch: > %v = ptrtoint i8 * %x to i32 > %r = icmp eq i32 %v, 255 > br i1 %r, label %bad, label %worse > bad: > ret i32 -1 > worse: > ret i32 -2 > }If I understood correctly, you're trying to pass the clean-up flag through %x directly on the invoke call. But later avoid the eh.exception call, assuming that was in %x. The problem is that you're mixing two concepts: The exception structure contains information about the object that was thrown and not a "good" number. That's the role of the clean-up flag (in case the catch blocks can't deal with the exception) or the landing pads (that should reflect the return values the user asked for in their programs). It's the users role to tell what's good and what's not (return values included). the only thing you (compiler) can do is to explode prematurely in case you can't properly catch the error (ie. throw inside throw, throw inside delete, etc). If that's the case, your implementation will not work for dwarf exceptions, and I wouldn't recommend having an *invoke* syntax for each type of exception handling mechanism. Other question: why are you passing untyped %x? I haven't seen any untyped variable in LLVM, so far, and I think it's good to be redundant in this case. That alone would have caught the mistake. If you need an i32 (for your bad/worse comparison), throwing i8* would have hinted that you crossed the concepts. On a side note... Exception handling was designed by the devil himself. Part of the flow control is designed by the user (try/catch blocks, throw specifications), part of it is designed by the compiler, in exception tables (specific unwinding instructions and types), and part by the library writers (unwinding and personality routines). All that, decided in three different time frames, by three different kinds of developers, have to communicate perfectly in run time. It'd be very difficult for the compiler to optimize automatically without breaking run-time assumptions. All of that is controlled by different ABIs, that make sure all three universes are talking the same language. You can't change one without changing all the others... To be honest, I'm still surprised that it actually works at all! ;) -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
I may me wrong, but I think Nathan used ints for demonstration purposes only. unwind always takes i8* argument that ideally should be a pointer to exception structure, variable %x in invoke is also typed i8*, it's not "untyped". Probably more llvm-ish syntax would be unwind i8* %x to label %catch to show the type explicitly. However throwing a pointer to a structure raises questions about structure's ownership, so I think Nathan cheated a bit and threw an int to make the code snippet simpler. Landing pad code then checked the int, whereas real code would expect exception object. "Real" code would look more like: define i32 @v(i32 %o) { %r = icmp eq i32 %o, 0 br i1 %r, label %raise, label %ok ok: %m = mul i32 %o, 2 ret i32 %m raise: %ex = call i8* @allocate_exception() call void @init_div_by_0_eh(i8* %ex) ; unwind now takes an i8* "exception" pointer unwind i8* %ex } define i32 @g(i32 %o) { entry: ; invoke produces a different value depending on whether if ; branches to the success case or the failure case. %s = invoke i32 @v(i32 %o) to label %ok unwind %x to label %catch ok: ret i32 %s catch: %type = call i32 @exception_type(i8* %x) %r = icmp eq i32 %type, 255 ; 255 is DivisionByZeroException type br i1 %r, label %bad, label %worse bad: ret i32 -1 worse: ret i32 -2 } Nathan -- is this approach simpler than using intrinsics @eh.throw (assuming it's added) and @eh.exception? The latter seems more flexible in supporting various levels of ABI (I think ideally LLVM exception handling should follow general platform ABI and also allow front-ends for specific languages generate code in accordance with language specific ABI). I going with invoke instruction that returns exception pointer (which feels right to me) maybe this is a good candidate for using union type -- invoke can produce a single result which is either normal return value or an exception pointer, since only one of the two values can be actually produced. This sounds logical but may be taking us too far from ABIs. Eugene On Sun, Sep 26, 2010 at 12:19 PM, Renato Golin <rengolin at systemcall.org> wrote:> On 25 September 2010 23:46, Nathan Jeffords <blunted2night at gmail.com> wrote: >> catch: >> %v = ptrtoint i8 * %x to i32 >> %r = icmp eq i32 %v, 255 >> br i1 %r, label %bad, label %worse >> bad: >> ret i32 -1 >> worse: >> ret i32 -2 >> } > > If I understood correctly, you're trying to pass the clean-up flag > through %x directly on the invoke call. But later avoid the > eh.exception call, assuming that was in %x. > > The problem is that you're mixing two concepts: The exception > structure contains information about the object that was thrown and > not a "good" number. That's the role of the clean-up flag (in case the > catch blocks can't deal with the exception) or the landing pads (that > should reflect the return values the user asked for in their > programs). > > It's the users role to tell what's good and what's not (return values > included). the only thing you (compiler) can do is to explode > prematurely in case you can't properly catch the error (ie. throw > inside throw, throw inside delete, etc). > > If that's the case, your implementation will not work for dwarf > exceptions, and I wouldn't recommend having an *invoke* syntax for > each type of exception handling mechanism. > > Other question: why are you passing untyped %x? I haven't seen any > untyped variable in LLVM, so far, and I think it's good to be > redundant in this case. That alone would have caught the mistake. If > you need an i32 (for your bad/worse comparison), throwing i8* would > have hinted that you crossed the concepts. > > > On a side note... > > Exception handling was designed by the devil himself. Part of the flow > control is designed by the user (try/catch blocks, throw > specifications), part of it is designed by the compiler, in exception > tables (specific unwinding instructions and types), and part by the > library writers (unwinding and personality routines). All that, > decided in three different time frames, by three different kinds of > developers, have to communicate perfectly in run time. > > It'd be very difficult for the compiler to optimize automatically > without breaking run-time assumptions. All of that is controlled by > different ABIs, that make sure all three universes are talking the > same language. You can't change one without changing all the others... > > To be honest, I'm still surprised that it actually works at all! ;) > > -- > cheers, > --renato > > http://systemcall.org/ > > Reclaim your digital rights, eliminate DRM, learn more at > http://www.defectivebydesign.org/what_is_drm > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Sun, Sep 26, 2010 at 4:19 AM, Renato Golin <rengolin at systemcall.org>wrote:> On 25 September 2010 23:46, Nathan Jeffords <blunted2night at gmail.com> > wrote: > > catch: > > %v = ptrtoint i8 * %x to i32 > > %r = icmp eq i32 %v, 255 > > br i1 %r, label %bad, label %worse > > bad: > > ret i32 -1 > > worse: > > ret i32 -2 > > } > > If I understood correctly, you're trying to pass the clean-up flag > through %x directly on the invoke call. But later avoid the > eh.exception call, assuming that was in %x. > > The problem is that you're mixing two concepts: The exception > structure contains information about the object that was thrown and > not a "good" number. That's the role of the clean-up flag (in case the > catch blocks can't deal with the exception) or the landing pads (that > should reflect the return values the user asked for in their > programs). > > It's the users role to tell what's good and what's not (return values > included). the only thing you (compiler) can do is to explode > prematurely in case you can't properly catch the error (ie. throw > inside throw, throw inside delete, etc). >The argument to the unwind instruction is always an i8* pointer, which in the case of dwarf exception handling would be the allocated exception object. I did cheat for the example, but this also demonstrates that an arbitrary value could be passed if using an LLVM specific exception handling implementation like the setjmp/longjmp version provided by the LowerInvoke pass.> > If that's the case, your implementation will not work for dwarf > exceptions, and I wouldn't recommend having an *invoke* syntax for > each type of exception handling mechanism. > > Other question: why are you passing untyped %x? I haven't seen any > untyped variable in LLVM, so far, and I think it's good to be > redundant in this case. That alone would have caught the mistake. If > you need an i32 (for your bad/worse comparison), throwing i8* would > have hinted that you crossed the concepts. > >The syntax for the invoke instruction is a little misleading. %x is a value that is being generated by the instruction, not passed to is. It is no different in that regard as to say '%x = call @eh.exception ...'. Since you don't specify the type in that type of assignment, I chose not to here either.> > On a side note... > > Exception handling was designed by the devil himself. Part of the flow > control is designed by the user (try/catch blocks, throw > specifications), part of it is designed by the compiler, in exception > tables (specific unwinding instructions and types), and part by the > library writers (unwinding and personality routines). All that, > decided in three different time frames, by three different kinds of > developers, have to communicate perfectly in run time. > > It'd be very difficult for the compiler to optimize automatically > without breaking run-time assumptions. All of that is controlled by > different ABIs, that make sure all three universes are talking the > same language. You can't change one without changing all the others... > >I agree; this change is not attempting to change how exception handling works, just provide a small change in how it is represented in the IR to make it more direct. Especially for users not using gcc/dwarf exception handling (I hope to attempt an SEH implementation)> To be honest, I'm still surprised that it actually works at all! ;) > > -- > cheers, > --renato > > http://systemcall.org/ > > Reclaim your digital rights, eliminate DRM, learn more at > http://www.defectivebydesign.org/what_is_drm >Thanks for the feedback -Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100926/8451d346/attachment.html>