Hi, I want to support the unwind instruction on x86. Specifically I want to: * Provide an efficient runtime implementation that does not depend on reading the DWARF EH information. * It should be self hosted, meaning the runtime is static linked in. I want to use it kernel mode. * Unwinding should be a read-only operation regarding the stack, so I can create a stack dump in the landing pad. * Provide a pass that raises C++ exception handling to just unwind instructions and thread-local data. My idea on handling the DWARF EH actions is to compile it to machine instructions. Fx. given an Instruction Pointer, unwinding a call frame might be described as: add esp, 12 ; Remove the current call frame. jmp unwind ; Continue the unwinding process. Other call frames might be more complex to handle. It depends on the moves needed to restore the registers of the previous call frame (the caller) and to remove the current frame. I want to tag all calls and invokes in a manner that can be easily recognized by a runtime. I can tolerate a small overhead on calls. The idea is to do something like this: call some_function jmp continue jmp case_of_unwinding continue: The case-of-unwinding will either unwind the current call frame or jump to the landing pad (if it was an invoke instruction). I think the overhead will in fact be quite small because of branch prediction on modern processors. The size overhead by tagging all calls, compared to only tagging unwind areas and actions in DWARF, does not matter much to me. On x86 a call stores the next Instruction Pointer address on the stack. The runtime would be something like this (assuming the first jmp instruction after the call is always short): unwind: ; We are on a call frame boundary. Assuming we can ; always destroy the content of EAX (true for C calling ; convention), we get the next Instruction Pointer (the ; instruction after the call): pop eax ; We advance our virtual Instruction Pointer to after ; the first jmp. Here we assume it is a two-byte ; encoded short jmp, but in practice we can support ; both short and long jmps: inc eax inc eax ; Then we invoke the local unwind handling. It will ; in turn either unwind another call frame or jump to ; the landing pad: jmp eax The pass to raise C++ exceptions to use unwind is another discussion. The most valuable thing for me is to support unwind. What do you think about the idea? Can it be achieved or did I overlook something important? I guess it should be implemented as a few passes. I think the one that inserts code after each call machine instruction is the most difficult to implement. Is it even possible to insert jmp instructions without them being optimized away? I also need to access information about the call frame. My last question: Is it possible to implement a flag for the back-end that selects which kind of exception handling to use? Ie. my idea versus DWARF + libgcc. Cheers, Bjarke Walling
Hello, Bjarke> * Provide a pass that raises C++ exception handling to just > unwind instructions and thread-local data.Are you familiar with C++ EH? How would you handle catches? Cleanups?> Other call frames might be more complex to handle. It depends on the > moves needed to restore the registers of the previous call frame (the > caller) and to remove the current frame.The devil is in details. How would you restore callee-saved registers? In general, you need to know what they are in the unwind point and their exact position on stack.> What do you think about the idea?The idea looks sane. But only for 'basic' unwind. Otherwise you'll end with rewriting libgcc unwind runtime. Also, how> My last question: Is it possible to implement a flag for the back-end > that selects which kind of exception handling to use? Ie. my idea > versus DWARF + libgcc.EH is complex. For C++ EH you will need to have a frontend support. Actually, calls to libgcc unwinding runtime is inserted by frontend. Backend is only responsible for emission of codegen-time information. So, if you're planning to hack on C++ EH somehow, you will need to hack on frontend as well. Have you looked into libunwind? Maybe it will be easy to hook it into llvm somehow? -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
Hi Bjarke,> * Provide an efficient runtime implementation that does not > depend on reading the DWARF EH information.why? The DWARF EH info encodes two things: (1) how to restore registers; and (2) matching rules for exception objects, and what to do with them. You will need something along the lines of (1) if you unwind out of the middle of functions. As for (2), if you don't do any matching of exceptions against types, this is an extremely minimal amount of info. In any case, it is entirely up to the personality function what happens here - you can always write your own. Check out the C personality function (yes, C not C++!) in gcc/unwind-c.c to get an idea of what a small personality function looks like.> * It should be self hosted, meaning the runtime is static > linked in. I want to use it kernel mode.You can link statically with libgcc.> * Unwinding should be a read-only operation regarding the > stack, so I can create a stack dump in the landing pad.You can get stack dumps with gcc dwarf eh. The Ada front-end does this for example - very convenient.> * Provide a pass that raises C++ exception handling to just > unwind instructions and thread-local data.Take a look at libunwind (http://www.hpl.hp.com/research/linux/libunwind/). Another possibility, very close you yours and currently used by the vmkit project, is to modify all functions so they return two values, the usual return value and an additional boolean value indicating whether an exception was thrown during the call or not. Callers then branch to an appropriate place based on this value. Thus there is no special stack unwinding, it is just functions returning. This adds some distributed overhead, but unwinding is fast. You can always return something more complicated than a boolean of course.> My idea on handling the DWARF EH actions is to compile it to machine > instructions. Fx. given an Instruction Pointer, unwinding a call frame > might be described as: > > add esp, 12 ; Remove the current call frame. > jmp unwind ; Continue the unwinding process.This is a lot like what you get using the "function returns an extra boolean" method.> Other call frames might be more complex to handle. It depends on the > moves needed to restore the registers of the previous call frame (the > caller) and to remove the current frame.If you really plan to unwind out of the middle of functions you will have to do magic to restore registers. Do you plan to use the dwarf frame moves info for this? If you use the "functions return an extra boolean" method then you don't have to do anything, since functions return and have registers restored in the usual way.> I want to tag all calls and invokes in a manner that can be easily > recognized by a runtime. I can tolerate a small overhead on calls. The > idea is to do something like this: > > call some_function > jmp continue > jmp case_of_unwinding > continue:This again looks a lot like the "have functions return an extra boolean" method.> The case-of-unwinding will either unwind the current call frame or > jump to the landing pad (if it was an invoke instruction). I think the > overhead will in fact be quite small because of branch prediction on > modern processors. The size overhead by tagging all calls, compared to > only tagging unwind areas and actions in DWARF, does not matter much > to me.The vmkit experience is that unwinding using their method is a lot faster than using the dwarf unwinder. I don't know if the distributed overhead is noticeable or not. Ciao, Duncan.
Hi Duncan, Hi Bjarke, Duncan Sands wrote:> Take a look at libunwind (http://www.hpl.hp.com/research/linux/libunwind/). > Another possibility, very close you yours and currently used by the vmkit > project, is to modify all functions so they return two values, the usual > return value and an additional boolean value indicating whether an exception > was thrown during the call or not. Callers then branch to an appropriate > place based on this value. Thus there is no special stack unwinding, it > is just functions returning. This adds some distributed overhead, but > unwinding is fast. You can always return something more complicated than > a boolean of course. >That's correct. Although, to make things clear, I'm not using the multiple return value features of LLVM. The signature of functions only return one value. If an exception happens, the boolean value is stored in a thread local specific variable. But conceptually, this pretty much is the same than returning two values.> The vmkit experience is that unwinding using their method is a lot faster > than using the dwarf unwinder. I don't know if the distributed overhead > is noticeable or not. >It's *hardly* noticeable. If your language/runtime throws a lot of exceptions, that's the way to go currently (there are other sophisticated techniques, but much more complex). Dwarf is great if your language/runtime basically never throws exceptions. I can't give you an accurate measure of the real cost of Dwarf on Java applications because, unfortunately, libgcc did not optimize its dwarf info lookup on dynamically registered frames. It's doing a linear search of the info, and that takes *a lot* of time. Nicolas
Hello Anton, On Tue, Mar 3, 2009 at 10:23 AM, Anton Korobeynikov <anton at korobeynikov.info> wrote:> Are you familiar with C++ EH? How would you handle catches? Cleanups?I guess I'm not familiar with C++ EH in details. Maybe I'm begin naive, but I think exceptions doesn't have to be that complex. When compiling C++ the front-end inserts a llvm.eh.selector intrinsic in the landing pad. It references the types you want to catch and a personality routine. If you use this intrinsic the type information is added to the DWARF .eh_frame section. The unwind runtime knows how to handle the actions described in this section. It is sort of a virtual machine, if I understand it correctly. The instructions and their semantics are described in the DWARF specification. The unwind runtime execute these instructions, unwinding frames and call the personality routine when it reaches a catch block to ask whether the catch block can handle this exception. I read the LLVM documentation on exceptions and regarding C++ cleanups it sounds a bit more complicated. I would have to find out how exactly libgcc, the DWARF actions and the personality routine plays together. I know DWARF was made to provide a platform-independant way to add debugging information. That is why they made a virtual machine for stack unwinding. If you have a compiled C++ program you have DWARF actions and a personality routine that is not going to change. If you don't care about being ABI compliant, then you could compile this virtual machine and all its actions to native machine code.> The idea looks sane. But only for 'basic' unwind. Otherwise you'll end > with rewriting libgcc unwind runtime. > Also, howMaybe so. What is 'basic' unwind? Maybe I shouldn't focus on C++ for now, but just about supporting invoke/unwind. If I don't use the llvm.eh.exception and llvm.eh.selector intrinsics, then the landing pad of an invoke becomes a catch-all. The landing pad code can check whether it accepts the exception stored as a pointer in a thread-local store. If it doesn't support the exception it unwind's again. I still need to handle stack unwinding, but the information is already there. It is written in .eh_frame and .gcc_except_table. What I want to do is translating this information, the actions, to native machine code for great efficiency.> EH is complex. For C++ EH you will need to have a frontend support. > Actually, calls to libgcc unwinding runtime is inserted by frontend. > Backend is only responsible for emission of codegen-time information. > So, if you're planning to hack on C++ EH somehow, you will need to > hack on frontend as well.I think you're right I shouldn't focus on C++. It has quite complicated and featureful exception handling. I guess the real pain is multiple inheritance.> Have you looked into libunwind? Maybe it will be easy to hook it into > llvm somehow?I will look at that, thank you. I haven't timed the unwinding proces yet, but I want to do: How many clock cycles does it take to handle various types of try/catch + throw. I don't think libgcc is very efficient, but I can't be sure before I measure it. Also adding 50KB to your executable is a lot in my opinion. It is about 13000 machine instructions (I measured that). Bjarke Walling
Hi Duncan, On Tue, Mar 3, 2009 at 10:26 AM, Duncan Sands <baldrick at free.fr> wrote:> why? The DWARF EH info encodes two things: (1) how to restore > registers; and (2) matching rules for exception objects, and > what to do with them. You will need something along the lines > of (1) if you unwind out of the middle of functions. As for (2), > if you don't do any matching of exceptions against types, this is > an extremely minimal amount of info. In any case, it is entirely > up to the personality function what happens here - you can always > write your own. Check out the C personality function (yes, C not > C++!) in gcc/unwind-c.c to get an idea of what a small personality > function looks like.Yes, I need (1) to restore registers. I don't see why the type checking can't be done in the landing pad. Yes, it is an overhead, but not more than interpreting DWARF gives me. I will look at the C personality function, thank you for that.> You can link statically with libgcc.Yes, I know, but I think 50KB is a lot. It's not a microkernel I'm writing anymore. Also I don't get the benefits of invoke/unwind. LLVM handles function inlining with invoke/unwind quite nicely. I'm not sure it can do that to the same degree with calls to libgcc?>> * Unwinding should be a read-only operation regarding the >> stack, so I can create a stack dump in the landing pad. > > You can get stack dumps with gcc dwarf eh. The Ada front-end > does this for example - very convenient.Very convenient. Does libgcc provide that too? I like the features of DWARF, just not the time and space overhead.> Take a look at libunwind (http://www.hpl.hp.com/research/linux/libunwind/).I will, thank you.> Another possibility, very close you yours and currently used by the vmkit > project, is to modify all functions so they return two values, the usual > return value and an additional boolean value indicating whether an exception > was thrown during the call or not. Callers then branch to an appropriate > place based on this value. Thus there is no special stack unwinding, it > is just functions returning. This adds some distributed overhead, but > unwinding is fast. You can always return something more complicated than > a boolean of course.Maybe that's an option. I think I know how to do that already by writing a LLVM pass. I guess it would be translated to something like this in machine code: call some_function test ebx, ebx ; Check second return value jz handle_unwind ; If nessescary handle unwinding That is a fairly small overhead.>> My idea on handling the DWARF EH actions is to compile it to machine >> instructions. Fx. given an Instruction Pointer, unwinding a call frame >> might be described as [...] > > This is a lot like what you get using the "function returns an extra > boolean" method.Yes, you're right and it's conceptually easier to handle.>> Other call frames might be more complex to handle. It depends on the >> moves needed to restore the registers of the previous call frame (the >> caller) and to remove the current frame. > > If you really plan to unwind out of the middle of functions you will > have to do magic to restore registers. Do you plan to use the dwarf > frame moves info for this?Yes. The information is available. It is stored as a DWARF virtual machine (CFA instructions). I "just" need to translate this to machine instructions. I thought that it would be somewhat easy to hook into LLVM to get the DWARF instructions it would write and then emit machine instructions instead.> If you use the "functions return an > extra boolean" method then you don't have to do anything, since > functions return and have registers restored in the usual way.Yes, I see.>> I want to tag all calls and invokes in a manner that can be easily >> recognized by a runtime. I can tolerate a small overhead on calls. The >> idea is to do something like this [...] > > This again looks a lot like the "have functions return an extra boolean" > method.Except there's a less overhead on a jmp instruction than test+jz instructions. Maybe it's worth it. It is definitely easier to implement than my first idea.> The vmkit experience is that unwinding using their method is a lot faster > than using the dwarf unwinder. I don't know if the distributed overhead > is noticeable or not.It can be measured. Maybe I should look at the vmkit code as well. Bjarke Walling