Ten Tzen via llvm-dev
2020-Apr-15 20:51 UTC
[llvm-dev] [RFC] [Windows SEH][-EHa] Support Hardware Exception Handling
Hi, This is a spin-off of previous Windows SEH RFC below. This RFC only focus on supporting HW Exception Handling. A detailed implementation can be seen in here: https://github.com/tentzen/llvm-project/commit/8a2421c274b683051e456cbe12c177e3b934fb5e It passes all MSVC SEH suite (excluding those with “Jumping out of _finally” ( _Local_Unwind)). Thanks, --Ten **** The rules for C code: **** For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules. First, no exception can move in or out of _try region., i.e., no "potential faulty instruction can be moved across _try boundary. Second, the order of exceptions for instructions 'directly' under a _try must be preserved (not applied to those in callees). Finally, global states (local/global/heap variables) that can be read outside of _try region must be updated in memory (not just in register) before the subsequent exception occurs. **** The impact to C++ code: **** Although SEH is a feature for C code, -EHa does have a profound effect on C++ side. When a C++ function (in the same compilation unit with option -EHa ) is called by a SEH C function, a hardware exception occurs in C++ code can also be handled properly by an upstream SEH _try-handler or a C++ catch(...). As such, when that happens in the middle of an object's life scope, the dtor must be invoked the same way as C++ Synchronous Exception during unwinding process. **** Design and Implementation: **** A natural way to achieve the rules above in LLVM today is to allow an EH edge added on memory/computation instruction (previous iload/istore idea) so that exception path is modeled in Flow graph preciously. However, tracking every single memory instruction and potential faulty instruction can create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial. This design does not intend to model exception path at instruction level. Instead, the proposed design tracks and reports EH state at BLOCK-level to reduce the complexity of flow graph and minimize the performance-impact on CPP code under -EHa option. Detailed implementation described below. -- Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end(). _scope_begin() is immediately added after ctor() is called and EHStack is pushed. So it must be an invoke, not a call. With that it's also guaranteed an EH-cleanup-pad is created regardless whether there exists a call in this scope. _scope_end is added before dtor(). These two intrinsics make the computation of Block-State possible in downstream code gen pass, even in the presence of ctor/dtor inlining. -- Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark _try boundary and to prevent from exceptions being moved across _try boundary. -- All memory instructions inside a _try are considered as 'volatile' to assure 2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's acceptable as the amount of code directly under _try is very small. -- For both C++ & C-code, the state of each block is computed at the same place in BE (WinEHPreparing pass) where all other EH tables/maps are calculated. In addition to _scope_begin & _scope_end, the computation of block state also rely on the existing State tracking code (UnwindMap and InvokeStateMap). -- For both C++ & C-code, the state of each block with potential trap instruction is marked and reported in DAG Instruction Selection pass, the same place where the state for -EHsc (synchronous exceptions) is done. -- If the first instruction in a reported block scope can trap, a Nop is injected before this instruction. This nop is needed to accommodate LLVM Windows EH implementation, in which the address in IPToState table is offset by +1. (note the purpose of that is to ensure the return address of a call is in the same scope as the call address. -- The handler for catch(...) for -EHa must handle HW exception. So it is 'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches C++ exceptions). From: Ten Tzen <tentzen at microsoft.com> Sent: Friday, April 3, 2020 9:43 PM To: rnk at google.com Cc: llvm-dev at lists.llvm.org; Aaron Smith <aaron.smith at microsoft.com> Subject: RE: [EXTERNAL] Re: [llvm-dev] [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling) Hi, Reid, Nice to finally meet you😊. Thank you for reading through the doc and providing insightful feedbacks. Yes I definitely can separate these two features if it’s more convenient for everyone. For now, the local_unwind specific changes can be separated and reviewed between these two commits: git diff 9b48ea90f4c9ae7ef030719d6c0b49b00861cdde 06c81a4b6262445432a4166627b87bf595f5291b the -EHa changes can be read : git diff e943329ba00772f96fbc1fe5dec836cfd0707a38 9b48ea90f4c9ae7ef030719d6c0b49b00861cdde My reply inline below in [Ten] lines. --Ten From: Reid Kleckner <rnk at google.com<mailto:rnk at google.com>> Sent: Friday, April 3, 2020 3:36 PM To: Ten Tzen <tentzen at microsoft.com<mailto:tentzen at microsoft.com>> Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Aaron Smith <aaron.smith at microsoft.com<mailto:aaron.smith at microsoft.com>> Subject: [EXTERNAL] Re: [llvm-dev] [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling) UHi Ten, Thanks for the writeup and implementation, nice to meet you. I wonder if it would be best to try to discuss the features separately. My view is that catching hardware exceptions (/EHa) is critical functionality, but it's not clear to me if local unwind is truly worth implementing. Having looked at the code briefly, it seemed like a large portion of the complexity comes from local unwind. Today, clang crashes on this small example that jumps out of a __finally block, but the intention was to reject the code and avoid implementing the functionality. Clang does, in fact, emit a warning: $ clang -c t.cpp t.cpp:7:7: warning: jump out of __finally block has undefined behavior [-Wjump-seh-finally] goto lu1; ^ Local unwind, in my view, is the user saying, "I wrote __finally, but actually I decided I wanted to catch the exception, so let's transfer to normal control flow now." It seems to me that the user already has a way to express this: __except. I know the mapping isn't trivial and it's not exactly the same, but it seems feasible to rewrite most uses of local unwind this way. [Ten] Right, I agree that to some degree a local_unwind can be viewed as another type of _except handler in the middle of unwinding. And true that some usage patterns can be worked around by rewriting SEH hierarchy. But I believe the work can be substantial and risky, especially in an OS Kernel. Furthermore, to broaden the interpretation, local_unwind can also serve as a _filter (or even rethrow-like handler in C++ EH), and the target block is the final handler. See the multi-local-unwind example in the doc. Can you estimate the prevalence of local unwind? What percent of __finally blocks in your experience use non-local control flow? I see a lot of value in supporting catching hardware exceptions, but if we can avoid carrying over the complexity of this local unwind feature, it seems to me that future generations of compiler engineers will thank us. [Ten] I don’t have this data in hand. But what I know is that local_unwind is an essential feature to build Windows Kernel. One most important SEH test (the infamous xcpt4u.c) is composed of 88 tests; among them there are 25 jumping-out-of-finally occurrences. Of course this does not translate to a percentage of local_unwind, but it does show us the significance of this feature to Windows. FYI Passing xcpt4u.c is the very first fundamental requirement before building Windows Kernel. --- Regarding trap / non-call / hardware exception handling, I guess I am a bit more blase about precisely modeling the control flow. As Eli mentions, we already have setjmp, and we already don't model it. Users file bugs about problems with setjmp, and we essentially close them as "wontfix" and tell them to put more "volatile" on the problem until it stops hurting. One thing that I am very concerned about is the implications for basic block layout. Right now, machine basic block layout is very free-handed. Today, CodeGen puts labels around every potentially-throwing call, does block layout without considering try regions, and then collapses adjacent label regions with the same landingpad during AsmPrinting. For MSVC C++ EH, state number stores and the ip2state table achieve the same goal. [Ten] Yes, I saw that (pretty nice implementation actually). This design and implementation completely inherits current mechanism except that now it’s allowed to report EH state ranges that only contain memory/computation instructions (for obvious reason). I’m not sure which part of that concerns you. I think we need rules about how LLVM is allowed to transform the following code: void foo(volatile int *pv) { __try { if (cond()) { ++*pv; __builtin_unreachable(); } } __except(1) { } __try { if (cond()) { ++*pv; __builtin_unreachable(); } } __except(1) { } } In this case, the *pv operation may throw, but I believe it would be semantics preserving to merge the two identical if-then blocks. The call.setup proposal I sent not long ago runs into the same issue. I have written a patch to tail merge such similar blocks, but I have not landed it: https://reviews.llvm.org/D29428<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD29428&data=02%7C01%7Ctentzen%40microsoft.com%7Cac3ebdd6804a46bedefd08d7d852ac14%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215721902320037&sdata=gnc5zhiNpq1Cv2Of0VSl7nwcS8F6uPBprFT4ffQgDx0%3D&reserved=0> Even though it's not yet landed, I think we need to know if the transform is valid. If it is, then we need to do more than volatilize the try region to make EHa work. [Ten] The merging should not happen. Per C-standard, a volatile must be read (or write) ONCE and only once (as long as it’s naturally aligned and can be accessed in one operation by HW). So merging two volatiles violates the standard. I’m sure it’s currently well-protected in LLVM today. For a long time I've wanted regions of some kind in LLVM IR, and this use case has made me want to pick it up again. However, assuming that you want to land support for hardware exceptions without some kind of generalized region support in the IR, I think we do need to do something about these blocks ending in unreachable in __try regions. The simplest thing that could possibly work is to make clang end the try region before unreachable. This would mean ending the block and adding `invoke void @seh_try_end` after every unreachable. It would be redundant for noreturn calls, since those will already have an unwind edge, ensuring they remain in the try region. [Ten] it’s interesting you mentioned this “blocks ending in unreachable in __try regions" here. With these two features supported, two remaining bugs in my ToDo list are; one setjmp() and one nested EH throw. The second one seems caused by a _try_block ended with an unreachable. Yes, this is on my list. Will discuss with you guys further when I look into it. --- Another interesting aspect of /EHa is how it affects C++ destructor cleanups. I am personally comfortable with the requirement that LLVM avoid moving around volatile instructions in __try blocks. LLVM is already required to leave volatile operations in order. But I *am* concerned about C++ destructor scopes, which are much more frequent than __try. As you have described it, clang would invoke eha_scope_begin() / eha_scope_end() around the object lifetime, but are you proposing to volatilize all memory operations in the region? If not, I see nothing that would prevent LLVM from moving potentially faulting operations in or out of this scope. We cannot require passes to look for non-local EH regions before doing code motion. Would that be acceptable behavior? It could lead to some strange behavior, where a load is sunk to the point of use outside the cleanup region, but maybe users don't care about this in practice. [Ten] No, memory operations in C++ need not be volatilized. The order of exception in C++ code does not matter for -EHa. Potential trap instructions are free to move in/out of any EH region. The only criteria is that when a HW exception is caught and handled, local live objects must be dtored gracefully, the same manner as C++ Synchronous exception. By reporting the EH state of those trap instructions, this is automatically done in LLVM today. --- To summarize, my feedback would be: 1. Focus on __try and hardware exceptions first, the value proposition is clear and large. In particular, something has to be done about unreachable. Clang should already thread other abnormal control flow through the region exit. 2. Please gather some data on prevalence of local unwind to motivate the feature 3. Please elaborate on the design for /EHa C++ destructor cleanups and code motion I hope that helps, and I'm sorry if I'm slow to respond, this is a tricky problem, and it's not my first priority. Reid On Wed, Apr 1, 2020 at 8:22 AM Ten Tzen via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi, all, The intend of this thread is to complete the support for Windows SEH. Currently there are two major missing features: Jumping out of a _finally and Hardware exception handling. The document below is my proposed design and implementation to fully support SEH on LLVM. I have completely implemented this design on a branch in repo: https://github.com/tentzen/llvm-project<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project&data=02%7C01%7Ctentzen%40microsoft.com%7Cac3ebdd6804a46bedefd08d7d852ac14%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215721902330030&sdata=%2BC8CO9VQMv6DZk0HabsMOswQ8YFvqjdZ%2B9dUhKtjsMo%3D&reserved=0>. It now passes MSVC’s in-house SEH suite. Sorry for this long write-up. For better readability, please read it on https://github.com/tentzen/llvm-project/wiki<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project%2Fwiki&data=02%7C01%7Ctentzen%40microsoft.com%7Cac3ebdd6804a46bedefd08d7d852ac14%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215721902330030&sdata=XukIQtEqSpgL13dk57%2FV2gHUw7YOwseyPy7212U7uDM%3D&reserved=0> Special thanks to Joseph Tremoulet for his earlier comments and suggestions. Note: I just subscribed llvm-dev, probably not in the list yet. So please reply with my email address (tentzen at microsoft.com<mailto:tentzen at microsoft.com>) explicitly in To-list. Thanks, --Ten -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200415/3149d26d/attachment-0001.html>
Eli Friedman via llvm-dev
2020-Apr-15 23:29 UTC
[llvm-dev] [RFC] [Windows SEH][-EHa] Support Hardware Exception Handling
I still have basically the same concerns. I’ll try to give more concrete examples for what I’m concerned about. Suppose I have something like the following: typedef struct C { int x[2]; } C; void threw_exception(); void z(); C f() { __try { z(); return *(C*)0; } __except(1) { threw_exception(); } C c = {0}; return c; } Currently, under your proposal, this won’t call threw_exception() if optimization is enabled, as far as I can tell. I have no idea if this is intentional: your proposal and your patch don’t contain or point to any documentation, and I can’t find any documentation that describes this on Microsoft’s website. (I don’t really care what the answer is here; I care that there’s some documented answer to this question, and other questions like it.) Constructing a testcase for the register allocation issues I mentioned before is hard because it’s sort of “random” based on the register allocation heuristics, but see https://reviews.llvm.org/D77767 for the sort of issues that come up. Note that we mark setjmp returns_twice, which turns off certain optimizations. I don’t really like extending the usage of this sort of construct further, but if we are going to, we should at least mark the new intrinsics returns_twice, so they get the same protection as setjmp. -Eli From: Ten Tzen <tentzen at microsoft.com> Sent: Wednesday, April 15, 2020 1:51 PM To: llvm-dev at lists.llvm.org Cc: rnk at google.com; Eli Friedman <efriedma at quicinc.com>; aaron.smith at microsoft.com; Joseph Tremoulet <jotrem at microsoft.com> Subject: [EXT] [RFC] [Windows SEH][-EHa] Support Hardware Exception Handling Hi, This is a spin-off of previous Windows SEH RFC below. This RFC only focus on supporting HW Exception Handling. A detailed implementation can be seen in here: https://github.com/tentzen/llvm-project/commit/8a2421c274b683051e456cbe12c177e3b934fb5e It passes all MSVC SEH suite (excluding those with “Jumping out of _finally” ( _Local_Unwind)). Thanks, --Ten **** The rules for C code: **** For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules. First, no exception can move in or out of _try region., i.e., no "potential faulty instruction can be moved across _try boundary. Second, the order of exceptions for instructions 'directly' under a _try must be preserved (not applied to those in callees). Finally, global states (local/global/heap variables) that can be read outside of _try region must be updated in memory (not just in register) before the subsequent exception occurs. **** The impact to C++ code: **** Although SEH is a feature for C code, -EHa does have a profound effect on C++ side. When a C++ function (in the same compilation unit with option -EHa ) is called by a SEH C function, a hardware exception occurs in C++ code can also be handled properly by an upstream SEH _try-handler or a C++ catch(...). As such, when that happens in the middle of an object's life scope, the dtor must be invoked the same way as C++ Synchronous Exception during unwinding process. **** Design and Implementation: **** A natural way to achieve the rules above in LLVM today is to allow an EH edge added on memory/computation instruction (previous iload/istore idea) so that exception path is modeled in Flow graph preciously. However, tracking every single memory instruction and potential faulty instruction can create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial. This design does not intend to model exception path at instruction level. Instead, the proposed design tracks and reports EH state at BLOCK-level to reduce the complexity of flow graph and minimize the performance-impact on CPP code under -EHa option. Detailed implementation described below. -- Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end(). _scope_begin() is immediately added after ctor() is called and EHStack is pushed. So it must be an invoke, not a call. With that it's also guaranteed an EH-cleanup-pad is created regardless whether there exists a call in this scope. _scope_end is added before dtor(). These two intrinsics make the computation of Block-State possible in downstream code gen pass, even in the presence of ctor/dtor inlining. -- Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark _try boundary and to prevent from exceptions being moved across _try boundary. -- All memory instructions inside a _try are considered as 'volatile' to assure 2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's acceptable as the amount of code directly under _try is very small. -- For both C++ & C-code, the state of each block is computed at the same place in BE (WinEHPreparing pass) where all other EH tables/maps are calculated. In addition to _scope_begin & _scope_end, the computation of block state also rely on the existing State tracking code (UnwindMap and InvokeStateMap). -- For both C++ & C-code, the state of each block with potential trap instruction is marked and reported in DAG Instruction Selection pass, the same place where the state for -EHsc (synchronous exceptions) is done. -- If the first instruction in a reported block scope can trap, a Nop is injected before this instruction. This nop is needed to accommodate LLVM Windows EH implementation, in which the address in IPToState table is offset by +1. (note the purpose of that is to ensure the return address of a call is in the same scope as the call address. -- The handler for catch(...) for -EHa must handle HW exception. So it is 'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches C++ exceptions). From: Ten Tzen <tentzen at microsoft.com<mailto:tentzen at microsoft.com>> Sent: Friday, April 3, 2020 9:43 PM To: rnk at google.com<mailto:rnk at google.com> Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Aaron Smith <aaron.smith at microsoft.com<mailto:aaron.smith at microsoft.com>> Subject: RE: [EXTERNAL] Re: [llvm-dev] [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling) Hi, Reid, Nice to finally meet you😊. Thank you for reading through the doc and providing insightful feedbacks. Yes I definitely can separate these two features if it’s more convenient for everyone. For now, the local_unwind specific changes can be separated and reviewed between these two commits: git diff 9b48ea90f4c9ae7ef030719d6c0b49b00861cdde 06c81a4b6262445432a4166627b87bf595f5291b the -EHa changes can be read : git diff e943329ba00772f96fbc1fe5dec836cfd0707a38 9b48ea90f4c9ae7ef030719d6c0b49b00861cdde My reply inline below in [Ten] lines. --Ten From: Reid Kleckner <rnk at google.com<mailto:rnk at google.com>> Sent: Friday, April 3, 2020 3:36 PM To: Ten Tzen <tentzen at microsoft.com<mailto:tentzen at microsoft.com>> Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Aaron Smith <aaron.smith at microsoft.com<mailto:aaron.smith at microsoft.com>> Subject: [EXTERNAL] Re: [llvm-dev] [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling) UHi Ten, Thanks for the writeup and implementation, nice to meet you. I wonder if it would be best to try to discuss the features separately. My view is that catching hardware exceptions (/EHa) is critical functionality, but it's not clear to me if local unwind is truly worth implementing. Having looked at the code briefly, it seemed like a large portion of the complexity comes from local unwind. Today, clang crashes on this small example that jumps out of a __finally block, but the intention was to reject the code and avoid implementing the functionality. Clang does, in fact, emit a warning: $ clang -c t.cpp t.cpp:7:7: warning: jump out of __finally block has undefined behavior [-Wjump-seh-finally] goto lu1; ^ Local unwind, in my view, is the user saying, "I wrote __finally, but actually I decided I wanted to catch the exception, so let's transfer to normal control flow now." It seems to me that the user already has a way to express this: __except. I know the mapping isn't trivial and it's not exactly the same, but it seems feasible to rewrite most uses of local unwind this way. [Ten] Right, I agree that to some degree a local_unwind can be viewed as another type of _except handler in the middle of unwinding. And true that some usage patterns can be worked around by rewriting SEH hierarchy. But I believe the work can be substantial and risky, especially in an OS Kernel. Furthermore, to broaden the interpretation, local_unwind can also serve as a _filter (or even rethrow-like handler in C++ EH), and the target block is the final handler. See the multi-local-unwind example in the doc. Can you estimate the prevalence of local unwind? What percent of __finally blocks in your experience use non-local control flow? I see a lot of value in supporting catching hardware exceptions, but if we can avoid carrying over the complexity of this local unwind feature, it seems to me that future generations of compiler engineers will thank us. [Ten] I don’t have this data in hand. But what I know is that local_unwind is an essential feature to build Windows Kernel. One most important SEH test (the infamous xcpt4u.c) is composed of 88 tests; among them there are 25 jumping-out-of-finally occurrences. Of course this does not translate to a percentage of local_unwind, but it does show us the significance of this feature to Windows. FYI Passing xcpt4u.c is the very first fundamental requirement before building Windows Kernel. --- Regarding trap / non-call / hardware exception handling, I guess I am a bit more blase about precisely modeling the control flow. As Eli mentions, we already have setjmp, and we already don't model it. Users file bugs about problems with setjmp, and we essentially close them as "wontfix" and tell them to put more "volatile" on the problem until it stops hurting. One thing that I am very concerned about is the implications for basic block layout. Right now, machine basic block layout is very free-handed. Today, CodeGen puts labels around every potentially-throwing call, does block layout without considering try regions, and then collapses adjacent label regions with the same landingpad during AsmPrinting. For MSVC C++ EH, state number stores and the ip2state table achieve the same goal. [Ten] Yes, I saw that (pretty nice implementation actually). This design and implementation completely inherits current mechanism except that now it’s allowed to report EH state ranges that only contain memory/computation instructions (for obvious reason). I’m not sure which part of that concerns you. I think we need rules about how LLVM is allowed to transform the following code: void foo(volatile int *pv) { __try { if (cond()) { ++*pv; __builtin_unreachable(); } } __except(1) { } __try { if (cond()) { ++*pv; __builtin_unreachable(); } } __except(1) { } } In this case, the *pv operation may throw, but I believe it would be semantics preserving to merge the two identical if-then blocks. The call.setup proposal I sent not long ago runs into the same issue. I have written a patch to tail merge such similar blocks, but I have not landed it: https://reviews.llvm.org/D29428<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD29428&data=02%7C01%7Ctentzen%40microsoft.com%7Cac3ebdd6804a46bedefd08d7d852ac14%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215721902320037&sdata=gnc5zhiNpq1Cv2Of0VSl7nwcS8F6uPBprFT4ffQgDx0%3D&reserved=0> Even though it's not yet landed, I think we need to know if the transform is valid. If it is, then we need to do more than volatilize the try region to make EHa work. [Ten] The merging should not happen. Per C-standard, a volatile must be read (or write) ONCE and only once (as long as it’s naturally aligned and can be accessed in one operation by HW). So merging two volatiles violates the standard. I’m sure it’s currently well-protected in LLVM today. For a long time I've wanted regions of some kind in LLVM IR, and this use case has made me want to pick it up again. However, assuming that you want to land support for hardware exceptions without some kind of generalized region support in the IR, I think we do need to do something about these blocks ending in unreachable in __try regions. The simplest thing that could possibly work is to make clang end the try region before unreachable. This would mean ending the block and adding `invoke void @seh_try_end` after every unreachable. It would be redundant for noreturn calls, since those will already have an unwind edge, ensuring they remain in the try region. [Ten] it’s interesting you mentioned this “blocks ending in unreachable in __try regions" here. With these two features supported, two remaining bugs in my ToDo list are; one setjmp() and one nested EH throw. The second one seems caused by a _try_block ended with an unreachable. Yes, this is on my list. Will discuss with you guys further when I look into it. --- Another interesting aspect of /EHa is how it affects C++ destructor cleanups. I am personally comfortable with the requirement that LLVM avoid moving around volatile instructions in __try blocks. LLVM is already required to leave volatile operations in order. But I *am* concerned about C++ destructor scopes, which are much more frequent than __try. As you have described it, clang would invoke eha_scope_begin() / eha_scope_end() around the object lifetime, but are you proposing to volatilize all memory operations in the region? If not, I see nothing that would prevent LLVM from moving potentially faulting operations in or out of this scope. We cannot require passes to look for non-local EH regions before doing code motion. Would that be acceptable behavior? It could lead to some strange behavior, where a load is sunk to the point of use outside the cleanup region, but maybe users don't care about this in practice. [Ten] No, memory operations in C++ need not be volatilized. The order of exception in C++ code does not matter for -EHa. Potential trap instructions are free to move in/out of any EH region. The only criteria is that when a HW exception is caught and handled, local live objects must be dtored gracefully, the same manner as C++ Synchronous exception. By reporting the EH state of those trap instructions, this is automatically done in LLVM today. --- To summarize, my feedback would be: 1. Focus on __try and hardware exceptions first, the value proposition is clear and large. In particular, something has to be done about unreachable. Clang should already thread other abnormal control flow through the region exit. 2. Please gather some data on prevalence of local unwind to motivate the feature 3. Please elaborate on the design for /EHa C++ destructor cleanups and code motion I hope that helps, and I'm sorry if I'm slow to respond, this is a tricky problem, and it's not my first priority. Reid On Wed, Apr 1, 2020 at 8:22 AM Ten Tzen via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi, all, The intend of this thread is to complete the support for Windows SEH. Currently there are two major missing features: Jumping out of a _finally and Hardware exception handling. The document below is my proposed design and implementation to fully support SEH on LLVM. I have completely implemented this design on a branch in repo: https://github.com/tentzen/llvm-project<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project&data=02%7C01%7Ctentzen%40microsoft.com%7Cac3ebdd6804a46bedefd08d7d852ac14%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215721902330030&sdata=%2BC8CO9VQMv6DZk0HabsMOswQ8YFvqjdZ%2B9dUhKtjsMo%3D&reserved=0>. It now passes MSVC’s in-house SEH suite. Sorry for this long write-up. For better readability, please read it on https://github.com/tentzen/llvm-project/wiki<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project%2Fwiki&data=02%7C01%7Ctentzen%40microsoft.com%7Cac3ebdd6804a46bedefd08d7d852ac14%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215721902330030&sdata=XukIQtEqSpgL13dk57%2FV2gHUw7YOwseyPy7212U7uDM%3D&reserved=0> Special thanks to Joseph Tremoulet for his earlier comments and suggestions. Note: I just subscribed llvm-dev, probably not in the list yet. So please reply with my email address (tentzen at microsoft.com<mailto:tentzen at microsoft.com>) explicitly in To-list. Thanks, --Ten -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200415/aae819d1/attachment.html>
Ten Tzen via llvm-dev
2020-Apr-16 04:14 UTC
[llvm-dev] [RFC] [Windows SEH][-EHa] Support Hardware Exception Handling
Hi, Eli, Why are you under the impression that threw_exception() will not be called if optimizations are enabled? I don’t know if the -EHa Spec is clearly described in MSFT Webs. At least this proposal has described the rules for both C & C++ code. The very first rule clearly said that “no exception can move in or out of _try region., i.e., no potential faulty instruction can be moved across _try boundary”. As such the dereference of statement return *(C*)0 must be kept in _try scope and the access-violation fault will be caught in _except handler where threw_exception() will be called. I don’t see why Register allocation plays a part in this topic. I do see a serious problem in LLVM SJLJ today (All tests in MSVC’s Setjmp suite fail with -O2 that I will look into it soon). But I failed to see why HW exception is corelated to setjmp/longjmp. These are two totally different features and the approaches employed are also totally different. It would be helpful if you can give one example why this proposal need to care about how registers are allocated. Again what we intend to do in this feature is to achieve these two points below. Please take a moment to read through it. Let me know if there is anything unclear. Thanks --Ten **** The rules for C code: **** For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules. First, no exception can move in or out of _try region., i.e., no "potential faulty instruction can be moved across _try boundary. Second, the order of exceptions for instructions 'directly' under a _try must be preserved (not applied to those in callees). Finally, global states (local/global/heap variables) that can be read outside of _try region must be updated in memory (not just in register) before the subsequent exception occurs. **** The impact to C++ code: **** Although SEH is a feature for C code, -EHa does have a profound effect on C++ side. When a C++ function (in the same compilation unit with option -EHa ) is called by a SEH C function, a hardware exception occurs in C++ code can also be handled properly by an upstream SEH _try-handler or a C++ catch(...). As such, when that happens in the middle of an object's life scope, the dtor must be invoked the same way as C++ Synchronous Exception during unwinding process. From: Eli Friedman <efriedma at quicinc.com> Sent: Wednesday, April 15, 2020 4:29 PM To: Ten Tzen <tentzen at microsoft.com>; llvm-dev at lists.llvm.org Cc: rnk at google.com; Aaron Smith <aaron.smith at microsoft.com>; Joseph Tremoulet <jotrem at microsoft.com> Subject: [EXTERNAL] RE: [RFC] [Windows SEH][-EHa] Support Hardware Exception Handling I still have basically the same concerns. I’ll try to give more concrete examples for what I’m concerned about. Suppose I have something like the following: typedef struct C { int x[2]; } C; void threw_exception(); void z(); C f() { __try { z(); return *(C*)0; } __except(1) { threw_exception(); } C c = {0}; return c; } Currently, under your proposal, this won’t call threw_exception() if optimization is enabled, as far as I can tell. I have no idea if this is intentional: your proposal and your patch don’t contain or point to any documentation, and I can’t find any documentation that describes this on Microsoft’s website. (I don’t really care what the answer is here; I care that there’s some documented answer to this question, and other questions like it.) Constructing a testcase for the register allocation issues I mentioned before is hard because it’s sort of “random” based on the register allocation heuristics, but see https://reviews.llvm.org/D77767<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD77767&data=02%7C01%7Ctentzen%40microsoft.com%7C334ed759562941d3f2ba08d7e194d0d6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637225901600853320&sdata=hTvtxO9OVy3fwzqX2EfLRb76Qb%2ByDlmdTHHLvuIrupQ%3D&reserved=0> for the sort of issues that come up. Note that we mark setjmp returns_twice, which turns off certain optimizations. I don’t really like extending the usage of this sort of construct further, but if we are going to, we should at least mark the new intrinsics returns_twice, so they get the same protection as setjmp. -Eli From: Ten Tzen <tentzen at microsoft.com<mailto:tentzen at microsoft.com>> Sent: Wednesday, April 15, 2020 1:51 PM To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Cc: rnk at google.com<mailto:rnk at google.com>; Eli Friedman <efriedma at quicinc.com<mailto:efriedma at quicinc.com>>; aaron.smith at microsoft.com<mailto:aaron.smith at microsoft.com>; Joseph Tremoulet <jotrem at microsoft.com<mailto:jotrem at microsoft.com>> Subject: [EXT] [RFC] [Windows SEH][-EHa] Support Hardware Exception Handling Hi, This is a spin-off of previous Windows SEH RFC below. This RFC only focus on supporting HW Exception Handling. A detailed implementation can be seen in here: https://github.com/tentzen/llvm-project/commit/8a2421c274b683051e456cbe12c177e3b934fb5e<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project%2Fcommit%2F8a2421c274b683051e456cbe12c177e3b934fb5e&data=02%7C01%7Ctentzen%40microsoft.com%7C334ed759562941d3f2ba08d7e194d0d6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637225901600853320&sdata=Dx710jfERAD7dDrNvsuEFOYVOrYgeYcRLAlt3mmW8es%3D&reserved=0> It passes all MSVC SEH suite (excluding those with “Jumping out of _finally” ( _Local_Unwind)). Thanks, --Ten **** The rules for C code: **** For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules. First, no exception can move in or out of _try region., i.e., no "potential faulty instruction can be moved across _try boundary. Second, the order of exceptions for instructions 'directly' under a _try must be preserved (not applied to those in callees). Finally, global states (local/global/heap variables) that can be read outside of _try region must be updated in memory (not just in register) before the subsequent exception occurs. **** The impact to C++ code: **** Although SEH is a feature for C code, -EHa does have a profound effect on C++ side. When a C++ function (in the same compilation unit with option -EHa ) is called by a SEH C function, a hardware exception occurs in C++ code can also be handled properly by an upstream SEH _try-handler or a C++ catch(...). As such, when that happens in the middle of an object's life scope, the dtor must be invoked the same way as C++ Synchronous Exception during unwinding process. **** Design and Implementation: **** A natural way to achieve the rules above in LLVM today is to allow an EH edge added on memory/computation instruction (previous iload/istore idea) so that exception path is modeled in Flow graph preciously. However, tracking every single memory instruction and potential faulty instruction can create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial. This design does not intend to model exception path at instruction level. Instead, the proposed design tracks and reports EH state at BLOCK-level to reduce the complexity of flow graph and minimize the performance-impact on CPP code under -EHa option. Detailed implementation described below. -- Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end(). _scope_begin() is immediately added after ctor() is called and EHStack is pushed. So it must be an invoke, not a call. With that it's also guaranteed an EH-cleanup-pad is created regardless whether there exists a call in this scope. _scope_end is added before dtor(). These two intrinsics make the computation of Block-State possible in downstream code gen pass, even in the presence of ctor/dtor inlining. -- Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark _try boundary and to prevent from exceptions being moved across _try boundary. -- All memory instructions inside a _try are considered as 'volatile' to assure 2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's acceptable as the amount of code directly under _try is very small. -- For both C++ & C-code, the state of each block is computed at the same place in BE (WinEHPreparing pass) where all other EH tables/maps are calculated. In addition to _scope_begin & _scope_end, the computation of block state also rely on the existing State tracking code (UnwindMap and InvokeStateMap). -- For both C++ & C-code, the state of each block with potential trap instruction is marked and reported in DAG Instruction Selection pass, the same place where the state for -EHsc (synchronous exceptions) is done. -- If the first instruction in a reported block scope can trap, a Nop is injected before this instruction. This nop is needed to accommodate LLVM Windows EH implementation, in which the address in IPToState table is offset by +1. (note the purpose of that is to ensure the return address of a call is in the same scope as the call address. -- The handler for catch(...) for -EHa must handle HW exception. So it is 'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches C++ exceptions). From: Ten Tzen <tentzen at microsoft.com<mailto:tentzen at microsoft.com>> Sent: Friday, April 3, 2020 9:43 PM To: rnk at google.com<mailto:rnk at google.com> Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Aaron Smith <aaron.smith at microsoft.com<mailto:aaron.smith at microsoft.com>> Subject: RE: [EXTERNAL] Re: [llvm-dev] [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling) Hi, Reid, Nice to finally meet you😊. Thank you for reading through the doc and providing insightful feedbacks. Yes I definitely can separate these two features if it’s more convenient for everyone. For now, the local_unwind specific changes can be separated and reviewed between these two commits: git diff 9b48ea90f4c9ae7ef030719d6c0b49b00861cdde 06c81a4b6262445432a4166627b87bf595f5291b the -EHa changes can be read : git diff e943329ba00772f96fbc1fe5dec836cfd0707a38 9b48ea90f4c9ae7ef030719d6c0b49b00861cdde My reply inline below in [Ten] lines. --Ten From: Reid Kleckner <rnk at google.com<mailto:rnk at google.com>> Sent: Friday, April 3, 2020 3:36 PM To: Ten Tzen <tentzen at microsoft.com<mailto:tentzen at microsoft.com>> Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Aaron Smith <aaron.smith at microsoft.com<mailto:aaron.smith at microsoft.com>> Subject: [EXTERNAL] Re: [llvm-dev] [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling) UHi Ten, Thanks for the writeup and implementation, nice to meet you. I wonder if it would be best to try to discuss the features separately. My view is that catching hardware exceptions (/EHa) is critical functionality, but it's not clear to me if local unwind is truly worth implementing. Having looked at the code briefly, it seemed like a large portion of the complexity comes from local unwind. Today, clang crashes on this small example that jumps out of a __finally block, but the intention was to reject the code and avoid implementing the functionality. Clang does, in fact, emit a warning: $ clang -c t.cpp t.cpp:7:7: warning: jump out of __finally block has undefined behavior [-Wjump-seh-finally] goto lu1; ^ Local unwind, in my view, is the user saying, "I wrote __finally, but actually I decided I wanted to catch the exception, so let's transfer to normal control flow now." It seems to me that the user already has a way to express this: __except. I know the mapping isn't trivial and it's not exactly the same, but it seems feasible to rewrite most uses of local unwind this way. [Ten] Right, I agree that to some degree a local_unwind can be viewed as another type of _except handler in the middle of unwinding. And true that some usage patterns can be worked around by rewriting SEH hierarchy. But I believe the work can be substantial and risky, especially in an OS Kernel. Furthermore, to broaden the interpretation, local_unwind can also serve as a _filter (or even rethrow-like handler in C++ EH), and the target block is the final handler. See the multi-local-unwind example in the doc. Can you estimate the prevalence of local unwind? What percent of __finally blocks in your experience use non-local control flow? I see a lot of value in supporting catching hardware exceptions, but if we can avoid carrying over the complexity of this local unwind feature, it seems to me that future generations of compiler engineers will thank us. [Ten] I don’t have this data in hand. But what I know is that local_unwind is an essential feature to build Windows Kernel. One most important SEH test (the infamous xcpt4u.c) is composed of 88 tests; among them there are 25 jumping-out-of-finally occurrences. Of course this does not translate to a percentage of local_unwind, but it does show us the significance of this feature to Windows. FYI Passing xcpt4u.c is the very first fundamental requirement before building Windows Kernel. --- Regarding trap / non-call / hardware exception handling, I guess I am a bit more blase about precisely modeling the control flow. As Eli mentions, we already have setjmp, and we already don't model it. Users file bugs about problems with setjmp, and we essentially close them as "wontfix" and tell them to put more "volatile" on the problem until it stops hurting. One thing that I am very concerned about is the implications for basic block layout. Right now, machine basic block layout is very free-handed. Today, CodeGen puts labels around every potentially-throwing call, does block layout without considering try regions, and then collapses adjacent label regions with the same landingpad during AsmPrinting. For MSVC C++ EH, state number stores and the ip2state table achieve the same goal. [Ten] Yes, I saw that (pretty nice implementation actually). This design and implementation completely inherits current mechanism except that now it’s allowed to report EH state ranges that only contain memory/computation instructions (for obvious reason). I’m not sure which part of that concerns you. I think we need rules about how LLVM is allowed to transform the following code: void foo(volatile int *pv) { __try { if (cond()) { ++*pv; __builtin_unreachable(); } } __except(1) { } __try { if (cond()) { ++*pv; __builtin_unreachable(); } } __except(1) { } } In this case, the *pv operation may throw, but I believe it would be semantics preserving to merge the two identical if-then blocks. The call.setup proposal I sent not long ago runs into the same issue. I have written a patch to tail merge such similar blocks, but I have not landed it: https://reviews.llvm.org/D29428<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD29428&data=02%7C01%7Ctentzen%40microsoft.com%7C334ed759562941d3f2ba08d7e194d0d6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637225901600863316&sdata=C2rPxOIKLoLFbFgpRlT%2F4E4aaazpphHhTiL3HNF%2BI9Y%3D&reserved=0> Even though it's not yet landed, I think we need to know if the transform is valid. If it is, then we need to do more than volatilize the try region to make EHa work. [Ten] The merging should not happen. Per C-standard, a volatile must be read (or write) ONCE and only once (as long as it’s naturally aligned and can be accessed in one operation by HW). So merging two volatiles violates the standard. I’m sure it’s currently well-protected in LLVM today. For a long time I've wanted regions of some kind in LLVM IR, and this use case has made me want to pick it up again. However, assuming that you want to land support for hardware exceptions without some kind of generalized region support in the IR, I think we do need to do something about these blocks ending in unreachable in __try regions. The simplest thing that could possibly work is to make clang end the try region before unreachable. This would mean ending the block and adding `invoke void @seh_try_end` after every unreachable. It would be redundant for noreturn calls, since those will already have an unwind edge, ensuring they remain in the try region. [Ten] it’s interesting you mentioned this “blocks ending in unreachable in __try regions" here. With these two features supported, two remaining bugs in my ToDo list are; one setjmp() and one nested EH throw. The second one seems caused by a _try_block ended with an unreachable. Yes, this is on my list. Will discuss with you guys further when I look into it. --- Another interesting aspect of /EHa is how it affects C++ destructor cleanups. I am personally comfortable with the requirement that LLVM avoid moving around volatile instructions in __try blocks. LLVM is already required to leave volatile operations in order. But I *am* concerned about C++ destructor scopes, which are much more frequent than __try. As you have described it, clang would invoke eha_scope_begin() / eha_scope_end() around the object lifetime, but are you proposing to volatilize all memory operations in the region? If not, I see nothing that would prevent LLVM from moving potentially faulting operations in or out of this scope. We cannot require passes to look for non-local EH regions before doing code motion. Would that be acceptable behavior? It could lead to some strange behavior, where a load is sunk to the point of use outside the cleanup region, but maybe users don't care about this in practice. [Ten] No, memory operations in C++ need not be volatilized. The order of exception in C++ code does not matter for -EHa. Potential trap instructions are free to move in/out of any EH region. The only criteria is that when a HW exception is caught and handled, local live objects must be dtored gracefully, the same manner as C++ Synchronous exception. By reporting the EH state of those trap instructions, this is automatically done in LLVM today. --- To summarize, my feedback would be: 1. Focus on __try and hardware exceptions first, the value proposition is clear and large. In particular, something has to be done about unreachable. Clang should already thread other abnormal control flow through the region exit. 2. Please gather some data on prevalence of local unwind to motivate the feature 3. Please elaborate on the design for /EHa C++ destructor cleanups and code motion I hope that helps, and I'm sorry if I'm slow to respond, this is a tricky problem, and it's not my first priority. Reid On Wed, Apr 1, 2020 at 8:22 AM Ten Tzen via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi, all, The intend of this thread is to complete the support for Windows SEH. Currently there are two major missing features: Jumping out of a _finally and Hardware exception handling. The document below is my proposed design and implementation to fully support SEH on LLVM. I have completely implemented this design on a branch in repo: https://github.com/tentzen/llvm-project<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project&data=02%7C01%7Ctentzen%40microsoft.com%7C334ed759562941d3f2ba08d7e194d0d6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637225901600863316&sdata=bDssTTeo13Fw1qheHB%2BX%2FG1rcJty0l%2FYFeDUdtZYpO4%3D&reserved=0>. It now passes MSVC’s in-house SEH suite. Sorry for this long write-up. For better readability, please read it on https://github.com/tentzen/llvm-project/wiki<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project%2Fwiki&data=02%7C01%7Ctentzen%40microsoft.com%7C334ed759562941d3f2ba08d7e194d0d6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637225901600873310&sdata=YdaHyRNh4JigYkzQQPx7COVJXYz2WTjaqr2E18RHo6Y%3D&reserved=0> Special thanks to Joseph Tremoulet for his earlier comments and suggestions. Note: I just subscribed llvm-dev, probably not in the list yet. So please reply with my email address (tentzen at microsoft.com<mailto:tentzen at microsoft.com>) explicitly in To-list. Thanks, --Ten -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200416/94fcd202/attachment-0001.html>
Possibly Parallel Threads
- [RFC] [Windows SEH][-EHa] Support Hardware Exception Handling
- [RFC] [Windows SEH][-EHa] Support Hardware Exception Handling
- [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally)
- [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling)
- [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling)