Ten Tzen via llvm-dev
2020-Apr-01  04:12 UTC
[llvm-dev] [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling)
Hi, all,
The intend of this thread is to complete the support for Windows SEH.
Currently there are two major missing features:  Jumping out of a _finally and
Hardware exception handling.
The document below is my proposed design and implementation to fully support SEH
on LLVM.
I have completely implemented this design on a branch in repo: 
https://github.com/tentzen/llvm-project<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project&data=02%7C01%7Ctentzen%40microsoft.com%7Ced638e497aa74798b3f808d7d5e46775%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637213049272295023&sdata=Pd6gK%2B7JsIlfcyJLB%2FajWKdrbgqsITsseBfeB2Z5lgg%3D&reserved=0>.
It now passes MSVC's in-house SEH suite.
Sorry for this long write-up.  For better readability, please read it on
https://github.com/tentzen/llvm-project/wiki<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project%2Fwiki&data=02%7C01%7Ctentzen%40microsoft.com%7Ced638e497aa74798b3f808d7d5e46775%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637213049272305020&sdata=SN9XBN6InU79U%2FEXnReyi9H1uPbVwTHgXhMkKODnA%2FM%3D&reserved=0>
Special thanks to Joseph Tremoulet for his earlier comments and suggestions.
Note: I just subscribed llvm-dev, probably not in the list yet.  So please reply
with my email address (tentzen at microsoft.com<mailto:tentzen at
microsoft.com>) explicitly in To-list.
Thanks,
--Ten
Windows SEH Support in LLVM
INTRODUCTION
An exception is an event that occurs during the execution of a program. It
requires the execution of code outside the normal flow of control. There are two
kinds of exceptions: hardware exceptions and software exceptions. Hardware
exceptions are initiated by the CPU, such as division by zero or an attempt to
access an invalid memory address. Software exceptions are initiated explicitly
by applications or the operating system. Windows SEH (Structured exception
handling) is a mechanism for handling both hardware and software exceptions.
Windows C++ Exception Handling is almost fully supported in LLVM. Detailed
design and new FuncletPad IR can be seen in
https://llvm.org/docs/ExceptionHandling.html#exception-handling-using-the-windows-runtime<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fllvm.org%2Fdocs%2FExceptionHandling.html%23exception-handling-using-the-windows-runtime&data=02%7C01%7Ctentzen%40microsoft.com%7Ced638e497aa74798b3f808d7d5e46775%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637213049272305020&sdata=skHY8qjtHYdwUzJ9uln4vc2di20e5Sa8%2B4%2FmFS2tQ0M%3D&reserved=0>.
However, for SEH, LLVM today is missing two major features. This project intents
to extend current model to achieve two missing features.
  1.  Local Unwind (AKA: Jumping out of _finally)
  2.  Hardware Exception Handling (AKA: MSVC++ option -EHa)
LOCAL UNWIND
In Windows SEH when a goto statement (or whatever statement, like
break/continue/leave/return, that changes control flow) in a _finally targeting
a label outside of the _finally clause, a "local-unwind" must be
triggered to properly invoke _finally clauses alone the path from the goto
statement to the target label. Since _finally clause can be executed in either
"normal execution path" as well as "exception path", the
_local_unwind can take place in both paths too.
Let's demonstrate all possible paths in the following example.
try {
  try {
    try {
      /* set counter = 1 */
      Counter += 1;
      if (ex)
        RtlRaiseException(&ExceptionRecord);
    }  finally {
      Counter += 1;
      if (abnormal_termination()) {
        printf(" inner finally: exception path \n\r");
      }
      else {
        printf(" inner finally: normal path \n\r");
      }
      if (lu) {
        printf(" inner finally: local unwind \n\r");
        goto t10;
      }
      printf(" inner finally: normal return \n\r");
    }
  } finally {
    Counter += 1;
    printf(" outer finally: \n\r");
  }
}
except(Counter) {
  /* set counter = 3 */
  printf(" except handler: \n\r");
  Counter += 1;
}
printf(" after outer try_except: \n\r");
t10:;
  *   Normal execution (ex is false), normal return (lu is false): both
_finallys are executed normally, but _except_handler should not be executed.
Output is:
inner finally: normal path:
inner finally: normal return:
outer finally:
after outer try_except:
  *   Normal execution (ex is false), local-unwind (lu is true): both _finallys
are executed, due to local-unwind, control jumps to $t10, "after outer
try_except" is not printed.
inner finally: normal path:
inner finally: local unwind:
outer finally:
  *   Exception execution (ex is true), normal return (lu is false): Windows
runtime found the handler. It invokes inner _finally and outer _finally, then
except-handler and jump to continue address, end of outer-try.
inner finally: exception path:
inner finally: normal return:
outer finally:
except handler:
after outer try_except:
*        Exception execution (ex is true), local-unwind (lu is true): Windows
runtime found the handler. It invokes inner _finally where _local_unwind is
kicked off. It unwinds to outer _finally then jump to target label, $t10. Again,
"after outer try_except" is not printed.
inner finally: exception path:
inner finally: local unwind:
outer finally:
To perform local unwind, Windows provides a _local_unwind() runtime function
that requires two input parameters: the target label address and the stack
frame. Note that the 2nd parameter is 'Establisher's stack pointer, not
a frame-pointer/base-pointer. With that all we need is to turn a goto statement
into a _local_unwind() invoke. Since the target label is beyond function
(_funclet) boundary, the target label must also be declared as a static global
label (a MCSymbol in LLVM) that need be fixed up by Linker.
IR modeling for Optimizer:
While transferring a goto statement into a runtime function call/invoke is
straight forward, another more complicate issue is how to model _local_unwind in
IR so that Optimizer can see its control flows. In #2 case of above example, the
control flowing from normal execution inner _finlly, passing through outer
_finally, and landing in $t10 cannot be represented by LLVM IR today. Similarly
in #4, the control starting from RtlRaiseException() passing through both
_finally funclets then landing in $t10 was not seen.
To precisely represent _local_unwind flow, our proposed solution is:
*        Add one more catchpad/catchret pair that forwards control to
local_unwind target. I.e., this extra Catchpad is the reentrance point for the
_local_unwind() runtime.
*        This catchpad address is used to pass to _local_unwind() runtime,
instead of the original goto target address.
*        The local_unwind catchpad will be handled the same way as
_except-handler; it will not become a funclet, instead it's demoted to a
normal label in parent function.
*        During LLVM BE code-gen and code layout pass, the Catchpad
(local_unwind dispatching) block must be assigned the same EH state as the
original goto target so that the local unwinding can be correctly landing at the
right EH scope.
For example, the IR of above example today is briefly listed below.
________________________________
define dso_local i32 @main() #0 personality i8* bitcast (i32 (...)*
@__C_specific_handler
..
%28 = invoke i32 bitcast (i32 (...)* @RtlRaiseException to
to label %29 unwind label %35,
; <label>:29: ; preds = %27
br label %30,
; <label>:30: ; preds = %29, %15
%31 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 0, i8* %31) #7
to label %32 unwind label %39,
; <label>:32: ; preds = %30
%33 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 0, i8* %33) #7
to label %34 unwind label %43,
; <label>:34: ; preds = %32
br label %53,
; <label>:35: ; preds = %27
%36 = cleanuppad within none [],
%37 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 1, i8* %37) #7 [
"funclet"(token %36) ]
to label %38 unwind label %39,
; <label>:38: ; preds = %35
cleanupret from %36 unwind label %39,
; <label>:39: ; preds = %38, %35, %30
%40 = cleanuppad within none [],
%41 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 1, i8* %41) #7 [
"funclet"(token %40) ]
to label %42 unwind label %43,
; <label>:42: ; preds = %39
cleanupret from %40 unwind label %43,
; <label>:43: ; preds = %42, %39, %32
%44 = catchswitch within none [label %45] unwind to caller,
; <label>:45: ; preds = %43
%46 = catchpad within %44 [i8* bitcast (i32 (i8*, i8*)* @"?filt at
0@main@@" to i8*)],
catchret from %46 to label %47,
; <label>:47: ; preds = %45
// except handler block
..
br label %53, !dbg !155
; <label>:53: ; preds = %47, %34
// after outer _try block
.. br label %56, !dbg !156
; <label>:t10:
.. ..
define internal void @"?fin at 0@main@@"(i8, i8* %1) #2 {
..
%2 = blockaddress($main, $t10)
call void @"?local_unwind@@"(i8* %1, i8* %2)
________________________________
The new IR is illustrated below. Changes are highlighted in bold:
________________________________
define dso_local i32 @main() #0 personality i8* bitcast (i32 (...)*
@__C_specific_handler
..
%28 = invoke i32 bitcast (i32 (...)* @RtlRaiseException to
to label %29 unwind label %35,
; <label>:29: ; preds = %27
br label %30,
; <label>:30: ; preds = %29, %15
%31 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 0, i8* %31) #7
to label %32 unwind label %39,
; <label>:32: ; preds = %30
%33 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 0, i8* %33) #7
to label %34 unwind label %43,
; <label>:34: ; preds = %32
br label %53,
; <label>:35: ; preds = %27
%36 = cleanuppad within none [],
%37 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 1, i8* %37) #7 [
"funclet"(token %36) ]
to label %38 unwind label %39,
; <label>:38: ; preds = %35
cleanupret from %36 unwind label %39,
; <label>:39: ; preds = %38, %35, %30
%40 = cleanuppad within none [],
%41 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 1, i8* %41) #7 [
"funclet"(token %40) ]
to label %42 unwind label %43,
; <label>:42: ; preds = %39
cleanupret from %40 unwind label %43,
; <label>:43: ; preds = %42, %39, %32
%44 = catchswitch within none [label %45, label %60] unwind to caller,
; <label>:45: ; preds = %43
%46 = catchpad within %44 [i8* bitcast (i32 (i8*, i8*)* @"?filt at
0@main@@" to i8*)],
catchret from %46 to label %47,
; <label>:60: ; preds = %43
%61 = catchpad within %44 [i8* bitcast (i32 (i8*, i8*)* @"?IsLocalUnwind at
0@main@@" to i8*)]
catchret from %61 to label %t10
; <label>:47: ; preds = %45
// except handler block
..
br label %53,
; <label>:53: ; preds = %47, %34
// after outer _try block
..
br label %56,
; <label>:t10:
.. ..
define internal void @"?fin at 0@main@@"(i8, i8* %1) #2 {
..
%2 = blockaddress($main, %60)
call void @"?local_unwind@@"(i8* %1, i8* %2)
________________________________
Note that @"?IsLocalUnwind at 0@main@@" is a funclet, similar to
@"?filt$0 at 0@main@@" of _except handler. The difference is that
"?IsLocalUnwind at 0@main@@" is a dummy one which is never being
called/checked by any runtime. It's there to make IR more readable and
consistent with existing model. However, unlike ?filt$0 at 0@main@@ that will be
referenced by EH table (for 1st pass, virtual unwind), "?IsLocalUnwind at
0@main@@" will be discarded by BE. At the end, there will not be a funclet
generated in the output object file.
Dispatch on Try-Finally
When the outermost _try is a _finally, not an _excecpt construct, a pseudo
_try/_except is added to dispatch _local_unwind. This try-except has one
constant filter EXCEPTION_CONTINUE_SEARCH, so from functional perspective,
it's virtually a NOP _try. Its only purpose is to model _local_unwind()
exception path.
Multiple Local-Unwinds
If there exists two or more local_unwind targets, one catchpad/catchret pair is
injected for each target. The catchpad/catchret must be added at the same _try
scope as its corresponding target label. For example,
Try {
  try {
    try  { /* inner try */
      if (ex)
        RtlRaiseException(&ExceptionRecord);
    } finally  {
      if (lu)
        goto t10;
      else if (lu2)
        goto t20;
   else if (lu3)
        goto t30
      printf(" inner finally: normal return \n\r");
    }
  } except(Counter) {
    /* inner handler */
  }
  // after inner handler
  t10:
  ...
  t20:
  ..
except(1)  {
  /* outer handler */
}
// after outer try
t30:
// after t30
The corresponding IR is listed below. It must be the 2nd _try to dispatch the
local unwind to t10 and t20.
________________________________
%12 = invoke i32 bitcast (i32 (...)* @RtlRaiseException to i32
to label %13 unwind label %16,
; <label>:13: ; preds = %0
%14 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 0, i8* %14) #7
to label %15 unwind label %20,
; <label>:15: ; preds = %13
br label %31,
; <label>:16: ; preds = %0
%17 = cleanuppad within none [],
%18 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 1, i8* %18) #7 [
"funclet"(token %17) ]
to label %19 unwind label %20,
; <label>:19: ; preds = %16
cleanupret from %17 unwind label %20,
; <label>:20: ; preds = %19, %16, %13
%21 = catchswitch within none [label %22, label %110, label %120] unwind label
%34,
; <label>:22: ; preds = %20
%23 = catchpad within %21 [i8* bitcast (i32 (i8*, i8*)* @"?filt at
0@main@@" to i8*)],
catchret from %23 to label %24,
; <label>:110: ; preds = %20
%23 = catchpad within %21 [i8* bitcast (i32 (i8*, i8*)* @"?IslocalUnwindt10
at 0@main@@" to i8*)],
catchret from %23 to label %t10
; <label>:120: ; preds = %20
%23 = catchpad within %21 [i8* bitcast (i32 (i8*, i8*)* @"?IslocalUnwindt20
at 0@main@@" to i8*)],
catchret from %23 to label %t20
; <label>:24: ; preds = %22
// inner handler
%27 = invoke i32 (i8*, ...) @printf(i8* getelementptr inbounds ([31 x i8], [31 x
i8]*
to label %28 unwind label %34,
; <label>:28: ; preds = %24
.. ..
br label %31,
; <label>:31: ; preds = %28, %15
// after inner handler
%33 = invoke i32 (i8*, ...) @printf(i8* ..
to label %49 unwind label %34,
; <label>:34: ; preds = %31, %24, %20
%35 = catchswitch within none [label %36, label %130] unwind to caller,
; <label>:36: ; preds = %34
%37 = catchpad within %35 [i8* null],
catchret from %37 to label %38,
; <label>:130: ; preds = %34
%37 = catchpad within %35 [i8* bitcast (i32 (i8*, i8*)* @"?IslocalUnwindt30
at 0@main@@" to],
catchret from %37 to label %t30
; <label>:38: ; preds = %36 // outer handler %41 = call i32 (i8*, ...)
@printf(i8*..
br label %42,
; <label>:42: ; preds = %38, %53
// after outer try
%44 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([22 x i8], [22 x
i8]*
br label %t30,
; <label>:t30: ; preds = %42
// after t30
ret
; <label>:49: ; preds = %31
br label %t10,
; <label>:t10: ; preds = %49
%51 = load i32, i32* %3, align 4,
%52 = add nsw i32 %51, 10,
store i32 %52, i32* %3, align 4,
br label %t20,
; <label>:t20: ; preds = %t10
%54 = load i32, i32* %3, align 4,
%55 = add nsw i32 %54, 20,
store i32 %55, i32* %3, align 4,
br label %42, !dbg !143
}
define internal void @"?fin at 0@main@@"(i8, i8* %1) #2 {
..
%2 = blockaddress($main, %110)
call void @"?local_unwind@@"(i8* %1, i8* %2)
..
%2 = blockaddress($main, %120)
call void @"?local_unwind@@"(i8* %1, i8* %2)
..
%2 = blockaddress($main, %130)
call void @"?local_unwind@@"(i8* %1, i8* %2)
..
________________________________
Implementation:
The first primary task of the design above is to determine the right place to
add this pseudo 'catchwitch' construct in order to dispatch a local
unwind to its target. One straight forward way is to add this new level
immediately on top of the outermost _try that encloses the local-unwind
statement and locates in the same EH scope as the unwind target.
Since semantic analysis and scope information are well constructed and performed
in Clang's Parser/Semantic-analyzer, the implementation just need to
slightly extend existent code to identifies local unwind statements and record
LU targets in the outermost SEHTryStmt during Parser/Semantic phase.
For Break/Continue/Leave/Return local unwind, please see Sema::ActOnBreakStmt()
and Sema::ActOnContinueStmt() and Parser::ParseSEHTryBlock(). For Goto local
unwind, it's more complicated as it could be a forward reference. Our code
utilizes JumpDiagnostics.cpp where Goto out of _finally is detected and
reported. Please see the change in JumpScopeChecker::CheckJump().
The second task is in FE CodeGen. Before entering the Try, an extra EHCatchScope
level is pushed into EHStack. Based on LU information recorded on SEHTryStmt by
earlier Parser & Semantic phases, a handler (Catchpad) is created to
dispatch local-unwind for each target associated with this Try statement. This
handler block will be used as the target-address for MSVC's _local_unwind()
runtime. See CodeGenFunction::pushSEHLocalUnwind() and
CodeGenFunction::popSEHLocalUnwind().
Finally in LLVM calculateSEHStateNumbers() (see the change in WinEHPrepare.cpp),
all _IsLocalUnwind**() filters in pseudo CatchSwitches are discarded and all LU
dispatch handlers are assigned to its parent scope's EH state.
HARDWARE EXCEPTION HANDLING (-EHA)
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules. First, no exception can move in or out of _try region., i.e., no
"potential faulty instruction can be moved across _try boundary. Second,
the order of exceptions for instructions 'directly' under a _try must be
preserved (not applied to those in callees). Finally, global states
(local/global/heap variables) that can be read outside of _try region must be
updated in memory (not just in register) before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also be
handled properly by an upstream SEH _try-handler or a C++ catch(...). As such,
when that happens in the middle of an object's life scope, the dtor must be
invoked the same way as C++ Synchronous Exception during unwinding process.
Design and Implementation:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all optimizations
be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option. Detailed implementation described below.
-- Two intrinsic are created to track CPP object scopes; eha_scope_begin() and
eha_scope_end(). _scope_begin() is immediately added after ctor() is called and
EHStack is pushed. So it must be an invoke, not a call. With that it's also
guaranteed an EH-cleanup-pad is created regardless whether there exists a call
in this scope. _scope_end is added before dtor(). These two intrinsics make the
computation of Block-State possible in downstream code gen pass, even in the
presence of ctor/dtor inlining.
-- Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to
mark _try boundary and to prevent from exceptions being moved across _try
boundary.
-- All memory instructions inside a _try are considered as 'volatile' to
assure 2nd and 3rd rules for C-code above. This is a little sub-optimized. But
it's acceptable as the amount of code directly under _try is very small.
-- For both C++ & C-code, the state of each block is computed at the same
place in BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state
also rely on the existing State tracking code (UnwindMap and InvokeStateMap).
-- For both C++ & C-code, the state of each block with potential trap
instruction is marked and reported in DAG Instruction Selection pass, the same
place where the state for -EHsc (synchronous exceptions) is done.
-- If the first instruction in a reported block scope can trap, a Nop is
injected before this instruction. This nop is needed to accommodate LLVM Windows
EH implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.
-- The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only
catches C++ exceptions).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200401/73dfb7f9/attachment-0001.html>
Reid Kleckner via llvm-dev
2020-Apr-03  22:36 UTC
[llvm-dev] [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling)
UHi Ten,
Thanks for the writeup and implementation, nice to meet you.
I wonder if it would be best to try to discuss the features separately. My
view is that catching hardware exceptions (/EHa) is critical functionality,
but it's not clear to me if local unwind is truly worth implementing.
Having looked at the code briefly, it seemed like a large portion of the
complexity comes from local unwind. Today, clang crashes on this small
example that jumps out of a __finally block, but the intention was to
reject the code and avoid implementing the functionality. Clang does, in
fact, emit a warning:
$ clang -c t.cpp
t.cpp:7:7: warning: jump out of __finally block has undefined behavior
[-Wjump-seh-finally]
      goto lu1;
      ^
Local unwind, in my view, is the user saying, "I wrote __finally, but
actually I decided I wanted to catch the exception, so let's transfer to
normal control flow now." It seems to me that the user already has a way to
express this: __except. I know the mapping isn't trivial and it's not
exactly the same, but it seems feasible to rewrite most uses of local
unwind this way.
Can you estimate the prevalence of local unwind? What percent of __finally
blocks in your experience use non-local control flow? I see a lot of value
in supporting catching hardware exceptions, but if we can avoid carrying
over the complexity of this local unwind feature, it seems to me that
future generations of compiler engineers will thank us.
---
Regarding trap / non-call / hardware exception handling, I guess I am a bit
more blase about precisely modeling the control flow. As Eli mentions, we
already have setjmp, and we already don't model it. Users file bugs about
problems with setjmp, and we essentially close them as "wontfix" and
tell
them to put more "volatile" on the problem until it stops hurting.
One thing that I am very concerned about is the implications for basic
block layout. Right now, machine basic block layout is very free-handed.
Today, CodeGen puts labels around every potentially-throwing call, does
block layout without considering try regions, and then collapses adjacent
label regions with the same landingpad during AsmPrinting. For MSVC C++ EH,
state number stores and the ip2state table achieve the same goal.
I think we need rules about how LLVM is allowed to transform the following
code:
void foo(volatile int *pv) {
  __try {
    if (cond()) {
      ++*pv;
      __builtin_unreachable();
    }
  } __except(1) { }
  __try {
    if (cond()) {
      ++*pv;
      __builtin_unreachable();
    }
  } __except(1) { }
}
In this case, the *pv operation may throw, but I believe it would be
semantics preserving to merge the two identical if-then blocks. The
call.setup proposal I sent not long ago runs into the same issue. I have
written a patch to tail merge such similar blocks, but I have not landed it:
https://reviews.llvm.org/D29428
Even though it's not yet landed, I think we need to know if the transform
is valid. If it is, then we need to do more than volatilize the try region
to make EHa work.
For a long time I've wanted regions of some kind in LLVM IR, and this use
case has made me want to pick it up again. However, assuming that you want
to land support for hardware exceptions without some kind of generalized
region support in the IR, I think we do need to do something about these
blocks ending in unreachable in __try regions. The simplest thing that
could possibly work is to make clang end the try region before unreachable.
This would mean ending the block and adding `invoke void @seh_try_end`
after every unreachable. It would be redundant for noreturn calls, since
those will already have an unwind edge, ensuring they remain in the try
region.
---
Another interesting aspect of /EHa is how it affects C++ destructor
cleanups. I am personally comfortable with the requirement that LLVM avoid
moving around volatile instructions in __try blocks. LLVM is already
required to leave volatile operations in order. But I *am* concerned about
C++ destructor scopes, which are much more frequent than __try. As you have
described it, clang would invoke eha_scope_begin() / eha_scope_end() around
the object lifetime, but are you proposing to volatilize all memory
operations in the region? If not, I see nothing that would prevent LLVM
from moving potentially faulting operations in or out of this scope. We
cannot require passes to look for non-local EH regions before doing code
motion. Would that be acceptable behavior? It could lead to some strange
behavior, where a load is sunk to the point of use outside the cleanup
region, but maybe users don't care about this in practice.
---
To summarize, my feedback would be:
1. Focus on __try and hardware exceptions first, the value proposition is
clear and large. In particular, something has to be done about unreachable.
Clang should already thread other abnormal control flow through the region
exit.
2. Please gather some data on prevalence of local unwind to motivate the
feature
3. Please elaborate on the design for /EHa C++ destructor cleanups and code
motion
I hope that helps, and I'm sorry if I'm slow to respond, this is a
tricky
problem, and it's not my first priority.
Reid
On Wed, Apr 1, 2020 at 8:22 AM Ten Tzen via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi, all,
>
>
>
> The intend of this thread is to complete the support for Windows SEH.
>
> Currently there are two major missing features:  Jumping out of a _finally
> and Hardware exception handling.
>
>
>
> The document below is my proposed design and implementation to fully
> support SEH on LLVM.
>
> I have completely implemented this design on a branch in repo:
> https://github.com/tentzen/llvm-project
>
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project&data=02%7C01%7Ctentzen%40microsoft.com%7Ced638e497aa74798b3f808d7d5e46775%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637213049272295023&sdata=Pd6gK%2B7JsIlfcyJLB%2FajWKdrbgqsITsseBfeB2Z5lgg%3D&reserved=0>.
>
>
> It now passes MSVC’s in-house SEH suite.
>
>
>
> Sorry for this long write-up.  For better readability, please read it on
> https://github.com/tentzen/llvm-project/wiki
>
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project%2Fwiki&data=02%7C01%7Ctentzen%40microsoft.com%7Ced638e497aa74798b3f808d7d5e46775%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637213049272305020&sdata=SN9XBN6InU79U%2FEXnReyi9H1uPbVwTHgXhMkKODnA%2FM%3D&reserved=0>
>
>
>
> Special thanks to Joseph Tremoulet for his earlier comments and
> suggestions.
>
>
>
> Note: I just subscribed llvm-dev, probably not in the list yet.  So please
> reply with my email address (tentzen at microsoft.com) explicitly in
> To-list.
>
> Thanks,
>
>
>
> --Ten
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200403/1d3f028e/attachment.html>
Ten Tzen via llvm-dev
2020-Apr-04  04:43 UTC
[llvm-dev] [EXTERNAL] Re: [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling)
Hi, Reid,
Nice to finally meet you😊.
Thank you for reading through the doc and providing insightful feedbacks.
Yes I definitely can separate these two features if it’s more convenient for
everyone.
For now, the local_unwind specific changes can be separated and reviewed between
these two commits:
      git diff 9b48ea90f4c9ae7ef030719d6c0b49b00861cdde
06c81a4b6262445432a4166627b87bf595f5291b
the -EHa changes can be read :
     git diff e943329ba00772f96fbc1fe5dec836cfd0707a38  
9b48ea90f4c9ae7ef030719d6c0b49b00861cdde
My reply inline below in [Ten] lines.
--Ten
From: Reid Kleckner <rnk at google.com<mailto:rnk at google.com>>
Sent: Friday, April 3, 2020 3:36 PM
To: Ten Tzen <tentzen at microsoft.com<mailto:tentzen at
microsoft.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Aaron
Smith <aaron.smith at microsoft.com<mailto:aaron.smith at
microsoft.com>>
Subject: [EXTERNAL] Re: [llvm-dev] [RFC] [Windows SEH] Local_Unwind (Jumping out
of a _finally) and -EHa (Hardware Exception Handling)
UHi Ten,
Thanks for the writeup and implementation, nice to meet you.
I wonder if it would be best to try to discuss the features separately. My view
is that catching hardware exceptions (/EHa) is critical functionality, but
it's not clear to me if local unwind is truly worth implementing. Having
looked at the code briefly, it seemed like a large portion of the complexity
comes from local unwind. Today, clang crashes on this small example that jumps
out of a __finally block, but the intention was to reject the code and avoid
implementing the functionality. Clang does, in fact, emit a warning:
$ clang -c t.cpp
t.cpp:7:7: warning: jump out of __finally block has undefined behavior
[-Wjump-seh-finally]
      goto lu1;
      ^
Local unwind, in my view, is the user saying, "I wrote __finally, but
actually I decided I wanted to catch the exception, so let's transfer to
normal control flow now." It seems to me that the user already has a way to
express this: __except. I know the mapping isn't trivial and it's not
exactly the same, but it seems feasible to rewrite most uses of local unwind
this way.
[Ten] Right, I agree that to some degree a local_unwind can be viewed as another
type of _except handler in the middle of unwinding. And true that  some usage
patterns can be worked around by rewriting SEH hierarchy. But I believe the work
can be substantial and risky, especially in an OS Kernel.  Furthermore, to
broaden the interpretation, local_unwind can also serve as  a _filter (or even
rethrow-like handler in C++ EH), and the target block is the final handler.  See
the multi-local-unwind  example in the doc.
Can you estimate the prevalence of local unwind? What percent of __finally
blocks in your experience use non-local control flow? I see a lot of value in
supporting catching hardware exceptions, but if we can avoid carrying over the
complexity of this local unwind feature, it seems to me that future generations
of compiler engineers will thank us.
[Ten] I don’t have this data in hand. But what I know is that local_unwind is an
essential feature to build Windows Kernel.  One most important SEH test (the
infamous xcpt4u.c) is composed of 88 tests; among them there are 25
jumping-out-of-finally occurrences.  Of course this does not translate to a
percentage of local_unwind, but it does show us the significance of this feature
to Windows. FYI Passing xcpt4u.c is the very first fundamental requirement
before building Windows Kernel.
---
Regarding trap / non-call / hardware exception handling, I guess I am a bit more
blase about precisely modeling the control flow. As Eli mentions, we already
have setjmp, and we already don't model it. Users file bugs about problems
with setjmp, and we essentially close them as "wontfix" and tell them
to put more "volatile" on the problem until it stops hurting.
One thing that I am very concerned about is the implications for basic block
layout. Right now, machine basic block layout is very free-handed. Today,
CodeGen puts labels around every potentially-throwing call, does block layout
without considering try regions, and then collapses adjacent label regions with
the same landingpad during AsmPrinting. For MSVC C++ EH, state number stores and
the ip2state table achieve the same goal.
[Ten] Yes, I saw that (pretty nice implementation actually).  This design and
implementation completely inherits current mechanism except that now it’s
allowed to report EH state ranges that only contain memory/computation
instructions  (for obvious reason). I’m not sure which part of that concerns
you.
I think we need rules about how LLVM is allowed to transform the following code:
void foo(volatile int *pv) {
  __try {
    if (cond()) {
      ++*pv;
      __builtin_unreachable();
    }
  } __except(1) { }
  __try {
    if (cond()) {
      ++*pv;
      __builtin_unreachable();
    }
  } __except(1) { }
}
In this case, the *pv operation may throw, but I believe it would be semantics
preserving to merge the two identical if-then blocks. The call.setup proposal I
sent not long ago runs into the same issue. I have written a patch to tail merge
such similar blocks, but I have not landed it:
https://reviews.llvm.org/D29428<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD29428&data=02%7C01%7Ctentzen%40microsoft.com%7C50232576aff6489aa63308d7d83b45bd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215621444674810&sdata=sp2WOC8LZ58B9DX4i7L2aQVVgnC3Q%2FL9AJdK0%2FLiE0w%3D&reserved=0>
Even though it's not yet landed, I think we need to know if the transform is
valid. If it is, then we need to do more than volatilize the try region to make
EHa work.
[Ten] The merging should not happen.  Per C-standard, a volatile must be read
(or write) ONCE and only once (as long as it’s naturally aligned and can be
accessed in one operation by HW).  So merging two volatiles violates the
standard.  I’m sure it’s currently well-protected in LLVM today.
For a long time I've wanted regions of some kind in LLVM IR, and this use
case has made me want to pick it up again. However, assuming that you want to
land support for hardware exceptions without some kind of generalized region
support in the IR, I think we do need to do something about these blocks ending
in unreachable in __try regions. The simplest thing that could possibly work is
to make clang end the try region before unreachable. This would mean ending the
block and adding `invoke void @seh_try_end` after every unreachable. It would be
redundant for noreturn calls, since those will already have an unwind edge,
ensuring they remain in the try region.
[Ten] it’s interesting you mentioned this “blocks ending in unreachable in __try
regions" here.  With these two features supported, two remaining bugs in my
ToDo list are; one setjmp() and one nested EH throw.  The second one seems
caused by a _try_block ended with an unreachable.   Yes, this is on my list. 
Will discuss with you guys further when I look into it.
---
Another interesting aspect of /EHa is how it affects C++ destructor cleanups. I
am personally comfortable with the requirement that LLVM avoid moving around
volatile instructions in __try blocks. LLVM is already required to leave
volatile operations in order. But I *am* concerned about C++ destructor scopes,
which are much more frequent than __try. As you have described it, clang would
invoke eha_scope_begin() / eha_scope_end() around the object lifetime, but are
you proposing to volatilize all memory operations in the region? If not, I see
nothing that would prevent LLVM from moving potentially faulting operations in
or out of this scope. We cannot require passes to look for non-local EH regions
before doing code motion. Would that be acceptable behavior? It could lead to
some strange behavior, where a load is sunk to the point of use outside the
cleanup region, but maybe users don't care about this in practice.
[Ten] No, memory operations in C++ need not be volatilized.  The order of
exception in C++ code does not matter for -EHa.  Potential trap instructions are
free to move in/out of any EH region.  The only criteria is that when a HW
exception is caught and handled, local live objects must be dtored gracefully,
the same manner as C++ Synchronous exception.  By reporting the EH state of
those trap instructions, this is automatically done in LLVM today.
---
To summarize, my feedback would be:
1. Focus on __try and hardware exceptions first, the value proposition is clear
and large. In particular, something has to be done about unreachable. Clang
should already thread other abnormal control flow through the region exit.
2. Please gather some data on prevalence of local unwind to motivate the feature
3. Please elaborate on the design for /EHa C++ destructor cleanups and code
motion
I hope that helps, and I'm sorry if I'm slow to respond, this is a
tricky problem, and it's not my first priority.
Reid
On Wed, Apr 1, 2020 at 8:22 AM Ten Tzen via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi, all,
The intend of this thread is to complete the support for Windows SEH.
Currently there are two major missing features:  Jumping out of a _finally and
Hardware exception handling.
The document below is my proposed design and implementation to fully support SEH
on LLVM.
I have completely implemented this design on a branch in repo: 
https://github.com/tentzen/llvm-project<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project&data=02%7C01%7Ctentzen%40microsoft.com%7C50232576aff6489aa63308d7d83b45bd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215621444684806&sdata=ecOB%2B84ZyVqL3pOeX8mg%2BUYRhM09J9gJG0DH%2B2%2FFRks%3D&reserved=0>.
It now passes MSVC’s in-house SEH suite.
Sorry for this long write-up.  For better readability, please read it on
https://github.com/tentzen/llvm-project/wiki<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project%2Fwiki&data=02%7C01%7Ctentzen%40microsoft.com%7C50232576aff6489aa63308d7d83b45bd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215621444684806&sdata=wC1DLOqZJcqDN7Zm%2BAo1qSZ8r7C4eylbOrKPq%2BnQ9gM%3D&reserved=0>
Special thanks to Joseph Tremoulet for his earlier comments and suggestions.
Note: I just subscribed llvm-dev, probably not in the list yet.  So please reply
with my email address (tentzen at microsoft.com<mailto:tentzen at
microsoft.com>) explicitly in To-list.
Thanks,
--Ten
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200404/2c0ff988/attachment.html>
Apparently Analagous Threads
- [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling)
- [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling)
- [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally)
- [RFC] [Windows SEH][-EHa] Support Hardware Exception Handling
- [RFC] [Windows SEH][-EHa] Support Hardware Exception Handling