thr3ads.net - llvm dev - [LLVMdev] RFC: How to represent SEH (__try / _

If this information is useful, please help other people find it:
Share via:

Bob Wilson

2014-Nov-18 19:19 UTC

[LLVMdev] RFC: How to represent SEH (try / except) in LLVM IR

> On Nov 18, 2014, at 11:07 AM, Reid Kleckner <rnk at google.com>
wrote:
> 
> On Tue, Nov 18, 2014 at 10:50 AM, Bob Wilson <bob.wilson at apple.com
<mailto:bob.wilson at apple.com>> wrote:
> 
>> On Nov 17, 2014, at 5:50 PM, Reid Kleckner <rnk at google.com
<mailto:rnk at google.com>> wrote:
>> 
>> On Mon, Nov 17, 2014 at 5:22 PM, Bob Wilson <bob.wilson at apple.com
<mailto:bob.wilson at apple.com>> wrote:
>> I don’t know much about SEH and haven’t had time to really dig into
this, but the idea of outlining functions that need to know about the frame
layout sounds a bit scary. Is it really necessary?
>> 
>> I’m wondering if you can treat the cleanups and filter functions as
portions of the same function, instead of outlining them to separate functions.
Can you arrange to set up the base pointer on entry to one of those segments of
code to have the same value as when running the normal part of the function? If
so, from the code-gen point of view, doesn’t it just behave as if there is a
large dynamic alloca on the stack at that point (because the stack pointer is
not where it was when the function was previously running)? Are there other
constraints that prevent that from working?
>> 
>> The "big dynamic alloca" approach does work, at least
conceptually. It's more or less what MSVC does. They emit the normal code,
then the epilogue, then a special prologue that resets ebp/rbp, and then
continue with normal emission. Any local variables declared in the __except
block are allocated in the parent frame and are accessed via ebp. Any calls
create new stack adjustments to new allocate argument memory.
>> 
>> This approach sounds far scarier to me, personally, and will
significantly complicate a part of LLVM that is already poorly understood and
hard to hack on. I think adding a pair of intrinsics that can't be inlined
will be far less disruptive for the rest of LLVM. This is actually already the
status quo for SjLj exceptions, which introduce a number of uninlinable
intrinsic calls (although maybe SjLj is a bad precedent :).
>> 
>> The way I see it, it's just a question of how much frame layout
information you want to teach CodeGen to save. If we add the set_capture_block /
get_capture_block intrinsics, then we only need to save the frame offset of
*one* alloca. This is easy, we can throw it into a side table on
MachineModuleInfo. If we don't go this way, we need to save just the right
amount of CodeGen state to get stack offsets in some other function.
> 
> This is the only part that concerns me. Who keeps track of the layout of
the data inside that capture block? How do you know what local variables need to
be in the capture block? If the front-end needs to decide that, is that
something that fits easily into how clang works?
> 
> The capture block would be a boring old LLVM struct with a type created
during CodeGenPrepare.
> 
> I'm imagining a pass similar to SjLjEHPrepare that:
> - Identifies all bbs reachable from landing pads
> - Identifies all SSA values live in those bbs
> - Demote all non-alloca SSA values to allocas (DemoteRegToMem, like sjlj)
> - Combine all allocas used in landing pad bbs into a single LLVM alloca
with a new combined struct type
> - Outline code from landing pads into cleanup handlers, filters, catch
handlers, etc
> - In the parent function entry block, call @llvm.eh.seh.set_capture_block
on the combined alloca
> - In the outlined entry blocks, call
@llvm.eh.seh.get_capture_block(@parent_fn, i8* %rbp) to recover a pointer to the
capture block. Cast it to a pointer to the right type.
> - Finally, RAUW all alloca references with GEPs into the capture block
> 
> The downside is that this approach probably hurts register allocation and
stack coloring, but I think it's a reasonable tradeoff.
> 
> Thanks for prompting me on this, it helps to write things down like this.
:)
No problem. Now that I see the details of what you have in mind, I can’t think
of any reason why that wouldn’t work, and I like the way it isolates most of the
impact of SEH into one new pass. Also, if the performance impact turns out to be
worse than expected, I don’t see anything here that would prevent moving to the
“big dynamic alloca” approach later.
>  
> For DWARF EH and SjLj, the backend is responsible for handling most of the
EH work. It seems like it would be a more consistent design for SEH to do the
same.
> 
> Yep. I guess the question is, is CodeGenPrep the backend or not? 
Yes, CGP is definitely backend. I thought you were going to say that the
front-end needed to decide what goes in the capture block.
>> Having a single combined MachineFunction also means that MI passes will
have to learn more about SEH. For example, we need to preserve the ordering of
basic blocks so that we don't end up with discontiguous regions of code.
> 
> Yes, you would probably need to do that. It doesn’t seem like that would be
fundamentally difficult, but I haven’t thought through the details and I can
imagine that it would take a fair bit of work.
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141118/600d31e4/attachment.html>

Kaylor, Andrew

2014-Nov-19 01:52 UTC

head link

[LLVMdev] RFC: How to represent SEH (try / except) in LLVM IR

> For DWARF EH and SjLj, the backend is responsible for handling most of the
EH work. It seems like it would be a more consistent design for SEH to do the
same.
Looking beyond SEH to C++ exception handling for a moment, it seems to me that
clang may be handling more than it should there.  For instance, calls like
“__cxa_allocate_exception” and “__cxa_throw_exception” are baked into the clang
IR output, which seems to assume that the backend is going to be using libc++abi
for its implementation.  Yet it has enough awareness that this won’t always be
true that it coughs up an ErrorUnsupported failure for
“isWindowsMSVCEnvironment” targets when asked to emit code for “try” or “throw”.

Should this be generalized with intrinsics?

Also, I’m starting to dig into the outlining implementation and there are some
things there that worry me.  I haven’t compared any existing code that might be
doing similar things, so maybe these issues will become clear as I get further
into it, but it seemed worth bringing it up now to smooth the progress.  I’m
trying to put together a general algorithm that starts at the landing pad
instruction and groups the subsequent instructions as cleanup code or parts of
catch handlers.  This is easy enough to do as a human reading the code, but the
way that I’m doing so seems to rely fairly heavily on the names of symbols and
labels.

For instance, following the landingpad instruction I expect to find an extract
and store of “exn.slot” and “ehselector.slot” then everything between that and
wherever the catch dispatch begins must be (I think) cleanup code.  The catch
handlers I’m identifying as a sequence that starts with a load of “exn.slot” and
a call to __cxa_begin_catch and continues until it reaches a call to
__cxa_end_catch.

The calls to begin/end catch are pretty convenient bookends, but identifying the
catch dispatch code and pairing catch handlers with the clauses they represent
seems to depend on recognizing the pattern of loading the ehselector, getting a
typeid then comparing and branching.  I suppose that will work, but it feels a
bit brittle.  Then there’s the cleanup code, which I’m not yet convinced has a
consistent location relative to the catch dispatching and I fear may be moved
around by various optimizations before the outlining and will potentially be
partially shared with cleanup for other landing pads.

Then there’s the matter of what all of this will look like with SEH, but I
haven’t given that much thought yet.

For now I’ll just happily push ahead in the hopes that this will all either
resolve itself or turn out not to be much of a problem, but it seemed worth
talking about now at least.

-Andy


From: Bob Wilson [mailto:bob.wilson at apple.com]
Sent: Tuesday, November 18, 2014 11:19 AM
To: Reid Kleckner
Cc: Kaylor, Andrew; LLVM Developers Mailing List
Subject: Re: [LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR


On Nov 18, 2014, at 11:07 AM, Reid Kleckner <rnk at google.com<mailto:rnk
at google.com>> wrote:

On Tue, Nov 18, 2014 at 10:50 AM, Bob Wilson <bob.wilson at
apple.com<mailto:bob.wilson at apple.com>> wrote:

On Nov 17, 2014, at 5:50 PM, Reid Kleckner <rnk at google.com<mailto:rnk
at google.com>> wrote:

On Mon, Nov 17, 2014 at 5:22 PM, Bob Wilson <bob.wilson at
apple.com<mailto:bob.wilson at apple.com>> wrote:
I don’t know much about SEH and haven’t had time to really dig into this, but
the idea of outlining functions that need to know about the frame layout sounds
a bit scary. Is it really necessary?

I’m wondering if you can treat the cleanups and filter functions as portions of
the same function, instead of outlining them to separate functions. Can you
arrange to set up the base pointer on entry to one of those segments of code to
have the same value as when running the normal part of the function? If so, from
the code-gen point of view, doesn’t it just behave as if there is a large
dynamic alloca on the stack at that point (because the stack pointer is not
where it was when the function was previously running)? Are there other
constraints that prevent that from working?

The "big dynamic alloca" approach does work, at least conceptually.
It's more or less what MSVC does. They emit the normal code, then the
epilogue, then a special prologue that resets ebp/rbp, and then continue with
normal emission. Any local variables declared in the __except block are
allocated in the parent frame and are accessed via ebp. Any calls create new
stack adjustments to new allocate argument memory.

This approach sounds far scarier to me, personally, and will significantly
complicate a part of LLVM that is already poorly understood and hard to hack on.
I think adding a pair of intrinsics that can't be inlined will be far less
disruptive for the rest of LLVM. This is actually already the status quo for
SjLj exceptions, which introduce a number of uninlinable intrinsic calls
(although maybe SjLj is a bad precedent :).

The way I see it, it's just a question of how much frame layout information
you want to teach CodeGen to save. If we add the set_capture_block /
get_capture_block intrinsics, then we only need to save the frame offset of
*one* alloca. This is easy, we can throw it into a side table on
MachineModuleInfo. If we don't go this way, we need to save just the right
amount of CodeGen state to get stack offsets in some other function.

This is the only part that concerns me. Who keeps track of the layout of the
data inside that capture block? How do you know what local variables need to be
in the capture block? If the front-end needs to decide that, is that something
that fits easily into how clang works?

The capture block would be a boring old LLVM struct with a type created during
CodeGenPrepare.

I'm imagining a pass similar to SjLjEHPrepare that:
- Identifies all bbs reachable from landing pads
- Identifies all SSA values live in those bbs
- Demote all non-alloca SSA values to allocas (DemoteRegToMem, like sjlj)
- Combine all allocas used in landing pad bbs into a single LLVM alloca with a
new combined struct type
- Outline code from landing pads into cleanup handlers, filters, catch handlers,
etc
- In the parent function entry block, call @llvm.eh.seh.set_capture_block on the
combined alloca
- In the outlined entry blocks, call @llvm.eh.seh.get_capture_block(@parent_fn,
i8* %rbp) to recover a pointer to the capture block. Cast it to a pointer to the
right type.
- Finally, RAUW all alloca references with GEPs into the capture block

The downside is that this approach probably hurts register allocation and stack
coloring, but I think it's a reasonable tradeoff.

Thanks for prompting me on this, it helps to write things down like this. :)

No problem. Now that I see the details of what you have in mind, I can’t think
of any reason why that wouldn’t work, and I like the way it isolates most of the
impact of SEH into one new pass. Also, if the performance impact turns out to be
worse than expected, I don’t see anything here that would prevent moving to the
“big dynamic alloca” approach later.



For DWARF EH and SjLj, the backend is responsible for handling most of the EH
work. It seems like it would be a more consistent design for SEH to do the same.

Yep. I guess the question is, is CodeGenPrep the backend or not?

Yes, CGP is definitely backend. I thought you were going to say that the
front-end needed to decide what goes in the capture block.


Having a single combined MachineFunction also means that MI passes will have to
learn more about SEH. For example, we need to preserve the ordering of basic
blocks so that we don't end up with discontiguous regions of code.

Yes, you would probably need to do that. It doesn’t seem like that would be
fundamentally difficult, but I haven’t thought through the details and I can
imagine that it would take a fair bit of work.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141119/4930d039/attachment.html>

Reid Kleckner

2014-Nov-19 02:43 UTC

head link

[LLVMdev] RFC: How to represent SEH (try / except) in LLVM IR

On Tue, Nov 18, 2014 at 5:52 PM, Kaylor, Andrew <andrew.kaylor at
intel.com>
wrote:
>  > For DWARF EH and SjLj, the backend is responsible for handling most
of
> the EH work. It seems like it would be a more consistent design for SEH to
> do the same.
>
>
>
> Looking beyond SEH to C++ exception handling for a moment, it seems to me
> that clang may be handling more than it should there.  For instance, calls
> like “__cxa_allocate_exception” and “__cxa_throw_exception” are baked into
> the clang IR output, which seems to assume that the backend is going to be
> using libc++abi for its implementation.  Yet it has enough awareness that
> this won’t always be true that it coughs up an ErrorUnsupported failure for
> “isWindowsMSVCEnvironment” targets when asked to emit code for “try” or
> “throw”.
>
>
>
> Should this be generalized with intrinsics?
>
We should just teach Clang to emit calls to the appropriate runtime
functions. This isn't needed for SEH because you don't
"throw", you just
crash.

> Also, I’m starting to dig into the outlining implementation and there are
> some things there that worry me.  I haven’t compared any existing code that
> might be doing similar things, so maybe these issues will become clear as I
> get further into it, but it seemed worth bringing it up now to smooth the
> progress.  I’m trying to put together a general algorithm that starts at
> the landing pad instruction and groups the subsequent instructions as
> cleanup code or parts of catch handlers.  This is easy enough to do as a
> human reading the code, but the way that I’m doing so seems to rely fairly
> heavily on the names of symbols and labels.
>
Look at lib/Transforms/Utils/CloneFunction.cpp. Most of that code should be
factored appropriately and reused. It uses a ValueMapping that we should be
able to apply to the landing pad instruction to map the ehselector.slot to
a constant, and propagating that through.

> For instance, following the landingpad instruction I expect to find an
> extract and store of “exn.slot” and “ehselector.slot” then everything
> between that and wherever the catch dispatch begins must be (I think)
> cleanup code.  The catch handlers I’m identifying as a sequence that starts
> with a load of “exn.slot” and a call to __cxa_begin_catch and continues
> until it reaches a call to __cxa_end_catch.
>
I think we'll have to intrinsic-ify __cxa_end_catch when targeting
*-windows-msvc to get this right. If we don't, exception rethrows will
probably not work. We don't really need an equivalent of __cxa_begin_catch
because there's no thread-local EH state to update, it's already managed
by
the caller of the catch handler.

> The calls to begin/end catch are pretty convenient bookends, but
> identifying the catch dispatch code and pairing catch handlers with the
> clauses they represent seems to depend on recognizing the pattern of
> loading the ehselector, getting a typeid then comparing and branching.  I
> suppose that will work, but it feels a bit brittle.  Then there’s the
> cleanup code, which I’m not yet convinced has a consistent location
> relative to the catch dispatching and I fear may be moved around by various
> optimizations before the outlining and will potentially be partially shared
> with cleanup for other landing pads.
>
We either have to pattern match the selector == typeid pattern in the EH
preparation pass, or come up with a new representation. I'm hesitant to add
a new EH representation that only MSVC compatible EH uses, because it will
probably trip up existing optimizations. I was hoping that something like
the pruning logic in "llvm::CloneAndPruneFunctionInto" would allow us
to
prune the selector comparison branches reliably.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141118/ec8e42c8/attachment.html>

llvm dev - Nov 2014 - [LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR

[LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR

[LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR

[LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR

llvm dev - Nov 2014 - [LLVMdev] RFC: How to represent SEH (try / except) in LLVM IR

[LLVMdev] RFC: How to represent SEH (try / except) in LLVM IR

[LLVMdev] RFC: How to represent SEH (try / except) in LLVM IR

[LLVMdev] RFC: How to represent SEH (try / except) in LLVM IR