Bob Wilson
2014-Nov-18 18:50 UTC
[LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR
> On Nov 17, 2014, at 5:50 PM, Reid Kleckner <rnk at google.com> wrote: > > On Mon, Nov 17, 2014 at 5:22 PM, Bob Wilson <bob.wilson at apple.com <mailto:bob.wilson at apple.com>> wrote: > I don’t know much about SEH and haven’t had time to really dig into this, but the idea of outlining functions that need to know about the frame layout sounds a bit scary. Is it really necessary? > > I’m wondering if you can treat the cleanups and filter functions as portions of the same function, instead of outlining them to separate functions. Can you arrange to set up the base pointer on entry to one of those segments of code to have the same value as when running the normal part of the function? If so, from the code-gen point of view, doesn’t it just behave as if there is a large dynamic alloca on the stack at that point (because the stack pointer is not where it was when the function was previously running)? Are there other constraints that prevent that from working? > > The "big dynamic alloca" approach does work, at least conceptually. It's more or less what MSVC does. They emit the normal code, then the epilogue, then a special prologue that resets ebp/rbp, and then continue with normal emission. Any local variables declared in the __except block are allocated in the parent frame and are accessed via ebp. Any calls create new stack adjustments to new allocate argument memory. > > This approach sounds far scarier to me, personally, and will significantly complicate a part of LLVM that is already poorly understood and hard to hack on. I think adding a pair of intrinsics that can't be inlined will be far less disruptive for the rest of LLVM. This is actually already the status quo for SjLj exceptions, which introduce a number of uninlinable intrinsic calls (although maybe SjLj is a bad precedent :). > > The way I see it, it's just a question of how much frame layout information you want to teach CodeGen to save. If we add the set_capture_block / get_capture_block intrinsics, then we only need to save the frame offset of *one* alloca. This is easy, we can throw it into a side table on MachineModuleInfo. If we don't go this way, we need to save just the right amount of CodeGen state to get stack offsets in some other function.This is the only part that concerns me. Who keeps track of the layout of the data inside that capture block? How do you know what local variables need to be in the capture block? If the front-end needs to decide that, is that something that fits easily into how clang works? For DWARF EH and SjLj, the backend is responsible for handling most of the EH work. It seems like it would be a more consistent design for SEH to do the same.> > Having a single combined MachineFunction also means that MI passes will have to learn more about SEH. For example, we need to preserve the ordering of basic blocks so that we don't end up with discontiguous regions of code.Yes, you would probably need to do that. It doesn’t seem like that would be fundamentally difficult, but I haven’t thought through the details and I can imagine that it would take a fair bit of work. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141118/c63f288e/attachment.html>
Reid Kleckner
2014-Nov-18 19:07 UTC
[LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR
On Tue, Nov 18, 2014 at 10:50 AM, Bob Wilson <bob.wilson at apple.com> wrote:> > On Nov 17, 2014, at 5:50 PM, Reid Kleckner <rnk at google.com> wrote: > > On Mon, Nov 17, 2014 at 5:22 PM, Bob Wilson <bob.wilson at apple.com> wrote: > >> I don’t know much about SEH and haven’t had time to really dig into this, >> but the idea of outlining functions that need to know about the frame >> layout sounds a bit scary. Is it really necessary? >> >> I’m wondering if you can treat the cleanups and filter functions as >> portions of the same function, instead of outlining them to separate >> functions. Can you arrange to set up the base pointer on entry to one of >> those segments of code to have the same value as when running the normal >> part of the function? If so, from the code-gen point of view, doesn’t it >> just behave as if there is a large dynamic alloca on the stack at that >> point (because the stack pointer is not where it was when the function was >> previously running)? Are there other constraints that prevent that from >> working? >> > > The "big dynamic alloca" approach does work, at least conceptually. It's > more or less what MSVC does. They emit the normal code, then the epilogue, > then a special prologue that resets ebp/rbp, and then continue with normal > emission. Any local variables declared in the __except block are allocated > in the parent frame and are accessed via ebp. Any calls create new stack > adjustments to new allocate argument memory. > > This approach sounds far scarier to me, personally, and will significantly > complicate a part of LLVM that is already poorly understood and hard to > hack on. I think adding a pair of intrinsics that can't be inlined will be > far less disruptive for the rest of LLVM. This is actually already the > status quo for SjLj exceptions, which introduce a number of uninlinable > intrinsic calls (although maybe SjLj is a bad precedent :). > > The way I see it, it's just a question of how much frame layout > information you want to teach CodeGen to save. If we add the > set_capture_block / get_capture_block intrinsics, then we only need to save > the frame offset of *one* alloca. This is easy, we can throw it into a side > table on MachineModuleInfo. If we don't go this way, we need to save just > the right amount of CodeGen state to get stack offsets in some other > function. > > > This is the only part that concerns me. Who keeps track of the layout of > the data inside that capture block? How do you know what local variables > need to be in the capture block? If the front-end needs to decide that, is > that something that fits easily into how clang works? >The capture block would be a boring old LLVM struct with a type created during CodeGenPrepare. I'm imagining a pass similar to SjLjEHPrepare that: - Identifies all bbs reachable from landing pads - Identifies all SSA values live in those bbs - Demote all non-alloca SSA values to allocas (DemoteRegToMem, like sjlj) - Combine all allocas used in landing pad bbs into a single LLVM alloca with a new combined struct type - Outline code from landing pads into cleanup handlers, filters, catch handlers, etc - In the parent function entry block, call @llvm.eh.seh.set_capture_block on the combined alloca - In the outlined entry blocks, call @llvm.eh.seh.get_capture_block(@parent_fn, i8* %rbp) to recover a pointer to the capture block. Cast it to a pointer to the right type. - Finally, RAUW all alloca references with GEPs into the capture block The downside is that this approach probably hurts register allocation and stack coloring, but I think it's a reasonable tradeoff. Thanks for prompting me on this, it helps to write things down like this. :)> For DWARF EH and SjLj, the backend is responsible for handling most of the > EH work. It seems like it would be a more consistent design for SEH to do > the same. >Yep. I guess the question is, is CodeGenPrep the backend or not?> Having a single combined MachineFunction also means that MI passes will > have to learn more about SEH. For example, we need to preserve the ordering > of basic blocks so that we don't end up with discontiguous regions of code. > > > Yes, you would probably need to do that. It doesn’t seem like that would > be fundamentally difficult, but I haven’t thought through the details and I > can imagine that it would take a fair bit of work. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141118/47031846/attachment.html>
Bob Wilson
2014-Nov-18 19:19 UTC
[LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR
> On Nov 18, 2014, at 11:07 AM, Reid Kleckner <rnk at google.com> wrote: > > On Tue, Nov 18, 2014 at 10:50 AM, Bob Wilson <bob.wilson at apple.com <mailto:bob.wilson at apple.com>> wrote: > >> On Nov 17, 2014, at 5:50 PM, Reid Kleckner <rnk at google.com <mailto:rnk at google.com>> wrote: >> >> On Mon, Nov 17, 2014 at 5:22 PM, Bob Wilson <bob.wilson at apple.com <mailto:bob.wilson at apple.com>> wrote: >> I don’t know much about SEH and haven’t had time to really dig into this, but the idea of outlining functions that need to know about the frame layout sounds a bit scary. Is it really necessary? >> >> I’m wondering if you can treat the cleanups and filter functions as portions of the same function, instead of outlining them to separate functions. Can you arrange to set up the base pointer on entry to one of those segments of code to have the same value as when running the normal part of the function? If so, from the code-gen point of view, doesn’t it just behave as if there is a large dynamic alloca on the stack at that point (because the stack pointer is not where it was when the function was previously running)? Are there other constraints that prevent that from working? >> >> The "big dynamic alloca" approach does work, at least conceptually. It's more or less what MSVC does. They emit the normal code, then the epilogue, then a special prologue that resets ebp/rbp, and then continue with normal emission. Any local variables declared in the __except block are allocated in the parent frame and are accessed via ebp. Any calls create new stack adjustments to new allocate argument memory. >> >> This approach sounds far scarier to me, personally, and will significantly complicate a part of LLVM that is already poorly understood and hard to hack on. I think adding a pair of intrinsics that can't be inlined will be far less disruptive for the rest of LLVM. This is actually already the status quo for SjLj exceptions, which introduce a number of uninlinable intrinsic calls (although maybe SjLj is a bad precedent :). >> >> The way I see it, it's just a question of how much frame layout information you want to teach CodeGen to save. If we add the set_capture_block / get_capture_block intrinsics, then we only need to save the frame offset of *one* alloca. This is easy, we can throw it into a side table on MachineModuleInfo. If we don't go this way, we need to save just the right amount of CodeGen state to get stack offsets in some other function. > > This is the only part that concerns me. Who keeps track of the layout of the data inside that capture block? How do you know what local variables need to be in the capture block? If the front-end needs to decide that, is that something that fits easily into how clang works? > > The capture block would be a boring old LLVM struct with a type created during CodeGenPrepare. > > I'm imagining a pass similar to SjLjEHPrepare that: > - Identifies all bbs reachable from landing pads > - Identifies all SSA values live in those bbs > - Demote all non-alloca SSA values to allocas (DemoteRegToMem, like sjlj) > - Combine all allocas used in landing pad bbs into a single LLVM alloca with a new combined struct type > - Outline code from landing pads into cleanup handlers, filters, catch handlers, etc > - In the parent function entry block, call @llvm.eh.seh.set_capture_block on the combined alloca > - In the outlined entry blocks, call @llvm.eh.seh.get_capture_block(@parent_fn, i8* %rbp) to recover a pointer to the capture block. Cast it to a pointer to the right type. > - Finally, RAUW all alloca references with GEPs into the capture block > > The downside is that this approach probably hurts register allocation and stack coloring, but I think it's a reasonable tradeoff. > > Thanks for prompting me on this, it helps to write things down like this. :)No problem. Now that I see the details of what you have in mind, I can’t think of any reason why that wouldn’t work, and I like the way it isolates most of the impact of SEH into one new pass. Also, if the performance impact turns out to be worse than expected, I don’t see anything here that would prevent moving to the “big dynamic alloca” approach later.> > For DWARF EH and SjLj, the backend is responsible for handling most of the EH work. It seems like it would be a more consistent design for SEH to do the same. > > Yep. I guess the question is, is CodeGenPrep the backend or not?Yes, CGP is definitely backend. I thought you were going to say that the front-end needed to decide what goes in the capture block.>> Having a single combined MachineFunction also means that MI passes will have to learn more about SEH. For example, we need to preserve the ordering of basic blocks so that we don't end up with discontiguous regions of code. > > Yes, you would probably need to do that. It doesn’t seem like that would be fundamentally difficult, but I haven’t thought through the details and I can imagine that it would take a fair bit of work. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141118/600d31e4/attachment.html>