Hamza Sood via llvm-dev
2019-Jun-25 11:08 UTC
[llvm-dev] Potential missed optimisation with SEH funclets
I’ve been experimenting with SEH handling in LLVM, and it seems like the unwind funclets generated by LLVM are much larger than those generated by Microsoft’s CL compiler. I used the following code as a test: void test() { MyClass x; externalFunction(); } Compiling with CL, the unwind funclet that destroys ‘x’ is just two lines of asm: lea rcx, QWORD PTR x$[rdx] jmp ??1MyClass@@QEAA at XZ However when compiling with clang-cl, it seems like it sets up an entire function frame just for the destructor call: mov qword ptr [rsp + 16], rdx push rbp .seh_pushreg 5 sub rsp, 32 .seh_stackalloc 32 Lea rbp, [rdx + 48] .seh_endprologue Lea rcx, [rbp - 16] call "??1MyClass@@QEAA at XZ” nop add rsp, 32 pop rbp ret Both were compiled with “/c /O2 /MD /EHsc” Is LLVM missing a major optimisation here? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190625/1bfc0cd5/attachment.html>
Reid Kleckner via llvm-dev
2019-Jun-26 20:17 UTC
[llvm-dev] Potential missed optimisation with SEH funclets
Yes, not much effort has been applied to optimizing Windows exception handling. We were primarily concerned with making it correct, and improving it hasn't been a priority. You can follow the code path through X86FrameLowering::emitPrologue with IsFunclet=true and see that it mechanically emits all the extra instructions mentioned above without any logic to skip such steps when not necessary. However, while the mid-level representation we chose makes it hard to write these types of micro-level code quality optimizations, it allows the optimizers to do a variety of fancy things like heap to stack promotion on unique_ptr in the presence of exceptional control flow. On Tue, Jun 25, 2019 at 4:08 AM Hamza Sood via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I’ve been experimenting with SEH handling in LLVM, and it seems like the > unwind funclets generated by LLVM are much larger than those generated by > Microsoft’s CL compiler. > > I used the following code as a test: > > void test() { > MyClass x; > externalFunction(); > } > > Compiling with CL, the unwind funclet that destroys ‘x’ is just two lines > of asm: > > lea rcx, QWORD PTR x$[rdx] > jmp ??1MyClass@@QEAA at XZ > > However when compiling with clang-cl, it seems like it sets up an entire > function frame just for the destructor call: > > mov qword ptr [rsp + 16], rdx > push rbp > .seh_pushreg 5 > sub rsp, 32 > .seh_stackalloc 32 > Lea rbp, [rdx + 48] > .seh_endprologue > Lea rcx, [rbp - 16] > call "??1MyClass@@QEAA at XZ” > nop > add rsp, 32 > pop rbp > ret > > Both were compiled with “/c /O2 /MD /EHsc” > > Is LLVM missing a major optimisation here? > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190626/3027fb46/attachment.html>
David Chisnall via llvm-dev
2019-Jun-27 12:04 UTC
[llvm-dev] Potential missed optimisation with SEH funclets
A quick skim of this code looks as if we are explicitly disabling frame pointer elimination for funclets in the back end. It looks as if this is done because FP-elim sometimes breaks funclets - if anyone has a test case for this then that would probably help tracking it down. David On 26/06/2019 21:17, Reid Kleckner via llvm-dev wrote:> Yes, not much effort has been applied to optimizing Windows exception > handling. We were primarily concerned with making it correct, and > improving it hasn't been a priority. You can follow the code path > through X86FrameLowering::emitPrologue with IsFunclet=true and see that > it mechanically emits all the extra instructions mentioned above without > any logic to skip such steps when not necessary. > > However, while the mid-level representation we chose makes it hard to > write these types of micro-level code quality optimizations, it allows > the optimizers to do a variety of fancy things like heap to stack > promotion on unique_ptr in the presence of exceptional control flow. > > On Tue, Jun 25, 2019 at 4:08 AM Hamza Sood via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > I’ve been experimenting with SEH handling in LLVM, and it seems like > the unwind funclets generated by LLVM are much larger than those > generated by Microsoft’s CL compiler. > > I used the following code as a test: > > void test() { > MyClass x; > externalFunction(); > } > > Compiling with CL, the unwind funclet that destroys ‘x’ is just two > lines of asm: > > lea rcx, QWORD PTR x$[rdx] > jmp ??1MyClass@@QEAA at XZ > > However when compiling with clang-cl, it seems like it sets up an > entire function frame just for the destructor call: > > mov qword ptr [rsp + 16], rdx > push rbp > .seh_pushreg 5 > sub rsp, 32 > .seh_stackalloc 32 > Lea rbp, [rdx + 48] > .seh_endprologue > Lea rcx, [rbp - 16] > call "??1MyClass@@QEAA at XZ” > nop > add rsp, 32 > pop rbp > ret > > Both were compiled with “/c /O2 /MD /EHsc” > > Is LLVM missing a major optimisation here? > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Hamza Sood via llvm-dev
2019-Jun-27 18:39 UTC
[llvm-dev] Potential missed optimisation with SEH funclets
I’d like to work on improving this, and I’ve got a few ideas thanks to your pointers. However there’s one issue that I can’t seem to work out. The funclets are treated as save and restore blocks for the associated function, which means that they’ll push/pop every callee saved register that the associated function uses, even if the funclets themselves don’t use them. I tried fixing this with some custom logic in X86FrameLowering::[spill/restore]CalleeSavedRegisters, but I couldn’t find a good way to determine which registers the block for the funclet actually use (without iterating over each instruction). Is there a better way to approach this?> On 26 Jun 2019, at 21:17, Reid Kleckner <rnk at google.com> wrote: > > > Yes, not much effort has been applied to optimizing Windows exception handling. We were primarily concerned with making it correct, and improving it hasn't been a priority. You can follow the code path through X86FrameLowering::emitPrologue with IsFunclet=true and see that it mechanically emits all the extra instructions mentioned above without any logic to skip such steps when not necessary. > > However, while the mid-level representation we chose makes it hard to write these types of micro-level code quality optimizations, it allows the optimizers to do a variety of fancy things like heap to stack promotion on unique_ptr in the presence of exceptional control flow. > >> On Tue, Jun 25, 2019 at 4:08 AM Hamza Sood via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> I’ve been experimenting with SEH handling in LLVM, and it seems like the unwind funclets generated by LLVM are much larger than those generated by Microsoft’s CL compiler. >> >> I used the following code as a test: >> >> void test() { >> MyClass x; >> externalFunction(); >> } >> >> Compiling with CL, the unwind funclet that destroys ‘x’ is just two lines of asm: >> >> lea rcx, QWORD PTR x$[rdx] >> jmp ??1MyClass@@QEAA at XZ >> >> However when compiling with clang-cl, it seems like it sets up an entire function frame just for the destructor call: >> >> mov qword ptr [rsp + 16], rdx >> push rbp >> .seh_pushreg 5 >> sub rsp, 32 >> .seh_stackalloc 32 >> Lea rbp, [rdx + 48] >> .seh_endprologue >> Lea rcx, [rbp - 16] >> call "??1MyClass@@QEAA at XZ” >> nop >> add rsp, 32 >> pop rbp >> ret >> >> Both were compiled with “/c /O2 /MD /EHsc” >> >> Is LLVM missing a major optimisation here? >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190627/740668ae/attachment.html>
Maybe Matching Threads
- What does a dead register mean?
- How to call an (x86) cleanup/catchpad funclet
- LLVM SEH docs -- enregistration of locals in nonvolatile registers?
- Incorrect code generation when using -fprofile-generate on code which contains exception handling (Windows target)
- Incorrect code generation when using -fprofile-generate on code which contains exception handling (Windows target)