thr3ads.net - llvm dev - [llvm-dev] Potential missed optimisation with SEH funclets [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Hamza Sood via llvm-dev

2019-Jun-25 11:08 UTC

[llvm-dev] Potential missed optimisation with SEH funclets

I’ve been experimenting with SEH handling in LLVM, and it seems like the unwind
funclets generated by LLVM are much larger than those generated by Microsoft’s
CL compiler.

I used the following code as a test:

void test() {
  MyClass x;
  externalFunction();
}

Compiling with CL, the unwind funclet that destroys ‘x’ is just two lines of
asm:

lea rcx, QWORD PTR x$[rdx]
jmp ??1MyClass@@QEAA at XZ

However when compiling with clang-cl, it seems like it sets up an entire
function frame just for the destructor call:

mov qword ptr [rsp + 16], rdx
push rbp
.seh_pushreg 5
sub rsp, 32
.seh_stackalloc 32
Lea rbp, [rdx + 48]
.seh_endprologue
Lea rcx, [rbp - 16]
call "??1MyClass@@QEAA at XZ”
nop
add rsp, 32
pop rbp
ret

Both were compiled with “/c /O2 /MD /EHsc”

Is LLVM missing a major optimisation here?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190625/1bfc0cd5/attachment.html>

Reid Kleckner via llvm-dev

2019-Jun-26 20:17 UTC

head link

[llvm-dev] Potential missed optimisation with SEH funclets

Yes, not much effort has been applied to optimizing Windows exception
handling. We were primarily concerned with making it correct, and improving
it hasn't been a priority. You can follow the code path through
X86FrameLowering::emitPrologue with IsFunclet=true and see that it
mechanically emits all the extra instructions mentioned above without any
logic to skip such steps when not necessary.

However, while the mid-level representation we chose makes it hard to write
these types of micro-level code quality optimizations, it allows the
optimizers to do a variety of fancy things like heap to stack promotion on
unique_ptr in the presence of exceptional control flow.

On Tue, Jun 25, 2019 at 4:08 AM Hamza Sood via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I’ve been experimenting with SEH handling in LLVM, and it seems like the
> unwind funclets generated by LLVM are much larger than those generated by
> Microsoft’s CL compiler.
>
> I used the following code as a test:
>
> void test() {
>   MyClass x;
>   externalFunction();
> }
>
> Compiling with CL, the unwind funclet that destroys ‘x’ is just two lines
> of asm:
>
> lea rcx, QWORD PTR x$[rdx]
> jmp ??1MyClass@@QEAA at XZ
>
> However when compiling with clang-cl, it seems like it sets up an entire
> function frame just for the destructor call:
>
> mov qword ptr [rsp + 16], rdx
> push rbp
> .seh_pushreg 5
> sub rsp, 32
> .seh_stackalloc 32
> Lea rbp, [rdx + 48]
> .seh_endprologue
> Lea rcx, [rbp - 16]
> call "??1MyClass@@QEAA at XZ”
> nop
> add rsp, 32
> pop rbp
> ret
>
> Both were compiled with “/c /O2 /MD /EHsc”
>
> Is LLVM missing a major optimisation here?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190626/3027fb46/attachment.html>

David Chisnall via llvm-dev

2019-Jun-27 12:04 UTC

head link

[llvm-dev] Potential missed optimisation with SEH funclets

A quick skim of this code looks as if we are explicitly disabling frame 
pointer elimination for funclets in the back end.  It looks as if this 
is done because FP-elim sometimes breaks funclets - if anyone has a test 
case for this then that would probably help tracking it down.

David

On 26/06/2019 21:17, Reid Kleckner via llvm-dev wrote:> Yes, not much effort has been applied to optimizing Windows exception 
> handling. We were primarily concerned with making it correct, and 
> improving it hasn't been a priority. You can follow the code path 
> through X86FrameLowering::emitPrologue with IsFunclet=true and see that 
> it mechanically emits all the extra instructions mentioned above without 
> any logic to skip such steps when not necessary.
> 
> However, while the mid-level representation we chose makes it hard to 
> write these types of micro-level code quality optimizations, it allows 
> the optimizers to do a variety of fancy things like heap to stack 
> promotion on unique_ptr in the presence of exceptional control flow.
> 
> On Tue, Jun 25, 2019 at 4:08 AM Hamza Sood via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> 
>     I’ve been experimenting with SEH handling in LLVM, and it seems like
>     the unwind funclets generated by LLVM are much larger than those
>     generated by Microsoft’s CL compiler.
> 
>     I used the following code as a test:
> 
>     void test() {
>        MyClass x;
>        externalFunction();
>     }
> 
>     Compiling with CL, the unwind funclet that destroys ‘x’ is just two
>     lines of asm:
> 
>     lea rcx, QWORD PTR x$[rdx]
>     jmp ??1MyClass@@QEAA at XZ
> 
>     However when compiling with clang-cl, it seems like it sets up an
>     entire function frame just for the destructor call:
> 
>     mov qword ptr [rsp + 16], rdx
>     push rbp
>     .seh_pushreg 5
>     sub rsp, 32
>     .seh_stackalloc 32
>     Lea rbp, [rdx + 48]
>     .seh_endprologue
>     Lea rcx, [rbp - 16]
>     call "??1MyClass@@QEAA at XZ”
>     nop
>     add rsp, 32
>     pop rbp
>     ret
> 
>     Both were compiled with “/c /O2 /MD /EHsc”
> 
>     Is LLVM missing a major optimisation here?
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Hamza Sood via llvm-dev

2019-Jun-27 18:39 UTC

head link

[llvm-dev] Potential missed optimisation with SEH funclets

I’d like to work on improving this, and I’ve got a few ideas thanks to your
pointers. However there’s one issue that I can’t seem to work out.

The funclets are treated as save and restore blocks for the associated function,
which means that they’ll push/pop every callee saved register that the
associated function uses, even if the funclets themselves don’t use them. I
tried fixing this with some custom logic in
X86FrameLowering::[spill/restore]CalleeSavedRegisters, but I couldn’t find a
good way to determine which registers the block for the funclet actually use
(without iterating over each instruction).

Is there a better way to approach this?
> On 26 Jun 2019, at 21:17, Reid Kleckner <rnk at google.com> wrote:
> 
> 
> Yes, not much effort has been applied to optimizing Windows exception
handling. We were primarily concerned with making it correct, and improving it
hasn't been a priority. You can follow the code path through
X86FrameLowering::emitPrologue with IsFunclet=true and see that it mechanically
emits all the extra instructions mentioned above without any logic to skip such
steps when not necessary.
> 
> However, while the mid-level representation we chose makes it hard to write
these types of micro-level code quality optimizations, it allows the optimizers
to do a variety of fancy things like heap to stack promotion on unique_ptr in
the presence of exceptional control flow.
> 
>> On Tue, Jun 25, 2019 at 4:08 AM Hamza Sood via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>> I’ve been experimenting with SEH handling in LLVM, and it seems like
the unwind funclets generated by LLVM are much larger than those generated by
Microsoft’s CL compiler.
>> 
>> I used the following code as a test:
>> 
>> void test() {
>>   MyClass x;
>>   externalFunction();
>> }
>> 
>> Compiling with CL, the unwind funclet that destroys ‘x’ is just two
lines of asm:
>> 
>> lea rcx, QWORD PTR x$[rdx]
>> jmp ??1MyClass@@QEAA at XZ
>> 
>> However when compiling with clang-cl, it seems like it sets up an
entire function frame just for the destructor call:
>> 
>> mov qword ptr [rsp + 16], rdx
>> push rbp
>> .seh_pushreg 5
>> sub rsp, 32
>> .seh_stackalloc 32
>> Lea rbp, [rdx + 48]
>> .seh_endprologue
>> Lea rcx, [rbp - 16]
>> call "??1MyClass@@QEAA at XZ”
>> nop
>> add rsp, 32
>> pop rbp
>> ret
>> 
>> Both were compiled with “/c /O2 /MD /EHsc”
>> 
>> Is LLVM missing a major optimisation here?
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190627/740668ae/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Jun 2019 - Potential missed optimisation with SEH funclets

[llvm-dev] Potential missed optimisation with SEH funclets

[llvm-dev] Potential missed optimisation with SEH funclets

[llvm-dev] Potential missed optimisation with SEH funclets

[llvm-dev] Potential missed optimisation with SEH funclets

Apparently Analagous Threads