Gao, Yunzhong
2013-Dec-19 19:31 UTC
[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
Hi all, I would like to find out whether anyone will find it useful to add an x86- specific calling convention for reducing emission of vzeroupper instructions. Current implementation: vzeroupper is inserted to any functions that use AVX instructions. The insertion points are: 1) before a call instruction; 2) before a return instruction; Background: vzeroupper is an AVX instruction; it is inserted to avoid performance penalty when transitioning between x86 AVX mode and legacy SSE mode, e.g., when an AVX function calls a SSE function. However, vzeroupper is a slow instruction; it adds to register pressure and hurts performance for AVX-to-AVX calls. My proposal: 1) (LLVM part) Add an x86-specific calling convention to the LLVM IR which specifies that an external function will be compiled with AVX support and its function definition does not use any legacy SSE instructions, e.g., declare x86_avxcc i32 @foo() 2) (Clang part) Add a function attribute to the clang front-end which specifies this calling convention, e.g., extern int foo() __attribute__((avx)); Function definitions in a translation unit compiled with -mavx architecture will implicitly have this attribute. Benefits: No vzeroupper is needed before calling a function with this avx attribute, e.g., extern int foo() __attribute__((avx)); void bar() { ... // some AVX instruction ... // no vzeroupper is needed before the call instruction foo(); ... // still needs a vzeroupper before the return instruction } Reference: A few months ago, I submitted a proposal for improving vzeroupper optimization strategy by changing the default code-emission strategy. The proposal was rejected on the ground that it would cause problems for existing operating systems. http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-September/065720.html -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131219/8b3d810e/attachment.html>
Rafael Espíndola
2013-Dec-19 20:14 UTC
[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
On 19 December 2013 14:31, Gao, Yunzhong <yunzhong_gao at playstation.sony.com> wrote:> Hi all, > > > > I would like to find out whether anyone will find it useful to add an x86- > > specific calling convention for reducing emission of vzeroupper > instructions. > > > > Current implementation: > > vzeroupper is inserted to any functions that use AVX instructions. The > > insertion points are: > > 1) before a call instruction; > > 2) before a return instruction; > > > > Background: > > vzeroupper is an AVX instruction; it is inserted to avoid performance > penalty > > when transitioning between x86 AVX mode and legacy SSE mode, e.g., when an > > AVX function calls a SSE function. However, vzeroupper is a slow > instruction; it > > adds to register pressure and hurts performance for AVX-to-AVX calls. > > > > My proposal: > > 1) (LLVM part) Add an x86-specific calling convention to the LLVM IR which > > specifies that an external function will be compiled with AVX support and > its > > function definition does not use any legacy SSE instructions, e.g., > > declare x86_avxcc i32 @foo()I would suggest using metadata instead. The reasons are: * It could be applied to functions with different calling conventions. For example, on windows we would probably want to do this to thiscall (methods) too. * It the metadata is dropped, we would just produced slower but still correct code (calls vzeroupper). Cheers, Rafael
Reid Kleckner
2013-Dec-19 20:36 UTC
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
On Thu, Dec 19, 2013 at 12:14 PM, Rafael Espíndola < rafael.espindola at gmail.com> wrote:> On 19 December 2013 14:31, Gao, Yunzhong > <yunzhong_gao at playstation.sony.com> wrote: > > Hi all, > > > > > > > > I would like to find out whether anyone will find it useful to add an > x86- > > > > specific calling convention for reducing emission of vzeroupper > > instructions. > > > > > > > > Current implementation: > > > > vzeroupper is inserted to any functions that use AVX instructions. The > > > > insertion points are: > > > > 1) before a call instruction; > > > > 2) before a return instruction; > > > > > > > > Background: > > > > vzeroupper is an AVX instruction; it is inserted to avoid performance > > penalty > > > > when transitioning between x86 AVX mode and legacy SSE mode, e.g., when > an > > > > AVX function calls a SSE function. However, vzeroupper is a slow > > instruction; it > > > > adds to register pressure and hurts performance for AVX-to-AVX calls. > > > > > > > > My proposal: > > > > 1) (LLVM part) Add an x86-specific calling convention to the LLVM IR > which > > > > specifies that an external function will be compiled with AVX support and > > its > > > > function definition does not use any legacy SSE instructions, e.g., > > > > declare x86_avxcc i32 @foo() > > I would suggest using metadata instead. The reasons are: > > * It could be applied to functions with different calling conventions. > For example, on windows we would probably want to do this to thiscall > (methods) too. > * It the metadata is dropped, we would just produced slower but still > correct code (calls vzeroupper). >Maybe a target-specific attribute instead? It would still apply to all CCs, but would never be dropped. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131219/83099eb2/attachment.html>
Aaron Ballman
2013-Dec-21 17:56 UTC
[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
On Thu, Dec 19, 2013 at 2:31 PM, Gao, Yunzhong <yunzhong_gao at playstation.sony.com> wrote:> Hi all, > > > > I would like to find out whether anyone will find it useful to add an x86- > > specific calling convention for reducing emission of vzeroupper > instructions. > > > > Current implementation: > > vzeroupper is inserted to any functions that use AVX instructions. The > > insertion points are: > > 1) before a call instruction; > > 2) before a return instruction; > > > > Background: > > vzeroupper is an AVX instruction; it is inserted to avoid performance > penalty > > when transitioning between x86 AVX mode and legacy SSE mode, e.g., when an > > AVX function calls a SSE function. However, vzeroupper is a slow > instruction; it > > adds to register pressure and hurts performance for AVX-to-AVX calls. > > > > My proposal: > > 1) (LLVM part) Add an x86-specific calling convention to the LLVM IR which > > specifies that an external function will be compiled with AVX support and > its > > function definition does not use any legacy SSE instructions, e.g., > > declare x86_avxcc i32 @foo() > > > > 2) (Clang part) Add a function attribute to the clang front-end which > specifies > > this calling convention, e.g., > > extern int foo() __attribute__((avx));In general, I'm not too keen on adding more calling conventions unless there's a really powerful need for one from an ABI perspective. This sounds more like an optimization than an ABI need. What's more, I worry (a little bit) about confusion that could be caused with the __vectorcall calling convention (which we do not currently support, but will need to at some point for MSVC compatibility). What should happen with this code? int foo() __attribute__((avx)); void bar(int (*fp)()) { int i = fp(); } void baz(void) { bar(foo); } Based on your description, this code is valid, but not as performant as it could be. The vzeroupper would be inserted before fp() is called, but there's no incompatibility happening. So I guess this feels more like a regular function attribute than a calling convention.> > Function definitions in a translation unit compiled with -mavx architecture > will > > implicitly have this attribute.Can you safely do that? What about code that does uses inline assembly to use legacy SSE instructions in a TU compiled with -mavx, for instance? ~Aaron
Rafael Espíndola
2013-Dec-24 12:50 UTC
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
> In general, I'm not too keen on adding more calling conventions unless > there's a really powerful need for one from an ABI perspective. This > sounds more like an optimization than an ABI need.I think that is the case.> What's more, I > worry (a little bit) about confusion that could be caused with the > __vectorcall calling convention (which we do not currently support, > but will need to at some point for MSVC compatibility).What does the __vectorcall does?> What should happen with this code? > > int foo() __attribute__((avx)); > > void bar(int (*fp)()) { > int i = fp(); > } > > void baz(void) { > bar(foo); > } > > Based on your description, this code is valid, but not as performant > as it could be. The vzeroupper would be inserted before fp() is > called, but there's no incompatibility happening. So I guess this > feels more like a regular function attribute than a calling > convention.It is not a calling convention. The issue is more if it is a type or a decl attribute. Given that putting the attributes on the function decls is the simplest and should cover most of the cases, I think we can probably start with that and revisit if we still see too many vzeroupper being inserted. What do you think?>> >> Function definitions in a translation unit compiled with -mavx architecture >> will >> >> implicitly have this attribute. > > Can you safely do that? What about code that does uses inline assembly > to use legacy SSE instructions in a TU compiled with -mavx, for > instance?I think it would take a performance penalty, but I don't expect that to be common. Cheers, Rafael
Possibly Parallel Threads
- [LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
- [LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
- [LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
- [LLVMdev] Proposal to improve vzeroupper optimization strategy
- [LLVMdev] Proposal to improve vzeroupper optimization strategy