Rafael Espíndola
2013-Dec-24 12:50 UTC
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
> In general, I'm not too keen on adding more calling conventions unless > there's a really powerful need for one from an ABI perspective. This > sounds more like an optimization than an ABI need.I think that is the case.> What's more, I > worry (a little bit) about confusion that could be caused with the > __vectorcall calling convention (which we do not currently support, > but will need to at some point for MSVC compatibility).What does the __vectorcall does?> What should happen with this code? > > int foo() __attribute__((avx)); > > void bar(int (*fp)()) { > int i = fp(); > } > > void baz(void) { > bar(foo); > } > > Based on your description, this code is valid, but not as performant > as it could be. The vzeroupper would be inserted before fp() is > called, but there's no incompatibility happening. So I guess this > feels more like a regular function attribute than a calling > convention.It is not a calling convention. The issue is more if it is a type or a decl attribute. Given that putting the attributes on the function decls is the simplest and should cover most of the cases, I think we can probably start with that and revisit if we still see too many vzeroupper being inserted. What do you think?>> >> Function definitions in a translation unit compiled with -mavx architecture >> will >> >> implicitly have this attribute. > > Can you safely do that? What about code that does uses inline assembly > to use legacy SSE instructions in a TU compiled with -mavx, for > instance?I think it would take a performance penalty, but I don't expect that to be common. Cheers, Rafael
Aaron Ballman
2013-Dec-24 15:02 UTC
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
On Tue, Dec 24, 2013 at 7:50 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:>> In general, I'm not too keen on adding more calling conventions unless >> there's a really powerful need for one from an ABI perspective. This >> sounds more like an optimization than an ABI need. > > I think that is the case. > >> What's more, I >> worry (a little bit) about confusion that could be caused with the >> __vectorcall calling convention (which we do not currently support, >> but will need to at some point for MSVC compatibility). > > What does the __vectorcall does?http://msdn.microsoft.com/en-us/library/dn375768.aspx It's different than the proposed attribute, but still relates to SIMD instruction optimizations.> >> What should happen with this code? >> >> int foo() __attribute__((avx)); >> >> void bar(int (*fp)()) { >> int i = fp(); >> } >> >> void baz(void) { >> bar(foo); >> } >> >> Based on your description, this code is valid, but not as performant >> as it could be. The vzeroupper would be inserted before fp() is >> called, but there's no incompatibility happening. So I guess this >> feels more like a regular function attribute than a calling >> convention. > > It is not a calling convention. The issue is more if it is a type or a > decl attribute. Given that putting the attributes on the function > decls is the simplest and should cover most of the cases, I think we > can probably start with that and revisit if we still see too many > vzeroupper being inserted. What do you think?That seems reasonable to me.> >>> >>> Function definitions in a translation unit compiled with -mavx architecture >>> will >>> >>> implicitly have this attribute. >> >> Can you safely do that? What about code that does uses inline assembly >> to use legacy SSE instructions in a TU compiled with -mavx, for >> instance? > > I think it would take a performance penalty, but I don't expect that > to be common.Hmm, I was worried about the situation where: extern int foo(); // compiled without -mavx void bar() { // compiled in a TU with -mavx ... // no vzeroupper is inserted before the call instruction because it is // implicit due to -mavx foo(); ... } I'm not certain whether this sort of pattern could cause problems or not. If there's no way for it to be problematic, then implicitly attaching the attribute is reasonable enough. It does mean we're straying farther from the as-written attributes for the function, but that's just an unfortunate situation we're already in today and wouldn't block this feature. ~Aaron
Gao, Yunzhong
2014-Jan-08 00:14 UTC
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
> -----Original Message----- > From: aaron.ballman at gmail.com [mailto:aaron.ballman at gmail.com] On > Behalf Of Aaron Ballman > Sent: Tuesday, December 24, 2013 7:02 AM > To: Rafael Espíndola > Cc: Gao, Yunzhong; cfe-dev at cs.uiuc.edu Developers (cfe-dev at cs.uiuc.edu); > LLVM Developers Mailing List (llvmdev at cs.uiuc.edu) > Subject: Re: [cfe-dev] [LLVMdev] [Proposal] function attribute to reduce > emission of vzeroupper instructions > > On Tue, Dec 24, 2013 at 7:50 AM, Rafael Espíndola > <rafael.espindola at gmail.com> wrote: > >> In general, I'm not too keen on adding more calling conventions > >> unless there's a really powerful need for one from an ABI > >> perspective. This sounds more like an optimization than an ABI need. > > > > I think that is the case. > > > >> What's more, I > >> worry (a little bit) about confusion that could be caused with the > >> __vectorcall calling convention (which we do not currently support, > >> but will need to at some point for MSVC compatibility). > > > > What does the __vectorcall does? > > http://msdn.microsoft.com/en-us/library/dn375768.aspx > > It's different than the proposed attribute, but still relates to SIMD instruction > optimizations. > > > > >> What should happen with this code? > >> > >> int foo() __attribute__((avx)); > >> > >> void bar(int (*fp)()) { > >> int i = fp(); > >> } > >> > >> void baz(void) { > >> bar(foo); > >> } > >> > >> Based on your description, this code is valid, but not as performant > >> as it could be. The vzeroupper would be inserted before fp() is > >> called, but there's no incompatibility happening. So I guess this > >> feels more like a regular function attribute than a calling > >> convention. > > > > It is not a calling convention. The issue is more if it is a type or a > > decl attribute. Given that putting the attributes on the function > > decls is the simplest and should cover most of the cases, I think we > > can probably start with that and revisit if we still see too many > > vzeroupper being inserted. What do you think? > > That seems reasonable to me. > > > > >>> > >>> Function definitions in a translation unit compiled with -mavx > >>> architecture will > >>> > >>> implicitly have this attribute. > >> > >> Can you safely do that? What about code that does uses inline > >> assembly to use legacy SSE instructions in a TU compiled with -mavx, > >> for instance? > > > > I think it would take a performance penalty, but I don't expect that > > to be common. > > Hmm, I was worried about the situation where: > > extern int foo(); // compiled without -mavx > > void bar() { // compiled in a TU with -mavx > ... > // no vzeroupper is inserted before the call instruction because it is > // implicit due to -mavx > foo(); > ... > } > > I'm not certain whether this sort of pattern could cause problems or not. If > there's no way for it to be problematic, then implicitly attaching the attribute > is reasonable enough. It does mean we're straying farther from the as- > written attributes for the function, but that's just an unfortunate situation > we're already in today and wouldn't block this feature. > > ~AaronHi Aaron, Many thanks for your feedback! I do not have any opinion right now on how this attribute should interact with the __vectorcall calling convention. I will need to revisit it later. Regarding the implicit attachment of this attribute, my intention is to only imply the avx attribute on function definitions. Since the backend can see what instructions are being generated in the callee, it should be able to make smart decisions on whether to emit a vzeroupper before the call instruction. In the above example, foo() would not implicitly carry the avx attribute because the compiler sees only its declaration. - Gao