Jakob Stoklund Olesen
2012-Jan-09 23:13 UTC
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:> > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered. > > Are you saying that only the xmm part of a ymm register gets spilled before a call? > >> 2. The callee preserves XMMs but works with YMMs and clobbering them. >> 3. So after the call, the upper part of YMM is gone. > > Are you on Windows? As Bruno said, all xmm and ymm registers are call-clobbered on non-Windows platforms.This thread has lots of interesting information: http://software.intel.com/en-us/forums/showthread.php?t=59291 I wasn't able to find a formal Win64 ABI spec, but according to http://www.agner.org/optimize/calling_conventions.pdf, xmm6-xmm15 are callee-saved on win64, but the high bits in ymm6-ymm15 are not. That's not currently correctly modelled in LLVM. To fix it, create a pseudo-register YMMHI_CLOBBER that aliases ymm6-ymm15. Then add YMMHI_CLOBBER to the registers clobbered by WINCALL64*. /jakob
Demikhovsky, Elena
2012-Jan-10 07:15 UTC
[LLVMdev] Calling conventions for YMM registers on AVX
We support Win64, that's right. We defined the upper part of YMM like this // XMM Registers, used by the various SSE instruction set extensions. // Theses are actually only needed for implementing the Win64 CC with AVX. def XMM0b: Register<"xmm0b">, DwarfRegNum<[17, 21, 21]>; def XMM1b: Register<"xmm1b">, DwarfRegNum<[18, 22, 22]>; def XMM2b: Register<"xmm2b">, DwarfRegNum<[19, 23, 23]>; def XMM3b: Register<"xmm3b">, DwarfRegNum<[20, 24, 24]>; def XMM4b: Register<"xmm4b">, DwarfRegNum<[21, 25, 25]>; def XMM5b: Register<"xmm5b">, DwarfRegNum<[22, 26, 26]>; def XMM6b: Register<"xmm6b">, DwarfRegNum<[23, 27, 27]>; def XMM7b: Register<"xmm7b">, DwarfRegNum<[24, 28, 28]>; // X86-64 only def XMM8b: Register<"xmm8b">, DwarfRegNum<[25, -2, -2]>; def XMM9b: Register<"xmm9b">, DwarfRegNum<[26, -2, -2]>; def XMM10b: Register<"xmm10b">, DwarfRegNum<[27, -2, -2]>; def XMM11b: Register<"xmm11b">, DwarfRegNum<[28, -2, -2]>; def XMM12b: Register<"xmm12b">, DwarfRegNum<[29, -2, -2]>; def XMM13b: Register<"xmm13b">, DwarfRegNum<[30, -2, -2]>; def XMM14b: Register<"xmm14b">, DwarfRegNum<[31, -2, -2]>; def XMM15b: Register<"xmm15b">, DwarfRegNum<[32, -2, -2]>; // YMM Registers, used by AVX instructions let SubRegIndices = [sub_xmm, sub_xmmb] in { def YMM0: RegisterWithSubRegs<"ymm0", [XMM0, XMM0b]>, DwarfRegNum<[17, 21, 21]>; def YMM1: RegisterWithSubRegs<"ymm1", [XMM1, XMM1b]>, DwarfRegNum<[18, 22, 22]>; def YMM2: RegisterWithSubRegs<"ymm2", [XMM2, XMM2b]>, DwarfRegNum<[19, 23, 23]>; def YMM3: RegisterWithSubRegs<"ymm3", [XMM3, XMM3b]>, DwarfRegNum<[20, 24, 24]>; def YMM4: RegisterWithSubRegs<"ymm4", [XMM4, XMM4b]>, DwarfRegNum<[21, 25, 25]>; def YMM5: RegisterWithSubRegs<"ymm5", [XMM5, XMM5b]>, DwarfRegNum<[22, 26, 26]>; def YMM6: RegisterWithSubRegs<"ymm6", [XMM6, XMM6b]>, DwarfRegNum<[23, 27, 27]>; def YMM7: RegisterWithSubRegs<"ymm7", [XMM7, XMM7b]>, DwarfRegNum<[24, 28, 28]>; def YMM8: RegisterWithSubRegs<"ymm8", [XMM8, XMM8b]>, DwarfRegNum<[25, -2, -2]>; def YMM9: RegisterWithSubRegs<"ymm9", [XMM9, XMM9b]>, DwarfRegNum<[26, -2, -2]>; def YMM10: RegisterWithSubRegs<"ymm10", [XMM10, XMM10b]>, DwarfRegNum<[27, -2, -2]>; def YMM11: RegisterWithSubRegs<"ymm11", [XMM11, XMM11b]>, DwarfRegNum<[28, -2, -2]>; def YMM12: RegisterWithSubRegs<"ymm12", [XMM12, XMM12b]>, DwarfRegNum<[29, -2, -2]>; def YMM13: RegisterWithSubRegs<"ymm13", [XMM13, XMM13b]>, DwarfRegNum<[30, -2, -2]>; def YMM14: RegisterWithSubRegs<"ymm14", [XMM14, XMM14b]>, DwarfRegNum<[31, -2, -2]>; def YMM15: RegisterWithSubRegs<"ymm15", [XMM15, XMM15b]>, DwarfRegNum<[32, -2, -2]>; } - Elena -----Original Message----- From: Jakob Stoklund Olesen [mailto:jolesen at apple.com] Sent: Tuesday, January 10, 2012 01:14 To: Demikhovsky, Elena Cc: Bruno Cardoso Lopes; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Calling conventions for YMM registers on AVX On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:> > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered. > > Are you saying that only the xmm part of a ymm register gets spilled before a call? > >> 2. The callee preserves XMMs but works with YMMs and clobbering them. >> 3. So after the call, the upper part of YMM is gone. > > Are you on Windows? As Bruno said, all xmm and ymm registers are call-clobbered on non-Windows platforms.This thread has lots of interesting information: http://software.intel.com/en-us/forums/showthread.php?t=59291 I wasn't able to find a formal Win64 ABI spec, but according to http://www.agner.org/optimize/calling_conventions.pdf, xmm6-xmm15 are callee-saved on win64, but the high bits in ymm6-ymm15 are not. That's not currently correctly modelled in LLVM. To fix it, create a pseudo-register YMMHI_CLOBBER that aliases ymm6-ymm15. Then add YMMHI_CLOBBER to the registers clobbered by WINCALL64*. /jakob --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Demikhovsky, Elena
2012-Jan-10 11:55 UTC
[LLVMdev] Calling conventions for YMM registers on AVX
This is the wrong code: declare <16 x float> @foo(<16 x float>) define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind { entry: %x1 = fadd <16 x float> %x, %y %call = call <16 x float> @foo(<16 x float> %x1) nounwind %y1 = fsub <16 x float> %call, %y ret <16 x float> %y1 } ./llc -mattr=+avx -mtriple=x86_64-win32 < test.ll .def test; .scl 2; .type 32; .endef .text .globl test .align 16, 0x90 test: # @test # BB#0: # %entry pushq %rbp movq %rsp, %rbp subq $64, %rsp vmovaps %xmm7, -32(%rbp) # 16-byte Spill vmovaps %xmm6, -16(%rbp) # 16-byte Spill vmovaps %ymm3, %ymm6 vmovaps %ymm2, %ymm7 vaddps %ymm7, %ymm0, %ymm0 vaddps %ymm6, %ymm1, %ymm1 callq foo vsubps %ymm7, %ymm0, %ymm0 vsubps %ymm6, %ymm1, %ymm1 vmovaps -16(%rbp), %xmm6 # 16-byte Reload vmovaps -32(%rbp), %xmm7 # 16-byte Reload addq $64, %rsp popq %rbp ret ymm6,ymm7 are not saved across the call. I have a fix, can send it to review. - Elena -----Original Message----- From: Demikhovsky, Elena Sent: Tuesday, January 10, 2012 09:15 To: 'Jakob Stoklund Olesen' Cc: Bruno Cardoso Lopes; llvmdev at cs.uiuc.edu Subject: RE: [LLVMdev] Calling conventions for YMM registers on AVX We support Win64, that's right. We defined the upper part of YMM like this // XMM Registers, used by the various SSE instruction set extensions. // Theses are actually only needed for implementing the Win64 CC with AVX. def XMM0b: Register<"xmm0b">, DwarfRegNum<[17, 21, 21]>; def XMM1b: Register<"xmm1b">, DwarfRegNum<[18, 22, 22]>; def XMM2b: Register<"xmm2b">, DwarfRegNum<[19, 23, 23]>; def XMM3b: Register<"xmm3b">, DwarfRegNum<[20, 24, 24]>; def XMM4b: Register<"xmm4b">, DwarfRegNum<[21, 25, 25]>; def XMM5b: Register<"xmm5b">, DwarfRegNum<[22, 26, 26]>; def XMM6b: Register<"xmm6b">, DwarfRegNum<[23, 27, 27]>; def XMM7b: Register<"xmm7b">, DwarfRegNum<[24, 28, 28]>; // X86-64 only def XMM8b: Register<"xmm8b">, DwarfRegNum<[25, -2, -2]>; def XMM9b: Register<"xmm9b">, DwarfRegNum<[26, -2, -2]>; def XMM10b: Register<"xmm10b">, DwarfRegNum<[27, -2, -2]>; def XMM11b: Register<"xmm11b">, DwarfRegNum<[28, -2, -2]>; def XMM12b: Register<"xmm12b">, DwarfRegNum<[29, -2, -2]>; def XMM13b: Register<"xmm13b">, DwarfRegNum<[30, -2, -2]>; def XMM14b: Register<"xmm14b">, DwarfRegNum<[31, -2, -2]>; def XMM15b: Register<"xmm15b">, DwarfRegNum<[32, -2, -2]>; // YMM Registers, used by AVX instructions let SubRegIndices = [sub_xmm, sub_xmmb] in { def YMM0: RegisterWithSubRegs<"ymm0", [XMM0, XMM0b]>, DwarfRegNum<[17, 21, 21]>; def YMM1: RegisterWithSubRegs<"ymm1", [XMM1, XMM1b]>, DwarfRegNum<[18, 22, 22]>; def YMM2: RegisterWithSubRegs<"ymm2", [XMM2, XMM2b]>, DwarfRegNum<[19, 23, 23]>; def YMM3: RegisterWithSubRegs<"ymm3", [XMM3, XMM3b]>, DwarfRegNum<[20, 24, 24]>; def YMM4: RegisterWithSubRegs<"ymm4", [XMM4, XMM4b]>, DwarfRegNum<[21, 25, 25]>; def YMM5: RegisterWithSubRegs<"ymm5", [XMM5, XMM5b]>, DwarfRegNum<[22, 26, 26]>; def YMM6: RegisterWithSubRegs<"ymm6", [XMM6, XMM6b]>, DwarfRegNum<[23, 27, 27]>; def YMM7: RegisterWithSubRegs<"ymm7", [XMM7, XMM7b]>, DwarfRegNum<[24, 28, 28]>; def YMM8: RegisterWithSubRegs<"ymm8", [XMM8, XMM8b]>, DwarfRegNum<[25, -2, -2]>; def YMM9: RegisterWithSubRegs<"ymm9", [XMM9, XMM9b]>, DwarfRegNum<[26, -2, -2]>; def YMM10: RegisterWithSubRegs<"ymm10", [XMM10, XMM10b]>, DwarfRegNum<[27, -2, -2]>; def YMM11: RegisterWithSubRegs<"ymm11", [XMM11, XMM11b]>, DwarfRegNum<[28, -2, -2]>; def YMM12: RegisterWithSubRegs<"ymm12", [XMM12, XMM12b]>, DwarfRegNum<[29, -2, -2]>; def YMM13: RegisterWithSubRegs<"ymm13", [XMM13, XMM13b]>, DwarfRegNum<[30, -2, -2]>; def YMM14: RegisterWithSubRegs<"ymm14", [XMM14, XMM14b]>, DwarfRegNum<[31, -2, -2]>; def YMM15: RegisterWithSubRegs<"ymm15", [XMM15, XMM15b]>, DwarfRegNum<[32, -2, -2]>; } - Elena -----Original Message----- From: Jakob Stoklund Olesen [mailto:jolesen at apple.com] Sent: Tuesday, January 10, 2012 01:14 To: Demikhovsky, Elena Cc: Bruno Cardoso Lopes; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Calling conventions for YMM registers on AVX On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:> > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered. > > Are you saying that only the xmm part of a ymm register gets spilled before a call? > >> 2. The callee preserves XMMs but works with YMMs and clobbering them. >> 3. So after the call, the upper part of YMM is gone. > > Are you on Windows? As Bruno said, all xmm and ymm registers are call-clobbered on non-Windows platforms.This thread has lots of interesting information: http://software.intel.com/en-us/forums/showthread.php?t=59291 I wasn't able to find a formal Win64 ABI spec, but according to http://www.agner.org/optimize/calling_conventions.pdf, xmm6-xmm15 are callee-saved on win64, but the high bits in ymm6-ymm15 are not. That's not currently correctly modelled in LLVM. To fix it, create a pseudo-register YMMHI_CLOBBER that aliases ymm6-ymm15. Then add YMMHI_CLOBBER to the registers clobbered by WINCALL64*. /jakob --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Jakob Stoklund Olesen
2012-Jan-10 16:15 UTC
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 10, 2012, at 3:55 AM, Demikhovsky, Elena wrote:> This is the wrong code: > > declare <16 x float> @foo(<16 x float>) > > define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind { > entry: > %x1 = fadd <16 x float> %x, %y > %call = call <16 x float> @foo(<16 x float> %x1) nounwind > %y1 = fsub <16 x float> %call, %y > ret <16 x float> %y1 > }Thanks.> ./llc -mattr=+avx -mtriple=x86_64-win32 < test.ll > test: # @test > # BB#0: # %entry > pushq %rbp > movq %rsp, %rbp > subq $64, %rsp > vmovaps %xmm7, -32(%rbp) # 16-byte Spill > vmovaps %xmm6, -16(%rbp) # 16-byte Spill > vmovaps %ymm3, %ymm6 > vmovaps %ymm2, %ymm7 > vaddps %ymm7, %ymm0, %ymm0 > vaddps %ymm6, %ymm1, %ymm1 > callq foo > vsubps %ymm7, %ymm0, %ymm0 > vsubps %ymm6, %ymm1, %ymm1 > vmovaps -16(%rbp), %xmm6 # 16-byte Reload > vmovaps -32(%rbp), %xmm7 # 16-byte Reload > addq $64, %rsp > popq %rbp > ret > > ymm6,ymm7 are not saved across the call.The xmm spills and reloads are correct, that is prolog and epilog code preserving xmm registers. However, you are correct that ymm6 and ymm7 can't be used as callee-saved registers.> We support Win64, that's right. > We defined the upper part of YMM like this > > // XMM Registers, used by the various SSE instruction set extensions. > // Theses are actually only needed for implementing the Win64 CC with AVX. > def XMM0b: Register<"xmm0b">, DwarfRegNum<[17, 21, 21]>; > def XMM1b: Register<"xmm1b">, DwarfRegNum<[18, 22, 22]>; > def XMM2b: Register<"xmm2b">, DwarfRegNum<[19, 23, 23]>; > def XMM3b: Register<"xmm3b">, DwarfRegNum<[20, 24, 24]>; > def XMM4b: Register<"xmm4b">, DwarfRegNum<[21, 25, 25]>; > def XMM5b: Register<"xmm5b">, DwarfRegNum<[22, 26, 26]>; > def XMM6b: Register<"xmm6b">, DwarfRegNum<[23, 27, 27]>; > def XMM7b: Register<"xmm7b">, DwarfRegNum<[24, 28, 28]>; > > // X86-64 only > def XMM8b: Register<"xmm8b">, DwarfRegNum<[25, -2, -2]>; > def XMM9b: Register<"xmm9b">, DwarfRegNum<[26, -2, -2]>; > def XMM10b: Register<"xmm10b">, DwarfRegNum<[27, -2, -2]>; > def XMM11b: Register<"xmm11b">, DwarfRegNum<[28, -2, -2]>; > def XMM12b: Register<"xmm12b">, DwarfRegNum<[29, -2, -2]>; > def XMM13b: Register<"xmm13b">, DwarfRegNum<[30, -2, -2]>; > def XMM14b: Register<"xmm14b">, DwarfRegNum<[31, -2, -2]>; > def XMM15b: Register<"xmm15b">, DwarfRegNum<[32, -2, -2]>;There is no need to define all these fake registers. One is enough: def YMM_UPPER : Register<"ymmupper"> { let Aliases = [ YMM0, YMM1, ..., YMM15 ]; }; It doesn't need to be a sub-register either. Aliasing is good enough. /jakob