thr3ads.net - llvm dev - [LLVMdev] Calling conventions for YMM registers on AVX [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2012-Jan-09 23:13 UTC

[LLVMdev] Calling conventions for YMM registers on AVX

On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:
> 
> On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote:
> 
>> I'll explain what we see in the code.
>> 1. The caller saves XMM registers across the call if needed (according
to DEFS definition).
>> YMMs are not in the set, so caller does not take care.
> 
> This is not how the register allocator works. It saves the registers
holding values, it doesn't care which alias is clobbered.
> 
> Are you saying that only the xmm part of a ymm register gets spilled before
a call?
> 
>> 2. The callee preserves XMMs but works with YMMs and clobbering them.
>> 3. So after the call, the upper part of YMM is gone.
> 
> Are you on Windows? As Bruno said, all xmm and ymm registers are
call-clobbered on non-Windows platforms.
This thread has lots of interesting information:
http://software.intel.com/en-us/forums/showthread.php?t=59291

I wasn't able to find a formal Win64 ABI spec, but according to
http://www.agner.org/optimize/calling_conventions.pdf, xmm6-xmm15 are
callee-saved on win64, but the high bits in ymm6-ymm15 are not.

That's not currently correctly modelled in LLVM. To fix it, create a
pseudo-register YMMHI_CLOBBER that aliases ymm6-ymm15. Then add YMMHI_CLOBBER to
the registers clobbered by WINCALL64*.

/jakob

Demikhovsky, Elena

2012-Jan-10 07:15 UTC

head link

[LLVMdev] Calling conventions for YMM registers on AVX

We support Win64, that's right.
We defined the upper part of YMM like this

  // XMM Registers, used by the various SSE instruction set extensions.
  // Theses are actually only needed for implementing the Win64 CC with AVX.
  def XMM0b: Register<"xmm0b">, DwarfRegNum<[17, 21, 21]>;
  def XMM1b: Register<"xmm1b">, DwarfRegNum<[18, 22, 22]>;
  def XMM2b: Register<"xmm2b">, DwarfRegNum<[19, 23, 23]>;
  def XMM3b: Register<"xmm3b">, DwarfRegNum<[20, 24, 24]>;
  def XMM4b: Register<"xmm4b">, DwarfRegNum<[21, 25, 25]>;
  def XMM5b: Register<"xmm5b">, DwarfRegNum<[22, 26, 26]>;
  def XMM6b: Register<"xmm6b">, DwarfRegNum<[23, 27, 27]>;
  def XMM7b: Register<"xmm7b">, DwarfRegNum<[24, 28, 28]>;

  // X86-64 only
  def XMM8b:  Register<"xmm8b">,  DwarfRegNum<[25, -2,
-2]>;
  def XMM9b:  Register<"xmm9b">,  DwarfRegNum<[26, -2,
-2]>;
  def XMM10b: Register<"xmm10b">, DwarfRegNum<[27, -2,
-2]>;
  def XMM11b: Register<"xmm11b">, DwarfRegNum<[28, -2,
-2]>;
  def XMM12b: Register<"xmm12b">, DwarfRegNum<[29, -2,
-2]>;
  def XMM13b: Register<"xmm13b">, DwarfRegNum<[30, -2,
-2]>;
  def XMM14b: Register<"xmm14b">, DwarfRegNum<[31, -2,
-2]>;
  def XMM15b: Register<"xmm15b">, DwarfRegNum<[32, -2,
-2]>;

  // YMM Registers, used by AVX instructions
  let SubRegIndices = [sub_xmm, sub_xmmb] in {
  def YMM0: RegisterWithSubRegs<"ymm0", [XMM0, XMM0b]>,
DwarfRegNum<[17, 21, 21]>;
  def YMM1: RegisterWithSubRegs<"ymm1", [XMM1, XMM1b]>,
DwarfRegNum<[18, 22, 22]>;
  def YMM2: RegisterWithSubRegs<"ymm2", [XMM2, XMM2b]>,
DwarfRegNum<[19, 23, 23]>;
  def YMM3: RegisterWithSubRegs<"ymm3", [XMM3, XMM3b]>,
DwarfRegNum<[20, 24, 24]>;
  def YMM4: RegisterWithSubRegs<"ymm4", [XMM4, XMM4b]>,
DwarfRegNum<[21, 25, 25]>;
  def YMM5: RegisterWithSubRegs<"ymm5", [XMM5, XMM5b]>,
DwarfRegNum<[22, 26, 26]>;
  def YMM6: RegisterWithSubRegs<"ymm6", [XMM6, XMM6b]>,
DwarfRegNum<[23, 27, 27]>;
  def YMM7: RegisterWithSubRegs<"ymm7", [XMM7, XMM7b]>,
DwarfRegNum<[24, 28, 28]>;
  def YMM8:  RegisterWithSubRegs<"ymm8", [XMM8, XMM8b]>, 
DwarfRegNum<[25, -2, -2]>;
  def YMM9:  RegisterWithSubRegs<"ymm9", [XMM9, XMM9b]>, 
DwarfRegNum<[26, -2, -2]>;
  def YMM10: RegisterWithSubRegs<"ymm10", [XMM10, XMM10b]>,
DwarfRegNum<[27, -2, -2]>;
  def YMM11: RegisterWithSubRegs<"ymm11", [XMM11, XMM11b]>,
DwarfRegNum<[28, -2, -2]>;
  def YMM12: RegisterWithSubRegs<"ymm12", [XMM12, XMM12b]>,
DwarfRegNum<[29, -2, -2]>;
  def YMM13: RegisterWithSubRegs<"ymm13", [XMM13, XMM13b]>,
DwarfRegNum<[30, -2, -2]>;
  def YMM14: RegisterWithSubRegs<"ymm14", [XMM14, XMM14b]>,
DwarfRegNum<[31, -2, -2]>;
  def YMM15: RegisterWithSubRegs<"ymm15", [XMM15, XMM15b]>,
DwarfRegNum<[32, -2, -2]>;
  }

- Elena

-----Original Message-----
From: Jakob Stoklund Olesen [mailto:jolesen at apple.com] 
Sent: Tuesday, January 10, 2012 01:14
To: Demikhovsky, Elena
Cc: Bruno Cardoso Lopes; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Calling conventions for YMM registers on AVX


On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:
> 
> On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote:
> 
>> I'll explain what we see in the code.
>> 1. The caller saves XMM registers across the call if needed (according
to DEFS definition).
>> YMMs are not in the set, so caller does not take care.
> 
> This is not how the register allocator works. It saves the registers
holding values, it doesn't care which alias is clobbered.
> 
> Are you saying that only the xmm part of a ymm register gets spilled before
a call?
> 
>> 2. The callee preserves XMMs but works with YMMs and clobbering them.
>> 3. So after the call, the upper part of YMM is gone.
> 
> Are you on Windows? As Bruno said, all xmm and ymm registers are
call-clobbered on non-Windows platforms.
This thread has lots of interesting information:
http://software.intel.com/en-us/forums/showthread.php?t=59291

I wasn't able to find a formal Win64 ABI spec, but according to
http://www.agner.org/optimize/calling_conventions.pdf, xmm6-xmm15 are
callee-saved on win64, but the high bits in ymm6-ymm15 are not.

That's not currently correctly modelled in LLVM. To fix it, create a
pseudo-register YMMHI_CLOBBER that aliases ymm6-ymm15. Then add YMMHI_CLOBBER to
the registers clobbered by WINCALL64*.

/jakob

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Demikhovsky, Elena

2012-Jan-10 11:55 UTC

head link

[LLVMdev] Calling conventions for YMM registers on AVX

This is the wrong code:

declare <16 x float> @foo(<16 x float>)

define <16 x float> @test(<16 x float> %x, <16 x float> %y)
nounwind {
entry:
  %x1 = fadd  <16 x float>  %x, %y
  %call = call  <16 x float> @foo(<16 x float> %x1) nounwind
  %y1 = fsub  <16 x float>  %call, %y
  ret <16 x float> %y1
}
./llc -mattr=+avx -mtriple=x86_64-win32 < test.ll
        .def     test;
        .scl    2;
        .type   32;
        .endef
        .text
        .globl  test
        .align  16, 0x90
test:                                   # @test
# BB#0:                                 # %entry
        pushq   %rbp
        movq    %rsp, %rbp
        subq    $64, %rsp
        vmovaps %xmm7, -32(%rbp)        # 16-byte Spill
        vmovaps %xmm6, -16(%rbp)        # 16-byte Spill
        vmovaps %ymm3, %ymm6
        vmovaps %ymm2, %ymm7
        vaddps  %ymm7, %ymm0, %ymm0
        vaddps  %ymm6, %ymm1, %ymm1
        callq   foo
        vsubps  %ymm7, %ymm0, %ymm0
        vsubps  %ymm6, %ymm1, %ymm1
        vmovaps -16(%rbp), %xmm6        # 16-byte Reload
        vmovaps -32(%rbp), %xmm7        # 16-byte Reload
        addq    $64, %rsp
        popq    %rbp
        ret

ymm6,ymm7 are not saved across the call.

I have a fix, can send it to review.

- Elena

-----Original Message-----
From: Demikhovsky, Elena 
Sent: Tuesday, January 10, 2012 09:15
To: 'Jakob Stoklund Olesen'
Cc: Bruno Cardoso Lopes; llvmdev at cs.uiuc.edu
Subject: RE: [LLVMdev] Calling conventions for YMM registers on AVX

We support Win64, that's right.
We defined the upper part of YMM like this

  // XMM Registers, used by the various SSE instruction set extensions.
  // Theses are actually only needed for implementing the Win64 CC with AVX.
  def XMM0b: Register<"xmm0b">, DwarfRegNum<[17, 21, 21]>;
  def XMM1b: Register<"xmm1b">, DwarfRegNum<[18, 22, 22]>;
  def XMM2b: Register<"xmm2b">, DwarfRegNum<[19, 23, 23]>;
  def XMM3b: Register<"xmm3b">, DwarfRegNum<[20, 24, 24]>;
  def XMM4b: Register<"xmm4b">, DwarfRegNum<[21, 25, 25]>;
  def XMM5b: Register<"xmm5b">, DwarfRegNum<[22, 26, 26]>;
  def XMM6b: Register<"xmm6b">, DwarfRegNum<[23, 27, 27]>;
  def XMM7b: Register<"xmm7b">, DwarfRegNum<[24, 28, 28]>;

  // X86-64 only
  def XMM8b:  Register<"xmm8b">,  DwarfRegNum<[25, -2,
-2]>;
  def XMM9b:  Register<"xmm9b">,  DwarfRegNum<[26, -2,
-2]>;
  def XMM10b: Register<"xmm10b">, DwarfRegNum<[27, -2,
-2]>;
  def XMM11b: Register<"xmm11b">, DwarfRegNum<[28, -2,
-2]>;
  def XMM12b: Register<"xmm12b">, DwarfRegNum<[29, -2,
-2]>;
  def XMM13b: Register<"xmm13b">, DwarfRegNum<[30, -2,
-2]>;
  def XMM14b: Register<"xmm14b">, DwarfRegNum<[31, -2,
-2]>;
  def XMM15b: Register<"xmm15b">, DwarfRegNum<[32, -2,
-2]>;

  // YMM Registers, used by AVX instructions
  let SubRegIndices = [sub_xmm, sub_xmmb] in {
  def YMM0: RegisterWithSubRegs<"ymm0", [XMM0, XMM0b]>,
DwarfRegNum<[17, 21, 21]>;
  def YMM1: RegisterWithSubRegs<"ymm1", [XMM1, XMM1b]>,
DwarfRegNum<[18, 22, 22]>;
  def YMM2: RegisterWithSubRegs<"ymm2", [XMM2, XMM2b]>,
DwarfRegNum<[19, 23, 23]>;
  def YMM3: RegisterWithSubRegs<"ymm3", [XMM3, XMM3b]>,
DwarfRegNum<[20, 24, 24]>;
  def YMM4: RegisterWithSubRegs<"ymm4", [XMM4, XMM4b]>,
DwarfRegNum<[21, 25, 25]>;
  def YMM5: RegisterWithSubRegs<"ymm5", [XMM5, XMM5b]>,
DwarfRegNum<[22, 26, 26]>;
  def YMM6: RegisterWithSubRegs<"ymm6", [XMM6, XMM6b]>,
DwarfRegNum<[23, 27, 27]>;
  def YMM7: RegisterWithSubRegs<"ymm7", [XMM7, XMM7b]>,
DwarfRegNum<[24, 28, 28]>;
  def YMM8:  RegisterWithSubRegs<"ymm8", [XMM8, XMM8b]>, 
DwarfRegNum<[25, -2, -2]>;
  def YMM9:  RegisterWithSubRegs<"ymm9", [XMM9, XMM9b]>, 
DwarfRegNum<[26, -2, -2]>;
  def YMM10: RegisterWithSubRegs<"ymm10", [XMM10, XMM10b]>,
DwarfRegNum<[27, -2, -2]>;
  def YMM11: RegisterWithSubRegs<"ymm11", [XMM11, XMM11b]>,
DwarfRegNum<[28, -2, -2]>;
  def YMM12: RegisterWithSubRegs<"ymm12", [XMM12, XMM12b]>,
DwarfRegNum<[29, -2, -2]>;
  def YMM13: RegisterWithSubRegs<"ymm13", [XMM13, XMM13b]>,
DwarfRegNum<[30, -2, -2]>;
  def YMM14: RegisterWithSubRegs<"ymm14", [XMM14, XMM14b]>,
DwarfRegNum<[31, -2, -2]>;
  def YMM15: RegisterWithSubRegs<"ymm15", [XMM15, XMM15b]>,
DwarfRegNum<[32, -2, -2]>;
  }

- Elena

-----Original Message-----
From: Jakob Stoklund Olesen [mailto:jolesen at apple.com] 
Sent: Tuesday, January 10, 2012 01:14
To: Demikhovsky, Elena
Cc: Bruno Cardoso Lopes; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Calling conventions for YMM registers on AVX


On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:
> 
> On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote:
> 
>> I'll explain what we see in the code.
>> 1. The caller saves XMM registers across the call if needed (according
to DEFS definition).
>> YMMs are not in the set, so caller does not take care.
> 
> This is not how the register allocator works. It saves the registers
holding values, it doesn't care which alias is clobbered.
> 
> Are you saying that only the xmm part of a ymm register gets spilled before
a call?
> 
>> 2. The callee preserves XMMs but works with YMMs and clobbering them.
>> 3. So after the call, the upper part of YMM is gone.
> 
> Are you on Windows? As Bruno said, all xmm and ymm registers are
call-clobbered on non-Windows platforms.
This thread has lots of interesting information:
http://software.intel.com/en-us/forums/showthread.php?t=59291

I wasn't able to find a formal Win64 ABI spec, but according to
http://www.agner.org/optimize/calling_conventions.pdf, xmm6-xmm15 are
callee-saved on win64, but the high bits in ymm6-ymm15 are not.

That's not currently correctly modelled in LLVM. To fix it, create a
pseudo-register YMMHI_CLOBBER that aliases ymm6-ymm15. Then add YMMHI_CLOBBER to
the registers clobbered by WINCALL64*.

/jakob

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Jakob Stoklund Olesen

2012-Jan-10 16:15 UTC

head link

[LLVMdev] Calling conventions for YMM registers on AVX

On Jan 10, 2012, at 3:55 AM, Demikhovsky, Elena wrote:
> This is the wrong code:
> 
> declare <16 x float> @foo(<16 x float>)
> 
> define <16 x float> @test(<16 x float> %x, <16 x float>
%y) nounwind {
> entry:
>  %x1 = fadd  <16 x float>  %x, %y
>  %call = call  <16 x float> @foo(<16 x float> %x1) nounwind
>  %y1 = fsub  <16 x float>  %call, %y
>  ret <16 x float> %y1
> }
Thanks.
> ./llc -mattr=+avx -mtriple=x86_64-win32 < test.ll
> test:                                   # @test
> # BB#0:                                 # %entry
>        pushq   %rbp
>        movq    %rsp, %rbp
>        subq    $64, %rsp
>        vmovaps %xmm7, -32(%rbp)        # 16-byte Spill
>        vmovaps %xmm6, -16(%rbp)        # 16-byte Spill
>        vmovaps %ymm3, %ymm6
>        vmovaps %ymm2, %ymm7
>        vaddps  %ymm7, %ymm0, %ymm0
>        vaddps  %ymm6, %ymm1, %ymm1
>        callq   foo
>        vsubps  %ymm7, %ymm0, %ymm0
>        vsubps  %ymm6, %ymm1, %ymm1
>        vmovaps -16(%rbp), %xmm6        # 16-byte Reload
>        vmovaps -32(%rbp), %xmm7        # 16-byte Reload
>        addq    $64, %rsp
>        popq    %rbp
>        ret
> 
> ymm6,ymm7 are not saved across the call.
The xmm spills and reloads are correct, that is prolog and epilog code
preserving xmm registers.

However, you are correct that ymm6 and ymm7 can't be used as callee-saved
registers.
> We support Win64, that's right.
> We defined the upper part of YMM like this
> 
>  // XMM Registers, used by the various SSE instruction set extensions.
>  // Theses are actually only needed for implementing the Win64 CC with AVX.
>  def XMM0b: Register<"xmm0b">, DwarfRegNum<[17, 21,
21]>;
>  def XMM1b: Register<"xmm1b">, DwarfRegNum<[18, 22,
22]>;
>  def XMM2b: Register<"xmm2b">, DwarfRegNum<[19, 23,
23]>;
>  def XMM3b: Register<"xmm3b">, DwarfRegNum<[20, 24,
24]>;
>  def XMM4b: Register<"xmm4b">, DwarfRegNum<[21, 25,
25]>;
>  def XMM5b: Register<"xmm5b">, DwarfRegNum<[22, 26,
26]>;
>  def XMM6b: Register<"xmm6b">, DwarfRegNum<[23, 27,
27]>;
>  def XMM7b: Register<"xmm7b">, DwarfRegNum<[24, 28,
28]>;
> 
>  // X86-64 only
>  def XMM8b:  Register<"xmm8b">,  DwarfRegNum<[25, -2,
-2]>;
>  def XMM9b:  Register<"xmm9b">,  DwarfRegNum<[26, -2,
-2]>;
>  def XMM10b: Register<"xmm10b">, DwarfRegNum<[27, -2,
-2]>;
>  def XMM11b: Register<"xmm11b">, DwarfRegNum<[28, -2,
-2]>;
>  def XMM12b: Register<"xmm12b">, DwarfRegNum<[29, -2,
-2]>;
>  def XMM13b: Register<"xmm13b">, DwarfRegNum<[30, -2,
-2]>;
>  def XMM14b: Register<"xmm14b">, DwarfRegNum<[31, -2,
-2]>;
>  def XMM15b: Register<"xmm15b">, DwarfRegNum<[32, -2,
-2]>;
There is no need to define all these fake registers. One is enough:

def YMM_UPPER : Register<"ymmupper"> {
  let Aliases = [ YMM0, YMM1, ..., YMM15 ];
};

It doesn't need to be a sub-register either. Aliasing is good enough.

/jakob

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Jan 2012 - [LLVMdev] Calling conventions for YMM registers on AVX

[LLVMdev] Calling conventions for YMM registers on AVX

[LLVMdev] Calling conventions for YMM registers on AVX

[LLVMdev] Calling conventions for YMM registers on AVX

[LLVMdev] Calling conventions for YMM registers on AVX

Seemingly Similar Threads