Hi! I have discovered a problem with LLVM's interpretation of the Win64 calling convention w.r.t. passing of aggregates as arguments. The following code is part of my host application that is compiled with Visual Studio 2005 in 64-bit debug mode. noise4 expects a structure of four floats as its first and only argument, which is - in accordance with the specs of the Win64 calling convention - passed by pointer. --- snip --- struct float4 { float x, y, z, w; } float noise4(float4 v) { 0000000140067AE0 mov qword ptr [rsp+8],rcx 0000000140067AE5 push rdi 0000000140067AE6 sub rsp,10h 0000000140067AEA mov rdi,rsp 0000000140067AED mov rcx,4 0000000140067AF7 mov eax,0CCCCCCCCh 0000000140067AFC rep stos dword ptr [rdi] 0000000140067AFE mov rcx,qword ptr [rsp+20h] return v.x + v.y; 0000000140067B03 mov rax,qword ptr [v] 0000000140067B08 mov rcx,qword ptr [v] 0000000140067B0D movss xmm0,dword ptr [rax] 0000000140067B11 addss xmm0,dword ptr [rcx+4] 0000000140067B16 add rsp,10h 0000000140067B1A pop rdi 0000000140067B1B ret } --- snip --- noise4 is supposed to be called by jitted LLVM code, just like in the following example. --- snip --- target datalayout "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-n8:16:32:64" target triple = "x86_64-pc-win32" %0 = type opaque %float4 = type { float, float, float, float } define void @main(%float4* noalias nocapture, %0* noalias nocapture) nounwind { %3 = call float @"noise$float4"(%float4 { float 1.000000e+000, float 2.000000e+000, float 3.000000e+000, float 4.000000e+000 }) ; <float> [#uses=4] %4 = insertvalue %float4 undef, float %3, 0 ; <%float4> [#uses=1] %5 = insertvalue %float4 %4, float %3, 1 ; <%float4> [#uses=1] %6 = insertvalue %float4 %5, float %3, 2 ; <%float4> [#uses=1] %7 = insertvalue %float4 %6, float %3, 3 ; <%float4> [#uses=1] store %float4 %7, %float4* %0 ret void } declare float @"noise$float4"(%float4) nounwind readnone --- snip --- When compiling this module with llc (Intel assembler syntax) I get the following code. As you can see, the float4 argument is not passed to the noise-function by pointer. Instead, noise is treated as if it expected four individual floats as arguments, which are passed in the registers XMM0-XMM3. --- snip --- .data ALIGN 4 $CPI1_0: ; constant float dd 1065353216 ; float 1.000000e+000 $CPI1_1: ; constant float dd 1073741824 ; float 2.000000e+000 $CPI1_2: ; constant float dd 1077936128 ; float 3.000000e+000 $CPI1_3: ; constant float dd 1082130432 ; float 4.000000e+000 .text ALIGN 16 .globl _main _main: ; @main ; BB#0: sub RSP, 40 mov QWORD PTR [RSP + 32], RSI ; Spill mov RSI, RCX movss XMM0, DWORD PTR [RIP + ($CPI1_0)] movss XMM1, DWORD PTR [RIP + ($CPI1_1)] movss XMM2, DWORD PTR [RIP + ($CPI1_2)] movss XMM3, DWORD PTR [RIP + ($CPI1_3)] call _noise$float4 movss DWORD PTR [RSI + 12], XMM0 movss DWORD PTR [RSI + 8], XMM0 movss DWORD PTR [RSI + 4], XMM0 movss DWORD PTR [RSI], XMM0 mov RSI, QWORD PTR [RSP + 32] ; Reload add RSP, 40 ret --- snip --- This clearly doesn't work and I'd be glad if someone could look into this issue. Other than that I'm pleased to say that my experiences with 64-bit code generation on Windows have been very positive. Great job! Best regards, Stephan
Hello> When compiling this module with llc (Intel assembler syntax) I get the > following code. As you can see, the float4 argument is not passed to > the noise-function by pointer. Instead, noise is treated as if it > expected four individual floats as arguments, which are passed in the > registers XMM0-XMM3.That's correct behaviour. ABI under question is C/C++ ABI and it is a frontend responsibility to lower stuff like struct-by-value into ABI-compliant IR. So, in short - you need to pass pointer to the struct in your IR. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
Hi!>> When compiling this module with llc (Intel assembler syntax) I get the >> following code. As you can see, the float4 argument is not passed to >> the noise-function by pointer. Instead, noise is treated as if it >> expected four individual floats as arguments, which are passed in the >> registers XMM0-XMM3. > That's correct behaviour. ABI under question is C/C++ ABI and it is a frontend > responsibility to lower stuff like struct-by-value into ABI-compliant IR. > > So, in short - you need to pass pointer to the struct in your IR.I don't know. I feel reluctant to generate different IRs for Win32 and for Win64. Since the C calling convention is the default for LLVM functions, I thought that it would map to *the* Win64 calling convention (since cdecl, fastcall and stdcall are all the same) when a 64-bit build was used. This is quite confusing ...
Hello> I don't know. I feel reluctant to generate different IRs for Win32 and > for Win64.Unfortunately, you should. Think about differences and between _Complex type and struct {double, double}.>From LLVM's point of view these are same types, however many ABIs havespecial rules for passing / returning _Complex, this is possible to handle in frontend only.> Since the C calling convention is the default for LLVM functions, I > thought that it would map to *the* Win64 calling convention (since > cdecl, fastcall and stdcall are all the same) when a 64-bit build was > used. This is quite confusing ...Yes, default calling convention is C. But you're not using an ABI-compliant C compiler to generate the IR. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
Thanks, Anton! I didn't know about exceptions like _Complex that you mentioned. The only way to support them is to place the burden of correct parameter passing on the front-end, I understand that now. So, today I created a new transformation pass that makes sure that LLVM IR, which works alright with the default Win32 calling conventions, also plays nice with Win64 code within the limited scope of my language (that requires packing of aggregates of sizes 8, 16, 32, and 64-bits into an integer argument, and passing of larger aggregates by reference). Best regards, Stephan 2009/12/3 Anton Korobeynikov <anton at korobeynikov.info>:> Hello > >> I don't know. I feel reluctant to generate different IRs for Win32 and >> for Win64. > Unfortunately, you should. Think about differences and between > _Complex type and struct {double, double}. > From LLVM's point of view these are same types, however many ABIs have > special rules for passing / returning _Complex, > this is possible to handle in frontend only. > >> Since the C calling convention is the default for LLVM functions, I >> thought that it would map to *the* Win64 calling convention (since >> cdecl, fastcall and stdcall are all the same) when a 64-bit build was >> used. This is quite confusing ... > Yes, default calling convention is C. But you're not using an > ABI-compliant C compiler to generate the IR. > > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University >