I am tracking down an x86-64 code generation problem that has to do with AVX instructions. The symptom is: a function is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually. I tracked this down to the following. The call site looks like vmovdqa 24064(%rsp), %ymm0 vmovdqa %ymm0, (%rsp) vzeroupper callq __Z14convert_char16Dv16_s which passes the argument on the stack. The callee, however, begins with __Z14convert_char16Dv16_s: ## @_Z14convert_char16Dv16_s .cfi_startproc ## BB#0: ## %entry pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp vextractf128 $1, %ymm0, %xmm1 which expects the argument in %ymm0. However, the vzeroupper in the caller just destroyed part of %ymm0... My question is: What decides this calling convention? I know that standard x86-64 should pass arguments in %xmm0, not %ymm0. Are there e.g. command line options, CPU attributes, or target triplets that would modify this? Or should this be filed as bug report? However, this may also be a bug in pocl as I haven't been able to reproduced this without pocl. -erik -- Erik Schnetter <schnetter at gmail.com> http://www.perimeterinstitute.ca/personal/eschnetter/ My email is as private as my paper mail. I therefore support encrypting and signing email messages. Get my PGP key from http://pgp.mit.edu/. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 203 bytes Desc: Message signed with OpenPGP using GPGMail URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130905/65489517/attachment.sig>
On Thu, Sep 5, 2013 at 1:23 PM, Erik Schnetter <schnetter at gmail.com> wrote:> I am tracking down an x86-64 code generation problem that has to do with > AVX instructions. The symptom is: a function is called, and the upper half > of the function argument (which is short16) is zero. This happens only when > I compile code with pocl, but not when I use clang and/or llc manually. > > I tracked this down to the following. The call site looks like > > vmovdqa 24064(%rsp), %ymm0 > vmovdqa %ymm0, (%rsp) > vzeroupper > callq __Z14convert_char16Dv16_s > > which passes the argument on the stack. The callee, however, begins with > > __Z14convert_char16Dv16_s: ## @_Z14convert_char16Dv16_s > .cfi_startproc > ## BB#0: ## %entry > pushq %rbp > Ltmp2: > .cfi_def_cfa_offset 16 > Ltmp3: > .cfi_offset %rbp, -16 > movq %rsp, %rbp > Ltmp4: > .cfi_def_cfa_register %rbp > vextractf128 $1, %ymm0, %xmm1 > > which expects the argument in %ymm0. However, the vzeroupper in the caller > just destroyed part of %ymm0... > > My question is: > > What decides this calling convention? I know that standard x86-64 should > pass arguments in %xmm0, not %ymm0. Are there e.g. command line options, > CPU attributes, or target triplets that would modify this? Or should this > be filed as bug report? However, this may also be a bug in pocl as I > haven't been able to reproduced this without pocl. > >The calling convention should be clear from the LLVM IR. Make sure the caller and callee use the same calling convention markings. You might get strange results if one translation unit has AVX and/or AVX2 enabled, and the other has it disabled: the CPU features modify the calling convention for AVX/AVX2 vectors. -Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130905/3a546652/attachment.html>
Seemingly Similar Threads
- Vector trunc code generation difference between llvm-3.9 and 4.0
- Vector trunc code generation difference between llvm-3.9 and 4.0
- Vector trunc code generation difference between llvm-3.9 and 4.0
- [PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
- [LLVMdev] use AVX automatically if present