Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] Stack alignment on X86 AVX seems incorrect"
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
This is the wrong code:
declare <16 x float> @foo(<16 x float>)
define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind {
entry:
%x1 = fadd <16 x float> %x, %y
%call = call <16 x float> @foo(<16 x float> %x1) nounwind
%y1 = fsub <16 x float> %call, %y
ret <16 x float> %y1
}
./llc -mattr=+avx
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:
>
> On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote:
>
>> I'll explain what we see in the code.
>> 1. The caller saves XMM registers across the call if needed (according to DEFS definition).
>> YMMs are not in the set, so caller does not take care.
>
> This is not how the register allocator
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Cameron,
Aligning the stack to 32 bytes when there are auto AVX vector variables
present shouldn't necessarily break the x86-64 ABI, as long as smaller
auto variables remain properly aligned. A similar approach was taken
for i386 in GCC in order to support SSE vectors.
Perhaps you could elaborate where the ABI was violated when your patch
is applied.
HTH
--
Evandro Menezes
2012 Mar 01
2
[LLVMdev] Stack alignment in kernel
I'm running in AVX mode, but the stack before call to kernel is aligned to 16 bit.
Could you, please, tell me where it should be specified?
Thank you.
- Elena
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp
vmovaps -176(%rbp), %ymm14
vmovaps -144(%rbp), %ymm11
vmovaps -240(%rbp), %ymm13
vmovaps -208(%rbp), %ymm9
vmovaps -272(%rbp), %ymm7
vmovaps -304(%rbp), %ymm0
vmovaps -112(%rbp), %ymm0
vmovaps -80(%rbp), %ymm1
vmovaps -112(%rbp), %ymm0
2012 May 24
0
[LLVMdev] use AVX automatically if present
On Thu, 24 May 2012, Pan, Wei wrote:
> Very likely AVX is not enabled in your llc. This feature was enabled
> just recently (late of April).
I forgot to mention that I am using recent LLVM-3.1 and in principle my
llc knows about avx as I have shown in the second example. But avx does
not seem to be used by default.
On Thu, 24 May 2012, Henning Thielemann wrote:
> $ llc -o - -mattr
2012 May 24
4
[LLVMdev] use AVX automatically if present
I wonder why AVX is not used automatically if available at the host
machine. In contrast to that, SSE41 instructions (like pmulld) are
automatically used if the host machine supports SSE41.
E.g.
$ cat avx.ll
define void @_fun1(<8 x float>*, <8 x float>*) {
_L1:
%x = load <8 x float>* %0
%y = load <8 x float>* %1
%z = fadd <8 x float> %x, %y
store
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
On Thu, Mar 1, 2012 at 5:30 PM, Cameron McInally
<cameron.mcinally at nyu.edu>wrote:
> Aligning the stack to 32 bytes when there are auto AVX vector variables
>> present shouldn't necessarily break the x86-64 ABI, as long as smaller auto
>> variables remain properly aligned. A similar approach was taken for i386
>> in GCC in order to support SSE vectors.
>>
2016 May 06
3
Unnecessary spill/fill issue
Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I've noticed
some inefficient use of the stack around constant vectors. In one example,
I have code that computes a series of constant vectors at compile time.
Each vector has a single use. In the final asm, I see a series of spills at
the top of the function of all the constant vectors immediately to stack,
then each use references
2012 Mar 01
4
[LLVMdev] Stack alignment on X86 AVX seems incorrect
On Thu, Mar 1, 2012 at 4:29 PM, Evandro Menezes <emenezes at codeaurora.org>wrote:
...
> Aligning the stack to 32 bytes when there are auto AVX vector variables
> present shouldn't necessarily break the x86-64 ABI, as long as smaller auto
> variables remain properly aligned. A similar approach was taken for i386
> in GCC in order to support SSE vectors.
>
> Perhaps
2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Even if you explicitly specify –stack-alignment=16 the aligned movs are still generated.
It is not an issue related to ABI.
See my original mail:
./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp
vmovaps -176(%rbp), %ymm14
vmovaps -144(%rbp), %ymm11
vmovaps -240(%rbp), %ymm13
- Elena
From: Cameron McInally
2013 Aug 28
3
[PATCH] x86: AVX instruction emulation fixes
- we used the C4/C5 (first prefix) byte instead of the apparent ModR/M
one as the second prefix byte
- early decoding normalized vex.reg, thus corrupting it for the main
consumer (copy_REX_VEX()), resulting in #UD on the two-operand
instructions we emulate
Also add respective test cases to the testing utility plus
- fix get_fpu() (the fall-through order was inverted)
- add cpu_has_avx2,
2020 Sep 01
2
Vector evolution?
Hi,
Please consider the following loop:
using v4f32 = float __attribute__((__vector_size__(16)));
void fct6(v4f32 *x)
{
#pragma clang loop vectorize(enable)
for (int i = 0; i < 256; ++i)
x[i] = 7 * x[i];
}
After compiling it with:
clang++ -O3 -march=native -mtune=native \
-Rpass=loop-vectorize,slp-vectorize
-Rpass-missed=loop-vectorize,slp-vectorize
2013 Dec 12
0
[LLVMdev] AVX code gen
It probably does not pick the right processor architecture.
You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell.
$ clang -march=core-avx2 -O3 -S -o - test.c
.section __TEXT,__text,regular,pure_instructions
.globl _f
.align 4, 0x90
_f: ## @f
2012 Mar 01
2
[LLVMdev] Stack alignment on X86 AVX seems incorrect
On Thu, Mar 01, 2012 at 06:16:46PM +0000, Demikhovsky, Elena wrote:
> vmovaps should not access stack if it is not aligned to 32
I'm not completely sure I understand your problem. Are you saying that
the generated code assumes 256bit alignment, your default stack
alignment is 128bit and LLVM doesn't adjust it automatically?
Joerg
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
When stack is unaligned, LLVM should generate vmovups instead of vmovaps.
- Elena
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Joerg Sonnenberger
Sent: Thursday, March 01, 2012 20:31
To: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Stack alignment on X86 AVX seems incorrect
On Thu, Mar 01, 2012 at 06:16:46PM +0000,
2013 Dec 11
2
[LLVMdev] AVX code gen
Hello -
I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Illustrative Example:
clang -fveclib=SVML -O3 svml.c -mavx
#include <math.h>
void foo(double *a, int N){
int i;
#pragma clang loop vectorize_width(8)
for (i=0;i<N;i++){
a[i] = sin(i);
}
}
Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
This is 8-element SVML sin() called with 8-element argument. On the surface,
2013 Jul 10
4
[LLVMdev] unaligned AVX store gets split into two instructions
I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads
on AVX.
3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as
a single instruction (details below).
In a matrix-matrix inner-kernel, I see a ~25% decrease in performance,
which seems to be due to this.
Any ideas why this changed? Thanks!
Zach
LLVM Code:
define <4 x double> @vstore(<4 x
2013 Dec 23
2
[LLVMdev] Commutability of X86 FMA3 instructions.
Hi Elena,
Thank you very much for looking in to that.
I'll go ahead and remove the isCommutable flag from those
instructions, since it sounds like that's the right thing to do. I
would still like to change the default from the 231 variant to 213
too, as this will reduce code-size for accumulator-style loops. I have
at least one benchmark that shows significant speedups when this
change