similar to: Aligned vector spills and variably sized stack frames

Displaying 20 results from an estimated 10000 matches similar to: "Aligned vector spills and variably sized stack frames"

2015 Aug 28
2
Aligned vector spills and variably sized stack frames
----- Original Message ----- > From: "Philip Reames via llvm-dev" <llvm-dev at lists.llvm.org> > To: "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Friday, August 28, 2015 6:21:00 PM > Subject: Re: [llvm-dev] Aligned vector spills and variably sized stack frames > > On 08/28/2015 04:00 PM, Philip Reames via llvm-dev wrote: > > I've run
2016 May 06
3
Unnecessary spill/fill issue
Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I've noticed some inefficient use of the stack around constant vectors. In one example, I have code that computes a series of constant vectors at compile time. Each vector has a single use. In the final asm, I see a series of spills at the top of the function of all the constant vectors immediately to stack, then each use references
2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Hi Elena, You're correct. LLVM does not align the stack to 32-bytes for AVX and unaligned moves should be used for YMM spills. I wrote some code to align the stack to 32-bytes when AVX spills are present; it does break the x86-64 ABI though. If upstream would be interested in this code, I can arrange with my employer to send a patch to the mailing list. -Cameron On Mar 1, 2012, at 4:09 PM,
2020 Sep 01
2
Vector evolution?
Hi, Please consider the following loop: using v4f32 = float __attribute__((__vector_size__(16))); void fct6(v4f32 *x) { #pragma clang loop vectorize(enable) for (int i = 0; i < 256; ++i) x[i] = 7 * x[i]; } After compiling it with: clang++ -O3 -march=native -mtune=native \ -Rpass=loop-vectorize,slp-vectorize -Rpass-missed=loop-vectorize,slp-vectorize
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
On 07/24/2015 03:42 AM, Benjamin Kramer wrote: >> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote: >> >> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing
2015 Aug 31
3
MCRegisterClass mandatory vs preferred alignment?
Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment. This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice. For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. Does anyone know why this is? Additionally, where are these alignments actually defined? I
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
------------------------------------ IR ------------------------------------------------------------------ if.then.i.i.i.i.i.i: ; preds = %if.then4 %S25_D = zext <2 x i32> %splatLDS17_D.splat to <2 x i64> %umul_with_overflow.i.iS26_D = shl <2 x i64> %S25_D, <i64 3, i64 3> %extumul_with_overflow.i.iS26_D = extractelement <2 x i64>
2015 Jul 24
1
[LLVMdev] SIMD for sdiv <2 x i64>
This snippet of IR is interesting: %sub.ptr.div.iS37_D = sdiv <2 x i64> %sub.ptr.sub.iS36_D, <i64 24, i64 24> %cmp10S38_D = icmp ugt <2 x i64> %sub.ptr.div.iS37_D, %splatInsMapS1_D.splat %zextS39_D = sext <2 x i1> %cmp10S38_D to <2 x i64> %BCS39_D = bitcast <2 x i64> %zextS39_D to i128 %mskS39_D = icmp ne i128 %BCS39_D, 0 br i1 %mskS39_D,
2012 Mar 01
2
[LLVMdev] Stack alignment on X86 AVX seems incorrect
On Thu, Mar 01, 2012 at 06:16:46PM +0000, Demikhovsky, Elena wrote: > vmovaps should not access stack if it is not aligned to 32 I'm not completely sure I understand your problem. Are you saying that the generated code assumes 256bit alignment, your default stack alignment is 128bit and LLVM doesn't adjust it automatically? Joerg
2015 Aug 31
2
MCRegisterClass mandatory vs preferred alignment?
On 08/31/2015 03:59 PM, Matthias Braun wrote: > Looks to me like the alignment is specified in tablegen. From Target.td: > > class RegisterClass<string namespace, list<ValueType> regTypes, int alignment, > dag regList, RegAltNameIndex idx = NoRegAltName> > > X86RegisterInfo.td: > > def VR256 : RegisterClass<"X86", [v32i8,
2015 Jul 14
4
[LLVMdev] Poor register allocation (constants causing spilling)
Hi, While investigating a performance issue with an internal codebase I came across what looks to be poor register allocation. I have constructed a small(ish) reproducible which demonstrates the issue (see test.ll attached). I have spent some time going through the register allocator to understand what is happening. I have also experimented with some small changes to try and improve the
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Adding to Ashutosh's comments, We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22). reference: https://sourceware.org/glibc/wiki/libmvec Using the example case given in the reference, we found there are 2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Ashutosh, Thanks for the repy. Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there. https://reviews.llvm.org/D19544 There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now in the trunk is "not legal enough". I'll work on the patch to stop
2013 Nov 16
1
[LLVMdev] Limit loop vectorizer to SSE
The vectorizer will now emit = load <8 x i32>, align #TargetAlignmentOfScalari32 where before it would emit = load <8 x i32> (which has the semantics of “= load <8 xi32>, align 0” which means the address is aligned with target abi alignment, see http://llvm.org/docs/LangRef.html#load-instruction). When the backend generates code for the former it will emit an unaligned move:
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
When stack is unaligned, LLVM should generate vmovups instead of vmovaps. - Elena -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Joerg Sonnenberger Sent: Thursday, March 01, 2012 20:31 To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Stack alignment on X86 AVX seems incorrect On Thu, Mar 01, 2012 at 06:16:46PM +0000,
2012 Mar 01
4
[LLVMdev] Stack alignment on X86 AVX seems incorrect
On Thu, Mar 1, 2012 at 4:29 PM, Evandro Menezes <emenezes at codeaurora.org>wrote: ... > Aligning the stack to 32 bytes when there are auto AVX vector variables > present shouldn't necessarily break the x86-64 ABI, as long as smaller auto > variables remain properly aligned. A similar approach was taken for i386 > in GCC in order to support SSE vectors. > > Perhaps
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Illustrative Example: clang -fveclib=SVML -O3 svml.c -mavx #include <math.h> void foo(double *a, int N){ int i; #pragma clang loop vectorize_width(8) for (i=0;i<N;i++){ a[i] = sin(i); } } Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer. This is 8-element SVML sin() called with 8-element argument. On the surface,
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now. So yes, I think that would allow us to remove the VecLib mappings because we are
2013 Nov 16
0
[LLVMdev] Limit loop vectorizer to SSE
I confirm that r194876 fixes the issue, i.e. segfault not caused. My program still passed 16 byte aligned pointers to the function which the loop vectorizer processes successfully: LV: Vector loop of width 8 costs: 1. LV: Selecting VF = : 8. LV: Found a vectorizable loop (8) in func_orig.ll LV: Unroll Factor is 1 Since the program runs fine, it seems to be allowed for the CPU to issue a vector
2018 Jul 02
8
[RFC][VECLIB] how should we legalize VECLIB calls?
On 07/02/2018 04:33 PM, Saito, Hideki wrote: > >   > > >It may not be a full solution for the problems you're trying to solve > >   > > If we are inventing a new solution, I’d like it also to solve OpenMP > declare simd legalization issue. If a small extension of existing scheme > > works for mathlib only, I’m happy to take that and discuss OpenMP >