thr3ads.net - similar to: "Aligned vector spills and variably sized stack frames"

Displaying 20 results from an estimated 10000 matches similar to: "Aligned vector spills and variably sized stack frames"

Aligned vector spills and variably sized stack frames

2015 Aug 28

Aligned vector spills and variably sized stack frames

----- Original Message ----- > From: "Philip Reames via llvm-dev" <llvm-dev at lists.llvm.org> > To: "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Friday, August 28, 2015 6:21:00 PM > Subject: Re: [llvm-dev] Aligned vector spills and variably sized stack frames > > On 08/28/2015 04:00 PM, Philip Reames via llvm-dev wrote: > > I've run

Unnecessary spill/fill issue

2016 May 06

Unnecessary spill/fill issue

Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I've noticed some inefficient use of the stack around constant vectors. In one example, I have code that computes a series of constant vectors at compile time. Each vector has a single use. In the final asm, I see a series of spills at the top of the function of all the constant vectors immediately to stack, then each use references

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

[LLVMdev] Stack alignment on X86 AVX seems incorrect

Hi Elena, You're correct. LLVM does not align the stack to 32-bytes for AVX and unaligned moves should be used for YMM spills. I wrote some code to align the stack to 32-bytes when AVX spills are present; it does break the x86-64 ABI though. If upstream would be interested in this code, I can arrange with my employer to send a patch to the mailing list. -Cameron On Mar 1, 2012, at 4:09 PM,

Vector evolution?

2020 Sep 01

Vector evolution?

Hi, Please consider the following loop: using v4f32 = float __attribute__((__vector_size__(16))); void fct6(v4f32 *x) { #pragma clang loop vectorize(enable) for (int i = 0; i < 256; ++i) x[i] = 7 * x[i]; } After compiling it with: clang++ -O3 -march=native -mtune=native \ -Rpass=loop-vectorize,slp-vectorize -Rpass-missed=loop-vectorize,slp-vectorize

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

On 07/24/2015 03:42 AM, Benjamin Kramer wrote: >> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote: >> >> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing

MCRegisterClass mandatory vs preferred alignment?

2015 Aug 31

MCRegisterClass mandatory vs preferred alignment?

Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment. This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice. For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. Does anyone know why this is? Additionally, where are these alignments actually defined? I

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

------------------------------------ IR ------------------------------------------------------------------ if.then.i.i.i.i.i.i: ; preds = %if.then4 %S25_D = zext <2 x i32> %splatLDS17_D.splat to <2 x i64> %umul_with_overflow.i.iS26_D = shl <2 x i64> %S25_D, <i64 3, i64 3> %extumul_with_overflow.i.iS26_D = extractelement <2 x i64>

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

This snippet of IR is interesting: %sub.ptr.div.iS37_D = sdiv <2 x i64> %sub.ptr.sub.iS36_D, <i64 24, i64 24> %cmp10S38_D = icmp ugt <2 x i64> %sub.ptr.div.iS37_D, %splatInsMapS1_D.splat %zextS39_D = sext <2 x i1> %cmp10S38_D to <2 x i64> %BCS39_D = bitcast <2 x i64> %zextS39_D to i128 %mskS39_D = icmp ne i128 %BCS39_D, 0 br i1 %mskS39_D,

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

[LLVMdev] Stack alignment on X86 AVX seems incorrect

On Thu, Mar 01, 2012 at 06:16:46PM +0000, Demikhovsky, Elena wrote: > vmovaps should not access stack if it is not aligned to 32 I'm not completely sure I understand your problem. Are you saying that the generated code assumes 256bit alignment, your default stack alignment is 128bit and LLVM doesn't adjust it automatically? Joerg

MCRegisterClass mandatory vs preferred alignment?

2015 Aug 31

MCRegisterClass mandatory vs preferred alignment?

On 08/31/2015 03:59 PM, Matthias Braun wrote: > Looks to me like the alignment is specified in tablegen. From Target.td: > > class RegisterClass<string namespace, list<ValueType> regTypes, int alignment, > dag regList, RegAltNameIndex idx = NoRegAltName> > > X86RegisterInfo.td: > > def VR256 : RegisterClass<"X86", [v32i8,

[LLVMdev] Poor register allocation (constants causing spilling)

2015 Jul 14

[LLVMdev] Poor register allocation (constants causing spilling)

Hi, While investigating a performance issue with an internal codebase I came across what looks to be poor register allocation. I have constructed a small(ish) reproducible which demonstrates the issue (see test.ll attached). I have spent some time going through the register allocator to understand what is happening. I have also experimented with some small changes to try and improve the

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

[RFC][VECLIB] how should we legalize VECLIB calls?

Adding to Ashutosh's comments, We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22). reference: https://sourceware.org/glibc/wiki/libmvec Using the example case given in the reference, we found there are 2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

[RFC][VECLIB] how should we legalize VECLIB calls?

Ashutosh, Thanks for the repy. Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there. https://reviews.llvm.org/D19544 There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now in the trunk is "not legal enough". I'll work on the patch to stop

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 16

[LLVMdev] Limit loop vectorizer to SSE

The vectorizer will now emit = load <8 x i32>, align #TargetAlignmentOfScalari32 where before it would emit = load <8 x i32> (which has the semantics of “= load <8 xi32>, align 0” which means the address is aligned with target abi alignment, see http://llvm.org/docs/LangRef.html#load-instruction). When the backend generates code for the former it will emit an unaligned move:

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

[LLVMdev] Stack alignment on X86 AVX seems incorrect

When stack is unaligned, LLVM should generate vmovups instead of vmovaps. - Elena -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Joerg Sonnenberger Sent: Thursday, March 01, 2012 20:31 To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Stack alignment on X86 AVX seems incorrect On Thu, Mar 01, 2012 at 06:16:46PM +0000,

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

[LLVMdev] Stack alignment on X86 AVX seems incorrect

On Thu, Mar 1, 2012 at 4:29 PM, Evandro Menezes <emenezes at codeaurora.org>wrote: ... > Aligning the stack to 32 bytes when there are auto AVX vector variables > present shouldn't necessarily break the x86-64 ABI, as long as smaller auto > variables remain properly aligned. A similar approach was taken for i386 > in GCC in order to support SSE vectors. > > Perhaps

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

[RFC][VECLIB] how should we legalize VECLIB calls?

Illustrative Example: clang -fveclib=SVML -O3 svml.c -mavx #include <math.h> void foo(double *a, int N){ int i; #pragma clang loop vectorize_width(8) for (i=0;i<N;i++){ a[i] = sin(i); } } Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer. This is 8-element SVML sin() called with 8-element argument. On the surface,

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

[RFC][VECLIB] how should we legalize VECLIB calls?

It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now. So yes, I think that would allow us to remove the VecLib mappings because we are

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 16

[LLVMdev] Limit loop vectorizer to SSE

I confirm that r194876 fixes the issue, i.e. segfault not caused. My program still passed 16 byte aligned pointers to the function which the loop vectorizer processes successfully: LV: Vector loop of width 8 costs: 1. LV: Selecting VF = : 8. LV: Found a vectorizable loop (8) in func_orig.ll LV: Unroll Factor is 1 Since the program runs fine, it seems to be allowed for the CPU to issue a vector

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

[RFC][VECLIB] how should we legalize VECLIB calls?

On 07/02/2018 04:33 PM, Saito, Hideki wrote: > > > > >It may not be a full solution for the problems you're trying to solve > > > > If we are inventing a new solution, I’d like it also to solve OpenMP > declare simd legalization issue. If a small extension of existing scheme > > works for mathlib only, I’m happy to take that and discuss OpenMP >

similar to: Aligned vector spills and variably sized stack frames