thr3ads.net - similar to: "Unnecessary spill/fill issue"

Displaying 20 results from an estimated 1000 matches similar to: "Unnecessary spill/fill issue"

2020 Sep 01

Vector evolution?

Hi, Please consider the following loop: using v4f32 = float __attribute__((__vector_size__(16))); void fct6(v4f32 *x) { #pragma clang loop vectorize(enable) for (int i = 0; i < 256; ++i) x[i] = 7 * x[i]; } After compiling it with: clang++ -O3 -march=native -mtune=native \ -Rpass=loop-vectorize,slp-vectorize -Rpass-missed=loop-vectorize,slp-vectorize

[PATCH] x86: AVX instruction emulation fixes

2013 Aug 28

[PATCH] x86: AVX instruction emulation fixes

- we used the C4/C5 (first prefix) byte instead of the apparent ModR/M one as the second prefix byte - early decoding normalized vex.reg, thus corrupting it for the main consumer (copy_REX_VEX()), resulting in #UD on the two-operand instructions we emulate Also add respective test cases to the testing utility plus - fix get_fpu() (the fall-through order was inverted) - add cpu_has_avx2,

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

[LLVMdev] Calling conventions for YMM registers on AVX

On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote: > > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 10

[LLVMdev] Calling conventions for YMM registers on AVX

This is the wrong code: declare <16 x float> @foo(<16 x float>) define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind { entry: %x1 = fadd <16 x float> %x, %y %call = call <16 x float> @foo(<16 x float> %x1) nounwind %y1 = fsub <16 x float> %call, %y ret <16 x float> %y1 } ./llc -mattr=+avx

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

[RFC][VECLIB] how should we legalize VECLIB calls?

Illustrative Example: clang -fveclib=SVML -O3 svml.c -mavx #include <math.h> void foo(double *a, int N){ int i; #pragma clang loop vectorize_width(8) for (i=0;i<N;i++){ a[i] = sin(i); } } Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer. This is 8-element SVML sin() called with 8-element argument. On the surface,

[LLVMdev] Stack alignment in kernel

2012 Mar 01

[LLVMdev] Stack alignment in kernel

I'm running in AVX mode, but the stack before call to kernel is aligned to 16 bit. Could you, please, tell me where it should be specified? Thank you. - Elena --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

[LLVMdev] Stack alignment on X86 AVX seems incorrect

Hi Elena, You're correct. LLVM does not align the stack to 32-bytes for AVX and unaligned moves should be used for YMM spills. I wrote some code to align the stack to 32-bytes when AVX spills are present; it does break the x86-64 ABI though. If upstream would be interested in this code, I can arrange with my employer to send a patch to the mailing list. -Cameron On Mar 1, 2012, at 4:09 PM,

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

[RFC][VECLIB] how should we legalize VECLIB calls?

Ashutosh, Thanks for the repy. Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there. https://reviews.llvm.org/D19544 There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now in the trunk is "not legal enough". I'll work on the patch to stop

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

[RFC][VECLIB] how should we legalize VECLIB calls?

Adding to Ashutosh's comments, We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22). reference: https://sourceware.org/glibc/wiki/libmvec Using the example case given in the reference, we found there are 2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

[RFC][VECLIB] how should we legalize VECLIB calls?

It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now. So yes, I think that would allow us to remove the VecLib mappings because we are

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

[RFC][VECLIB] how should we legalize VECLIB calls?

On 07/02/2018 04:33 PM, Saito, Hideki wrote: > > > > >It may not be a full solution for the problems you're trying to solve > > > > If we are inventing a new solution, I’d like it also to solve OpenMP > declare simd legalization issue. If a small extension of existing scheme > > works for mathlib only, I’m happy to take that and discuss OpenMP >

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

I assume compiler knows that your only have 2 input values that you just added together 1000 times. Despite the fact that you stored to a[i] and b[i] here, nothing reads them other than the addition in the same loop iteration. So the compiler easily removed the a and b arrays. Same with 'c', it's not read outside the loop so it doesn't need to exist. So the compiler turned your

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 17

Vector trunc code generation difference between llvm-3.9 and 4.0

Correction in the C snippet: typedef signed short v8i16_t __attribute__((ext_vector_type(8))); v8i16_t foo (v8i16_t a, int n) { return a >> n; } Best regards Saurabh On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com> wrote: > Hello, > > We are investigating a difference in code generation for vector splat > instructions between llvm-3.9

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 18

Vector trunc code generation difference between llvm-3.9 and 4.0

Thanks Sanjay. Interestingly for me, disable-llvm-optmzns did not make a difference in the way the shift was handled. Does the initial IR generated for you show this difference when the option is passed? Best regards Saurabh On 17 February 2017 at 19:03, Sanjay Patel <spatel at rotateright.com> wrote: > I think this is caused by a front-end change (cc'ing clang-dev) because >

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Mar 08

Vector trunc code generation difference between llvm-3.9 and 4.0

The regression for the reported case should be avoided after: https://reviews.llvm.org/rL297232 https://reviews.llvm.org/rL297242 https://reviews.llvm.org/rL297280 It would still be good to understand if the clang change was intentional or if that was a side effect that can be limited. On Sat, Feb 18, 2017 at 9:11 AM, Sanjay Patel <spatel at rotateright.com> wrote: > Yes, there is an

[LLVMdev] use AVX automatically if present

2012 May 24

[LLVMdev] use AVX automatically if present

I wonder why AVX is not used automatically if available at the host machine. In contrast to that, SSE41 instructions (like pmulld) are automatically used if the host machine supports SSE41. E.g. $ cat avx.ll define void @_fun1(<8 x float>*, <8 x float>*) { _L1: %x = load <8 x float>* %0 %y = load <8 x float>* %1 %z = fadd <8 x float> %x, %y store

[LLVMdev] use AVX automatically if present

2012 May 24

[LLVMdev] use AVX automatically if present

On Thu, 24 May 2012, Pan, Wei wrote: > Very likely AVX is not enabled in your llc. This feature was enabled > just recently (late of April). I forgot to mention that I am using recent LLVM-3.1 and in principle my llc knows about avx as I have shown in the second example. But avx does not seem to be used by default. On Thu, 24 May 2012, Henning Thielemann wrote: > $ llc -o - -mattr

[LLVMdev] AVX code gen

2013 Dec 11

[LLVMdev] AVX code gen

Hello - I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

On 07/24/2015 03:42 AM, Benjamin Kramer wrote: >> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote: >> >> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing

similar to: Unnecessary spill/fill issue