thr3ads.net - similar to: "Vector trunc code generation difference between llvm-3.9 and 4.0"

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 18

2

Vector trunc code generation difference between llvm-3.9 and 4.0

Thanks Sanjay. Interestingly for me, disable-llvm-optmzns did not make a difference in the way the shift was handled. Does the initial IR generated for you show this difference when the option is passed? Best regards Saurabh On 17 February 2017 at 19:03, Sanjay Patel <spatel at rotateright.com> wrote: > I think this is caused by a front-end change (cc'ing clang-dev) because >

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Mar 08

2

Vector trunc code generation difference between llvm-3.9 and 4.0

The regression for the reported case should be avoided after: https://reviews.llvm.org/rL297232 https://reviews.llvm.org/rL297242 https://reviews.llvm.org/rL297280 It would still be good to understand if the clang change was intentional or if that was a side effect that can be limited. On Sat, Feb 18, 2017 at 9:11 AM, Sanjay Patel <spatel at rotateright.com> wrote: > Yes, there is an

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

3

[LLVMdev] Calling conventions for YMM registers on AVX

On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote: > > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator

Vector evolution?

2020 Sep 01

2

Vector evolution?

Hi, Please consider the following loop: using v4f32 = float __attribute__((__vector_size__(16))); void fct6(v4f32 *x) { #pragma clang loop vectorize(enable) for (int i = 0; i < 256; ++i) x[i] = 7 * x[i]; } After compiling it with: clang++ -O3 -march=native -mtune=native \ -Rpass=loop-vectorize,slp-vectorize -Rpass-missed=loop-vectorize,slp-vectorize

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 10

0

[LLVMdev] Calling conventions for YMM registers on AVX

This is the wrong code: declare <16 x float> @foo(<16 x float>) define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind { entry: %x1 = fadd <16 x float> %x, %y %call = call <16 x float> @foo(<16 x float> %x1) nounwind %y1 = fsub <16 x float> %call, %y ret <16 x float> %y1 } ./llc -mattr=+avx

[LLVMdev] AVX code gen

2013 Dec 11

2

[LLVMdev] AVX code gen

Hello - I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such

[PATCH] x86: AVX instruction emulation fixes

2013 Aug 28

3

[PATCH] x86: AVX instruction emulation fixes

- we used the C4/C5 (first prefix) byte instead of the apparent ModR/M one as the second prefix byte - early decoding normalized vex.reg, thus corrupting it for the main consumer (copy_REX_VEX()), resulting in #UD on the two-operand instructions we emulate Also add respective test cases to the testing utility plus - fix get_fpu() (the fall-through order was inverted) - add cpu_has_avx2,

unable to emit vectorized code in LLVM IR

2017 Aug 17

4

unable to emit vectorized code in LLVM IR

I assume compiler knows that your only have 2 input values that you just added together 1000 times. Despite the fact that you stored to a[i] and b[i] here, nothing reads them other than the addition in the same loop iteration. So the compiler easily removed the a and b arrays. Same with 'c', it's not read outside the loop so it doesn't need to exist. So the compiler turned your

[LLVMdev] AVX code gen

2013 Dec 12

0

[LLVMdev] AVX code gen

It probably does not pick the right processor architecture. You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell. $ clang -march=core-avx2 -O3 -S -o - test.c .section __TEXT,__text,regular,pure_instructions .globl _f .align 4, 0x90 _f: ## @f

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

2018 May 23

33

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v3: - Update on message to describe longer term PIE goal. - Minor change on ftrace if condition. - Changed code using xchgq. - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

2

[RFC][VECLIB] how should we legalize VECLIB calls?

Illustrative Example: clang -fveclib=SVML -O3 svml.c -mavx #include <math.h> void foo(double *a, int N){ int i; #pragma clang loop vectorize_width(8) for (i=0;i<N;i++){ a[i] = sin(i); } } Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer. This is 8-element SVML sin() called with 8-element argument. On the surface,

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

2

[RFC][VECLIB] how should we legalize VECLIB calls?

Ashutosh, Thanks for the repy. Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there. https://reviews.llvm.org/D19544 There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now in the trunk is "not legal enough". I'll work on the patch to stop

Unnecessary spill/fill issue

2016 May 06

3

Unnecessary spill/fill issue

Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I've noticed some inefficient use of the stack around constant vectors. In one example, I have code that computes a series of constant vectors at compile time. Each vector has a single use. In the final asm, I see a series of spills at the top of the function of all the constant vectors immediately to stack, then each use references

DBG_VALUE insertion for spills breaks bundles

2017 Dec 19

3

DBG_VALUE insertion for spills breaks bundles

Hi, The insertion of DBG_VALUE instructions for spills does not seem to be handling insert locations inside bundles well. If the spill instruction is part of a bundle, the new DBG_VALUE is inserted after it, but does not have the bundling flags set. This essentially means that if we start with a set of bundled instructions: MI1 [BundledSucc=true, BundledPred=false] MI2 [BundledSucc=false,

similar to: Vector trunc code generation difference between llvm-3.9 and 4.0