Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] AVX code gen"
2013 Dec 12
0
[LLVMdev] AVX code gen
It probably does not pick the right processor architecture.
You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell.
$ clang -march=core-avx2 -O3 -S -o - test.c
.section __TEXT,__text,regular,pure_instructions
.globl _f
.align 4, 0x90
_f: ## @f
2020 Sep 01
2
Vector evolution?
Hi,
Please consider the following loop:
using v4f32 = float __attribute__((__vector_size__(16)));
void fct6(v4f32 *x)
{
#pragma clang loop vectorize(enable)
for (int i = 0; i < 256; ++i)
x[i] = 7 * x[i];
}
After compiling it with:
clang++ -O3 -march=native -mtune=native \
-Rpass=loop-vectorize,slp-vectorize
-Rpass-missed=loop-vectorize,slp-vectorize
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:
>
> On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote:
>
>> I'll explain what we see in the code.
>> 1. The caller saves XMM registers across the call if needed (according to DEFS definition).
>> YMMs are not in the set, so caller does not take care.
>
> This is not how the register allocator
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
This is the wrong code:
declare <16 x float> @foo(<16 x float>)
define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind {
entry:
%x1 = fadd <16 x float> %x, %y
%call = call <16 x float> @foo(<16 x float> %x1) nounwind
%y1 = fsub <16 x float> %call, %y
ret <16 x float> %y1
}
./llc -mattr=+avx
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
Hi Frank,
What does --debug-only=vectorize says?
You may try to get the datalayout and the triple on the IR header,
just to make sure you got everything right. LLVM will honour those,
and front-ends should create them correctly.
--renato
On 1 July 2015 at 19:06, Frank Winter <fwinter at jlab.org> wrote:
> I realized that the function parameters had no alignment attributes on them.
2019 Sep 02
3
AVX2 codegen - question reg. FMA generation
Hello,
On the appended reasonably simple test case that has an fmul/fadd
sequence on <8 x float> vector types, I don't see the x86-64 code
generator (with cpu set to haswell or later types) turning it into an
AVX2 FMA instructions. Here's the snippet in the output it generates:
$ llc -O3 -mcpu=skylake
---------------------
.LBB0_2: # =>This Inner
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
Frank,
It sounds like the SLP vectorizer thinks that it is more profitable to use 128bit wide operations (because 256bit operations are double pumped on Sandybridge). Did you see a different result on Haswell?
Thanks,
Nadav
> On Jul 1, 2015, at 11:06 AM, Frank Winter <fwinter at jlab.org> wrote:
>
> I realized that the function parameters had no alignment attributes on them.
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
I seem to have problem to get the SLP vectorizer to make use of the full
8 floats available in a SIMD vector on a Sandy Bridge CPU with AVX. The
function is attached, the CPU flags are:
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
2013 Jul 10
4
[LLVMdev] unaligned AVX store gets split into two instructions
I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads
on AVX.
3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as
a single instruction (details below).
In a matrix-matrix inner-kernel, I see a ~25% decrease in performance,
which seems to be due to this.
Any ideas why this changed? Thanks!
Zach
LLVM Code:
define <4 x double> @vstore(<4 x
2015 Nov 04
2
Vectorizing structure reads, writes, etc on X86-64 AVX
Hi Jay -
I see the slow, small accesses using an older clang [Apple LLVM version
7.0.0 (clang-700.1.76)], but this looks fixed on trunk. I made a change
that comes into play if you don't specify a particular CPU:
http://llvm.org/viewvc/llvm-project?view=revision&revision=245950
$ ./clang -O1 -mavx copy.c -S -o -
...
movslq %edi, %rax
movq _spr_dynamic at GOTPCREL(%rip),
2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Hi Elena,
You're correct. LLVM does not align the stack to 32-bytes for AVX and
unaligned moves should be used for YMM spills.
I wrote some code to align the stack to 32-bytes when AVX spills are
present; it does break the x86-64 ABI though. If upstream would be
interested in this code, I can arrange with my employer to send a patch to
the mailing list.
-Cameron
On Mar 1, 2012, at 4:09 PM,
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Illustrative Example:
clang -fveclib=SVML -O3 svml.c -mavx
#include <math.h>
void foo(double *a, int N){
int i;
#pragma clang loop vectorize_width(8)
for (i=0;i<N;i++){
a[i] = sin(i);
}
}
Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
This is 8-element SVML sin() called with 8-element argument. On the surface,
2013 Apr 09
1
[LLVMdev] inefficient code generation for 128-bit->256-bit typecast intrinsics
Hello,
LLVM generates two additional instructions for 128->256 bit typecasts
(e.g. _mm256_castsi128_si256()) to clear out the upper 128 bits of YMM register corresponding to source XMM register.
vxorps xmm2,xmm2,xmm2
vinsertf128 ymm0,ymm2,xmm0,0x0
Most of the industry-standard C/C++ compilers (GCC, Intel's compiler, Visual Studio compiler) don't
generate any extra moves
2017 Aug 17
4
unable to emit vectorized code in LLVM IR
I assume compiler knows that your only have 2 input values that you just
added together 1000 times.
Despite the fact that you stored to a[i] and b[i] here, nothing reads them
other than the addition in the same loop iteration. So the compiler easily
removed the a and b arrays. Same with 'c', it's not read outside the loop
so it doesn't need to exist. So the compiler turned your
2015 Nov 03
2
Vectorizing structure reads, writes, etc on X86-64 AVX
----- Original Message -----
> From: "Sanjay Patel via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "Jay McCarthy" <jay.mccarthy at gmail.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Tuesday, November 3, 2015 12:30:51 PM
> Subject: Re: [llvm-dev] Vectorizing structure reads, writes, etc on X86-64 AVX
>
> If the
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Ashutosh,
Thanks for the repy.
Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.
https://reviews.llvm.org/D19544
There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now
in the trunk is "not legal enough". I'll work on the patch to stop
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Adding to Ashutosh's comments, We are also interested in making LLVM
generate vector math library calls that are available with glibc (version >
2.22).
reference: https://sourceware.org/glibc/wiki/libmvec
Using the example case given in the reference, we found there are 2 vector
versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx)
version and _ZGVdN4v_sin
2018 Jul 02
8
[RFC][VECLIB] how should we legalize VECLIB calls?
On 07/02/2018 04:33 PM, Saito, Hideki wrote:
>
>
>
> >It may not be a full solution for the problems you're trying to solve
>
>
>
> If we are inventing a new solution, I’d like it also to solve OpenMP
> declare simd legalization issue. If a small extension of existing scheme
>
> works for mathlib only, I’m happy to take that and discuss OpenMP
>
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
It may not be a full solution for the problems you're trying to solve, but
I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a
problem in itself. Certainly, it's a mess that could be organized,
especially so we're not repeating everything for each data type as we do
right now.
So yes, I think that would allow us to remove the VecLib mappings because
we are
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
Hi!
When compiling the attached module with the JIT engine on an Intel KNL I
see wrong code getting emitted. I attach a complete exploit program
which shows the bug in LLVM 3.8. It loads and JIT compiles the module
and prints the assembler. I stumbled on this since the result of an
actual calculation was wrong. So, it's not only the text version of the
assembler also the machine