Displaying 20 results from an estimated 1000 matches similar to: "Unnecessary spill/fill issue"
2020 Sep 01
2
Vector evolution?
Hi,
Please consider the following loop:
using v4f32 = float __attribute__((__vector_size__(16)));
void fct6(v4f32 *x)
{
#pragma clang loop vectorize(enable)
for (int i = 0; i < 256; ++i)
x[i] = 7 * x[i];
}
After compiling it with:
clang++ -O3 -march=native -mtune=native \
-Rpass=loop-vectorize,slp-vectorize
-Rpass-missed=loop-vectorize,slp-vectorize
2013 Aug 28
3
[PATCH] x86: AVX instruction emulation fixes
- we used the C4/C5 (first prefix) byte instead of the apparent ModR/M
one as the second prefix byte
- early decoding normalized vex.reg, thus corrupting it for the main
consumer (copy_REX_VEX()), resulting in #UD on the two-operand
instructions we emulate
Also add respective test cases to the testing utility plus
- fix get_fpu() (the fall-through order was inverted)
- add cpu_has_avx2,
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:
>
> On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote:
>
>> I'll explain what we see in the code.
>> 1. The caller saves XMM registers across the call if needed (according to DEFS definition).
>> YMMs are not in the set, so caller does not take care.
>
> This is not how the register allocator
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
This is the wrong code:
declare <16 x float> @foo(<16 x float>)
define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind {
entry:
%x1 = fadd <16 x float> %x, %y
%call = call <16 x float> @foo(<16 x float> %x1) nounwind
%y1 = fsub <16 x float> %call, %y
ret <16 x float> %y1
}
./llc -mattr=+avx
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Illustrative Example:
clang -fveclib=SVML -O3 svml.c -mavx
#include <math.h>
void foo(double *a, int N){
int i;
#pragma clang loop vectorize_width(8)
for (i=0;i<N;i++){
a[i] = sin(i);
}
}
Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
This is 8-element SVML sin() called with 8-element argument. On the surface,
2012 Mar 01
2
[LLVMdev] Stack alignment in kernel
I'm running in AVX mode, but the stack before call to kernel is aligned to 16 bit.
Could you, please, tell me where it should be specified?
Thank you.
- Elena
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or
2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Hi Elena,
You're correct. LLVM does not align the stack to 32-bytes for AVX and
unaligned moves should be used for YMM spills.
I wrote some code to align the stack to 32-bytes when AVX spills are
present; it does break the x86-64 ABI though. If upstream would be
interested in this code, I can arrange with my employer to send a patch to
the mailing list.
-Cameron
On Mar 1, 2012, at 4:09 PM,
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Ashutosh,
Thanks for the repy.
Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.
https://reviews.llvm.org/D19544
There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now
in the trunk is "not legal enough". I'll work on the patch to stop
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Adding to Ashutosh's comments, We are also interested in making LLVM
generate vector math library calls that are available with glibc (version >
2.22).
reference: https://sourceware.org/glibc/wiki/libmvec
Using the example case given in the reference, we found there are 2 vector
versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx)
version and _ZGVdN4v_sin
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
It may not be a full solution for the problems you're trying to solve, but
I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a
problem in itself. Certainly, it's a mess that could be organized,
especially so we're not repeating everything for each data type as we do
right now.
So yes, I think that would allow us to remove the VecLib mappings because
we are
2018 Jul 02
8
[RFC][VECLIB] how should we legalize VECLIB calls?
On 07/02/2018 04:33 PM, Saito, Hideki wrote:
>
>
>
> >It may not be a full solution for the problems you're trying to solve
>
>
>
> If we are inventing a new solution, I’d like it also to solve OpenMP
> declare simd legalization issue. If a small extension of existing scheme
>
> works for mathlib only, I’m happy to take that and discuss OpenMP
>
2017 Aug 17
4
unable to emit vectorized code in LLVM IR
I assume compiler knows that your only have 2 input values that you just
added together 1000 times.
Despite the fact that you stored to a[i] and b[i] here, nothing reads them
other than the addition in the same loop iteration. So the compiler easily
removed the a and b arrays. Same with 'c', it's not read outside the loop
so it doesn't need to exist. So the compiler turned your
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You,
It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 =
[8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are
indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from
these locations. and zmm2 contains constant 4000. so,
vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000,
as for array b the stride is 4000.
zmm14=
2017 Feb 17
2
Vector trunc code generation difference between llvm-3.9 and 4.0
Correction in the C snippet:
typedef signed short v8i16_t __attribute__((ext_vector_type(8)));
v8i16_t foo (v8i16_t a, int n)
{
return a >> n;
}
Best regards
Saurabh
On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com>
wrote:
> Hello,
>
> We are investigating a difference in code generation for vector splat
> instructions between llvm-3.9
2017 Feb 18
2
Vector trunc code generation difference between llvm-3.9 and 4.0
Thanks Sanjay. Interestingly for me, disable-llvm-optmzns did not make a
difference in the way the shift was handled. Does the initial IR generated
for you show this difference when the option is passed?
Best regards
Saurabh
On 17 February 2017 at 19:03, Sanjay Patel <spatel at rotateright.com> wrote:
> I think this is caused by a front-end change (cc'ing clang-dev) because
>
2017 Mar 08
2
Vector trunc code generation difference between llvm-3.9 and 4.0
The regression for the reported case should be avoided after:
https://reviews.llvm.org/rL297232
https://reviews.llvm.org/rL297242
https://reviews.llvm.org/rL297280
It would still be good to understand if the clang change was intentional or
if that was a side effect that can be limited.
On Sat, Feb 18, 2017 at 9:11 AM, Sanjay Patel <spatel at rotateright.com>
wrote:
> Yes, there is an
2012 May 24
4
[LLVMdev] use AVX automatically if present
I wonder why AVX is not used automatically if available at the host
machine. In contrast to that, SSE41 instructions (like pmulld) are
automatically used if the host machine supports SSE41.
E.g.
$ cat avx.ll
define void @_fun1(<8 x float>*, <8 x float>*) {
_L1:
%x = load <8 x float>* %0
%y = load <8 x float>* %1
%z = fadd <8 x float> %x, %y
store
2012 May 24
0
[LLVMdev] use AVX automatically if present
On Thu, 24 May 2012, Pan, Wei wrote:
> Very likely AVX is not enabled in your llc. This feature was enabled
> just recently (late of April).
I forgot to mention that I am using recent LLVM-3.1 and in principle my
llc knows about avx as I have shown in the second example. But avx does
not seem to be used by default.
On Thu, 24 May 2012, Henning Thielemann wrote:
> $ llc -o - -mattr
2013 Dec 11
2
[LLVMdev] AVX code gen
Hello -
I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
On 07/24/2015 03:42 AM, Benjamin Kramer wrote:
>> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote:
>>
>> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing