similar to: [LLVMdev] LLVM ERROR: No such instruction: `vmovsd ...' ?

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] LLVM ERROR: No such instruction: `vmovsd ...' ?"

2011 Mar 18
0
[LLVMdev] LLVM ERROR: No such instruction: `vmovsd ...' ?
On Fri, Mar 18, 2011 at 4:23 PM, Nicolas Ojeda Bar <nojb at math.harvard.edu> wrote: > Hi Eli, > > I'm using 2.8. Ouch; too late to fix 2.8 :(. If you can, use trunk or the 2.9 branch instead. Otherwise, passing -mattr=-avx to llc should do the trick. -Eli > On Mar 18, 2011, at 6:50 PM, Eli Friedman wrote: > >> On Fri, Mar 18, 2011 at 2:56 PM, Nicolas Ojeda Bar
2014 Mar 12
2
[LLVMdev] Autovectorization questions
Hi, I'm reading "http://llvm.org/docs/Vectorizers.html" and have few question. Hope someone has answers on it. The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. ( http://llvm.org/docs/Vectorizers.html#scatter-gather) int foo(int *A, int *B, int n, int k) { for (int i = 0; i < n; ++i) A[i*7] += B[i*k]; } I
2012 Jul 26
1
[LLVMdev] X86 FMA4
Hey Jan and Dave, It's not obvious, but there is a significant scalar performance issue following the GCC intrinsics. Let's look at the VFMADDSD pattern. We're operating on scalars with undefineds as the remaining vector elements of the operands. This sounds okay, but when one looks closer... vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 vmovaps %xmm3, 18560(%rsp)
2013 Jul 11
1
[LLVMdev] Bikeshedding a name for new directive: CHECK-LABEL vs. CHECK-BOUNDARY vs. something else.
Hi, I would like to add a new directive to FileCheck called CHECK-FOO (where FOO is a name under discussion right now) which is used to improve error messages. The idea is that you would use CHECK-FOO on any line that contains a unique identifier (typically labels, function definitions, etc.) that is guaranteed to only occur once in the file; FileCheck will then conceptually break the break the
2012 Jul 26
0
[LLVMdev] X86 FMA4
Ah, bad example. This is a general problem for all (maybe most) SSE and AVX SS/SD patterns though, which is why I mentioned Sandybridge. You can swap out VFMADDSD in my example for VADDSD or whatever you like. I have a lion's share of such a change implemented already and performance is greatly affected. If the community is interested in this change, I would be happy to prepare a patch.
2012 Jul 27
2
[LLVMdev] X86 FMA4
Just looked up the numbers from Agner Fog for Sandy Bridge for vmovaps/etc for loading/storing from memory. vmovaps - load takes 1 load mu op, 3 latency, with a reciprocal throughput of 0.5. vmovaps - store takes 1 store mu op, 1 load mu op for address calculation, 3 latency, with a reciprocal throughput of 1. He does not list vmovsd, but movsd has the same stats as vmovaps, so I feel it is a
2014 Mar 12
4
[LLVMdev] Autovectorization questions
In order to vectorize code like this LLVM needs to prove that “A[i*7]” does not wrap in the address space. It fails to do so and so LLVM doesn’t vectorize this loop even if we try to force it. The following loop will be vectorized if we force it: int foo(int * A, int * B, int n, int k) { for (int i = 0; i < 1024; ++i) A[i] += B[i*k]; } So will this loop: int foo(int * restrict A, int
2012 Jul 27
0
[LLVMdev] X86 FMA4
Hey Michael, Thanks for the legwork! It appears that the stats you listed are for movaps [SSE], not vmovaps [AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256), since they are both AVX instructions. Although, yes, I agree that this is not clear from Agner's report. Please correct me if I am misunderstanding. As I am sure you are aware, we cannot use SSE (movaps)
2014 Mar 26
3
[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double
On 03/26/2014 11:36 AM, Geoffrey Irving wrote: > I am trying to compute conservative lower and upper bounds for the > square of a double. I have set the rounding mode to FE_UPWARDS > elsewhere, so the code is > > struct Interval { > double nlo, hi; > }; > > Interval inspect_singleton_sqr(const double x) { > Interval s; > s.nlo = x * -x; > s.hi = x *
2014 Oct 07
4
[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)
Hello everyone. I'm not an expert neither in llvm nor in x86 nor in IEEE standard for floating point numbers, thus any of my following assumptions maybe wrong. If so, I will be grateful if you clarify me what's goes wrong. But if my guesses are correct we possibly have a bug in fp arithmetics on x86. I have the following ir: @g = constant i64 1 define i32 @main() { %gval = load
2014 Sep 17
2
[LLVMdev] VEX prefixes for JIT in llvm 3.5
Hi guys, I just upgraded our JIT system to use llvm 3.5 and noticed one big change in our generated code: we don't see any non-destructive VEX prefix instructions being emitted any more (vmulsd xmm0, xmm1, blah) etc. It's long been on my list of things to investigate anyway as I noticed llvm didn't emit VZEROUPPER calls either, so I supposed it might not be a bad thing to disable
2015 Oct 02
2
Register Spill Caused by the Reassociation pass
This conflict is with many optimizations incl. copy prop, coalescing, hoisting etc. Each could increase register pressure and with similar impact. Attempts to control the register pressure locally (within an optimization pass) tend to get hard to tune and maintain. Would it be a better way to describe eg in metadata how to undo an optimization? Optimizations that attempt to reduce pressure like
2012 Jul 25
6
[LLVMdev] X86 FMA4
We're migrating to LLVM 3.1 and trying to use the upstream FMA patterns. Why is VFMADDSD4 defined with vector types? Is this simply because the gcc intrinsic uses vector types? It's quite unnatural if you have a compiler that generates FMAs as opposed to requiring user intrinsics. -Dave
2014 Sep 17
3
[LLVMdev] VEX prefixes for JIT in llvm 3.5
Hi Jim, Thanks for a very quick reply! That indeed does the trick! Presumably the default has changed in 3.5 to be a "generic" CPU instead of the native one? If that's the case I wonder why: especially when JITting it really only makes sense to target the actual CPU - unless I'm missing something? :) Thanks again, Matt On Wed, Sep 17, 2014 at 2:16 PM, Jim Grosbach
2013 Aug 28
3
[PATCH] x86: AVX instruction emulation fixes
- we used the C4/C5 (first prefix) byte instead of the apparent ModR/M one as the second prefix byte - early decoding normalized vex.reg, thus corrupting it for the main consumer (copy_REX_VEX()), resulting in #UD on the two-operand instructions we emulate Also add respective test cases to the testing utility plus - fix get_fpu() (the fall-through order was inverted) - add cpu_has_avx2,
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include
2011 Apr 07
2
[LLVMdev] opt + fastcc bug?
Hi, Is this correct behaviour? test.ll: declare {} @__ex__print_int(i64) define i32 @main() { entry: %0 = call i64 @f.1() %1 = call {} @__ex__print_int(i64 %0) ret i32 0 } define internal fastcc i64 @f.1() { entry: ret i64 7 } > opt -std-compile-opts test.ll -S ; ModuleID = 'test.ll' define i32 @main() noreturn nounwind { entry: tail call void @llvm.trap()
2012 Jul 26
0
[LLVMdev] X86 FMA4
Because the intrinsics uses vector types (same as gcc). - Jan ----- Original Message ----- > From: "dag at cray.com" <dag at cray.com> > To: llvmdev at cs.uiuc.edu > Cc: > Sent: Wednesday, July 25, 2012 3:26 PM > Subject: [LLVMdev] X86 FMA4 > > We're migrating to LLVM 3.1 and trying to use the upstream FMA patterns. > > Why is VFMADDSD4
2011 Aug 15
3
[LLVMdev] structured types as function arguments
Hi, When calling a function, does the llvm code generator support passing structured types (arrays, structs, etc.) by _value_? I wrote some small examples, and it seemed to work, but I was wondering if anything can go wrong if the structured types are very large... Thanks, N
2016 Apr 01
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
RFC: A proposal for vectorizing loops with calls to math functions using SVML (short vector math library). ========= Overview ========= Very simply, SVML (Intel short vector math library) functions are vector variants of scalar math functions that take vector arguments, apply an operation to each element, and store the result in a vector register. These vector variants can be generated by the