similar to: RFC: A proposal for vectorizing loops with calls to math functions using SVML

Displaying 20 results from an estimated 3000 matches similar to: "RFC: A proposal for vectorizing loops with calls to math functions using SVML"

2016 Apr 04
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
Hi Sanjay, For sincos calls, I’m currently just going through isTriviallyVectorizable(), which was good enough to get things working so that I could test the translation. I don’t see why this cannot be changed to use addVectorizableFunctionsFromVecLib(). The other functions that I’m working with are already vectorized using the loop pragma. Those include sin, cos, exp, log, and pow. From: Sanjay
2016 Mar 02
4
Proposal for function vectorization and loop vectorization with function calls
Proposal for function vectorization and loop vectorization with function calls ============================================================================== Intel Corporation (3/2/2016) This is a proposal for an initial work towards Clang and LLVM implementation of vectorizing a function annotated with OpenMP 4.5's "#pragma omp declare simd" (named SIMD-enabled function) and its
2016 Mar 02
2
Proposal for function vectorization and loop vectorization with function calls
Hi Michael. Thank for your feedback and questions/comments. See below. >>>>>I think it should be possible to vectorize such loop even without openmp clauses. We just need to gather a vector value from several scalar calls, and vectorizer already knows how to do that, we just need not to bail out early. Dealing with calls is tricky, but in this case we have the pragma, so we can
2020 Aug 31
2
Vectorization of math function failed?
Hi, After reading https://llvm.org/docs/Vectorizers.html#vectorization-of-function-calls I decided to write the following C++ program: #include <cmath> using v4f32 = float __attribute__((__vector_size__(16))); v4f32 fct1(v4f32 x) { v4f32 y; y[0] = std::sin(x[0]); y[1] = std::sin(x[1]); y[2] = std::sin(x[2]); y[3] = std::sin(x[3]); return y; } v4f32 fct2(v4f32 x) { v4f32 y;
2010 Jan 27
1
returning a list of functions
Hi interested readers, I have a function that creates several functions within a loop and I would like them to be returned for further use as follows: Main.Function(df,...){ # df is a multivariate data funcList<-list(NULL) for (i in 1:ncol(df)){ temp<-logspline(df[,i],...) # logspline density estimate funcList[[i]]<-function(x){expression(temp,x)} } return(funcList) } I have tried
2015 Jul 29
2
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
When I compile attached IR with LLVM 3.6 llc -march=x86-64 -o f.S f.ll it generates an aligned ADDPS with unaligned address. See attached f.S, here an extract: addq $12, %r9 # $12 is not a multiple of 4, thus for xmm0 this is unaligned xorl %esi, %esi .align 16, 0x90 .LBB0_1: # %loop2
2016 Sep 27
3
RFC: SIMD math-function library
I should keep quiet and leave well enough alone, but playing devil's advocate for a moment - I see you didn't bundle this with compiler-rt, which I guess is good? In the end what was the reasoning for that? Do you see this being sufficiently independent and running a different development track that it made sense? 1) Why rename C files to C++ (consistency?) 2) It seems your
2008 Feb 13
3
[LLVMdev] LLVM2.2 x64 JIT trouble on VStudio build
Hola LLVMers, I'm debugging through some strangeness that I'm seeing on X64 on windows with LLVM2.2. I had to change the code so that it would engage the x64 target machine on windows builds, but I've otherwise left LLVM 2.2 alone. The basic idea is that I've got a function bar which is compiled by VStudio and I'm creating another function foo via LLVM JIT which is going
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include
2008 Feb 15
0
[LLVMdev] LLVM2.2 x64 JIT trouble on VStudio build
On Feb 12, 2008, at 5:26 PM, Chuck Rose III wrote: > Hola LLVMers, > > I’m debugging through some strangeness that I’m seeing on X64 on > windows with LLVM2.2. I had to change the code so that it would > engage the x64 target machine on windows builds, but I’ve otherwise > left LLVM 2.2 alone. The basic idea is that I’ve got a function bar > which is compiled by
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Sat, Jul 7, 2012 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: >> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: >>> [...] >>> movaps 32(%rdi), %xmm3 >>> movaps 48(%rdi), %xmm2 >>>
2008 Jul 12
2
[LLVMdev] Shuffle regression
Hi all, I think I found a regression in the shuffle instruction. I've attached a replacement of fibonacci.cpp to reproduce the issue. It runs fine on release 2.3 but revision 52648 fails, and I suspect that the issue is still present. 2.3 generates the following x86 code: 03A10010 push ebp 03A10011 mov ebp,esp 03A10013 and esp,0FFFFFFF0h 03A10019
2008 Jul 12
0
[LLVMdev] Shuffle regression
I have fixed a related bug: 52740. Can you check if that fixes this problem? Evan On Jul 11, 2008, at 6:43 PM, Nicolas Capens wrote: > Hi all, > > I think I found a regression in the shuffle instruction. I’ve > attached a replacement of fibonacci.cpp to reproduce the issue. It > runs fine on release 2.3 but revision 52648 fails, and I suspect > that the issue is still
2013 Aug 19
2
[LLVMdev] Duplicate loading of double constants
Hi, I found that in some cases llvm generates duplicate loads of double constants, e.g. $ cat t.c double f(double* p, int n) { double s = 0; if (n) s += *p; return s; } $ clang -S -O3 t.c -o - ... f: # @f .cfi_startproc # BB#0: xorps %xmm0, %xmm0 testl %esi, %esi je .LBB0_2 # BB#1: xorps
2017 Apr 19
3
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
Changing the list from cfe-dev to llvm-dev > On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com> wrote: > > I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI. > > I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic
2008 Feb 15
1
[LLVMdev] LLVM2.2 x64 JIT trouble on VStudio build
Hey Evan, At the point of the instructions you suggested I step through, X86ISelLowering has this state: - this 0x00000000005fe728 {VarArgsFrameIndex=-842150451 RegSaveFrameIndex=-842150451 VarArgsGPOffset=3452816845 ...} llvm::X86TargetLowering * const + llvm::TargetLowering {TM={...} TD=0x00000000008edac0
2016 Jul 27
5
RFC: SIMD math-function library
Hi everyone, I think that everyone is on the same page. We'll put together a patch for review. One remaining question: There seem two potential homes for this library: parallel_libs and compiler-rt. Opinions on where the vectorized math functions should live? My inclination is to target it for the new parallel_libs project, in part because I feel like compiler-rt has too many things grouped
2018 Nov 17
2
error: couldn't allocate input reg for constraint '{xmm0}'
Here is some zig code: pub fn setXmm0(comptime T: type, value: T) void { comptime assert(builtin.arch == builtin.Arch.x86_64); const aligned_value: T align(16) = value; asm volatile ( \\movaps (%[ptr]), %%xmm0 : : [ptr] "r" (&aligned_value) : "xmm0" ); } I want to improve this and integrate more tightly with LLVM IR,
2010 May 11
2
[LLVMdev] How does SSEDomainFix work?
Hello. This is my 1st post. I have tried SSE execution domain fixup pass. But I am not able to see any improvements. I expect for the example below to use MOVDQA, PAND &c. (On nehalem, ANDPS is extremely slower than PAND) Please tell me if something would be wrong for me. Thank you. Takumi Host: i386-mingw32 Build: trunk at 103373 foo.ll: define <4 x i32> @foo(<4 x i32> %x,
2007 Oct 18
3
[LLVMdev] movaps being generated despite alignment 1 being specified
Hello LLVMers, High order bit: Presence of a called function is causing a store on an unrelated vector to generate an aligned store rather an unaligned one despite unaligned store being indicated in the associated StoreInst. Details: I pulled down the latest source, so this is something I'm finding with the current LLVM. I'm hoping you'll have an idea what's