thr3ads.net - similar to: "RFC: A proposal for vectorizing loops with calls to math functions using SVML"

Displaying 20 results from an estimated 3000 matches similar to: "RFC: A proposal for vectorizing loops with calls to math functions using SVML"

RFC: A proposal for vectorizing loops with calls to math functions using SVML

2016 Apr 04

RFC: A proposal for vectorizing loops with calls to math functions using SVML

Hi Sanjay, For sincos calls, I’m currently just going through isTriviallyVectorizable(), which was good enough to get things working so that I could test the translation. I don’t see why this cannot be changed to use addVectorizableFunctionsFromVecLib(). The other functions that I’m working with are already vectorized using the loop pragma. Those include sin, cos, exp, log, and pow. From: Sanjay

Proposal for function vectorization and loop vectorization with function calls

2016 Mar 02

Proposal for function vectorization and loop vectorization with function calls

Proposal for function vectorization and loop vectorization with function calls ============================================================================== Intel Corporation (3/2/2016) This is a proposal for an initial work towards Clang and LLVM implementation of vectorizing a function annotated with OpenMP 4.5's "#pragma omp declare simd" (named SIMD-enabled function) and its

Proposal for function vectorization and loop vectorization with function calls

2016 Mar 02

Proposal for function vectorization and loop vectorization with function calls

Hi Michael. Thank for your feedback and questions/comments. See below. >>>>>I think it should be possible to vectorize such loop even without openmp clauses. We just need to gather a vector value from several scalar calls, and vectorizer already knows how to do that, we just need not to bail out early. Dealing with calls is tricky, but in this case we have the pragma, so we can

Vectorization of math function failed?

2020 Aug 31

Vectorization of math function failed?

Hi, After reading https://llvm.org/docs/Vectorizers.html#vectorization-of-function-calls I decided to write the following C++ program: #include <cmath> using v4f32 = float __attribute__((__vector_size__(16))); v4f32 fct1(v4f32 x) { v4f32 y; y[0] = std::sin(x[0]); y[1] = std::sin(x[1]); y[2] = std::sin(x[2]); y[3] = std::sin(x[3]); return y; } v4f32 fct2(v4f32 x) { v4f32 y;

returning a list of functions

2010 Jan 27

returning a list of functions

Hi interested readers, I have a function that creates several functions within a loop and I would like them to be returned for further use as follows: Main.Function(df,...){ # df is a multivariate data funcList<-list(NULL) for (i in 1:ncol(df)){ temp<-logspline(df[,i],...) # logspline density estimate funcList[[i]]<-function(x){expression(temp,x)} } return(funcList) } I have tried

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

2015 Jul 29

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

When I compile attached IR with LLVM 3.6 llc -march=x86-64 -o f.S f.ll it generates an aligned ADDPS with unaligned address. See attached f.S, here an extract: addq $12, %r9 # $12 is not a multiple of 4, thus for xmm0 this is unaligned xorl %esi, %esi .align 16, 0x90 .LBB0_1: # %loop2

RFC: SIMD math-function library

2016 Sep 27

RFC: SIMD math-function library

I should keep quiet and leave well enough alone, but playing devil's advocate for a moment - I see you didn't bundle this with compiler-rt, which I guess is good? In the end what was the reasoning for that? Do you see this being sufficiently independent and running a different development track that it made sense? 1) Why rename C files to C++ (consistency?) 2) It seems your

[LLVMdev] LLVM2.2 x64 JIT trouble on VStudio build

2008 Feb 13

[LLVMdev] LLVM2.2 x64 JIT trouble on VStudio build

Hola LLVMers, I'm debugging through some strangeness that I'm seeing on X64 on windows with LLVM2.2. I had to change the code so that it would engage the x64 target machine on windows builds, but I've otherwise left LLVM 2.2 alone. The basic idea is that I've got a function bar which is compiled by VStudio and I'm creating another function foo via LLVM JIT which is going

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include

[LLVMdev] LLVM2.2 x64 JIT trouble on VStudio build

2008 Feb 15

[LLVMdev] LLVM2.2 x64 JIT trouble on VStudio build

On Feb 12, 2008, at 5:26 PM, Chuck Rose III wrote: > Hola LLVMers, > > I’m debugging through some strangeness that I’m seeing on X64 on > windows with LLVM2.2. I had to change the code so that it would > engage the x64 target machine on windows builds, but I’ve otherwise > left LLVM 2.2 alone. The basic idea is that I’ve got a function bar > which is compiled by

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

On Sat, Jul 7, 2012 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: >> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: >>> [...] >>> movaps 32(%rdi), %xmm3 >>> movaps 48(%rdi), %xmm2 >>>

[LLVMdev] Shuffle regression

2008 Jul 12

[LLVMdev] Shuffle regression

Hi all, I think I found a regression in the shuffle instruction. I've attached a replacement of fibonacci.cpp to reproduce the issue. It runs fine on release 2.3 but revision 52648 fails, and I suspect that the issue is still present. 2.3 generates the following x86 code: 03A10010 push ebp 03A10011 mov ebp,esp 03A10013 and esp,0FFFFFFF0h 03A10019

[LLVMdev] Shuffle regression

2008 Jul 12

[LLVMdev] Shuffle regression

I have fixed a related bug: 52740. Can you check if that fixes this problem? Evan On Jul 11, 2008, at 6:43 PM, Nicolas Capens wrote: > Hi all, > > I think I found a regression in the shuffle instruction. I’ve > attached a replacement of fibonacci.cpp to reproduce the issue. It > runs fine on release 2.3 but revision 52648 fails, and I suspect > that the issue is still

[LLVMdev] Duplicate loading of double constants

2013 Aug 19

[LLVMdev] Duplicate loading of double constants

Hi, I found that in some cases llvm generates duplicate loads of double constants, e.g. $ cat t.c double f(double* p, int n) { double s = 0; if (n) s += *p; return s; } $ clang -S -O3 t.c -o - ... f: # @f .cfi_startproc # BB#0: xorps %xmm0, %xmm0 testl %esi, %esi je .LBB0_2 # BB#1: xorps

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

2017 Apr 19

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

Changing the list from cfe-dev to llvm-dev > On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com> wrote: > > I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI. > > I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic

[LLVMdev] LLVM2.2 x64 JIT trouble on VStudio build

2008 Feb 15

[LLVMdev] LLVM2.2 x64 JIT trouble on VStudio build

Hey Evan, At the point of the instructions you suggested I step through, X86ISelLowering has this state: - this 0x00000000005fe728 {VarArgsFrameIndex=-842150451 RegSaveFrameIndex=-842150451 VarArgsGPOffset=3452816845 ...} llvm::X86TargetLowering * const + llvm::TargetLowering {TM={...} TD=0x00000000008edac0

RFC: SIMD math-function library

2016 Jul 27

RFC: SIMD math-function library

Hi everyone, I think that everyone is on the same page. We'll put together a patch for review. One remaining question: There seem two potential homes for this library: parallel_libs and compiler-rt. Opinions on where the vectorized math functions should live? My inclination is to target it for the new parallel_libs project, in part because I feel like compiler-rt has too many things grouped

error: couldn't allocate input reg for constraint '{xmm0}'

2018 Nov 17

error: couldn't allocate input reg for constraint '{xmm0}'

Here is some zig code: pub fn setXmm0(comptime T: type, value: T) void { comptime assert(builtin.arch == builtin.Arch.x86_64); const aligned_value: T align(16) = value; asm volatile ( \\movaps (%[ptr]), %%xmm0 : : [ptr] "r" (&aligned_value) : "xmm0" ); } I want to improve this and integrate more tightly with LLVM IR,

[LLVMdev] How does SSEDomainFix work?

2010 May 11

[LLVMdev] How does SSEDomainFix work?

Hello. This is my 1st post. I have tried SSE execution domain fixup pass. But I am not able to see any improvements. I expect for the example below to use MOVDQA, PAND &c. (On nehalem, ANDPS is extremely slower than PAND) Please tell me if something would be wrong for me. Thank you. Takumi Host: i386-mingw32 Build: trunk at 103373 foo.ll: define <4 x i32> @foo(<4 x i32> %x,

[LLVMdev] movaps being generated despite alignment 1 being specified

2007 Oct 18

[LLVMdev] movaps being generated despite alignment 1 being specified

Hello LLVMers, High order bit: Presence of a called function is causing a store on an unrelated vector to generate an aligned store rather an unaligned one despite unaligned store being indicated in the associated StoreInst. Details: I pulled down the latest source, so this is something I'm finding with the current LLVM. I'm hoping you'll have an idea what's

similar to: RFC: A proposal for vectorizing loops with calls to math functions using SVML