thr3ads.net - search: "vec

Displaying 14 results from an estimated 14 matches for "vec_add".

__restirct ignored when including headers like <cmath>

2020 Jun 28

__restirct ignored when including headers like <cmath>

Hi, I am observing a strange behaviour in which Clang ignores __restirct when I include some standard headers. For example, this code: void vec_add(int* __restrict a, int* __restrict b, int n) { #pragma unroll 4 for(int i=0; i<n; ++i) { a[i] += b[i]; } } results in: ; Function Attrs: nofree norecurse nounwind define dso_local void @_Z7vec_addPiS_i(i32* noalias nocapture %a, i32* noalias nocapture readonl...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

...or float a1 = vec_ld( 15, a ); vector float b0 = vec_ld( 0, b ); vector float b1 = vec_ld( 15, b ); a0 = vec_perm( a0, a1, vec_lvsl( 0, a ) ); b0 = vec_perm( b0, b1, vec_lvsl( 0, b ) ); a0 = vec_madd( a0, b0, (vector float) vec_splat_u32(0) ) ; a0 = vec_add( a0, vec_sld( a0, a0, 8 ) ); a0 = vec_add( a0, vec_sld( a0, a0, 4 ) ); vec_ste( a0, 0, &sum ); return sum; Please note that dot products of simple vector floats are usually faster in the scalar units. The add across and transfer to scalar is just too expensive. Its gen...

FW: Restrict qualifier on class members

2020 Jun 24

FW: Restrict qualifier on class members

Hi Jeroen, Sorry, I missed that. I tried the patch, and this program: #include <stdint.h> #define __remote __attribute__((address_space(1))) __remote int* A; __remote int* B; void vec_add(__remote int* __restrict a, __remote int* __restrict b, int n) { #pragma unroll 4 for(int i=0; i<n; ++i) { a[i] += b[i]; } } int main(int argc, char** argv) { __remote int* __restrict a = A; __remote int* __restrict b = B; #pragma unroll 4 for(int i=...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

...a += 4; MSQa = vec_ld(0, a); vec_a = vec_perm(LSQa, MSQa, maska); vec_b = vec_ld(0, b); b += 4; vec_result = vec_madd(vec_a, vec_b, vec_result); } } vec_result = vec_add(vec_result, vec_sld(vec_result, vec_result, 8)); vec_result = vec_add(vec_result, vec_sld(vec_result, vec_result, 4)); vec_ste(vec_result, 0, &sum); return sum; <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.x...

[LLVMdev] Altivec vs the type legalizer

2009 Nov 10

[LLVMdev] Altivec vs the type legalizer

...y, they do. (And I see no other way to fix it except to break the vector into scalars, which produces horrendous code.) typedef vector unsigned char vuint8_t; vuint8_t baz; void foo(vuint8_t x) { vuint8_t temp = (vuint8_t)(22,21,20, 3, 25,24,23, 3, 28,27,26, 3, 31,30,29, 3); baz = vec_add(x, temp); }

run time assembler patch for altivec, sse + bug fixes

2005 Dec 02

run time assembler patch for altivec, sse + bug fixes

...= vec_madd(vec_a, vec_b, vec_result); a += 4; MSQa = vec_ld(0, a); vec_a = vec_perm(LSQa, MSQa, maska); vec_b = vec_ld(0, b); b += 4; vec_result = vec_madd(vec_a, vec_b, vec_result); } } vec_result = vec_add(vec_result, vec_sld(vec_result, vec_result, 8)); vec_result = vec_add(vec_result, vec_sld(vec_result, vec_result, 4)); vec_ste(vec_result, 0, &sum); return sum; } #endif -------------- next part -------------- A non-text attachment was scrubbed... Name: ltp_altivec.h Type: applica...

Restrict qualifier on class members

2020 Jun 22

Restrict qualifier on class members

Hi Jeroen, That's great! I was trying to use the patch, what's the latest version of the project we could apply it on? Hi Neil, That seems like what I can do as well! Do you happen to have some examples lying around? Maybe a pointer to the planned presentation, if that's okay? Thank you, Bandhav On Mon, Jun 22, 2020 at 1:55 AM Neil Henning <neil.henning at unity3d.com>

A couple of points about flac 1.1.1 on ppc/linux/altivec

2005 Jan 29

A couple of points about flac 1.1.1 on ppc/linux/altivec

On Thu, 27 Jan 2005, John Steele Scott wrote: > That looks fine to me as well. However, the best solution is something which > Luca suggested a few months ago, which is to use the functions defined in > altivec.h. These are C functions which map directly to Altivec machine > instructions. I am willing to help out, but I don't find the current lpc_asm.s > very easy to follow, and

[LLVMdev] Altivec vs the type legalizer

2009 Nov 10

[LLVMdev] Altivec vs the type legalizer

...ix it except to break the vector into scalars, which produces > horrendous code.) > > > typedef vector unsigned char vuint8_t; > vuint8_t baz; > void foo(vuint8_t x) { > vuint8_t temp = (vuint8_t)(22,21,20, 3, 25,24,23, 3, 28,27,26, 3, > 31,30,29, 3); > baz = vec_add(x, temp); > } Earlier this year we ran into a similar problem for NEON and ended up modifying the type rules for BUILD_VECTOR so that the operand types do not need to mach the element types. It is possible that what you are seeing is fall-out from that change. With the "new" BUILD_V...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

[PATCH] Make SSE Run Time option. Add Win32 SSE code

Jean-Marc, >I'm still not sure I get it. On an Athlon XP, I can do something like >"mulps xmm0, xmm1", which means that the xmm registers are indeed >supported. Besides, without the xmm registers, you can't use much of >SSE. In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run that code it generates an Illegal Instruction Error. In addition,

Optimisations

2000 Nov 15

Optimisations

Looking through the archives I have seen talk of making CPU specific optimisations for Vorbis, a la MMX/3DNow!/SSE. The feeling I gather is to wait until something is working well in C before committing to any kind of specific optimisation. What if oft used and needed DSP functions were identified and standardised DSP functionality be written for Vorbis? This would seperate the basically

Proposal for function vectorization and loop vectorization with function calls

2016 Mar 02

Proposal for function vectorization and loop vectorization with function calls

...LoopVectorizer will vectorize the generated %t loop, expected to produce the following vectorized code eliminating the loop (pseudo code): define __stdcall <4 x f32> @_ZGVbN4ul_dowork(f32* %a, i32 %k) #0 { vec_load xmm1, %a[k: VL] xmm2 = call __svml_sinf(xmm1) xmm0 = vec_add xmm2, [9,8f, 9.8f, 9.8f, 9.8f] store %a[k:VL], xmm0 return xmm0; } [[Note: Vectorizer support for the Short Vector Math Library (SVML) functions will be a seperate proposal. ]] 3. The LLVM LoopVectorizer is enhanced to a) identify loops with calls that have been ann...

Proposal for function vectorization and loop vectorization with function calls

2016 Mar 02

Proposal for function vectorization and loop vectorization with function calls

...the generated %t loop, expected > to produce the following vectorized code eliminating the loop (pseudo code): > > define __stdcall <4 x f32> @_ZGVbN4ul_dowork(f32* %a, i32 %k) #0 > { > vec_load xmm1, %a[k: VL] > xmm2 = call __svml_sinf(xmm1) > xmm0 = vec_add xmm2, [9,8f, 9.8f, 9.8f, 9.8f] > store %a[k:VL], xmm0 > return xmm0; > } > > [[Note: Vectorizer support for the Short Vector Math Library (SVML) > functions will be a seperate proposal. ]] Loop Vectorizer already supports math functions and math functions l...

RFC: Implementing the Swift calling convention in LLVM and Clang

2016 Mar 02

RFC: Implementing the Swift calling convention in LLVM and Clang

> On Mar 2, 2016, at 1:33 AM, Renato Golin <renato.golin at linaro.org> wrote: > > On 2 March 2016 at 01:14, John McCall via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Hi, all. >> - We sometimes want to return more values in registers than the convention normally does, and we want to be able to use both integer and floating-point registers. For

search for: vec_add