search for: vec_add

Displaying 14 results from an estimated 14 matches for "vec_add".

2020 Jun 28
2
__restirct ignored when including headers like <cmath>
Hi, I am observing a strange behaviour in which Clang ignores __restirct when I include some standard headers. For example, this code: void vec_add(int* __restrict a, int* __restrict b, int n) { #pragma unroll 4 for(int i=0; i<n; ++i) { a[i] += b[i]; } } results in: ; Function Attrs: nofree norecurse nounwind define dso_local void @_Z7vec_addPiS_i(i32* noalias nocapture %a, i32* noalias nocapture readonl...
2004 Aug 06
0
[PATCH] Make SSE Run Time option.
...or float a1 = vec_ld( 15, a ); vector float b0 = vec_ld( 0, b ); vector float b1 = vec_ld( 15, b ); a0 = vec_perm( a0, a1, vec_lvsl( 0, a ) ); b0 = vec_perm( b0, b1, vec_lvsl( 0, b ) ); a0 = vec_madd( a0, b0, (vector float) vec_splat_u32(0) ) ; a0 = vec_add( a0, vec_sld( a0, a0, 8 ) ); a0 = vec_add( a0, vec_sld( a0, a0, 4 ) ); vec_ste( a0, 0, &sum ); return sum; Please note that dot products of simple vector floats are usually faster in the scalar units. The add across and transfer to scalar is just too expensive. Its gen...
2020 Jun 24
2
FW: Restrict qualifier on class members
Hi Jeroen, Sorry, I missed that. I tried the patch, and this program: #include <stdint.h> #define __remote __attribute__((address_space(1))) __remote int* A; __remote int* B; void vec_add(__remote int* __restrict a, __remote int* __restrict b, int n) { #pragma unroll 4 for(int i=0; i<n; ++i) { a[i] += b[i]; } } int main(int argc, char** argv) { __remote int* __restrict a = A; __remote int* __restrict b = B; #pragma unroll 4 for(int i=...
2004 Aug 06
6
[PATCH] Make SSE Run Time option.
...a += 4; MSQa = vec_ld(0, a); vec_a = vec_perm(LSQa, MSQa, maska); vec_b = vec_ld(0, b); b += 4; vec_result = vec_madd(vec_a, vec_b, vec_result); } } vec_result = vec_add(vec_result, vec_sld(vec_result, vec_result, 8)); vec_result = vec_add(vec_result, vec_sld(vec_result, vec_result, 4)); vec_ste(vec_result, 0, &sum); return sum; <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.x...
2009 Nov 10
4
[LLVMdev] Altivec vs the type legalizer
...y, they do. (And I see no other way to fix it except to break the vector into scalars, which produces horrendous code.) typedef vector unsigned char vuint8_t; vuint8_t baz; void foo(vuint8_t x) { vuint8_t temp = (vuint8_t)(22,21,20, 3, 25,24,23, 3, 28,27,26, 3, 31,30,29, 3); baz = vec_add(x, temp); }
2005 Dec 02
0
run time assembler patch for altivec, sse + bug fixes
...= vec_madd(vec_a, vec_b, vec_result); a += 4; MSQa = vec_ld(0, a); vec_a = vec_perm(LSQa, MSQa, maska); vec_b = vec_ld(0, b); b += 4; vec_result = vec_madd(vec_a, vec_b, vec_result); } } vec_result = vec_add(vec_result, vec_sld(vec_result, vec_result, 8)); vec_result = vec_add(vec_result, vec_sld(vec_result, vec_result, 4)); vec_ste(vec_result, 0, &sum); return sum; } #endif -------------- next part -------------- A non-text attachment was scrubbed... Name: ltp_altivec.h Type: applica...
2020 Jun 22
2
Restrict qualifier on class members
Hi Jeroen, That's great! I was trying to use the patch, what's the latest version of the project we could apply it on? Hi Neil, That seems like what I can do as well! Do you happen to have some examples lying around? Maybe a pointer to the planned presentation, if that's okay? Thank you, Bandhav On Mon, Jun 22, 2020 at 1:55 AM Neil Henning <neil.henning at unity3d.com>
2005 Jan 29
4
A couple of points about flac 1.1.1 on ppc/linux/altivec
On Thu, 27 Jan 2005, John Steele Scott wrote: > That looks fine to me as well. However, the best solution is something which > Luca suggested a few months ago, which is to use the functions defined in > altivec.h. These are C functions which map directly to Altivec machine > instructions. I am willing to help out, but I don't find the current lpc_asm.s > very easy to follow, and
2009 Nov 10
0
[LLVMdev] Altivec vs the type legalizer
...ix it except to break the vector into scalars, which produces > horrendous code.) > > > typedef vector unsigned char vuint8_t; > vuint8_t baz; > void foo(vuint8_t x) { > vuint8_t temp = (vuint8_t)(22,21,20, 3, 25,24,23, 3, 28,27,26, 3, > 31,30,29, 3); > baz = vec_add(x, temp); > } Earlier this year we ran into a similar problem for NEON and ended up modifying the type rules for BUILD_VECTOR so that the operand types do not need to mach the element types. It is possible that what you are seeing is fall-out from that change. With the "new" BUILD_V...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
Jean-Marc, >I'm still not sure I get it. On an Athlon XP, I can do something like >"mulps xmm0, xmm1", which means that the xmm registers are indeed >supported. Besides, without the xmm registers, you can't use much of >SSE. In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run that code it generates an Illegal Instruction Error. In addition,
2000 Nov 15
8
Optimisations
Looking through the archives I have seen talk of making CPU specific optimisations for Vorbis, a la MMX/3DNow!/SSE. The feeling I gather is to wait until something is working well in C before committing to any kind of specific optimisation. What if oft used and needed DSP functions were identified and standardised DSP functionality be written for Vorbis? This would seperate the basically
2016 Mar 02
4
Proposal for function vectorization and loop vectorization with function calls
...LoopVectorizer will vectorize the generated %t loop, expected to produce the following vectorized code eliminating the loop (pseudo code): define __stdcall <4 x f32> @_ZGVbN4ul_dowork(f32* %a, i32 %k) #0 { vec_load xmm1, %a[k: VL] xmm2 = call __svml_sinf(xmm1) xmm0 = vec_add xmm2, [9,8f, 9.8f, 9.8f, 9.8f] store %a[k:VL], xmm0 return xmm0; } [[Note: Vectorizer support for the Short Vector Math Library (SVML) functions will be a seperate proposal. ]] 3. The LLVM LoopVectorizer is enhanced to a) identify loops with calls that have been ann...
2016 Mar 02
2
Proposal for function vectorization and loop vectorization with function calls
...the generated %t loop, expected > to produce the following vectorized code eliminating the loop (pseudo code): > > define __stdcall <4 x f32> @_ZGVbN4ul_dowork(f32* %a, i32 %k) #0 > { > vec_load xmm1, %a[k: VL] > xmm2 = call __svml_sinf(xmm1) > xmm0 = vec_add xmm2, [9,8f, 9.8f, 9.8f, 9.8f] > store %a[k:VL], xmm0 > return xmm0; > } > > [[Note: Vectorizer support for the Short Vector Math Library (SVML) > functions will be a seperate proposal. ]] Loop Vectorizer already supports math functions and math functions l...
2016 Mar 02
5
RFC: Implementing the Swift calling convention in LLVM and Clang
> On Mar 2, 2016, at 1:33 AM, Renato Golin <renato.golin at linaro.org> wrote: > > On 2 March 2016 at 01:14, John McCall via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Hi, all. >> - We sometimes want to return more values in registers than the convention normally does, and we want to be able to use both integer and floating-point registers. For