thr3ads.net - search: "_mm_move

Displaying 3 results from an estimated 3 matches for "_mm_move_ss".

2004 Aug 06

[PATCH] Make SSE Run Time option.

...n[10], 0, 0); for (i=0;i<N;i++) { __m128 xx; __m128 yy; /* Compute next filter result */ xx = _mm_load_ps1(x+i); yy = _mm_add_ss(xx, mem[0]); _mm_store_ss(y+i, yy); yy = _mm_shuffle_ps(yy, yy, 0); /* Update memory */ mem[0] = _mm_move_ss(mem[0], mem[1]); mem[0] = _mm_shuffle_ps(mem[0], mem[0], 0x39); mem[0] = _mm_add_ps(mem[0], _mm_mul_ps(xx, num[0])); mem[0] = _mm_sub_ps(mem[0], _mm_mul_ps(yy, den[0])); mem[1] = _mm_move_ss(mem[1], mem[2]); mem[1] = _mm_shuffle_ps(mem[1], mem[1], 0x39); mem[1...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

> Personally, I don't think much of PNI. The complex arithmetic stuff they > added sets you up for a lot of permute overhead that is inefficient -- > especially on a processor that is already weak on permute. In my opinion, Actually, the new instructions make it possible to do complex multiplies without the need to permute and separate the add and subtract. The really useful

Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

2004 Aug 06

Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

...dword ptr [edx],xmm0 262: yy = _mm_shuffle_ps(yy, yy, 0); 004134BF movaps xmm0,xmmword ptr [yy] 004134C6 movaps xmm1,xmmword ptr [yy] 004134CD shufps xmm1,xmm0,0 004134D1 movaps xmmword ptr [yy],xmm1 263: 264: /* Update memory */ 265: mem[0] = _mm_move_ss(mem[0], mem[1]); 004134D8 movaps xmm0,xmmword ptr [ebp-50h] 004134DC movaps xmm1,xmmword ptr [ebp-60h] 004134E0 movss xmm1,xmm0 004134E4 movaps xmmword ptr [ebp-60h],xmm1 266: mem[0] = _mm_shuffle_ps(mem[0], mem[0], 0x39); 004134E8 movaps xmm0,xmmword ptr...

search for: _mm_move_ss