search for: _mm_sub_ps

Displaying 3 results from an estimated 3 matches for "_mm_sub_ps".

2004 Aug 06
3
[PATCH] Make SSE Run Time option.
...y = _mm_add_ss(xx, mem[0]); _mm_store_ss(y+i, yy); yy = _mm_shuffle_ps(yy, yy, 0); /* Update memory */ mem[0] = _mm_move_ss(mem[0], mem[1]); mem[0] = _mm_shuffle_ps(mem[0], mem[0], 0x39); mem[0] = _mm_add_ps(mem[0], _mm_mul_ps(xx, num[0])); mem[0] = _mm_sub_ps(mem[0], _mm_mul_ps(yy, den[0])); mem[1] = _mm_move_ss(mem[1], mem[2]); mem[1] = _mm_shuffle_ps(mem[1], mem[1], 0x39); mem[1] = _mm_add_ps(mem[1], _mm_mul_ps(xx, num[1])); mem[1] = _mm_sub_ps(mem[1], _mm_mul_ps(yy, den[1])); mem[2] = _mm_shuffle_ps(mem[2], mem[2], 0x...
2004 Aug 06
5
[PATCH] Make SSE Run Time option.
> Personally, I don't think much of PNI. The complex arithmetic stuff they > added sets you up for a lot of permute overhead that is inefficient -- > especially on a processor that is already weak on permute. In my opinion, Actually, the new instructions make it possible to do complex multiplies without the need to permute and separate the add and subtract. The really useful
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
...with functions of this size. Regards, Anthony ////////////////////// CODE ///////////////////////////////////////// #include <xmmintrin.h> #define __INLINE static inline __attribute__((always_inline)) #define LOAD _mm_load_ps #define STORE _mm_store_ps #define ADD _mm_add_ps #define SUB _mm_sub_ps #define MULT _mm_mul_ps #define STREAM _mm_stream_ps #define SHUF _mm_shuffle_ps #define VLIT4(a,b,c,d) _mm_set_ps(a,b,c,d) #define SWAP(d) SHUF(d,d,_MM_SHUFFLE(2,3,0,1)) #define UNPACK2LO(a,b) SHUF(a,b,_MM_SHUFFLE(1,0,1,0)) #define UNPACK2HI(a,b) SHUF(a,b,_MM_SHUFFLE(3,2,3,2)) #define HALFBLEND(a,...