thr3ads.net - search: "_mm_sub

Displaying 3 results from an estimated 3 matches for "_mm_sub_pd".

Did you mean: _mm_sub_ps

2004 Aug 06

[PATCH] Make SSE Run Time option.

...// Cr = Ar * Br - Ai * Bi // Ci = Ai * Br + Ar * Bi __m128d real = _mm_mul_pd( Ar, Br ); __m128d imag = _mm_mul_pd( Ai, Br ); Ai = _mm_mul_pd( Ai, Bi ); Ar = _mm_mul_pd( Ar, Bi ); real = _mm_sub_pd( real, Ai ); imag = _mm_add_pd( imag, Ar ); *Cr = real; *Ci = imag; } No permute is required. The key thing to note is that I do two/four complex multiplies at a time in proper SIMD fashion, unlike PNI based methods. Thus, throughput is 3 v...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

...hworld.wolfram.com/ComplexMultiplication.html > // Cr = Ar * Br - Ai * Bi > // Ci = Ai * Br + Ar * Bi > > __m128d real = _mm_mul_pd( Ar, Br ); > __m128d imag = _mm_mul_pd( Ai, Br ); > > Ai = _mm_mul_pd( Ai, Bi ); > Ar = _mm_mul_pd( Ar, Bi ); > > real = _mm_sub_pd( real, Ai ); > imag = _mm_add_pd( imag, Ar ); > > *Cr = real; > *Ci = imag; > } > > No permute is required. The key thing to note is that I do two/four > complex multiplies at a time in proper SIMD fashion, unlike PNI based > methods. Thus, throughput is 3 vecto...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

> Personally, I don't think much of PNI. The complex arithmetic stuff they > added sets you up for a lot of permute overhead that is inefficient -- > especially on a processor that is already weak on permute. In my opinion, Actually, the new instructions make it possible to do complex multiplies without the need to permute and separate the add and subtract. The really useful

search for: _mm_sub_pd