Displaying 7 results from an estimated 7 matches for "__m128d".
Did you mean:
__m128i
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
...e it with a code sample?
I suppose if I make such a demand that it would only be sporting if I
provide what I believe to be the more efficient competing method that uses
only SSE/SSE2. Double precision is shown. For Single precision simply
replace all "pd" with "ps" and "__m128d" with "__m128".
//For C[] = A[] * B[]
//The real and imaginary parts of A, B and C are stored in
//different arrays, not interleaved
inline void ComplexMultiply( __m128d *Cr, __m128d *Ci,
__m128d Ar, __m128d Ai...
2004 Aug 06
0
[PATCH] Make SSE Run Time option.
...>
> I suppose if I make such a demand that it would only be sporting if I
> provide what I believe to be the more efficient competing method that uses
> only SSE/SSE2. Double precision is shown. For Single precision simply
> replace all "pd" with "ps" and "__m128d" with "__m128".
>
> //For C[] = A[] * B[]
> //The real and imaginary parts of A, B and C are stored in
> //different arrays, not interleaved
> inline void ComplexMultiply( __m128d *Cr, __m128d *Ci,
> __m128d Ar, __m128d Ai,
> __m128d Br, __m128d B...
2008 Nov 26
1
SSE2 code won't compile in VC
Jean-Marc,
At least VS2005 (what I'm using) won't compile resample_sse.h with
_USE_SSE2 defined because it refuses to cast __m128 to __m128d and vice
versa. While there are intrinsics to do the casts, I thought it would be
simpler to just use an intrinsic that accomplishes the same thing
without all the casting. Thanks,
--John
@@ -91,7 +91,7 @@ static inline double inner_product_double(const float
*a, const float *b, unsign...
2004 Aug 06
5
[PATCH] Make SSE Run Time option.
> Personally, I don't think much of PNI. The complex arithmetic stuff they
> added sets you up for a lot of permute overhead that is inefficient --
> especially on a processor that is already weak on permute. In my opinion,
Actually, the new instructions make it possible to do complex multiplies
without the need to permute and separate the add and subtract. The
really useful
2009 Oct 26
1
[PATCH] Fix miscompile of SSE resampler
...include <emmintrin.h>
#define OVERRIDE_INNER_PRODUCT_DOUBLE
-static inline double inner_product_double(const float *a, const float *b, unsigned int len)
+static inline void inner_product_double(double *ret, const float *a, const float *b, unsigned int len)
{
int i;
- double ret;
__m128d sum = _mm_setzero_pd();
__m128 t;
for (i=0;i<len;i+=8)
@@ -92,14 +87,12 @@ static inline double inner_product_double(const float *a, const float *b, unsign
sum = _mm_add_pd(sum, _mm_cvtps_pd(_mm_movehl_ps(t, t)));
}
sum = _mm_add_sd(sum, _mm_unpackhi_pd(sum, sum));
- _mm...
2008 May 03
2
Resampler (no api)
..._shuffle_ps(sum, sum, 0x55));
+ _mm_store_ss(&ret, sum);
+ return ret;
+}
+
+#ifdef _USE_SSE2
+#include <emmintrin.h>
+#define OVERRIDE_INNER_PRODUCT_DOUBLE
+
+static inline double inner_product_double(const float *a, const float *b, unsigned int len)
+{
+ int i;
+ double ret;
+ __m128d sum = _mm_setzero_pd();
+ __m128 t;
+ for (i=0;i<len;i+=8)
+ {
+ t = _mm_mul_ps(_mm_loadu_ps(a+i), _mm_loadu_ps(b+i));
+ sum = _mm_add_pd(sum, _mm_cvtps_pd(t));
+ sum = _mm_add_pd(sum, _mm_cvtps_pd(_mm_movehl_ps(t, t)));
+
+ t = _mm_mul_ps(_mm_loadu_ps(a+i+4), _mm_loadu...
2008 May 03
0
Resampler, memory only variant
..._shuffle_ps(sum, sum, 0x55));
+ _mm_store_ss(&ret, sum);
+ return ret;
+}
+
+#ifdef _USE_SSE2
+#include <emmintrin.h>
+#define OVERRIDE_INNER_PRODUCT_DOUBLE
+
+static inline double inner_product_double(const float *a, const float *b, unsigned int len)
+{
+ int i;
+ double ret;
+ __m128d sum = _mm_setzero_pd();
+ __m128 t;
+ for (i=0;i<len;i+=8)
+ {
+ t = _mm_mul_ps(_mm_loadu_ps(a+i), _mm_loadu_ps(b+i));
+ sum = _mm_add_pd(sum, _mm_cvtps_pd(t));
+ sum = _mm_add_pd(sum, _mm_cvtps_pd(_mm_movehl_ps(t, t)));
+
+ t = _mm_mul_ps(_mm_loadu_ps(a+i+4), _mm_loadu...