search for: __m128

Displaying 20 results from an estimated 59 matches for "__m128".

Did you mean: __m128i
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
...#define SHUF _mm_shuffle_ps #define VLIT4(a,b,c,d) _mm_set_ps(a,b,c,d) #define SWAP(d) SHUF(d,d,_MM_SHUFFLE(2,3,0,1)) #define UNPACK2LO(a,b) SHUF(a,b,_MM_SHUFFLE(1,0,1,0)) #define UNPACK2HI(a,b) SHUF(a,b,_MM_SHUFFLE(3,2,3,2)) #define HALFBLEND(a,b) SHUF(a,b,_MM_SHUFFLE(3,2,1,0)) __INLINE void TX2(__m128 *a, __m128 *b) { __m128 TX2_t0 = UNPACK2LO(*a, *b); __m128 TX2_t1 = UNPACK2HI(*a,*b); *a = TX2_t0; *b = TX2_t1; } __INLINE void FMA(__m128 *Rd, __m128 Rn, __m128 Rm) { *Rd = ADD(*Rd, MULT(Rn,Rm)); } __INLINE void FMS(__m128 *Rd, __m128 Rn, __m128 Rm) { *Rd = SUB(*Rd, MULT(Rn,Rm)); } __INL...
2008 Nov 26
1
SSE2 code won't compile in VC
Jean-Marc, At least VS2005 (what I'm using) won't compile resample_sse.h with _USE_SSE2 defined because it refuses to cast __m128 to __m128d and vice versa. While there are intrinsics to do the casts, I thought it would be simpler to just use an intrinsic that accomplishes the same thing without all the casting. Thanks, --John @@ -91,7 +91,7 @@ static inline double inner_product_double(const float *a, const float *b, u...
2016 Sep 12
4
[X86] FMA transformation restrictions
I noticed that the operand commuting code in X86InstrInfo.cpp treats scalar FMA intrinsics specially. It prevents operand commuting on these scalar instructions because the scalar FMA instructions preserve the upper bits of the vector. Presumably, the restrictions are there because commuting operands potentially changes the result upper bits. However, AFAIK the Intel and GNU FMA intrinsics
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...cked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include <emmintrin.h> #endif #include <stdint.h> #include <assert.h> #if SPILLING_ENSUES == 1 static int32_t geti(const __m128i v, const size_t i) { switch (i) { case 0: return _mm_cvtsi128_si32(v); case 1: return _mm_cvtsi128_si32(_mm_shuffle_epi32(v, 0xe5)); case 2: return _mm_cvtsi128_si32(_mm_shuffle_epi32(v, 0xe6)); case 3: return _mm_cvtsi128_si32(_mm_shuffle_epi32(v, 0xe7)); } assert(0); return -1; } #else static...
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
...will have a fit if it's accessing uninitialized memory. Here's a version I wrote a few days ago you're welcome to use that doesn't suffer from that problem: static inline void xcorr_kernel(const opus_val16 *x, const opus_val16 *y, opus_val32 sum[4], int len) { int j; __m128 xsum1 = _mm_loadu_ps(sum); __m128 xsum2 = _mm_setzero_ps(); for (j = 0; j < len-3; j += 4) { const __m128 x0 = _mm_loadu_ps(x+j); const __m128 y0 = _mm_loadu_ps(y+j); const __m128 y3 = _mm_loadu_ps(y+j+3); xsum1 = _mm_add_ps(xsum1,_mm_mul_ps(_mm_s...
2017 Jun 14
2
Default FPENV state
Hi, We are interesting in expanding some vector operations directly in the IR form as constants https://reviews.llvm.org/D33406, for example: _mm256_cmp_ps("any input", "any input", _CMP_TRUE_UQ) should produce -1, -1, -1, ... vector, but for some values for example "1.00 -nan" if FPU exceptions were enabled this operation triggers the exception. Here is the question:
2020 Aug 19
2
Question about llvm vectors
...m __b /// A 128-bit vector of [4 x float] containing one of the source operands. /// The horizontal sums of the values are stored in the upper bits of the /// destination. /// \returns A 128-bit vector of [4 x float] containing the horizontal sums of /// both operands. static __inline__ __m128 __DEFAULT_FN_ATTRS _mm_hadd_ps(__m128 __a, __m128 __b) { return __builtin_ia32_haddps((__v4sf)__a, (__v4sf)__b); } Here clang will translate _mm_hadd_ps to a CPU specific feature. Why not create __builtin_vector_hadd(a, b) which would select the CPU specific instruction or a fallback generic imp...
2013 Jun 07
0
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
...uninitialized memory. Here's a version > I wrote a few days ago you're welcome to use that doesn't suffer from > that problem: > > static inline void xcorr_kernel(const opus_val16 *x, const opus_val16 > *y, opus_val32 sum[4], int len) > { > int j; > __m128 xsum1 = _mm_loadu_ps(sum); > __m128 xsum2 = _mm_setzero_ps(); > > for (j = 0; j < len-3; j += 4) { > const __m128 x0 = _mm_loadu_ps(x+j); > const __m128 y0 = _mm_loadu_ps(y+j); > const __m128 y3 = _mm_loadu_ps(y+j+3); > > xs...
2007 Aug 22
1
SSE bug on Win32 with GCC 4.2.1
...natvig.com> wrote: > > Jean-Marc Valin wrote: > >> I recently found a .. weird bug on Win32 SSE with GCC 4.2.1. > >> > >> In libspeex/cb_search_sse.h, the following union is used: > >> > >> union { > >> float __a[4]; > >> __m128 __v; > >> } __u; > >> > >> For some odd reason, this particular version of GCC will not 16-byte > >> align the union. IE; the alignment requirement of __v isn't propagated. > >> Changing it into this: > >> > >> union { > >&g...
2007 Aug 20
2
SSE bug on Win32 with GCC 4.2.1
Jean-Marc Valin wrote: >> I recently found a .. weird bug on Win32 SSE with GCC 4.2.1. >> >> In libspeex/cb_search_sse.h, the following union is used: >> >> union { >> float __a[4]; >> __m128 __v; >> } __u; >> >> For some odd reason, this particular version of GCC will not 16-byte >> align the union. IE; the alignment requirement of __v isn't propagated. >> Changing it into this: >> >> union { >> float __a[4]; >> __m128 _...
2007 Aug 20
3
SSE bug on Win32 with GCC 4.2.1
...t (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: I recently found a .. weird bug on Win32 SSE with GCC 4.2.1. In libspeex/cb_search_sse.h, the following union is used: union { float __a[4]; __m128 __v; } __u; [...] Content analysis details: (4.0 points, 3.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [sc...
2009 Oct 26
1
[PATCH] Fix miscompile of SSE resampler
...#include <xmmintrin.h> #define OVERRIDE_INNER_PRODUCT_SINGLE -static inline float inner_product_single(const float *a, const float *b, unsigned int len) +static inline void inner_product_single(float *ret, const float *a, const float *b, unsigned int len) { int i; - float ret; __m128 sum = _mm_setzero_ps(); for (i=0;i<len;i+=8) { @@ -49,14 +48,12 @@ static inline float inner_product_single(const float *a, const float *b, unsigne } sum = _mm_add_ps(sum, _mm_movehl_ps(sum, sum)); sum = _mm_add_ss(sum, _mm_shuffle_ps(sum, sum, 0x55)); - _mm_store_ss(&r...
2009 Jul 23
1
[LLVMdev] Case where VSETCC DAGCombiner hack doesn't work
On Jul 21, 2009, at 11:14 PM, Eli Friedman wrote: > Testcase (compile with clang >= r76726): > #include <emmintrin.h> > __m128i a(__m128 a, __m128 b) { return a==a & b==b; } > > CodeGen ends up scalarizing the comparison, which is really bad, and > AFAIK different from what we did before vsetcc was removed. The ideal > code is a single cmpordps, although I don't think clang ever generated > that for...
2005 Jan 25
1
"spx_word16_t *" is incompatible with parameter of type "float *"
..., float *resp2, spx_word32_t *E, int shape_cb_size, int subvect_size, char *stack) But then calls it with resp2 being defined as type spx_word16_t * in line 185: compute_weighted_codebook(shape_cb, r, resp, resp2, E, shape_cb_size, subvect_size, stack); Defined on line 103: #ifdef _USE_SSE __m128 *resp2; __m128 *E; #else spx_word16_t *resp2; spx_word32_t *E; #endif I have downloaded the speex version 1.1.6 and have made no changes; I used a command line option to define FIXED_POINT. *** Is there a fix for this discrepancy? *** Change to float (a site search of "resp2"...
2014 Sep 03
2
[LLVMdev] Questions on the llvm 'vector' types and resulting SIMD instructions
...hat when I JIT, it will generates the best SIMD instructions available on the host it's running on? For example, when running on a machine supporting SSE, it does seem to generate SSE instructions, and this successfully turns into a function callable from C with a signature that looks like __m128 simd_mul (__m128 a, __m128 b); But the vector documentation is a little sketchy, and I am not sure about a few things: * Will it really autodetect and use the best SIMD available on my machine? (For example, SSE4.2 vs SSE2, etc.?) Is there anything I need to tell the JIT or the ExecutionEngine to...
2020 Aug 20
2
Question about llvm vectors
...e >> operands. >> /// The horizontal sums of the values are stored in the upper bits of >> the >> /// destination. >> /// \returns A 128-bit vector of [4 x float] containing the horizontal >> sums of >> /// both operands. >> static __inline__ __m128 __DEFAULT_FN_ATTRS >> _mm_hadd_ps(__m128 __a, __m128 __b) >> { >> return __builtin_ia32_haddps((__v4sf)__a, (__v4sf)__b); >> } >> >> Here clang will translate _mm_hadd_ps to a CPU specific feature. >> Why not create __builtin_vector_hadd(a, b) which would...
2019 Jun 03
2
[cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
...lowing code: > > > ``` > // AArch64 Advanced SIMD compilation > double foo(double) __attribute__(simd_variant(“nN2v”,”neon_foo”)); > float64x2_t neon_foo(float64x2_t x) {…} > > // x86 SSE compilation > double foo(double) __attribute__(simd_variant(“aN2v”,”sse_foo”)); > __m128 sse_foo(__m128 x) {…} > ``` > > The attribute would use the “core” tokens of the mangled names (without > _ZGV prefix and the scalar function name postfix) to describe the vector > function provided in the redirection. > > Formal syntax: > > ``` > __attribute__(simd_va...
2016 May 31
2
[PATCH 1/2] Modify autoconf tests for intrinsics to stop clang from optimizing them away.
...x"1"], @@ -521,10 +522,13 @@ AS_IF([test x"$enable_intrinsics" = x"yes"],[ [OPUS_X86_MAY_HAVE_SSE], [OPUS_X86_PRESUME_SSE], [[#include <xmmintrin.h> + #include <time.h> ]], [[ - static __m128 mtest; - mtest = _mm_setzero_ps(); + __m128 mtest; + mtest = _mm_set1_ps((float)time(NULL)); + mtest = _mm_mul_ps(mtest, mtest); + return _mm_cvtss_si32(mtest); ]] ) AS_IF([test x"$OPUS_X86_MAY_HAVE_SSE" =...
2015 Mar 13
1
[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.
...mathops.h" #include "pitch.h" -#if defined(OPUS_X86_MAY_HAVE_SSE4_1) -#include <smmintrin.h> -#include "x86cpu.h" - -opus_val32 celt_inner_prod_sse4_1(const opus_val16 *x, const opus_val16 *y, - int N) -{ - opus_int i, dataSize16; - opus_int32 sum; - __m128i inVec1_76543210, inVec1_FEDCBA98, acc1; - __m128i inVec2_76543210, inVec2_FEDCBA98, acc2; - __m128i inVec1_3210, inVec2_3210; - - sum = 0; - dataSize16 = N & ~15; - - acc1 = _mm_setzero_si128(); - acc2 = _mm_setzero_si128(); - - for (i=0;i<dataSize16;i+=16) { - i...
2015 Mar 12
1
[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
...mathops.h" #include "pitch.h" -#if defined(OPUS_X86_MAY_HAVE_SSE4_1) -#include <smmintrin.h> -#include "x86cpu.h" - -opus_val32 celt_inner_prod_sse4_1(const opus_val16 *x, const opus_val16 *y, - int N) -{ - opus_int i, dataSize16; - opus_int32 sum; - __m128i inVec1_76543210, inVec1_FEDCBA98, acc1; - __m128i inVec2_76543210, inVec2_FEDCBA98, acc2; - __m128i inVec1_3210, inVec2_3210; - - sum = 0; - dataSize16 = N & ~15; - - acc1 = _mm_setzero_si128(); - acc2 = _mm_setzero_si128(); - - for (i=0;i<dataSize16;i+=16) { - i...