thr3ads.net - search: "_

Displaying 20 results from an estimated 59 matches for "__m128".

Did you mean: __m128i

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

...#define SHUF _mm_shuffle_ps #define VLIT4(a,b,c,d) _mm_set_ps(a,b,c,d) #define SWAP(d) SHUF(d,d,_MM_SHUFFLE(2,3,0,1)) #define UNPACK2LO(a,b) SHUF(a,b,_MM_SHUFFLE(1,0,1,0)) #define UNPACK2HI(a,b) SHUF(a,b,_MM_SHUFFLE(3,2,3,2)) #define HALFBLEND(a,b) SHUF(a,b,_MM_SHUFFLE(3,2,1,0)) __INLINE void TX2(__m128 *a, __m128 *b) { __m128 TX2_t0 = UNPACK2LO(*a, *b); __m128 TX2_t1 = UNPACK2HI(*a,*b); *a = TX2_t0; *b = TX2_t1; } __INLINE void FMA(__m128 *Rd, __m128 Rn, __m128 Rm) { *Rd = ADD(*Rd, MULT(Rn,Rm)); } __INLINE void FMS(__m128 *Rd, __m128 Rn, __m128 Rm) { *Rd = SUB(*Rd, MULT(Rn,Rm)); } __INL...

SSE2 code won't compile in VC

2008 Nov 26

SSE2 code won't compile in VC

Jean-Marc, At least VS2005 (what I'm using) won't compile resample_sse.h with _USE_SSE2 defined because it refuses to cast __m128 to __m128d and vice versa. While there are intrinsics to do the casts, I thought it would be simpler to just use an intrinsic that accomplishes the same thing without all the casting. Thanks, --John @@ -91,7 +91,7 @@ static inline double inner_product_double(const float *a, const float *b, u...

[X86] FMA transformation restrictions

2016 Sep 12

[X86] FMA transformation restrictions

I noticed that the operand commuting code in X86InstrInfo.cpp treats scalar FMA intrinsics specially. It prevents operand commuting on these scalar instructions because the scalar FMA instructions preserve the upper bits of the vector. Presumably, the restrictions are there because commuting operands potentially changes the result upper bits. However, AFAIK the Intel and GNU FMA intrinsics

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...cked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include <emmintrin.h> #endif #include <stdint.h> #include <assert.h> #if SPILLING_ENSUES == 1 static int32_t geti(const __m128i v, const size_t i) { switch (i) { case 0: return _mm_cvtsi128_si32(v); case 1: return _mm_cvtsi128_si32(_mm_shuffle_epi32(v, 0xe5)); case 2: return _mm_cvtsi128_si32(_mm_shuffle_epi32(v, 0xe6)); case 3: return _mm_cvtsi128_si32(_mm_shuffle_epi32(v, 0xe7)); } assert(0); return -1; } #else static...

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

2013 Jun 07

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

...will have a fit if it's accessing uninitialized memory. Here's a version I wrote a few days ago you're welcome to use that doesn't suffer from that problem: static inline void xcorr_kernel(const opus_val16 *x, const opus_val16 *y, opus_val32 sum[4], int len) { int j; __m128 xsum1 = _mm_loadu_ps(sum); __m128 xsum2 = _mm_setzero_ps(); for (j = 0; j < len-3; j += 4) { const __m128 x0 = _mm_loadu_ps(x+j); const __m128 y0 = _mm_loadu_ps(y+j); const __m128 y3 = _mm_loadu_ps(y+j+3); xsum1 = _mm_add_ps(xsum1,_mm_mul_ps(_mm_s...

Default FPENV state

2017 Jun 14

Default FPENV state

Hi, We are interesting in expanding some vector operations directly in the IR form as constants https://reviews.llvm.org/D33406, for example: _mm256_cmp_ps("any input", "any input", _CMP_TRUE_UQ) should produce -1, -1, -1, ... vector, but for some values for example "1.00 -nan" if FPU exceptions were enabled this operation triggers the exception. Here is the question:

Question about llvm vectors

2020 Aug 19

Question about llvm vectors

...m __b /// A 128-bit vector of [4 x float] containing one of the source operands. /// The horizontal sums of the values are stored in the upper bits of the /// destination. /// \returns A 128-bit vector of [4 x float] containing the horizontal sums of /// both operands. static __inline__ __m128 __DEFAULT_FN_ATTRS _mm_hadd_ps(__m128 __a, __m128 __b) { return __builtin_ia32_haddps((__v4sf)__a, (__v4sf)__b); } Here clang will translate _mm_hadd_ps to a CPU specific feature. Why not create __builtin_vector_hadd(a, b) which would select the CPU specific instruction or a fallback generic imp...

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

2013 Jun 07

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

...uninitialized memory. Here's a version > I wrote a few days ago you're welcome to use that doesn't suffer from > that problem: > > static inline void xcorr_kernel(const opus_val16 *x, const opus_val16 > *y, opus_val32 sum[4], int len) > { > int j; > __m128 xsum1 = _mm_loadu_ps(sum); > __m128 xsum2 = _mm_setzero_ps(); > > for (j = 0; j < len-3; j += 4) { > const __m128 x0 = _mm_loadu_ps(x+j); > const __m128 y0 = _mm_loadu_ps(y+j); > const __m128 y3 = _mm_loadu_ps(y+j+3); > > xs...

SSE bug on Win32 with GCC 4.2.1

2007 Aug 22

SSE bug on Win32 with GCC 4.2.1

...natvig.com> wrote: > > Jean-Marc Valin wrote: > >> I recently found a .. weird bug on Win32 SSE with GCC 4.2.1. > >> > >> In libspeex/cb_search_sse.h, the following union is used: > >> > >> union { > >> float __a[4]; > >> __m128 __v; > >> } __u; > >> > >> For some odd reason, this particular version of GCC will not 16-byte > >> align the union. IE; the alignment requirement of __v isn't propagated. > >> Changing it into this: > >> > >> union { > >&g...

SSE bug on Win32 with GCC 4.2.1

2007 Aug 20

SSE bug on Win32 with GCC 4.2.1

Jean-Marc Valin wrote: >> I recently found a .. weird bug on Win32 SSE with GCC 4.2.1. >> >> In libspeex/cb_search_sse.h, the following union is used: >> >> union { >> float __a[4]; >> __m128 __v; >> } __u; >> >> For some odd reason, this particular version of GCC will not 16-byte >> align the union. IE; the alignment requirement of __v isn't propagated. >> Changing it into this: >> >> union { >> float __a[4]; >> __m128 _...

SSE bug on Win32 with GCC 4.2.1

2007 Aug 20

SSE bug on Win32 with GCC 4.2.1

...t (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: I recently found a .. weird bug on Win32 SSE with GCC 4.2.1. In libspeex/cb_search_sse.h, the following union is used: union { float __a[4]; __m128 __v; } __u; [...] Content analysis details: (4.0 points, 3.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [sc...

[PATCH] Fix miscompile of SSE resampler

2009 Oct 26

[PATCH] Fix miscompile of SSE resampler

...#include <xmmintrin.h> #define OVERRIDE_INNER_PRODUCT_SINGLE -static inline float inner_product_single(const float *a, const float *b, unsigned int len) +static inline void inner_product_single(float *ret, const float *a, const float *b, unsigned int len) { int i; - float ret; __m128 sum = _mm_setzero_ps(); for (i=0;i<len;i+=8) { @@ -49,14 +48,12 @@ static inline float inner_product_single(const float *a, const float *b, unsigne } sum = _mm_add_ps(sum, _mm_movehl_ps(sum, sum)); sum = _mm_add_ss(sum, _mm_shuffle_ps(sum, sum, 0x55)); - _mm_store_ss(&r...

[LLVMdev] Case where VSETCC DAGCombiner hack doesn't work

2009 Jul 23

[LLVMdev] Case where VSETCC DAGCombiner hack doesn't work

On Jul 21, 2009, at 11:14 PM, Eli Friedman wrote: > Testcase (compile with clang >= r76726): > #include <emmintrin.h> > __m128i a(__m128 a, __m128 b) { return a==a & b==b; } > > CodeGen ends up scalarizing the comparison, which is really bad, and > AFAIK different from what we did before vsetcc was removed. The ideal > code is a single cmpordps, although I don't think clang ever generated > that for...

"spx_word16_t *" is incompatible with parameter of type "float *"

2005 Jan 25

"spx_word16_t *" is incompatible with parameter of type "float *"

..., float *resp2, spx_word32_t *E, int shape_cb_size, int subvect_size, char *stack) But then calls it with resp2 being defined as type spx_word16_t * in line 185: compute_weighted_codebook(shape_cb, r, resp, resp2, E, shape_cb_size, subvect_size, stack); Defined on line 103: #ifdef _USE_SSE __m128 *resp2; __m128 *E; #else spx_word16_t *resp2; spx_word32_t *E; #endif I have downloaded the speex version 1.1.6 and have made no changes; I used a command line option to define FIXED_POINT. *** Is there a fix for this discrepancy? *** Change to float (a site search of "resp2"...

[LLVMdev] Questions on the llvm 'vector' types and resulting SIMD instructions

2014 Sep 03

[LLVMdev] Questions on the llvm 'vector' types and resulting SIMD instructions

...hat when I JIT, it will generates the best SIMD instructions available on the host it's running on? For example, when running on a machine supporting SSE, it does seem to generate SSE instructions, and this successfully turns into a function callable from C with a signature that looks like __m128 simd_mul (__m128 a, __m128 b); But the vector documentation is a little sketchy, and I am not sure about a few things: * Will it really autodetect and use the best SIMD available on my machine? (For example, SSE4.2 vs SSE2, etc.?) Is there anything I need to tell the JIT or the ExecutionEngine to...

Question about llvm vectors

2020 Aug 20

Question about llvm vectors

...e >> operands. >> /// The horizontal sums of the values are stored in the upper bits of >> the >> /// destination. >> /// \returns A 128-bit vector of [4 x float] containing the horizontal >> sums of >> /// both operands. >> static __inline__ __m128 __DEFAULT_FN_ATTRS >> _mm_hadd_ps(__m128 __a, __m128 __b) >> { >> return __builtin_ia32_haddps((__v4sf)__a, (__v4sf)__b); >> } >> >> Here clang will translate _mm_hadd_ps to a CPU specific feature. >> Why not create __builtin_vector_hadd(a, b) which would...

[cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

2019 Jun 03

[cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

...lowing code: > > > ``` > // AArch64 Advanced SIMD compilation > double foo(double) __attribute__(simd_variant(“nN2v”,”neon_foo”)); > float64x2_t neon_foo(float64x2_t x) {…} > > // x86 SSE compilation > double foo(double) __attribute__(simd_variant(“aN2v”,”sse_foo”)); > __m128 sse_foo(__m128 x) {…} > ``` > > The attribute would use the “core” tokens of the mangled names (without > _ZGV prefix and the scalar function name postfix) to describe the vector > function provided in the redirection. > > Formal syntax: > > ``` > __attribute__(simd_va...

[PATCH 1/2] Modify autoconf tests for intrinsics to stop clang from optimizing them away.

2016 May 31

[PATCH 1/2] Modify autoconf tests for intrinsics to stop clang from optimizing them away.

...x"1"], @@ -521,10 +522,13 @@ AS_IF([test x"$enable_intrinsics" = x"yes"],[ [OPUS_X86_MAY_HAVE_SSE], [OPUS_X86_PRESUME_SSE], [[#include <xmmintrin.h> + #include <time.h> ]], [[ - static __m128 mtest; - mtest = _mm_setzero_ps(); + __m128 mtest; + mtest = _mm_set1_ps((float)time(NULL)); + mtest = _mm_mul_ps(mtest, mtest); + return _mm_cvtss_si32(mtest); ]] ) AS_IF([test x"$OPUS_X86_MAY_HAVE_SSE" =...

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 13

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

...mathops.h" #include "pitch.h" -#if defined(OPUS_X86_MAY_HAVE_SSE4_1) -#include <smmintrin.h> -#include "x86cpu.h" - -opus_val32 celt_inner_prod_sse4_1(const opus_val16 *x, const opus_val16 *y, - int N) -{ - opus_int i, dataSize16; - opus_int32 sum; - __m128i inVec1_76543210, inVec1_FEDCBA98, acc1; - __m128i inVec2_76543210, inVec2_FEDCBA98, acc2; - __m128i inVec1_3210, inVec2_3210; - - sum = 0; - dataSize16 = N & ~15; - - acc1 = _mm_setzero_si128(); - acc2 = _mm_setzero_si128(); - - for (i=0;i<dataSize16;i+=16) { - i...

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 12

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

search for: __m128