thr3ads.net - search: "_

Displaying 12 results from an estimated 12 matches for "__m256".

Did you mean: __m256i

RFC: Adding Support For Vectorcall Calling Convention

2016 Nov 30

RFC: Adding Support For Vectorcall Calling Convention

...ion of HVA Types -------------------------------------- A Homogeneous Vector Aggregate (HVA) type is a composite type of up to four data members that have identical vector types. An HVA type has the same alignment requirement as the vector type of its members. For example: typedef struct { __m256 x; __m256 y; __m256 z; } hva3; // HVA type with 3 __m256 elements Vectorcall Extension ---------------------------- Vectorcall extends the standard x64 calling convention while adding support for HVA and vector types. There are four main differences: - Floating-point types are consid...

[PATCH 1/2] Modify autoconf tests for intrinsics to stop clang from optimizing them away.

2016 May 31

[PATCH 1/2] Modify autoconf tests for intrinsics to stop clang from optimizing them away.

...= x"1"], @@ -576,10 +585,13 @@ AS_IF([test x"$enable_intrinsics" = x"yes"],[ [OPUS_X86_MAY_HAVE_AVX], [OPUS_X86_PRESUME_AVX], [[#include <immintrin.h> + #include <time.h> ]], [[ - static __m256 mtest; - mtest = _mm256_setzero_ps(); + __m256 mtest; + mtest = _mm256_set1_ps((float)time(NULL)); + mtest = _mm256_addsub_ps(mtest, mtest); + return _mm_cvtss_si32(_mm256_extractf128_ps(mtest, 0)); ]] ) AS_IF([test...

[RFC] Expose user provided vector function for auto-vectorization.

2019 Jun 10

[RFC] Expose user provided vector function for auto-vectorization.

...t the vectorizer should care about? For the case mentioned earlier: float MyAdd(float* a, int b) { return *a + b; } __declspec(vector_variant(implements(MyAdd(float *a, int b)), linear(a), vectorlength(8), nomask, processor(core_2nd_gen_avx))) __m256 __regcall MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2) If FE emitted ;; Alwaysinline define <8 x float> @MyAddVec.abi_wrapper(float* %v_a, <8 x i32> %v_b) { ;; Not sure about the exact values in the mask parameter. %v_b1 = shufflevector <8 x i32> %v_b, <8 x i32> un...

[RFC] Expose user provided vector function for auto-vectorization.

2019 Jun 10

[RFC] Expose user provided vector function for auto-vectorization.

...eloper-guide-and-reference-vector-variant: > > float MyAdd(float* a, int b) { return *a + b; } > __declspec(vector_variant(implements(MyAdd(float *a, int b)), > linear(a), vectorlength(8), > nomask, processor(core_2nd_gen_avx))) > __m256 __regcall MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2) > > We need somehow communicate which lanes of widened "b" would map for the b1 parameter and which would go to the b2. If we only care about single ABI (like the one mandated by the OMP) than such things could be put to TTI...

[LLVMdev] ABI incompatability when passing vector parameters on 32-bit x86

2014 Dec 15

[LLVMdev] ABI incompatability when passing vector parameters on 32-bit x86

...clang and GCC in the way vector parameters are passed on 32-bit x86. (This is documented in PR21510.) Specifically, GCC uses XMM0-XMM2 to pass the first 3 __m128 parameters, and the rest are passed on the stack. Clang passes an additional parameter by register, using XMM0-XMM3. The same applies to __m256 with YMM0-2 vs. YMM0-3. In theory, it would apply to __m512 as well, but currently clang doesn't support passing __m512 in x86 mode at all. ICC has the same behavior as GCC, and it seems that MSVC in 32-bit mode only *allows* up to 3 vector parameters per function (when not using __vectorcall),...

[RFC] Expose user provided vector function for auto-vectorization.

2019 Jun 07

[RFC] Expose user provided vector function for auto-vectorization.

...en-us/cpp-compiler-developer-guide-and-reference-vector-variant: float MyAdd(float* a, int b) { return *a + b; } __declspec(vector_variant(implements(MyAdd(float *a, int b)), linear(a), vectorlength(8), nomask, processor(core_2nd_gen_avx))) __m256 __regcall MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2) We need somehow communicate which lanes of widened "b" would map for the b1 parameter and which would go to the b2. If we only care about single ABI (like the one mandated by the OMP) than such things could be put to TTI, but wha...

RFC: Interface user provided vector functions with the vectorizer.

2019 Jun 11