thr3ads.net - search: "emmintrin"

[LLVMdev] [clang] SSE2 intrinsics (emmintrin.h): _mm_movpi64_pi64 should be _mm_movpi64_epi64?

2013 Nov 22

0

[LLVMdev] [clang] SSE2 intrinsics (emmintrin.h): _mm_movpi64_pi64 should be _mm_movpi64_epi64?

Hi there, I've recently encountered a piece of code that uses some SSE2 intrinsics and builds with gcc46, but not clang: clang can't find _mm_movpi64_epi64(), while gcc46 defines it in its lib/gcc46/gcc/.../4.6.3/include/emmintrin.h: extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_movpi64_epi64 (__m64 __A) { return _mm_set_epi64 ((__m64)0LL, __A); } extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_set_epi64x (long l...

[LLVMdev] instcombine does silly things with vector x+x

2011 Oct 28

2

[LLVMdev] instcombine does silly things with vector x+x

...lvm.x86.mmx.padd.b), and the saturating 128 bit version (llvm.x86.sse2.padds.b). I would just give up and use inline assembly, but it seems I can't JIT that. I'm using the latest llvm 3.1 from svn. I get similar behavior at llvm.org/demo using the following equivalent C code: #include <emmintrin.h> __m128i f(__m128i a) { return _mm_add_epi8(a, a); } The no-optimization compilation of this is better than the optimized version. Any ideas? Should I just not use this pass? - Andrew

[LLVMdev] Case where VSETCC DAGCombiner hack doesn't work

2009 Jul 23

1

[LLVMdev] Case where VSETCC DAGCombiner hack doesn't work

On Jul 21, 2009, at 11:14 PM, Eli Friedman wrote: > Testcase (compile with clang >= r76726): > #include <emmintrin.h> > __m128i a(__m128 a, __m128 b) { return a==a & b==b; } > > CodeGen ends up scalarizing the comparison, which is really bad, and > AFAIK different from what we did before vsetcc was removed. The ideal > code is a single cmpordps, although I don't think clang ever gener...

[PATCH 1/2] Modify autoconf tests for intrinsics to stop clang from optimizing them away.

2016 May 31

2

[PATCH 1/2] Modify autoconf tests for intrinsics to stop clang from optimizing them away.

...OPUS_X86_MAY_HAVE_SSE" = x"1" && test x"$OPUS_X86_PRESUME_SSE" != x"1"], @@ -539,10 +543,13 @@ AS_IF([test x"$enable_intrinsics" = x"yes"],[ [OPUS_X86_MAY_HAVE_SSE2], [OPUS_X86_PRESUME_SSE2], [[#include <emmintrin.h> + #include <time.h> ]], [[ - static __m128i mtest; - mtest = _mm_setzero_si128(); + __m128i mtest; + mtest = _mm_set1_epi32((int)time(NULL)); + mtest = _mm_mul_epu32(mtest, mtest); + return...

[LLVMdev] instcombine does silly things with vector x+x

2011 Oct 28

0

[LLVMdev] instcombine does silly things with vector x+x

...aturating 128 bit version (llvm.x86.sse2.padds.b). I would just give > up and use inline assembly, but it seems I can't JIT that. > > I'm using the latest llvm 3.1 from svn. I get similar behavior at > llvm.org/demo using the following equivalent C code: > > #include <emmintrin.h> > __m128i f(__m128i a) { > return _mm_add_epi8(a, a); > } > > The no-optimization compilation of this is better than the optimized version. > > Any ideas? Should I just not use this pass? > > - Andrew > _______________________________________________ > LLV...

[LLVMdev] long double type on ARM

2009 Sep 30

2

[LLVMdev] long double type on ARM

...LLVM LOCAL end ;; ... arm*-*-*) cpu_type=arm extra_headers="mmintrin.h" ;; ... i[34567]86-*-*) cpu_type=i386 # LLVM LOCAL begin out_cxx_file=i386/llvm-i386.cpp # LLVM LOCAL end # APPLE LOCAL begin 5612787 mainline sse4 extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h nmmintrin.h" (out_cxx_file variable is empty for ARM target) I wonder if llvm-gcc 4.2 front-end support bitcode conversion for ARM target. Thank you. Best regards, Jin-Gu Kang ________________________________ From: Bo...

[LLVMdev] instcombine does silly things with vector x+x

2011 Oct 30

1

[LLVMdev] instcombine does silly things with vector x+x

...aturating 128 bit version (llvm.x86.sse2.padds.b). I would just give > up and use inline assembly, but it seems I can't JIT that. > > I'm using the latest llvm 3.1 from svn. I get similar behavior at > llvm.org/demo using the following equivalent C code: > > #include <emmintrin.h> > __m128i f(__m128i a) { > return _mm_add_epi8(a, a); > } > > The no-optimization compilation of this is better than the optimized version. > > Any ideas? Should I just not use this pass? > > - Andrew > _______________________________________________ > LLV...

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 13

1

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

...PL[OPUS_ARCHMASK + 1])( #endif #endif + +#endif diff --git a/celt/x86/pitch_sse.c b/celt/x86/pitch_sse.c index e3bc6d7..20e7312 100644 --- a/celt/x86/pitch_sse.c +++ b/celt/x86/pitch_sse.c @@ -29,223 +29,157 @@ #include "config.h" #endif -#include <xmmintrin.h> -#include <emmintrin.h> - #include "macros.h" #include "celt_lpc.h" #include "stack_alloc.h" #include "mathops.h" #include "pitch.h" -#if defined(OPUS_X86_MAY_HAVE_SSE4_1) -#include <smmintrin.h> -#include "x86cpu.h" - -opus_val32 celt_inner_pr...

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 12

1

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

...PL[OPUS_ARCHMASK + 1])( #endif #endif + +#endif diff --git a/celt/x86/pitch_sse.c b/celt/x86/pitch_sse.c index e3bc6d7..20e7312 100644 --- a/celt/x86/pitch_sse.c +++ b/celt/x86/pitch_sse.c @@ -29,223 +29,157 @@ #include "config.h" #endif -#include <xmmintrin.h> -#include <emmintrin.h> - #include "macros.h" #include "celt_lpc.h" #include "stack_alloc.h" #include "mathops.h" #include "pitch.h" -#if defined(OPUS_X86_MAY_HAVE_SSE4_1) -#include <smmintrin.h> -#include "x86cpu.h" - -opus_val32 celt_inner_pr...

[LLVMdev] long double type on ARM

2009 Sep 30

0

[LLVMdev] long double type on ARM

...gt; extra_headers="mmintrin.h" > ;; > ... > i[34567]86-*-*) > cpu_type=i386 > # LLVM LOCAL begin > out_cxx_file=i386/llvm-i386.cpp > # LLVM LOCAL end > # APPLE LOCAL begin 5612787 mainline sse4 > extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h > pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h > nmmintrin.h" > > (out_cxx_file variable is empty for ARM target) > I wonder if llvm-gcc 4.2 front-end support bitcode conversion for > ARM target. > > Thank you. > > Best regards, > &...

[PATCH] Fix miscompile of SSE resampler

2009 Oct 26

1

[PATCH] Fix miscompile of SSE resampler

...uct_single(const float *a, const float *b, u sum = _mm_mul_ps(f, sum); sum = _mm_add_ps(sum, _mm_movehl_ps(sum, sum)); sum = _mm_add_ss(sum, _mm_shuffle_ps(sum, sum, 0x55)); - _mm_store_ss(&ret, sum); - return ret; + _mm_store_ss(ret, sum); } #ifdef _USE_SSE2 #include <emmintrin.h> #define OVERRIDE_INNER_PRODUCT_DOUBLE -static inline double inner_product_double(const float *a, const float *b, unsigned int len) +static inline void inner_product_double(double *ret, const float *a, const float *b, unsigned int len) { int i; - double ret; __m128d sum = _mm_set...

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

2

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include <emmintrin.h> #endif #include <stdint.h> #include <assert.h> #if SPILLING_ENSUES == 1 static int32_t geti(const __m128i v, const size_t i) { switch (i) { case 0: return _mm_cvtsi128_si32(v); case 1: return _mm_cvtsi128_si32(_mm_shuffle_epi32(v, 0xe5)); case 2: return _mm_cvtsi128_si32(_mm_shuf...

Patch cleaning up Opus x86 intrinsics configury

2015 Mar 02

13

Patch cleaning up Opus x86 intrinsics configury

The attached patch cleans up Opus's x86 intrinsics configury. It: * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in

gcc ubsan alignement test --minimal gcc version?

2015 Oct 14

0

gcc ubsan alignement test --minimal gcc version?

...make clean But I'm not sure how to proceed from there. I tried the obvious thing: Rdevel CMD build my_offending_package Rdevel CMD check --as-cran my_offending_package But I have not been able to replicate the bug I see on cran: /usr/local/gcc5/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/include/emmintrin.h:140:21: runtime error: load of misaligned address 0x61800007fc84 for type 'const double', which requires 8 byte alignment 0x61800007fc84: note: pointer points here 00 00 80 3f 00 00 80 3f 00 00 80 3f 00 00 80 3f 00 00 80 3f 00 00 80 3f 00 00 80 3f 00 00 80 3f **Though the original...

[PATCH 3a/3] Add shadow VRAM

2006 Mar 16

0

[PATCH 3a/3] Add shadow VRAM

...a/tools/ioemu/hw/vga.c Tue Mar 14 19:33:45 2006 +0100 +++ b/tools/ioemu/hw/vga.c Thu Mar 16 14:15:07 2006 -0700 @@ -21,6 +21,10 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ +#include <signal.h> +#include <setjmp.h> +#include <emmintrin.h> + #include "vl.h" #include "vga_int.h" @@ -149,6 +153,8 @@ static uint8_t expand4to8[16]; VGAState *vga_state; int vga_io_memory; + +int sse2_ok = 1; static uint32_t vga_ioport_read(void *opaque, uint32_t addr) { @@ -1340,6 +1346,80 @@ void vga_invalidate_scanl...

gcc ubsan alignement test --minimal gcc version?

2015 Oct 13

2

gcc ubsan alignement test --minimal gcc version?

Dear All, I'm trying to implement the section of the manual pertaining to the gcc-ubsan test carried by CRAN on my local computer (ubuntu 14.04): http://www.stats.ox.ac.uk/pub/bdr/memtests/gcc-UBSAN/README.txt I was wondering whether someone could tell what the minimal version of the gcc tool chain needed to run the gcc-ASAN and gcc-UBSAN alignment tests on ones local

[LLVMdev] long double type on ARM

2009 Sep 30

0

[LLVMdev] long double type on ARM

Unlike llvm itself, llvm-gcc needs to be configured for a particular target architecture. It looks like you're using a copy of llvm-gcc that was built to generate x86 code. On Sep 30, 2009, at 6:27 AM, Jin Gu Kang wrote: > Dear LLVM members. > > I am compiling coreutils-7.4 package for ARM linux using LLVM 2.5 > version. > > When i compiled 'od' program in

[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10

2015 Mar 18

5

[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10

Hi All, Since I continue to base my work on top of Jonathan's patch, and my previous Ne10 fft/ifft/mdct_forward/backward patches, I thought it would be better to just post all new patches as a patch series. Please let me know if anyone disagrees with this approach. You can see wip branch of all latest patches at https://git.linaro.org/people/viswanath.puttagunta/opus.git Branch:

[LLVMdev] Compiling llvm and Clang on Linux

2012 Jul 11

0

[LLVMdev] Compiling llvm and Clang on Linux

It's undocumented FAQ, if you are using RHEL5 (or clone). - install gcc44-c++ - Build with CC=gcc44 CXX=g++44 - You may need "CC=clang -std=gnu89" to use clang with its glibc. Have fun! ps. AFAIK, clang can be built more easily on centos6. ...Takumi 2012/7/11 Sitvanit Ruah <RUAH at il.ibm.com>: > > Hello all, > I am new to this mailing list so I hope this is

[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series

2015 Mar 31

6

[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series

Hi Timothy, As I mentioned earlier [1], I now fixed compile issues with fixed point and resubmitting the patch. I also have new patch that does intrinsics optimizations for celt_pitch_xcorr targetting aarch64. You can find my latest work-in-progress branch at [2] For reference, you can use the Ne10 pre-built libraries at [3] Note that I am working with Phil at ARM to get my patch at [4]

search for: emmintrin