thr3ads.net - search: "vxorp"

Displaying 20 results from an estimated 23 matches for "vxorp".

Did you mean: vxorps

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote: > Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

..., float 0.000000e+00, > float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1, > i32 6, i32 7> > ret <4 x float> %2 > } > > > llc -march=x86-64 -mattr=+avx test.ll -o - > > test: # @test > vxorps %xmm2, %xmm2, %xmm2 > vmovss %xmm0, %xmm2, %xmm2 > vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] > vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > retl > > test2: # @test2 > vi...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 06

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1, >> i32 6, i32 7> >> ret <4 x float> %2 >> } >> >> >> llc -march=x86-64 -mattr=+avx test.ll -o - >> >> test: # @test >> vxorps %xmm2, %xmm2, %xmm2 >> vmovss %xmm0, %xmm2, %xmm2 >> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] >> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] >> retl >> >> test2:...

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

2014 Mar 26

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

...0000e+00 > .text > .globl _Z21inspect_singleton_sqrd > .align 16, 0x90 > .type _Z21inspect_singleton_sqrd, at function > _Z21inspect_singleton_sqrd: # @_Z21inspect_singleton_sqrd > .cfi_startproc > # BB#0: > vmulsd %xmm0, %xmm0, %xmm1 > vxorpd .LCPI1_0(%rip), %xmm1, %xmm0 > ret > .Ltmp1: > .size _Z21inspect_singleton_sqrd, .Ltmp1-_Z21inspect_singleton_sqrd > .cfi_endproc > > I realize this is unsupported behavior, but it would be nice to still > be able to use clang to do numerical computation. Is there a...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 08

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...t; %1, <4 x i32> <i32 4, i32 1, >>> i32 6, i32 7> >>> ret <4 x float> %2 >>> } >>> >>> >>> llc -march=x86-64 -mattr=+avx test.ll -o - >>> >>> test: # @test >>> vxorps %xmm2, %xmm2, %xmm2 >>> vmovss %xmm0, %xmm2, %xmm2 >>> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] >>> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] >>> retl >>> >>> test2:...

Restrict global constructors to base ISA

2018 Dec 01

Restrict global constructors to base ISA

...ng++-mp-5.0 ... -c chacha.cpp /opt/local/bin/clang++-mp-5.0 ... -mavx2 -c chacha_avx.cpp /opt/local/bin/clang++-mp-5.0 ... -msse2 -c chacha_simd.cpp ... At runtime we catch a SIGILL due to chacha_avx.cpp as shown below. It looks like global constructors are using instructions from AVX (vxorps), which is beyond what the machine supports. How do we tell Clang to use the base ISA for global constructors? Thanks in advance. ========== Here's the full command line used for a typical file: /opt/local/bin/clang++-mp-5.0 -DNDEBUG -g2 -O3 -fPIC -pthread -pipe -c cryptlib.cpp Here'...

[LLVMdev] inefficient code generation for 128-bit->256-bit typecast intrinsics

2013 Apr 09

[LLVMdev] inefficient code generation for 128-bit->256-bit typecast intrinsics

Hello, LLVM generates two additional instructions for 128->256 bit typecasts (e.g. _mm256_castsi128_si256()) to clear out the upper 128 bits of YMM register corresponding to source XMM register. vxorps xmm2,xmm2,xmm2 vinsertf128 ymm0,ymm2,xmm0,0x0 Most of the industry-standard C/C++ compilers (GCC, Intel's compiler, Visual Studio compiler) don't generate any extra moves for 128-bit->256-bit typecast intrinsics. None of these compilers zero-extend the upper 128 bits of the 256-...

[LLVMdev] AVX Status?

2011 Jun 01

[LLVMdev] AVX Status?

...2510540: v8f32 = bitcast 0x2532270 [ID=16] 0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15] 0x2532070: v4i64 = bitcast 0x2510740 [ID=14] 0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340, 0x2510f40, 0x2511140 [ORD=3] [ID=12] ... The same counts for or and xor where VXORPS etc. should be selected. There seems to be some code for this because xor <8 x i32> %m, %m works, probably because it can get rid of all bitcasts. Ideally, I guess we would want code like this instead of the intrinsics at some point: define <8 x float> @test3(<8 x float> %a,...

[LLVMdev] Optimization puzzle...

2015 Mar 25

[LLVMdev] Optimization puzzle...

...Z18sampleNullOperator5PointS_ > .cfi_startproc > ## BB#0: ## %_ZN15SamplingClosureD1Ev.exit > push rbp > Ltmp0: > .cfi_def_cfa_offset 16 > Ltmp1: > .cfi_offset rbp, -16 > mov rbp, rsp > Ltmp2: > .cfi_def_cfa_register rbp > vxorps xmm0, xmm0, xmm0 > vxorps xmm1, xmm1, xmm1 > pop rbp > ret I am wondering because I think that it might explain why the LLVM IR code shown below does not get simplified to a single "ret { <2 x float>, float } zeroinitializer" instruction. It seems to me that...

[LLVMdev] AVX Status?

2011 Jun 02

[LLVMdev] AVX Status?

...0 [ID=16] > 0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15] > 0x2532070: v4i64 = bitcast 0x2510740 [ID=14] > 0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340, > 0x2510f40, 0x2511140 [ORD=3] [ID=12] > ... > > The same counts for or and xor where VXORPS etc. should be selected. Please file bug reports! > There seems to be some code for this because > xor <8 x i32> %m, %m > works, probably because it can get rid of all bitcasts. > > Ideally, I guess we would want code like this instead of the intrinsics > at some point: &...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...t> %1, <4 x i32> <i32 4, i32 1, >>> i32 6, i32 7> >>> ret <4 x float> %2 >>> } >>> >>> >>> llc -march=x86-64 -mattr=+avx test.ll -o - >>> >>> test: # @test >>> vxorps %xmm2, %xmm2, %xmm2 >>> vmovss %xmm0, %xmm2, %xmm2 >>> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] >>> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] >>> retl >>> >>> test2:...

[LLVMdev] Optimization puzzle...

2015 Mar 25

[LLVMdev] Optimization puzzle...

...sureD1Ev.exit >> > push rbp >> > Ltmp0: >> > .cfi_def_cfa_offset 16 >> > Ltmp1: >> > .cfi_offset rbp, -16 >> > mov rbp, rsp >> > Ltmp2: >> > .cfi_def_cfa_register rbp >> > vxorps xmm0, xmm0, xmm0 >> > vxorps xmm1, xmm1, xmm1 >> > pop rbp >> > ret >> >> >> I am wondering because I think that it might explain why the LLVM IR code >> shown below does not get simplified to a single "ret { <2 x f...

[LLVMdev] AVX Status?

2011 Jun 03

[LLVMdev] AVX Status?

...270: v4i64 = and 0x2532070, 0x2532170 [ID=15] >> 0x2532070: v4i64 = bitcast 0x2510740 [ID=14] >> 0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340, >> 0x2510f40, 0x2511140 [ORD=3] [ID=12] >> ... >> >> The same counts for or and xor where VXORPS etc. should be selected. > > Please file bug reports! It's a problem with integer code. There are no 256-bit integer bitwise instructions in AVX. There are no 256-bit integer instructions period. What's missing is the legalize code to handle this. I have it in our tree. >>...

[LLVMdev] Poor register allocation (constants causing spilling)

2015 Jul 14

[LLVMdev] Poor register allocation (constants causing spilling)

...one to some length to keep it in a register, and it has spilled a value to the stack. It would have been cheaper to simply fold the constant load into the 3 uses. This is not the only example. Later on we can see this: vmovaps .LCPI0_1(%rip), %xmm6 # xmm6 = [2147483648,2147483648,...] vxorps %xmm6, %xmm2, %xmm3 ... vandps %xmm6, %xmm5, %xmm2 ... vmovaps %xmm1, -56(%rsp) # 16-byte Spill vmovaps %xmm6, %xmm1 ... vmovaps -56(%rsp), %xmm0 # 16-byte Reload ... vxorps %xmm1, %xmm3, %xmm4 ... Here, we have a spill and reload to keep t...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...32 7> >>>>> ret <4 x float> %2 >>>>> } >>>>> >>>>> >>>>> llc -march=x86-64 -mattr=+avx test.ll -o - >>>>> >>>>> test: # @test >>>>> vxorps %xmm2, %xmm2, %xmm2 >>>>> vmovss %xmm0, %xmm2, %xmm2 >>>>> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] >>>>> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] >>>>> retl >>&g...

[PATCH] x86: AVX instruction emulation fixes

2013 Aug 28

[PATCH] x86: AVX instruction emulation fixes

...printf("skipped\n"); + memset(res + 2, 0x77, 8); + } + + printf("%-40s", "Testing vmovaps (%edx),%ymm7..."); + if ( stack_exec && cpu_has_avx ) + { + extern const unsigned char vmovaps_from_mem[]; + + asm volatile ( "vxorps %%ymm7, %%ymm7, %%ymm7\n" + ".pushsection .test, \"a\", @progbits\n" + "vmovaps_from_mem: vmovaps (%0), %%ymm7\n" + ".popsection" :: "d" (NULL) ); + + memcpy(instr, vmova...

[LLVMdev] AVX Status?

2011 Jun 07

[LLVMdev] AVX Status?

...#39;d have to dig deeper into the failure. The fact that there are inconsistencies like this is one of the motivations behind the SIMD reorg. There are plenty of such inconsistencies in the existing SSE spec. Hopefully after the reorg, implementing a pattern like VANDPS given an existing one for VXORPS is trivial. > Anyway, I am looking forward to testing your patches. So am I. :) > Would it be possible to send around a notification when the stuff goes > upstream? > Thanks a lot :). I try to put [AVX] in the subject of patch mailings (to -commits) and commit messages. Once in a...

[LLVMdev] AVX Status?

2011 Jun 04

[LLVMdev] AVX Status?

Hi David, >> The last time the AVX backend was mentioned on this list seems to be >> from November 2010, so I would like to ask about the current status. Is >> anybody (e.g. at Cray?) still actively working on it? > > Yes, we are! I am doing a lot of tuning work at the moment. We have > been rather swamped with work for new products and I am now just getting > out

[LLVMdev] AVX Status?

2011 Jun 03

[LLVMdev] AVX Status?

...0: v4i64 = and 0x2532070, 0x2532170 [ID=15] >> 0x2532070: v4i64 = bitcast 0x2510740 [ID=14] >> 0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340, >> 0x2510f40, 0x2511140 [ORD=3] [ID=12] >> ... >> >> The same counts for or and xor where VXORPS etc. should be selected. > > Please file bug reports! > >> There seems to be some code for this because >> xor<8 x i32> %m, %m >> works, probably because it can get rid of all bitcasts. >> >> Ideally, I guess we would want code like this instead of the...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 19

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...gnificant regression in our internal codebase. In one particular case I observed a slowdown (around 1%); here is what I found when investigating on this slowdown. 1. With the new shuffle lowering, there is one case where we end up producing the following sequence: vmovss .LCPxx(%rip), %xmm1 vxorps %xmm0, %xmm0, %xmm0 vblendps $1, %xmm1, %xmm0, %xmm0 Before, we used to generate a simpler: vmovss .LCPxx(%rip), %xmm1 In this particular case, the 'vblendps' is redundant since the vmovss would zero the upper bits in %xmm1. I am not sure why we get this poor-codegen with your new...

search for: vxorp