thr3ads.net - search: "sse41"

Displaying 20 results from an estimated 37 matches for "sse41".

Did you mean: sse4.1

2012 Sep 04

[LLVMdev] branch on vector compare?

...%15, %cr > %17 = sext <4 x i1> %16 to <4 x i32> > %18 = bitcast <4 x i32> %17 to i128 > %19 = icmp ne i128 %18, 0 > br i1 %19, label %true1, label %false2 > > should do the trick (one cmpps + one ptest + one br instruction). > This, however, requires sse41 which I don't know if you have - you say > the extractelements go through memory which I've never seen then again > our code didn't try to extract the i1 directly (even without fixes for > ptest the above sequence will result in only 2 extraction steps instead > of 4 if you&...

[LLVMdev] branch on vector compare?

2012 Sep 04

[LLVMdev] branch on vector compare?

...%16 = fcmp ogt <4 x float> %15, %cr %17 = sext <4 x i1> %16 to <4 x i32> %18 = bitcast <4 x i32> %17 to i128 %19 = icmp ne i128 %18, 0 br i1 %19, label %true1, label %false2 should do the trick (one cmpps + one ptest + one br instruction). This, however, requires sse41 which I don't know if you have - you say the extractelements go through memory which I've never seen then again our code didn't try to extract the i1 directly (even without fixes for ptest the above sequence will result in only 2 extraction steps instead of 4 if you're on x64 and th...

[LLVMdev] branch on vector compare?

2012 Sep 03

[LLVMdev] branch on vector compare?

> > which goes through memory. Is there some idiom I'm missing so that it would use > > for instance movmsk for SSE or vcmpgt & cr6 for altivec? > > I don't think you are missing anything: LLVM IR has no support for horizontal > operations like or'ing the elements of a vector of boolean together. The code > generators do try to recognize a few idioms and

[LLVMdev] branch on vector compare?

2012 Sep 05

[LLVMdev] branch on vector compare?

...sext <4 x i1> %16 to <4 x i32> >> %18 = bitcast <4 x i32> %17 to i128 >> %19 = icmp ne i128 %18, 0 >> br i1 %19, label %true1, label %false2 >> >> should do the trick (one cmpps + one ptest + one br instruction). >> This, however, requires sse41 which I don't know if you have - you say >> the extractelements go through memory which I've never seen then again >> our code didn't try to extract the i1 directly (even without fixes for >> ptest the above sequence will result in only 2 extraction steps instead >&...

[LLVMdev] use AVX automatically if present

2012 May 24

[LLVMdev] use AVX automatically if present

I wonder why AVX is not used automatically if available at the host machine. In contrast to that, SSE41 instructions (like pmulld) are automatically used if the host machine supports SSE41. E.g. $ cat avx.ll define void @_fun1(<8 x float>*, <8 x float>*) { _L1: %x = load <8 x float>* %0 %y = load <8 x float>* %1 %z = fadd <8 x float> %x, %y store <8 x...

bad identification of the CPU pentium dual core ( penryn instead of core2 )

2015 Oct 21

bad identification of the CPU pentium dual core ( penryn instead of core2 )

...s the good behaviour, the llvm git commit who has introduced this bug is : cd83d5b5071f072882ad06cc4b904b2d27d1e54a https://github.com/llvm-mirror/llvm/commit/cd83d5b5071f072882ad06cc4b904b2d27d1e54a this faulty commit has deleted a crucial test about SSE4 for CPU family 6 model 23 : return HasSSE41 ? "penryn" : "core2"; the solution is simply to re-add this test for CPU family 6 model 23, here is the patch : --- a/lib/Support/Host.cpp 2015-10-14 07:13:52.381374679 +0200 +++ b/lib/Support/Host.cpp 2015-10-14 07:13:28.224708323 +0200 @@ -332,6 +332,8 @@...

x86_64 SSE2/SSE41 optim not used

2014 Mar 11

x86_64 SSE2/SSE41 optim not used

Hi Guys, In stream_decoder.c when assigning lpc restore function, only IA32 processor benefits from SS2 and SSE4.1 optimization. Shouldn't it be the case for x86_64 processor as well ? Thanks, -- Olivier TRISTAN uvi.net -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/flac-dev/attachments/20140311/1d49b5c2/attachment.htm

x86_64 SSE2/SSE41 optim not used

2014 Mar 12

x86_64 SSE2/SSE41 optim not used

Olivier Tristan wrote: > In stream_decoder.c when assigning lpc restore function, > only IA32 processor benefits from SS2 and SSE4.1 optimization. > > Shouldn't it be the case for x86_64 processor as well ? I tried, and it didn't make decoding faster. (And even SSE4.1 for IA-32 is... questionable) OTOH, flac decoding is really very fast. It's very hard to make it even

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

Hi, When setting -mattr option on X86, I would like to treat MMX separately from SSE levels. This would allow a client who sets the attributes directly to set the SSE level independent of MMX, e.g., llc -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while llc -march=x86 -mattr=mmx -mattr=sse42 will get mmx and sse42. If anyone objects to this change, please let me know. Thanks, -- Mon Ping

[LLVMdev] Using intrinsics with memory operands

2008 Aug 01

[LLVMdev] Using intrinsics with memory operands

...movsxbd instruction. One variant takes two XMM > registers, while another has a 32-bit memory location as source operand. The > latter is quite interesting if you know you're reading from memory anyway, > and if it's not 16-byte aligned. It looks like LLVM's > Intrinsic::x86_sse41_pmovsxbd expects a v16i8 as source operand though. So > how do I achieve using the variant taking a memory operand? A load+insertelement+pmovsx sequence should codegen into a single instruction, but it looks like that isn't working. I guess the pattern-matching magic should kick in and tak...

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 12

X86 TRUNCATE cost for AVX & AVX2 mode

<Copied Cong> Thanks Elena. Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41. Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive vs SSE2. I feel this number should be same/close to the cost mentioned for same operation in SSE2ConversionTbl. Below patch from Cong Hou reduce cost for same operation in SSE2 mode. http://reviews.llvm.org/rL...

[LLVMdev] Using intrinsics with memory operands

2008 Aug 01

[LLVMdev] Using intrinsics with memory operands

...example the SSE4.1 pmovsxbd instruction. One variant takes two XMM registers, while another has a 32-bit memory location as source operand. The latter is quite interesting if you know you're reading from memory anyway, and if it's not 16-byte aligned. It looks like LLVM's Intrinsic::x86_sse41_pmovsxbd expects a v16i8 as source operand though. So how do I achieve using the variant taking a memory operand? Thanks a bunch, Nicolas Capens -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080801...

LLVM and Xeon Skylake v5

2017 May 08

LLVM and Xeon Skylake v5

Hi, I have a JIT compiler using the legacy JIT on LLVM 3.5 that, when run on the Xeon v5 Skylakes produces "Cannot select: intrinsic %llvm.x86.sse41.round.sd". Note, this does not occur on i7 Kabylakes. To get this far I had to disable AVX512 code gen. Upgrading the system I am looking at from 3.5 to a later version is a big job that I'd prefer not to have on my critical path. Does anyone have any tips on where I would look to debug...

[LLVMdev] use AVX automatically if present

2012 May 24

[LLVMdev] use AVX automatically if present

...pper > ret > .Ltmp5: > .size _fun1, .Ltmp5-_fun1 > .cfi_endproc > > > .section ".note.GNU-stack","", at progbits > > > > > I guess your answer is that I did not specify a target triple. However why is > SSE41 automatically detected and AVX is not?

[LLVMdev] New TargetSpec 'llvmnote'

2011 Feb 23

[LLVMdev] New TargetSpec 'llvmnote'

...quot;Feature Delta" field, using "+" to add features but using a charactar other than "-" to remove them is unfortunate. How about just prohibiting "-" in CPU names? Or for another idea, how about prefixing negative features with "no-", as in "core2+sse41+no-cmov"? Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110223/b045a55e/attachment.html>

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

...2008-20-11 at 02:57 -0500, Mon Ping Wang wrote: > Hi, > > When setting -mattr option on X86, I would like to treat MMX > separately from SSE levels. This would allow a client who sets the > attributes directly to set the SSE level independent of MMX, e.g., llc > -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while > llc -march=x86 -mattr=mmx -mattr=sse42 will get mmx and sse42. If > anyone objects to this change, please let me know. > > Thanks, > -- Mon Ping > _______________________________________________ > LLVM Developers mailing list...

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

...19, 2008, at 11:57 PMPST, Mon Ping Wang wrote: > Hi, > > When setting -mattr option on X86, I would like to treat MMX > separately from SSE levels. This would allow a client who sets the > attributes directly to set the SSE level independent of MMX, e.g., llc > -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while > llc -march=x86 -mattr=mmx -mattr=sse42 will get mmx and sse42. If > anyone objects to this change, please let me know. I don't object, but please don't change the defaults. You're likely to find places the SSE code assumes MMX exis...

Performance and precompute_partition_info_sums_32bit_asm_ia32_()

2013 Sep 17

Performance and precompute_partition_info_sums_32bit_asm_ia32_()

...time in seconds, smaller=better): no SSE disabled 53.9 no SSE enabled 55.2 SSE1 disabled 53.9 SSE1 enabled 55.3 SSE2 disabled 51.9 SSE2 enabled 53.1 SSE3 disabled 51.8 SSE3 enabled 53.2 SSSE3 disabled 45.7 SSSE3 enabled 51.4 SSE41 disabled 46.1 SSE41 enabled 51.6 SSE42 disabled 46.1 SSE42 enabled 51.6 Conclusions: 1) flac is always faster when precompute_partition_info_sums_32bit_asm_ia32_() is disabled. 2) Some C code benefits noticeably from SSSE3 instructions; at least when compiled with GCC...

[PATCH] for cpu.c

2014 Aug 07

[PATCH] for cpu.c

This patch moves all info->ia32.fxsr = info->ia32.sse = info->ia32.sse2 = info->ia32.sse3 = info->ia32.ssse3 = info->ia32.sse41 = info->ia32.sse42 = false; expressions into a static function disable_sse(FLAC__CPUInfo *info). -------------- next part -------------- A non-text attachment was scrubbed... Name: simplify_cpu_c.zip Type: application/zip Size: 1163 bytes Desc: not available Url : http://lists.xiph.org/pipermai...

RFC: [X86] Can we begin removing AutoUpgrade support for x86 instrinsics added in early 3.X versions

2017 Sep 20

RFC: [X86] Can we begin removing AutoUpgrade support for x86 instrinsics added in early 3.X versions

...cases for this x86.sse2.pcmpgt.* - we no test cases for this x86.avx2.pcmpeq.* - we have no test cases x86.avx2.pcmpgt.* - we have no test cases for this x86.avx.vpermil.* - we do test this 3.2 added upgrade for: x86.avx.movnt.* - we have tests for this x86.xop.vpcom* - we have tests for this x86.sse41.ptest.* had its signature chagned and we upgrade from the old signature. We don't have tests for the old signature. x86.xop.vfrcz.ss/sd had an argument dropped that we upgrade for. We don't have any tests for the old signature. 3.3 had no upgrades 3.4 removed: x86.sse42.crc32.64.8 we do h...

search for: sse41