thr3ads.net - similar to: "[LLVMdev] Poor floating point optimizations?"

Displaying 20 results from an estimated 700 matches similar to: "[LLVMdev] Poor floating point optimizations?"

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

[LLVMdev] Poor floating point optimizations?

And also the resulting assembly code is very poor: 00460013 movss xmm0,dword ptr [esp+8] 00460019 movaps xmm1,xmm0 0046001C addss xmm1,xmm1 00460020 pxor xmm2,xmm2 00460024 addss xmm2,xmm1 00460028 addss xmm2,xmm0 0046002C movss dword ptr [esp],xmm2 00460031 fld dword ptr [esp] Especially pxor&and instead of movss (which is

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

[LLVMdev] Poor floating point optimizations?

On Nov 20, 2010, at 2:41 PM, Sdadsda Sdasdaas wrote: > And also the resulting assembly code is very poor: > > 00460013 movss xmm0,dword ptr [esp+8] > 00460019 movaps xmm1,xmm0 > 0046001C addss xmm1,xmm1 > 00460020 pxor xmm2,xmm2 > 00460024 addss xmm2,xmm1 > 00460028 addss xmm2,xmm0 > 0046002C movss dword ptr

[LLVMdev] X86 - Help on fixing a poor code generation bug

2013 Dec 05

[LLVMdev] X86 - Help on fixing a poor code generation bug

Hi all, I noticed that the x86 backend tends to emit unnecessary vector insert instructions immediately after sse scalar fp instructions like addss/mulss. For example: ///////////////////////////////// __m128 foo(__m128 A, __m128 B) { _mm_add_ss(A, B); } ///////////////////////////////// produces the sequence: addss %xmm0, %xmm1 movss %xmm1, %xmm0 which could be easily optimized into

TypePromoteFloat loses intermediate rounding operations

2019 Dec 10

TypePromoteFloat loses intermediate rounding operations

For the following C code __fp16 x, y, z, w; void foo() { x = y + z; x = x + w; } clang produces IR that extends each operand to float and then truncates to half before assigning to x. Like this define dso_local void @foo() #0 !dbg !18 { %1 = load half, half* @y, align 2, !dbg !21 %2 = fpext half %1 to float, !dbg !21 %3 = load half, half* @z, align 2, !dbg !22 %4 = fpext half %3 to float, !dbg

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Hi, I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are generated when input arrays are integer, but not when they are float or double. If I modify the float example in http://llvm.org/docs/Vectorizers.html by adding restrict to the input arrays packed instructions are generated. Although it should not be required I tried

[LLVMdev] better code for IV

2014 Feb 19

[LLVMdev] better code for IV

Hi Andrew, The issue below refers to LSR, so I'll appreciate your feedback. It also refers to instruction combining and might impact backends other than X86, so if you know of others that might be interested you are more than welcome to add them. Thanks, Anat _____________________________________________ From: Shemer, Anat Sent: Tuesday, February 18, 2014 15:07 To: 'llvmdev at

TypePromoteFloat loses intermediate rounding operations

2019 Dec 10

TypePromoteFloat loses intermediate rounding operations

Thanks Eli. I forgot to bring up the strict FP questions which I was working on when I found this. If we're in a strict FP function, do the fp_to_f16/f16_to_fp emitted by promoting load/store/bitcast need to be strict versions of fp_to_f16/f16_to_fp. And if so where do we get the chain, especially for the bitcast case which isn't a chained node. ~Craig On Tue, Dec 10, 2019 at 3:18 PM

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating point operations. Thanks, Nadav On Apr 3, 2013, at 10:29 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> wrote: > Hi, > > I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 04

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Thanks, that did it! Are there any plans to enable the loop vectorizer by default? From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Wednesday, April 03, 2013 13:33 PM To: Nowicki, Tyler Cc: LLVM Developers Mailing List Subject: Re: Packed instructions generaetd by LoopVectorize? Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

2015 Jul 29

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

When I compile attached IR with LLVM 3.6 llc -march=x86-64 -o f.S f.ll it generates an aligned ADDPS with unaligned address. See attached f.S, here an extract: addq $12, %r9 # $12 is not a multiple of 4, thus for xmm0 this is unaligned xorl %esi, %esi .align 16, 0x90 .LBB0_1: # %loop2

[LLVMdev] X86 - Help on fixing a poor code generation bug

2013 Dec 05

[LLVMdev] X86 - Help on fixing a poor code generation bug

Hi Andrea, Thanks for working on this. I can see two approaches to solving this problem. The first one (that you suggested) is to catch this pattern after register allocation. The second approach is to eliminate this redundancy during instruction selection. Can you please look into catching this pattern during iSel? The idea is that ADDSS does an ADD plus BLEND operations, and you can easily

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

2015 Jul 29

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

This load instruction assumes the default ABI alignment for the <4 x float> type, which is 16: %15 = load <4 x float>* %14 You can set the alignment of loads to something lower than 16 in your frontend, and this will make LLVM use movups instructions: %15 = load <4 x float>* %14, align 4 If some LLVM mid-level pass is introducing this load without proving that the vector is

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Hi, I've attached 2 .ll files which are supposed to be equivalent but 'unopt-fail.ll' causes a crash in webkit's test suite while 'unopt-pass.ll' does not. I can't give more details about the crash, when I run the crashing test it in isolation it passes, when I run the full suite it crashes; it boggles the mind. Below I provide the optimized asm that is produced from

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Using MM registers is wrong unless the user has specifically asked for it, which doesn't seem to be the case here. In the awesome MMX architecture, touching an MM register makes subsequent x87 operations fail unless an EMMS instruction is issued first; none of the compilers here are smart enough to insert EMMS instructions in the right places, so the only safe thing is not to use

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Here's the optimized versions: $ opt -std-compile-opts unopt-pass.ll -o - | llvm-dis -o - [...] define %3 @_ZN7WebCore15GraphicsContext19roundToDevicePixelsERKNS_9FloatRectE(%"class.WebCore::GraphicsContext"* %this, %"struct.WebCore::FloatRect"* %rect) nounwind ssp align 2 { %roundedOrigin = alloca %"class.WebCore::FloatSize", align 4 ;

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

2017 Apr 19

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

Changing the list from cfe-dev to llvm-dev > On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com> wrote: > > I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI. > > I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Hi Chandler, Thanks for fixing the problem with the insertps mask. Generally the new shuffle lowering looks promising, however there are some cases where the codegen is now worse causing runtime performance regressions in some of our internal codebase. You have already mentioned how the new shuffle lowering is missing some features; for example, you explicitly said that we currently lack of

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 10

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: >> >> You have already mentioned how the new shuffle lowering is missing >> some features; for example, you explicitly

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

2017 Apr 20

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

> This seems like it was done for perf reason (mispredict). Conditional-to-cmov transformation should keep > from introducing additional observable side-effects, and it's clear that whatever did this did not account > for floating point exception. That’s a very reasonable statement, but I’m not sure it corresponds to the way we have typically approached this sort of problem. In

similar to: [LLVMdev] Poor floating point optimizations?