Displaying 20 results from an estimated 700 matches similar to: "[LLVMdev] Poor floating point optimizations?"
2010 Nov 20
0
[LLVMdev] Poor floating point optimizations?
And also the resulting assembly code is very poor:
00460013 movss xmm0,dword ptr [esp+8]
00460019 movaps xmm1,xmm0
0046001C addss xmm1,xmm1
00460020 pxor xmm2,xmm2
00460024 addss xmm2,xmm1
00460028 addss xmm2,xmm0
0046002C movss dword ptr [esp],xmm2
00460031 fld dword ptr [esp]
Especially pxor&and instead of movss (which is
2010 Nov 20
3
[LLVMdev] Poor floating point optimizations?
On Nov 20, 2010, at 2:41 PM, Sdadsda Sdasdaas wrote:
> And also the resulting assembly code is very poor:
>
> 00460013 movss xmm0,dword ptr [esp+8]
> 00460019 movaps xmm1,xmm0
> 0046001C addss xmm1,xmm1
> 00460020 pxor xmm2,xmm2
> 00460024 addss xmm2,xmm1
> 00460028 addss xmm2,xmm0
> 0046002C movss dword ptr
2013 Dec 05
3
[LLVMdev] X86 - Help on fixing a poor code generation bug
Hi all,
I noticed that the x86 backend tends to emit unnecessary vector insert
instructions immediately after sse scalar fp instructions like
addss/mulss.
For example:
/////////////////////////////////
__m128 foo(__m128 A, __m128 B) {
_mm_add_ss(A, B);
}
/////////////////////////////////
produces the sequence:
addss %xmm0, %xmm1
movss %xmm1, %xmm0
which could be easily optimized into
2019 Dec 10
2
TypePromoteFloat loses intermediate rounding operations
For the following C code
__fp16 x, y, z, w;
void foo() {
x = y + z;
x = x + w;
}
clang produces IR that extends each operand to float and then truncates to
half before assigning to x. Like this
define dso_local void @foo() #0 !dbg !18 {
%1 = load half, half* @y, align 2, !dbg !21
%2 = fpext half %1 to float, !dbg !21
%3 = load half, half* @z, align 2, !dbg !22
%4 = fpext half %3 to float, !dbg
2013 Apr 03
2
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Hi,
I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are generated when input arrays are integer, but not when they are float or double.
If I modify the float example in http://llvm.org/docs/Vectorizers.html by adding restrict to the input arrays packed instructions are generated. Although it should not be required I tried
2014 Feb 19
2
[LLVMdev] better code for IV
Hi Andrew,
The issue below refers to LSR, so I'll appreciate your feedback. It also refers to instruction combining and might impact backends other than X86, so if you know of others that might be interested you are more than welcome to add them.
Thanks, Anat
_____________________________________________
From: Shemer, Anat
Sent: Tuesday, February 18, 2014 15:07
To: 'llvmdev at
2019 Dec 10
2
TypePromoteFloat loses intermediate rounding operations
Thanks Eli.
I forgot to bring up the strict FP questions which I was working on when I
found this. If we're in a strict FP function, do the fp_to_f16/f16_to_fp
emitted by promoting load/store/bitcast need to be strict versions of
fp_to_f16/f16_to_fp. And if so where do we get the chain, especially for
the bitcast case which isn't a chained node.
~Craig
On Tue, Dec 10, 2019 at 3:18 PM
2013 Apr 03
0
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Hi Tyler,
Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating point operations.
Thanks,
Nadav
On Apr 3, 2013, at 10:29 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> wrote:
> Hi,
>
> I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are
2013 Apr 04
1
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Thanks, that did it!
Are there any plans to enable the loop vectorizer by default?
From: Nadav Rotem [mailto:nrotem at apple.com]
Sent: Wednesday, April 03, 2013 13:33 PM
To: Nowicki, Tyler
Cc: LLVM Developers Mailing List
Subject: Re: Packed instructions generaetd by LoopVectorize?
Hi Tyler,
Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating
2015 Jul 29
2
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
When I compile attached IR with LLVM 3.6
llc -march=x86-64 -o f.S f.ll
it generates an aligned ADDPS with unaligned address. See attached f.S,
here an extract:
addq $12, %r9 # $12 is not a multiple of 4, thus for
xmm0 this is unaligned
xorl %esi, %esi
.align 16, 0x90
.LBB0_1: # %loop2
2013 Dec 05
0
[LLVMdev] X86 - Help on fixing a poor code generation bug
Hi Andrea,
Thanks for working on this. I can see two approaches to solving this problem. The first one (that you suggested) is to catch this pattern after register allocation. The second approach is to eliminate this redundancy during instruction selection. Can you please look into catching this pattern during iSel? The idea is that ADDSS does an ADD plus BLEND operations, and you can easily
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
Hello,
Depending on how I extract integer lanes from an x86_64 xmm register, the
backend may spill that register in order to load scalars. The effect was
observed on two targets: corei7-avx and btver1 (I haven't checked other
targets).
Here's a test case with spilling/no-spilling code put on conditional
compile:
#if __SSE4_1__ != 0
#include <smmintrin.h>
#else
#include
2015 Jul 29
0
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
This load instruction assumes the default ABI alignment for the <4 x float>
type, which is 16:
%15 = load <4 x float>* %14
You can set the alignment of loads to something lower than 16 in your
frontend, and this will make LLVM use movups instructions:
%15 = load <4 x float>* %14, align 4
If some LLVM mid-level pass is introducing this load without proving that
the vector is
2010 Aug 31
5
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
Hi,
I've attached 2 .ll files which are supposed to be equivalent but 'unopt-fail.ll' causes a crash in webkit's test suite while 'unopt-pass.ll' does not. I can't give more details about the crash, when I run the crashing test it in isolation it passes, when I run the full suite it crashes; it boggles the mind.
Below I provide the optimized asm that is produced from
2010 Aug 31
0
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
Using MM registers is wrong unless the user has specifically asked for
it, which doesn't seem to be the case here.
In the awesome MMX architecture, touching an MM register makes
subsequent x87 operations fail unless an EMMS instruction is issued
first; none of the compilers here are smart enough to insert EMMS
instructions in the right places, so the only safe thing is not to use
2010 Aug 31
2
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
Here's the optimized versions:
$ opt -std-compile-opts unopt-pass.ll -o - | llvm-dis -o -
[...]
define %3 @_ZN7WebCore15GraphicsContext19roundToDevicePixelsERKNS_9FloatRectE(%"class.WebCore::GraphicsContext"* %this, %"struct.WebCore::FloatRect"* %rect) nounwind ssp align 2 {
%roundedOrigin = alloca %"class.WebCore::FloatSize", align 4 ;
2017 Apr 19
3
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
Changing the list from cfe-dev to llvm-dev
> On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com> wrote:
>
> I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI.
>
> I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler,
Thanks for fixing the problem with the insertps mask.
Generally the new shuffle lowering looks promising, however there are
some cases where the codegen is now worse causing runtime performance
regressions in some of our internal codebase.
You have already mentioned how the new shuffle lowering is missing
some features; for example, you explicitly said that we currently lack
of
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Awesome, thanks for all the information!
>
> See below:
>
> On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
> wrote:
>>
>> You have already mentioned how the new shuffle lowering is missing
>> some features; for example, you explicitly
2017 Apr 20
4
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
> This seems like it was done for perf reason (mispredict). Conditional-to-cmov transformation should keep
> from introducing additional observable side-effects, and it's clear that whatever did this did not account
> for floating point exception.
That’s a very reasonable statement, but I’m not sure it corresponds to the way we have typically approached this sort of problem.
In