Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address"
2015 Jul 29
0
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
This load instruction assumes the default ABI alignment for the <4 x float>
type, which is 16:
%15 = load <4 x float>* %14
You can set the alignment of loads to something lower than 16 in your
frontend, and this will make LLVM use movups instructions:
%15 = load <4 x float>* %14, align 4
If some LLVM mid-level pass is introducing this load without proving that
the vector is
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
Hello,
Depending on how I extract integer lanes from an x86_64 xmm register, the
backend may spill that register in order to load scalars. The effect was
observed on two targets: corei7-avx and btver1 (I haven't checked other
targets).
Here's a test case with spilling/no-spilling code put on conditional
compile:
#if __SSE4_1__ != 0
#include <smmintrin.h>
#else
#include
2007 Sep 28
2
[LLVMdev] Vector troubles
Hola LLVMers,
I'm working on engaging SSE via the LLVM vector ops on x86. I had some
questions a while back that you all helped out on, but I'm seeing
similar issues and was hoping you'd have some ideas. Below is the dump
of the LLVM IR of a program which is designed to take a vector stored in
a float*, build an LLVM vector from it, copy it to another vector, and
then take it
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
All,
Attached is a patch that does two things. First it makes the use
of the current SSE code a run time option through the use
of speex_decoder_ctl() and speex_encoder_ctl
It does this twofold. First there is a modification to the configure.in
script which introduces a check based upon platform. It will compile in the
sse assembly if you are on an i?86 based platform by making a
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N.
The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function.
I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and
2010 Nov 20
0
[LLVMdev] Poor floating point optimizations?
And also the resulting assembly code is very poor:
00460013 movss xmm0,dword ptr [esp+8]
00460019 movaps xmm1,xmm0
0046001C addss xmm1,xmm1
00460020 pxor xmm2,xmm2
00460024 addss xmm2,xmm1
00460028 addss xmm2,xmm0
0046002C movss dword ptr [esp],xmm2
00460031 fld dword ptr [esp]
Especially pxor&and instead of movss (which is
2010 Aug 31
0
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
Using MM registers is wrong unless the user has specifically asked for
it, which doesn't seem to be the case here.
In the awesome MMX architecture, touching an MM register makes
subsequent x87 operations fail unless an EMMS instruction is issued
first; none of the compilers here are smart enough to insert EMMS
instructions in the right places, so the only safe thing is not to use
2013 Feb 26
2
[LLVMdev] passing vector of booleans to functions
Hi all,
I'm currently trying to figure out the best way to pass vector of
booleans to other functions. Take this small example:
define <4 x float> @vcmp_add(<4 x float> %a, <4 x float> %b) {
entry:
%cmp = fcmp olt <4 x float> %a, %b
%add = fadd <4 x float> %a, %b
%sel = select <4 x i1> %cmp, <4 x float> %add, <4 x float> %a
ret <4 x
2010 Aug 31
2
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
Here's the optimized versions:
$ opt -std-compile-opts unopt-pass.ll -o - | llvm-dis -o -
[...]
define %3 @_ZN7WebCore15GraphicsContext19roundToDevicePixelsERKNS_9FloatRectE(%"class.WebCore::GraphicsContext"* %this, %"struct.WebCore::FloatRect"* %rect) nounwind ssp align 2 {
%roundedOrigin = alloca %"class.WebCore::FloatSize", align 4 ;
2010 Aug 31
5
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
Hi,
I've attached 2 .ll files which are supposed to be equivalent but 'unopt-fail.ll' causes a crash in webkit's test suite while 'unopt-pass.ll' does not. I can't give more details about the crash, when I run the crashing test it in isolation it passes, when I run the full suite it crashes; it boggles the mind.
Below I provide the optimized asm that is produced from
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Awesome, thanks for all the information!
>
> See below:
>
> On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
> wrote:
>>
>> You have already mentioned how the new shuffle lowering is missing
>> some features; for example, you explicitly
2017 Apr 19
3
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
Changing the list from cfe-dev to llvm-dev
> On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com> wrote:
>
> I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI.
>
> I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler,
Thanks for fixing the problem with the insertps mask.
Generally the new shuffle lowering looks promising, however there are
some cases where the codegen is now worse causing runtime performance
regressions in some of our internal codebase.
You have already mentioned how the new shuffle lowering is missing
some features; for example, you explicitly said that we currently lack
of
2013 Feb 26
0
[LLVMdev] passing vector of booleans to functions
Hi Roland,
> define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float> %a, <4 x float>
%b) {
> entry:
> %add = fadd <4 x float> %a, %b
> %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a
> ret <4 x float> %sel
> }
>
> I will get:
>
> addps %xmm1, %xmm2
> pslld $31, %xmm0
>
2009 Nov 20
1
[LLVMdev] Spilling & UNPCKLPS Question
I'm working on adding some more annotations to asm and I
cam across this odd construct generated for X86/split-vector-rem.ll:
movss %xmm0, 32(%rsp) # Scalar Spill
[...]
unpcklps 48(%rsp), %xmm0 # Vector Folded Reload
[...]
movaps %xmm0, 16(%rsp) # Vector Spill
[...]
unpcklps
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>
> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>
>> I've noticed that LLVM tends to generate suboptimal code and spill an
>> excessive amount of registers in large functions, such as in those
>> that are automatically generated by FFTW.
>
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
> On Sep 7, 2014, at 8:49 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>
> Sure,
>
> Here is the command line:
> clang -cc1 -triple x86_64-apple-macosx -S -disable-free -disable-llvm-verifier -main-file-name tmp.i -mrelocation-model pic -pic-level 2 -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu core-avx-i -O3 -ferror-limit 19 -fmessage-length 114
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior and wanted to ask for
some advice. For the C program below, function sum() gets inlined in
foo() but the code generated looks very suboptimal (the code is an
extract from a larger program).
Below I show the 32-bit x86 assembly as produced by the demo page on
the llvm home page ("Output A"). As you can see from the assembly,
after
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround?
/Patrik Hägglund
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker
Sent: den 28 mars 2012 03:18
To: llvmdev
Subject: [LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior
2014 Sep 09
1
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
> On Sep 9, 2014, at 1:47 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
>
> On Tue, Sep 9, 2014 at 12:53 PM, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote:
> Hi Chandler,
>
> I had observed some improvements and regressions with the new lowering.
>
> Here are the numbers for an Ivy Bridge machine fixed at