Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Greedy Register Allocation in LLVM 3.0"
2015 Jul 29
2
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
When I compile attached IR with LLVM 3.6
llc -march=x86-64 -o f.S f.ll
it generates an aligned ADDPS with unaligned address. See attached f.S,
here an extract:
addq $12, %r9 # $12 is not a multiple of 4, thus for
xmm0 this is unaligned
xorl %esi, %esi
.align 16, 0x90
.LBB0_1: # %loop2
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>
> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>
>> I've noticed that LLVM tends to generate suboptimal code and spill an
>> excessive amount of registers in large functions, such as in those
>> that are automatically generated by FFTW.
>
2015 Jul 29
0
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
This load instruction assumes the default ABI alignment for the <4 x float>
type, which is 16:
%15 = load <4 x float>* %14
You can set the alignment of loads to something lower than 16 in your
frontend, and this will make LLVM use movups instructions:
%15 = load <4 x float>* %14, align 4
If some LLVM mid-level pass is introducing this load without proving that
the vector is
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Sat, Jul 7, 2012 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
> On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>>> [...]
>>> movaps 32(%rdi), %xmm3
>>> movaps 48(%rdi), %xmm2
>>>
2013 Feb 26
2
[LLVMdev] passing vector of booleans to functions
Hi all,
I'm currently trying to figure out the best way to pass vector of
booleans to other functions. Take this small example:
define <4 x float> @vcmp_add(<4 x float> %a, <4 x float> %b) {
entry:
%cmp = fcmp olt <4 x float> %a, %b
%add = fadd <4 x float> %a, %b
%sel = select <4 x i1> %cmp, <4 x float> %add, <4 x float> %a
ret <4 x
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
> I've noticed that LLVM tends to generate suboptimal code and spill an
> excessive amount of registers in large functions, such as in those
> that are automatically generated by FFTW.
One problem might be that we're forcing the 16 stores to the out array to happen in source order, which
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
Hi,
I've noticed that LLVM tends to generate suboptimal code and spill an
excessive amount of registers in large functions, such as in those
that are automatically generated by FFTW.
LLVM generates good code for a function that computes an 8-point
complex FFT, but from 16-point upwards, icc or gcc generates much
better code. Here is an example of a sequence of instructions from a
32-point
2013 Feb 26
0
[LLVMdev] passing vector of booleans to functions
Hi Roland,
> define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float> %a, <4 x float>
%b) {
> entry:
> %add = fadd <4 x float> %a, %b
> %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a
> ret <4 x float> %sel
> }
>
> I will get:
>
> addps %xmm1, %xmm2
> pslld $31, %xmm0
>
2011 Sep 27
2
[LLVMdev] Poor code generation for odd sized vectors
Hi all,
I'm compiling LLCM IR code like this on x86-64:
define linkonce ccc <16 x float> @vector_add_float(<16 x float> %a.78, <16 x float> %a.79) align 8
{
entry:
%result.80 = fadd <16 x float> %a.78, %a.79
ret <18 x float> %result.80
}
This works really well when the vector length (16 in the above) is
an integer multiple of the SSE vector
2015 Jul 14
4
[LLVMdev] Poor register allocation (constants causing spilling)
Hi,
While investigating a performance issue with an internal codebase I
came across what looks to be poor register allocation. I have
constructed a small(ish) reproducible which demonstrates the issue
(see test.ll attached).
I have spent some time going through the register allocator to
understand what is happening. I have also experimented with some
small changes to try and improve the
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
Hello,
Depending on how I extract integer lanes from an x86_64 xmm register, the
backend may spill that register in order to load scalars. The effect was
observed on two targets: corei7-avx and btver1 (I haven't checked other
targets).
Here's a test case with spilling/no-spilling code put on conditional
compile:
#if __SSE4_1__ != 0
#include <smmintrin.h>
#else
#include
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior and wanted to ask for
some advice. For the C program below, function sum() gets inlined in
foo() but the code generated looks very suboptimal (the code is an
extract from a larger program).
Below I show the 32-bit x86 assembly as produced by the demo page on
the llvm home page ("Output A"). As you can see from the assembly,
after
2014 Sep 05
3
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
wrote:
> Unfortunately, another team, while doing internal testing has seen the
> new path generating illegal insertps masks. A sample here:
>
> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround?
/Patrik Hägglund
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker
Sent: den 28 mars 2012 03:18
To: llvmdev
Subject: [LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior
2014 Sep 05
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler,
While doing the performance measurement on a Ivy Bridge, I ran into compile time errors.
I saw a bunch of “cannot select" in the LLVM test suite with -march=core-avx-i.
E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3 -march=core-avx-i with:
fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 = bitcast 0x7f91b99b0e10 [ORD=3] [ID=27]
2014 Sep 06
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
I've run the SingleSource test suite for core-avx-i and have no failures
here so a preprocessed file + commandline would be very useful if this
reproduces for you still.
On Sat, Sep 6, 2014 at 4:07 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:
> I'm having trouble reproducing this. I'm trying to get LNT to actually
> run, but manually compiling the given source
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I
end up with SSE instructions(including sqrtpd) if I don't disable it.
On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote:
> Is there something specifically required to enable SSE? If it's not
> detected as available (based from the target triple?) then I don't think
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
> On Sep 7, 2014, at 8:49 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>
> Sure,
>
> Here is the command line:
> clang -cc1 -triple x86_64-apple-macosx -S -disable-free -disable-llvm-verifier -main-file-name tmp.i -mrelocation-model pic -pic-level 2 -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu core-avx-i -O3 -ferror-limit 19 -fmessage-length 114
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com>
wrote:
> Hi Chandler,
>
> I've been looking at the regressions Quentin mentioned, and filed a PR
> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377
>
> As for the others, I'm working on reducing them, but for now, here are
> some raw observations, in case any of
2008 Jul 12
2
[LLVMdev] Shuffle regression
Hi all,
I think I found a regression in the shuffle instruction. I've attached a
replacement of fibonacci.cpp to reproduce the issue. It runs fine on release
2.3 but revision 52648 fails, and I suspect that the issue is still present.
2.3 generates the following x86 code:
03A10010 push ebp
03A10011 mov ebp,esp
03A10013 and esp,0FFFFFFF0h
03A10019