similar to: [LLVMdev] Spilling & UNPCKLPS Question

Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] Spilling & UNPCKLPS Question"

2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include
2015 Jul 29
2
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
When I compile attached IR with LLVM 3.6 llc -march=x86-64 -o f.S f.ll it generates an aligned ADDPS with unaligned address. See attached f.S, here an extract: addq $12, %r9 # $12 is not a multiple of 4, thus for xmm0 this is unaligned xorl %esi, %esi .align 16, 0x90 .LBB0_1: # %loop2
2015 Jul 29
0
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
This load instruction assumes the default ABI alignment for the <4 x float> type, which is 16: %15 = load <4 x float>* %14 You can set the alignment of loads to something lower than 16 in your frontend, and this will make LLVM use movups instructions: %15 = load <4 x float>* %14, align 4 If some LLVM mid-level pass is introducing this load without proving that the vector is
2016 Apr 01
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
RFC: A proposal for vectorizing loops with calls to math functions using SVML (short vector math library). ========= Overview ========= Very simply, SVML (Intel short vector math library) functions are vector variants of scalar math functions that take vector arguments, apply an operation to each element, and store the result in a vector register. These vector variants can be generated by the
2016 Apr 04
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
Hi Sanjay, For sincos calls, I’m currently just going through isTriviallyVectorizable(), which was good enough to get things working so that I could test the translation. I don’t see why this cannot be changed to use addVectorizableFunctionsFromVecLib(). The other functions that I’m working with are already vectorized using the loop pragma. Those include sin, cos, exp, log, and pow. From: Sanjay
2010 Nov 20
0
[LLVMdev] Poor floating point optimizations?
And also the resulting assembly code is very poor: 00460013 movss xmm0,dword ptr [esp+8] 00460019 movaps xmm1,xmm0 0046001C addss xmm1,xmm1 00460020 pxor xmm2,xmm2 00460024 addss xmm2,xmm1 00460028 addss xmm2,xmm0 0046002C movss dword ptr [esp],xmm2 00460031 fld dword ptr [esp] Especially pxor&and instead of movss (which is
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Sat, Jul 7, 2012 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: >> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: >>> [...] >>> movaps 32(%rdi), %xmm3 >>> movaps 48(%rdi), %xmm2 >>>
2007 Sep 28
2
[LLVMdev] Vector troubles
Hola LLVMers, I'm working on engaging SSE via the LLVM vector ops on x86. I had some questions a while back that you all helped out on, but I'm seeing similar issues and was hoping you'd have some ideas. Below is the dump of the LLVM IR of a program which is designed to take a vector stored in a float*, build an LLVM vector from it, copy it to another vector, and then take it
2017 Apr 19
3
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
Changing the list from cfe-dev to llvm-dev > On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com> wrote: > > I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI. > > I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: > > On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > >> I've noticed that LLVM tends to generate suboptimal code and spill an >> excessive amount of registers in large functions, such as in those >> that are automatically generated by FFTW. >
2007 Jul 21
0
[LLVMdev] Seg faulting on vector ops
On Fri, 20 Jul 2007, Chuck Rose III wrote: > I'm looking to make use of the vectorization primitives in the Intel > chip with the code we generate from LLVM and so I've started > experimenting with it. What is the state of the machine code generated > for vectors? In my tinkering, I seem to be getting some wonky machine > instructions, but I'm most likely just doing
2007 Jul 20
5
[LLVMdev] Seg faulting on vector ops
Hola LLVMers, I'm looking to make use of the vectorization primitives in the Intel chip with the code we generate from LLVM and so I've started experimenting with it. What is the state of the machine code generated for vectors? In my tinkering, I seem to be getting some wonky machine instructions, but I'm most likely just doing something wrong and I'm hoping you can set me in
2007 Jul 24
2
[LLVMdev] Seg faulting on vector ops
Hrm. This problem shouldn't be target specific. I am pretty sure prologue / epilogue inserter aligns stack correctly if there are stack objects with greater than default stack alignment requirement. Seems to be the initial alloca() instruction should specify 16 byte alignment? Evan On Jul 21, 2007, at 2:51 PM, Chris Lattner wrote: > On Fri, 20 Jul 2007, Chuck Rose III wrote:
2004 Aug 06
2
Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others
Here are our notes on 1.1.4 testing on Windows 1. Compile Error with regular mode (FIXED_POINT undefined) at lsp.c line 104 static inline spx_word16_t spx_cos(spx_word16_t x) . VS6 does not like the inline keyword here. Removing it allows compiling. same with cb_search_sse.h line 34. 2. Compile Error with quant_lsp.c line 55. M_PI is undefined. Either it needs to be included
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > I've noticed that LLVM tends to generate suboptimal code and spill an > excessive amount of registers in large functions, such as in those > that are automatically generated by FFTW. One problem might be that we're forcing the 16 stores to the out array to happen in source order, which
2017 Apr 20
4
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
> This seems like it was done for perf reason (mispredict). Conditional-to-cmov transformation should keep > from introducing additional observable side-effects, and it's clear that whatever did this did not account > for floating point exception. That’s a very reasonable statement, but I’m not sure it corresponds to the way we have typically approached this sort of problem. In
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: >> >> You have already mentioned how the new shuffle lowering is missing >> some features; for example, you explicitly
2007 Jul 20
0
[LLVMdev] Seg faulting on vector ops
Hi Chuck! On Jul 20, 2007, at 11:36 AM, Chuck Rose III wrote: > Hola LLVMers, > > > > I’m looking to make use of the vectorization primitives in the > Intel chip with the code we generate from LLVM and so I’ve started > experimenting with it. What is the state of the machine code > generated for vectors? In my tinkering, I seem to be getting some > wonky
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N. The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function. I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and
2007 Jul 26
0
[LLVMdev] Seg faulting on vector ops
I am fairly certain this is right. Chuck, can you do a quick experiment for me? Go back to your original code but make sure the alloca instruction specify 16-byte alignment. The code should work. If not, please file a bug. Thanks, Evan On Jul 24, 2007, at 1:58 PM, Evan Cheng wrote: > Hrm. This problem shouldn't be target specific. I am pretty sure > prologue / epilogue inserter