similar to: [LLVMdev] Misaligned SSE store problem (with reduced source)

Displaying 20 results from an estimated 3000 matches similar to: "[LLVMdev] Misaligned SSE store problem (with reduced source)"

2011 Nov 11
0
[LLVMdev] Misaligned SSE store problem (with reduced source)
On Thu, Nov 10, 2011 at 6:13 PM, Aaron Dwyer <Aaron.Dwyer at imgtec.com> wrote: > Using LLVM 2.9, the following LLVM IR produces invalid x86 32 bit assembly > (a misaligned SSE store). > ; ModuleID = 'MisalignedStore' > define void @MisalignedStore() nounwind readnone { > entry: >   %v = alloca <4 x float>, align 16 >   store <4 x float>
2011 Nov 11
1
[LLVMdev] Misaligned SSE store problem (with reduced source)
On 11 November 2011 02:33, Eli Friedman <eli.friedman at gmail.com> wrote: > It's a known issue that the x86 backend doesn't know how to generate > dynamic stack realignment code for functions using allocas outside the > entry block.  Don't know the bug number off the top of my head. These ones look relevant: http://llvm.org/bugs/show_bug.cgi?id=2962
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
the clang 3.5 loop optimizer seems to jump in unintentional for simple loops the very simple example ---- const int SIZE = 3; int the_func(int* p_array) { int dummy = 0; #if defined(ITER) for(int* p = &p_array[0]; p < &p_array[SIZE]; ++p) dummy += *p; #else for(int i = 0; i < SIZE; ++i) dummy += p_array[i]; #endif return dummy; } int main(int argc, char** argv) {
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
Hi, I have run into the following strange behavior and wanted to ask for some advice. For the C program below, function sum() gets inlined in foo() but the code generated looks very suboptimal (the code is an extract from a larger program). Below I show the 32-bit x86 assembly as produced by the demo page on the llvm home page ("Output A"). As you can see from the assembly, after
2010 Nov 14
1
[LLVMdev] Pesudo X86 instructions used for generating constants
Hi, I noticed a bunch of psuedo instructions used for creation of constants without generating loads. e.g. pxor xmm0, xmm0 Here is an example of what i am referring to snipped from X86InstrSSE.td: def FsFLD0SS : I<0xEF, MRMInitReg, (outs FR32:$dst), (ins), "", [(set FR32:$dst, fp32imm0)]>, Requires<[HasSSE1]>, TB, OpSize; My question is
2010 Nov 20
2
[LLVMdev] Poor floating point optimizations?
I wanted to use LLVM for my math parser but it seems that floating point optimizations are poor. For example consider such C code: float foo(float x) { return x+x+x; } and here is the code generated with "optimized" live demo: define float @foo(float %x) nounwind readnone { entry: %0 = fmul float %x, 2.000000e+00 ; <float> [#uses=1] %1 = fadd float %0, %x
2010 May 11
2
[LLVMdev] How does SSEDomainFix work?
Hello. This is my 1st post. I have tried SSE execution domain fixup pass. But I am not able to see any improvements. I expect for the example below to use MOVDQA, PAND &c. (On nehalem, ANDPS is extremely slower than PAND) Please tell me if something would be wrong for me. Thank you. Takumi Host: i386-mingw32 Build: trunk at 103373 foo.ll: define <4 x i32> @foo(<4 x i32> %x,
2006 Apr 19
2
[LLVMdev] floating point exception and SSE2 instructions
Hi, I'm building a little JIT that creates functions to do array manipulations, eg. sum all the elements of a double* array. I'm writing this in python, generating llvm assembly intructions and piping that through a call to ParseAssemblyString, ExecutionEngine, etc. It's working OK on integer values, but i'm getting nasty floating point exceptions when i try this on double*
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround? /Patrik Hägglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker Sent: den 28 mars 2012 03:18 To: llvmdev Subject: [LLVMdev] Suboptimal code due to excessive spilling Hi, I have run into the following strange behavior
2015 Nov 19
5
[RFC] Introducing a vector reduction add instruction.
After some attempt to implement reduce-add in LLVM, I found out a easier way to detect reduce-add without introducing new IR operations. The basic idea is annotating phi node instead of add (so that it is easier to handle other reduction operations). In PHINode class, we can add a flag indicating if the phi node is a reduction one (the flag can be set in loop vectorizer for vectorized phi nodes).
2010 May 11
0
[LLVMdev] How does SSEDomainFix work?
On May 10, 2010, at 9:07 PM, NAKAMURA Takumi wrote: > Hello. This is my 1st post. ようこそ! > I have tried SSE execution domain fixup pass. > But I am not able to see any improvements. Did you actually measure runtime, or did you look at assembly? > I expect for the example below to use MOVDQA, PAND &c. > (On nehalem, ANDPS is extremely slower than PAND) Are you sure? The
2010 Nov 20
0
[LLVMdev] Poor floating point optimizations?
And also the resulting assembly code is very poor: 00460013 movss xmm0,dword ptr [esp+8] 00460019 movaps xmm1,xmm0 0046001C addss xmm1,xmm1 00460020 pxor xmm2,xmm2 00460024 addss xmm2,xmm1 00460028 addss xmm2,xmm0 0046002C movss dword ptr [esp],xmm2 00460031 fld dword ptr [esp] Especially pxor&and instead of movss (which is
2015 Nov 25
2
[RFC] Introducing a vector reduction add instruction.
On Wed, Nov 25, 2015 at 2:32 PM, Hal Finkel <hfinkel at anl.gov> wrote: > Hi Cong, > > After reading the original RFC and this update, I'm still not entirely sure I understand the semantics of the flag you're proposing to add. Does it having something to do with the ordering of the reduction operations? The flag is only useful for vectorized reduction for now. I'll give
2007 Oct 19
2
[LLVMdev] llvm_fcmp_ord and llvm_fcmp_uno and assembly code generation
Hi, The C backend in llc generates code like: static inline int llvm_fcmp_ord(double X, double Y) { return X == X && Y == Y; } static inline int llvm_fcmp_uno(double X, double Y) { return X != X || Y != Y; } First of all it generates a warning by clang and gcc (with certain flags): x.cbe.c:130: warning: comparing floating point with == or != is unsafe Now, C99 provides a macro for this
2015 Nov 25
2
[RFC] Introducing a vector reduction add instruction.
----- Original Message ----- > From: "Xinliang David Li" <davidxl at google.com> > To: "Cong Hou" <congh at google.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, November 25, 2015 5:17:58 PM > Subject: Re: [llvm-dev] [RFC] Introducing a vector reduction add
2010 Nov 20
3
[LLVMdev] Poor floating point optimizations?
On Nov 20, 2010, at 2:41 PM, Sdadsda Sdasdaas wrote: > And also the resulting assembly code is very poor: > > 00460013 movss xmm0,dword ptr [esp+8] > 00460019 movaps xmm1,xmm0 > 0046001C addss xmm1,xmm1 > 00460020 pxor xmm2,xmm2 > 00460024 addss xmm2,xmm1 > 00460028 addss xmm2,xmm0 > 0046002C movss dword ptr
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: >> >> You have already mentioned how the new shuffle lowering is missing >> some features; for example, you explicitly
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed. attached the updated patch to apply to svn/trunk. j -------------- next part -------------- A non-text attachment was scrubbed... Name: theora-mmx.patch.gz Type: application/x-gzip Size: 8648 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2004 Aug 06
5
SIMD interest
Greetings, <p>my apologies for putting this trash in the mailing list but the topic about SSE run-time option interested me pretty much. Looks like some people is really experienced on the topic. I would really appreciate if somebody could point me to good resources about SSE and Altivec (not necessarly on the net, I'm ready to invest some money if necessary). I already have intel
2018 Jan 19
7
how to search r-help?
I am new to this listand am unable to get the search tools listed on https://stat.ethz.ch/mailman/listinfo/r-help towork. What do people use to search the help archives? 1. The google search box on http://tolstoy.newcastle.edu.au/~rking/R/ returns a 404 error. 2. The http://finzi.psych.upenn.edu/ site has many references but I don't see how to search r-help from there. 3. The