thr3ads.net - similar to: "[LLVMdev] Missuse of xmm register on X86-64"

Displaying 20 results from an estimated 1200 matches similar to: "[LLVMdev] Missuse of xmm register on X86-64"

[LLVMdev] Scheduled Instructions go missing

2010 May 19

[LLVMdev] Scheduled Instructions go missing

All, I'm working on a new scheduler. I have a basic block for which my scheduler generates bad code. The C code looks like int j, *p; if ((j = *p++) != 0) {...} My scheduler emits (x86, AT&T) mov p, %rax mov (%rax), %rax mov %rax, j addq $0x04, p je ... Notice there is no test instruction. The default list scheduler generates mov p, %rax mov (%rax), %rax mov %rax, j addq $0x04,

[LLVMdev] Instruction Itineraries

2010 Feb 04

[LLVMdev] Instruction Itineraries

All, I am working on a scheduler for X86 and would like to include instruction latencies. It appears that this information is gathered from instruction itineraries, but that there isn't an itinerary for X86. I also can't seem to find documentation on how to add this for X86. Any pointers would be helpfull. Aran -------------- next part -------------- A non-text attachment was

[LLVMdev] X86 Target instruction definitions

2010 Mar 26

[LLVMdev] X86 Target instruction definitions

All, Where are the SSE instructions defined? Specifically, I cannot find the def for ADDSDrr. Aran -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 194 bytes Desc: This is a digitally signed message part. URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100326/130b2e02/attachment.sig>

[LLVMdev] llvm-gcc 4.2

2010 Feb 13

[LLVMdev] llvm-gcc 4.2

All, I'm trying to build llvm-gcc 4.2 from svn (as of about a week ago). I'm getting: ../../llvm-gcc-4.2/libcpp/expr.c: In function 'num_negate': ../../llvm-gcc-4.2/libcpp/expr.c:1114: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. I would like to do some debugging, but I don't see where

[LLVMdev] X86 - Help on fixing a poor code generation bug

2013 Dec 05

[LLVMdev] X86 - Help on fixing a poor code generation bug

Hi all, I noticed that the x86 backend tends to emit unnecessary vector insert instructions immediately after sse scalar fp instructions like addss/mulss. For example: ///////////////////////////////// __m128 foo(__m128 A, __m128 B) { _mm_add_ss(A, B); } ///////////////////////////////// produces the sequence: addss %xmm0, %xmm1 movss %xmm1, %xmm0 which could be easily optimized into

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

[LLVMdev] Poor floating point optimizations?

And also the resulting assembly code is very poor: 00460013 movss xmm0,dword ptr [esp+8] 00460019 movaps xmm1,xmm0 0046001C addss xmm1,xmm1 00460020 pxor xmm2,xmm2 00460024 addss xmm2,xmm1 00460028 addss xmm2,xmm0 0046002C movss dword ptr [esp],xmm2 00460031 fld dword ptr [esp] Especially pxor&and instead of movss (which is

[LLVMdev] X86 - Help on fixing a poor code generation bug

2013 Dec 05

[LLVMdev] X86 - Help on fixing a poor code generation bug

Hi Andrea, Thanks for working on this. I can see two approaches to solving this problem. The first one (that you suggested) is to catch this pattern after register allocation. The second approach is to eliminate this redundancy during instruction selection. Can you please look into catching this pattern during iSel? The idea is that ADDSS does an ADD plus BLEND operations, and you can easily

[LLVMdev] llvm-gcc + abi stuff

2008 Jan 24

[LLVMdev] llvm-gcc + abi stuff

<moving this to llvmdev instead of commits> On Jan 22, 2008, at 11:23 PM, Duncan Sands wrote: >> Okay, well we already get many other x86-64 issues wrong already, but >> Evan is chipping away at it. How do you pass an array by value in C? >> Example please, > > I find the x86-64 ABI hard to interpret, but it seems to say that > aggregates are classified

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Hi, I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are generated when input arrays are integer, but not when they are float or double. If I modify the float example in http://llvm.org/docs/Vectorizers.html by adding restrict to the input arrays packed instructions are generated. Although it should not be required I tried

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

[PATCH] Make SSE Run Time option. Add Win32 SSE code

All, Attached is a patch that does two things. First it makes the use of the current SSE code a run time option through the use of speex_decoder_ctl() and speex_encoder_ctl It does this twofold. First there is a modification to the configure.in script which introduces a check based upon platform. It will compile in the sse assembly if you are on an i?86 based platform by making a

SCO 5.0.5 (i686-pc-sco3.2v5.0.5), scp and the -n option

2001 Feb 06

SCO 5.0.5 (i686-pc-sco3.2v5.0.5), scp and the -n option

Ok, using openssh-SNAP-20010126.tar.gz, two versions of the server both compiled with the configure commands as below, one with USE_PIPES defined and one without. This is on SCO OpenServer 5.0.5 (using SCO dev environment, SCO make, etc.) The client is always linux, openssh 2.3.0p1. export CCFLAGS='-L/usr/local/lib -I/usr/local/include' ./configure --sysconfdir=/etc/ssh

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating point operations. Thanks, Nadav On Apr 3, 2013, at 10:29 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> wrote: > Hi, > > I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

[LLVMdev] Poor floating point optimizations?

I wanted to use LLVM for my math parser but it seems that floating point optimizations are poor. For example consider such C code: float foo(float x) { return x+x+x; } and here is the code generated with "optimized" live demo: define float @foo(float %x) nounwind readnone { entry: %0 = fmul float %x, 2.000000e+00 ; <float> [#uses=1] %1 = fadd float %0, %x

Trouble when suppressing a portion of fast-math-transformations

2017 Sep 29

Trouble when suppressing a portion of fast-math-transformations

Hi all, In a mailing-list post last November: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html I raised some concerns that having the IR-level fast-math-flag 'fast' act as an "umbrella" to implicitly turn on all the lower-level fast-math-flags, causes some fundamental problems. Those fundamental problems are related to situations where a user wants to

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 04

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Thanks, that did it! Are there any plans to enable the loop vectorizer by default? From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Wednesday, April 03, 2013 13:33 PM To: Nowicki, Tyler Cc: LLVM Developers Mailing List Subject: Re: Packed instructions generaetd by LoopVectorize? Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating

TypePromoteFloat loses intermediate rounding operations

2019 Dec 10

TypePromoteFloat loses intermediate rounding operations

For the following C code __fp16 x, y, z, w; void foo() { x = y + z; x = x + w; } clang produces IR that extends each operand to float and then truncates to half before assigning to x. Like this define dso_local void @foo() #0 !dbg !18 { %1 = load half, half* @y, align 2, !dbg !21 %2 = fpext half %1 to float, !dbg !21 %3 = load half, half* @z, align 2, !dbg !22 %4 = fpext half %3 to float, !dbg

TypePromoteFloat loses intermediate rounding operations

2019 Dec 10

TypePromoteFloat loses intermediate rounding operations

Thanks Eli. I forgot to bring up the strict FP questions which I was working on when I found this. If we're in a strict FP function, do the fp_to_f16/f16_to_fp emitted by promoting load/store/bitcast need to be strict versions of fp_to_f16/f16_to_fp. And if so where do we get the chain, especially for the bitcast case which isn't a chained node. ~Craig On Tue, Dec 10, 2019 at 3:18 PM

[LLVMdev] Passing a 256 bit integer vector with XMM registers

2013 Sep 20

[LLVMdev] Passing a 256 bit integer vector with XMM registers

I am implementing a new calling convention for X86 which requires to pass a 256 bit integer vector with two XMM registers rather than one YMM register. For example define <8 x i32> @add(<8 x i32> %a, <8 x i32> %b) { %add = add <8 x i32> %a, %b ret <8 x i32> %add } With march=X86-64 and mcpu=corei7-avx, llc with the default calling convention generates the

[LLVMdev] State of Loop Unrolling and Vectorization in LLVM

2013 Apr 15

[LLVMdev] State of Loop Unrolling and Vectorization in LLVM

Hi , I have a test case (and a micro benchmark made out of the test case) to check if loop unrolling and loop vectorization is efficiently done on LLVM. Here is the test case (credits: Tyler Nowicki) {code} extern float * array; extern int array_size; float g() { int i; float total = 0; for(i = 0; i < array_size; i++) { total += array[i]; } return total; } {code} When

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

[LLVMdev] Poor floating point optimizations?

On Nov 20, 2010, at 2:41 PM, Sdadsda Sdasdaas wrote: > And also the resulting assembly code is very poor: > > 00460013 movss xmm0,dword ptr [esp+8] > 00460019 movaps xmm1,xmm0 > 0046001C addss xmm1,xmm1 > 00460020 pxor xmm2,xmm2 > 00460024 addss xmm2,xmm1 > 00460028 addss xmm2,xmm0 > 0046002C movss dword ptr

similar to: [LLVMdev] Missuse of xmm register on X86-64