thr3ads.net - similar to: "[LLVMdev] Miscompilation on MingW32"

[LLVMdev] Misaligned SSE store problem (with reduced source)

2011 Nov 11

3

[LLVMdev] Misaligned SSE store problem (with reduced source)

Using LLVM 2.9, the following LLVM IR produces invalid x86 32 bit assembly (a misaligned SSE store). ; ModuleID = 'MisalignedStore' define void @MisalignedStore() nounwind readnone { entry: %v = alloca <4 x float>, align 16 store <4 x float> zeroinitializer, <4 x float>* %v, align 16 br label %post-block post-block: %f = alloca float ret void } If I feed

[LLVMdev] Misaligned SSE store problem (with reduced source)

2011 Nov 11

0

[LLVMdev] Misaligned SSE store problem (with reduced source)

On Thu, Nov 10, 2011 at 6:13 PM, Aaron Dwyer <Aaron.Dwyer at imgtec.com> wrote: > Using LLVM 2.9, the following LLVM IR produces invalid x86 32 bit assembly > (a misaligned SSE store). > ; ModuleID = 'MisalignedStore' > define void @MisalignedStore() nounwind readnone { > entry: > %v = alloca <4 x float>, align 16 > store <4 x float>

[LLVMdev] Build issues on Solaris

2009 Aug 25

2

[LLVMdev] Build issues on Solaris

On 19/08/2009, at 4:00 AM, Anton Korobeynikov wrote: > Hello, Nathan > >> or if it should be a configure test, which might be safer. Are there >> any x86 platforms (other than apple) that don't need PLT-indirect >> calls? > Yes, mingw. However just tweaking the define is not enough - we're not Ok, so configure might be the way to go then, maybe something

[LLVMdev] Runtime linker issue wtih X11R6 on i386 with -O3 optimization

2012 Mar 20

0

[LLVMdev] Runtime linker issue wtih X11R6 on i386 with -O3 optimization

I was told that my writeup lacked an example and details so I reproduced the code that X uses and I was able to boil down the issue to a couple of lines of code. Sorry again for the length of this email. Code was compiled on OpenBSD with clang 3.0-release. ======================================================================== With -O0 which works as X expects:

How to call an (x86) cleanup/catchpad funclet

2016 Apr 04

2

How to call an (x86) cleanup/catchpad funclet

I've modified llvm to emit vc++ compatible SEH structures for my personality on x86/Windows and my handler works fine, but the only thing I can't figure out is how to call these funclets, they look like: Catch: "?catch$3@?0?m3 at 4HA": LBB4_3: # %BasicBlock26 pushl %ebp pushl %eax addl $12, %ebp movl %esp, -28(%ebp) movl $LBB4_5, %eax

[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

2014 Dec 21

2

[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

Which performance guidelines are you referring to? I'm not that familiar with decade-old CPUs, but to the best of my knowledge, this is not true on current hardware. There is one specific circumstance where PUSHes should be avoided - for Atom/Silvermont processors, the memory form of PUSH is inefficient, so the register-freeing optimization below may not be profitable (see 14.3.3.6 and

[klibc 24/43] i386 support for klibc

2006 Jun 26

0

[klibc 24/43] i386 support for klibc

The parts of klibc specific to the i386 architecture. Signed-off-by: H. Peter Anvin <hpa at zytor.com> --- commit bd0599e5290ca1a16bb7a68f7c362d395c612eb3 tree 8f33afdd02a14c22e7a3984da2bad13184e3f729 parent 84f6a72f42cf41e32daa59871a0b5424572093e4 author H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun 2006 16:58:21 -0700 committer H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun

[LLVMdev] operator overloading fails while debugging with gdb for i386

2012 Dec 01

0

[LLVMdev] operator overloading fails while debugging with gdb for i386

Problem seems not only with operator overloading, It occurs with struct value returning also. gdb while debugging expects the return value in eax, gcc does returns in eax, But Clang returns in edx(it can be checked in gdb by printing the contents of edx). Code(sample code) struct A1 { int x; int y; }; A1 sum(const A1 one, const A1 two) { A1 plus = {0,0}; plus.x = one.x + two.x; plus.y

[LLVMdev] Suboptimal code due to excessive spilling

2012 Apr 05

0

[LLVMdev] Suboptimal code due to excessive spilling

I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround? /Patrik Hägglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker Sent: den 28 mars 2012 03:18 To: llvmdev Subject: [LLVMdev] Suboptimal code due to excessive spilling Hi, I have run into the following strange behavior

[PATCH] (with benchmarks) binary patching of paravirt_ops call sites

2007 Apr 18

1

[PATCH] (with benchmarks) binary patching of paravirt_ops call sites

Hi all, Sorry for the delay. This implements binary patching of call sites for interrupt-related paravirt ops, since no-doubt Andi wasn't the only one to believe this approach is slow. The benchmarks were done on a UP 3GHz Pentium 4 with 512MB of RAM. 2.6.17-rc4 vs 2.6.17-rc4 with CONFIG_PARAVIRT=y vs 2.6.17-rc4 CONFIG_PARAVIRT=y with patch. Summary: with binary patching, the difference

[PATCH] (with benchmarks) binary patching of paravirt_ops call sites

2007 Apr 18

1

[PATCH] (with benchmarks) binary patching of paravirt_ops call sites

Hi all, Sorry for the delay. This implements binary patching of call sites for interrupt-related paravirt ops, since no-doubt Andi wasn't the only one to believe this approach is slow. The benchmarks were done on a UP 3GHz Pentium 4 with 512MB of RAM. 2.6.17-rc4 vs 2.6.17-rc4 with CONFIG_PARAVIRT=y vs 2.6.17-rc4 CONFIG_PARAVIRT=y with patch. Summary: with binary patching, the difference

[LLVMdev] Suboptimal code due to excessive spilling

2012 Mar 28

2

[LLVMdev] Suboptimal code due to excessive spilling

Hi, I have run into the following strange behavior and wanted to ask for some advice. For the C program below, function sum() gets inlined in foo() but the code generated looks very suboptimal (the code is an extract from a larger program). Below I show the 32-bit x86 assembly as produced by the demo page on the llvm home page ("Output A"). As you can see from the assembly, after

[LLVMdev] operator overloading fails while debugging with gdb for i386

2012 Dec 02

0

[LLVMdev] operator overloading fails while debugging with gdb for i386

Hi, As you told that function ends up returning void, I just confirmed it in the IR, the function is defined as: define *void* @_Z3sum2A1S_(*%struct.A1* noalias sret %agg.result*, %struct.A1* byval align 4 %one, %struct.A1* byval align 4 %two). But when i checked the register values in g++, eax contains an address of stack, which points to the value (object) returned by sum. That is if we

[LLVMdev] GVNPRE /PRE is not effective

2013 Dec 13

0

[LLVMdev] GVNPRE /PRE is not effective

Hi All, The PRE or GVNPRE is not effective for the below use case. int sum; int phi =30; void f (int i, int *a) { if ((a[i] << (1)) > -15) sum =(phi+ 0x7fffffffL )/ a[i]; if ((a[i] << (2)) > -15) sum =(phi + 0x7fffffffL) /a[i]; } respective asm (clang on trunk ) #clang -O3 -S test.c BB#0: # %entry pushl %edi pushl %esi

Patch: use .pushsection/.popsection

2007 Apr 18

1

Patch: use .pushsection/.popsection

I think this might fix the X bug... -------------- next part -------------- diff -r e698e6ee2fa1 arch/i386/kernel/entry.S --- a/arch/i386/kernel/entry.S Tue Aug 08 10:18:34 2006 -0700 +++ b/arch/i386/kernel/entry.S Tue Aug 08 10:36:17 2006 -0700 @@ -162,17 +162,17 @@ 2: popl %es; \ 2: popl %es; \ CFI_ADJUST_CFA_OFFSET -4;\ /*CFI_RESTORE es;*/\ -.section .fixup,"ax"; \ +.pushsection

Patch: use .pushsection/.popsection

2007 Apr 18

1

Patch: use .pushsection/.popsection

I think this might fix the X bug... -------------- next part -------------- diff -r e698e6ee2fa1 arch/i386/kernel/entry.S --- a/arch/i386/kernel/entry.S Tue Aug 08 10:18:34 2006 -0700 +++ b/arch/i386/kernel/entry.S Tue Aug 08 10:36:17 2006 -0700 @@ -162,17 +162,17 @@ 2: popl %es; \ 2: popl %es; \ CFI_ADJUST_CFA_OFFSET -4;\ /*CFI_RESTORE es;*/\ -.section .fixup,"ax"; \ +.pushsection

[LLVMdev] operator overloading fails while debugging with gdb for i386

2012 Dec 01

2

[LLVMdev] operator overloading fails while debugging with gdb for i386

Hi, Structures are passed by pointer, so the return value is not actually in eax. That code gets transformed into something like: void sum(A1 *out, const A1 one, const A1 two) { out->x = one.x + two.x out->y = one.y + two.y } So actually the function ends up returning void and operating on a hidden parameter, so %eax is dead at the end of the function and should not be being relied

[LLVMdev] GCC vs. LLVM difference on simple code example

2011 Mar 24

2

[LLVMdev] GCC vs. LLVM difference on simple code example

Hi, I have a question on why gcc and llvm-gcc compile the following simple code snippet differently: extern int a; extern int *b; void foo() { int i; for (i = 1; i < 100; ++i) a += b[i]; } gcc compiles this function hoisting the load of the global variable "b" outside of the loop, while llvm-gcc keeps it inside the loop. This results in slower code on the part of

[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

2014 Dec 21

5

[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

Hello all, In r223757 I've committed a patch that performs, for the 32-bit x86 calling convention, the transformation of MOV instructions that push function arguments onto the stack into actual PUSH instructions. For example, it will transform this: subl $16, %esp movl $4, 12(%esp) movl $3, 8(%esp) movl $2, 4(%esp) movl $1, (%esp) calll _func addl $16, %esp

changing definition of paravirt_ops.iret

2007 May 21

2

changing definition of paravirt_ops.iret

I'm implementing a more efficient version of the Xen iret paravirt_op, so that it can use the real iret instruction where possible. I really need to get access to per-cpu variables, so I can set the event mask state in the vcpu_info structure, but unfortunately at the point where INTERRUPT_RETURN is used in entry.S, the usermode %fs has already been restored. How would you feel if we changed

similar to: [LLVMdev] Miscompilation on MingW32