thr3ads.net - similar to: "[LLVMdev] A bug in LLVM-GCC 4.2 with inlining __exchange_and

Displaying 20 results from an estimated 200 matches similar to: "[LLVMdev] A bug in LLVM-GCC 4.2 with inlining __exchange_and_add"

[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels

2011 Nov 12

[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels

This would be best reported to Apple's Radar bug database at http://bugreport.apple.com/ but its whole website has been down for a while. I have a 100% reproducible Thumb-2 code generation error that occurs at all of the levels of optimization available in the Xcode 4.2 for Snow Leopard build settings GUI: -O0, -O1, -O2, -O3 and -Os. However the bad machine code only occurs in Release

[LLVMdev] Simple NEON optimization

2010 Nov 12

[LLVMdev] Simple NEON optimization

Hi folks, me again, So, I want to implement a simple optimization in a NEON case I've seen these days, most as a matter of exercise, but it also simplifies (just a bit) the code generated. The case is simple: uint32x2_t x, res; res = vceq_u32(x, vcreate_u32(0)); This will generate the following code: ; zero d16 vmov.i32 d16, #0x0 ; load a

[LLVMdev] Simple NEON optimization

2010 Nov 12

[LLVMdev] Simple NEON optimization

On Nov 12, 2010, at 7:23 AM, Renato Golin wrote: > Hi folks, me again, > > So, I want to implement a simple optimization in a NEON case I've seen > these days, most as a matter of exercise, but it also simplifies (just > a bit) the code generated. > > The case is simple: > > uint32x2_t x, res; > res = vceq_u32(x, vcreate_u32(0)); > > This

[LLVMdev] Is shortening a load a bug?

2014 Sep 11

[LLVMdev] Is shortening a load a bug?

When the IR specifies a 32 bit load can it be changed to a narrower load? What if the load is from memory (e.g. a peripheral) that only supports 32-bit access? Consider the following IR: ---- target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:32" target triple = "thumbv7m-unknown-unknown" @f = external global i32 define zeroext i8 @bar() nounwind { L.0:

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

2011 Jan 10

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

Hi everyone, happy new year. This note is to announce that support for PC relative reloc tags for movw/movt is nearing completion (hopefully <48hrs!). This work is is from Jan Voung, David Meyer and myself. Unfortunately, to test this change, we need to patch ARM/AsmParser to address http://llvm.org/bugs/show_bug.cgi?id=8721 Locally, we have hacked up a solution to 8721, but its not ideal

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

2011 Jan 10

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

-llvmcommits On Mon, Jan 10, 2011 at 3:21 PM, Renato Golin <renato.golin at arm.com> wrote: > Btw, I know this is for ELF printing, but can the same infrastructure > you're using to print the hi/lo be used to print relocation in Asm > output? Or is this a completely separate subject? Hi Renato, If I am understanding you correctly, then the answer is no, because .s output

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 16

[LLVMdev] MI scheduler produce badly code with inline function

Hi Andy, thanks for your help!! The scheduled code by method A is same as B when using the new machine model. it's make sense, but there is the another problem, the scheduled code is badly. load/store instruction always reuse the same register Source: #define N 2000000 static double b[N], c[N]; void Scale () { double scalar = 3.0; for (int j=0;j<N;j++) b[j] =

[LLVMdev] question about alignment of structures on the stack (arm 32)

2015 Apr 21

[LLVMdev] question about alignment of structures on the stack (arm 32)

Hello Tim, thanks for response ---------------------------------------- > Date: Mon, 20 Apr 2015 11:45:03 -0700 > Subject: Re: [LLVMdev] question about alignment of structures on the stack (arm 32) > From: t.p.northover at gmail.com > To: alexey.perevalov at hotmail.com > CC: llvmdev at cs.uiuc.edu > > On 20 April 2015 at 11:09, Alexey Perevalov > <alexey.perevalov at

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

2011 Jan 11

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

On 10 January 2011 23:54, Jason Kim <jasonwkim at google.com> wrote: > If I am understanding you correctly, then the answer is no, because .s > output doesn't care about relocations per se... Hi Jason, That's not entirely true. ;) If you only use the GNU toolchain, that is correct. However, CodeSourcery's GCC changed a bit on how it works for ARM because the ARM toolchain

[LLVMdev] [llvm-commits] [patch] ARM/MC/ELF add new stub for movt/movw in ARMFixupKinds

2010 Nov 17

[LLVMdev] [llvm-commits] [patch] ARM/MC/ELF add new stub for movt/movw in ARMFixupKinds

+llvmdev -llvmcommits On Fri, Nov 12, 2010 at 8:03 AM, Jim Grosbach <grosbach at apple.com> wrote: > Sorta. getBinaryCodeForInst() is auto-generated by tablegen, so shouldn't be modified directly. The target can register hooks for instruction operands for any special encoding needs, including registering fixups, using the EncoderMethod string. For an example, have a look at the

[LLVMdev] Is shortening a load a bug?

2014 Sep 12

[LLVMdev] Is shortening a load a bug?

On 09/11/2014 05:33 PM, Quentin Colombet wrote: > Hi Brian, > > On Sep 11, 2014, at 3:03 PM, Bagel <bagel99 at gmail.com> wrote: > >> When the IR specifies a 32 bit load can it be changed to a narrower >> load? What if the load is from memory (e.g. a peripheral) that only >> supports 32-bit access? Consider the following IR: ---- target datalayout >> =

PBQP register allocation and copy propagation

2016 Jun 02

PBQP register allocation and copy propagation

Hi Lang and Arnaud, I've been testing out the PBQP allocator for Thumb-2 and have ran into a problem I'd love to get your input on. The problem is exemplfied in the codegen for the function @bar in the attached IR file: bar: push {r4, lr} sub sp, #12 (1) movw r2, :lower16:.L_MergedGlobals (1) movt r2, :upper16:.L_MergedGlobals ldm.w r2,

[LLVMdev] ARM assembler's syntax in clang

2013 Mar 08

[LLVMdev] ARM assembler's syntax in clang

> And be warned that the PC doesn't point at the next instruction when you use it like this - I believe you don't need to modify it at all if you swap the pop and the .long. Bernie, is it related to ARM pipeline? I'm interesting in this, is there any other additional information? On Fri, Mar 8, 2013 at 4:59 AM, Tim Northover <t.p.northover at gmail.com>wrote: > Hi Ashi,

PBQP register allocation and copy propagation

2016 Jun 03

PBQP register allocation and copy propagation

Hi James, I’ve tried to play in the past with the allocation order, which can definitely be tweaked and improved. The metric we use for spill cost being what it is (i.e. not targeted for PBQP, but that’s a different subject), I found it made real sense to use some other heuristics to sort nodes (on top of the spill cost) when there was a tie between them. Of course, that’s a heuristic and it can

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

2011 Feb 18

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote: > Hello everyone, > > I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". > Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Adding separate "s" instructions is

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

2011 Feb 18

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

Hello everyone, I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Besides, I propose the codegen optimization based on them, which removes the redundant comparison in patterns like orr

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 15

[LLVMdev] MI scheduler produce badly code with inline function

On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote: > Hi all, > I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched > > The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. A bug for this is welcome. Pretty soon, I’ll

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 14

[LLVMdev] MI scheduler produce badly code with inline function

Hi all, I meet this problem when compiling the TREAM benchmark ( http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. so I rewrite a simple code as attached link (foo.c), and compiled with two different methods: *method A:* *$clang -O3 foo.c -static -S

PBQP register allocation and copy propagation

2016 Jun 03

PBQP register allocation and copy propagation

Hi, > > I think one idea to improve the situation is to consider the cost vector of adjacent nodes during RN. Let's say you decided to do a RN for node A and want to compute the costs for choosing register %Ri. The current implementation does this by computing min(row/column i of edge A <--> B) but you can do better by adding B's cost vector to the row/column before computing

[LLVMdev] question about alignment of structures on the stack (arm 32)

2015 Apr 20

[LLVMdev] question about alignment of structures on the stack (arm 32)

Dear community, I faced with code which was generated by llvm, assembly instructions of that code is relying on 8-bytes alignment for structures on the stack. The part of Objective C code is following: -(void)getCharacters:(unichar *)unicode { NSRange range; range.location = 0; range.length = [self length]; printf("%p, %p\n", &range.location, &range.length); And

similar to: [LLVMdev] A bug in LLVM-GCC 4.2 with inlining __exchange_and_add