thr3ads.net - search: "upper16"

[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels

2011 Nov 12

2

[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels

...tinuously]": Ltmp265: Lfunc_begin24: .loc 1 380 0 .loc 1 380 1 prologue_end push {r4, r5, r6, r7, lr} add r7, sp, #12 push.w {r8, r10, r11} vpush {d8} sub sp, #4 .loc 1 382 2 Ltmp266: movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC24_0+4)) Ltmp267: mov r4, r0 Ltmp268: movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC24_0+4)) movw r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_1+4)) movt r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_1+4)) LPC24_0: add r1, pc LPC24_1: add r0, pc ldr r1, [r1] ldr r0, [r0] blx _objc_msgSend movw r1, :lower16:(L_OBJC_SELECTOR...

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

2011 Feb 18

0

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote: > Hello everyone, > > I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". > Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Adding separate "s" instructions is

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

2011 Feb 18

2

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

Hello everyone, I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Besides, I propose the codegen optimization based on them, which removes the redundant comparison in patterns like orr

[LLVMdev] Simple NEON optimization

2010 Nov 12

2

[LLVMdev] Simple NEON optimization

...es (just a bit) the code generated. The case is simple: uint32x2_t x, res; res = vceq_u32(x, vcreate_u32(0)); This will generate the following code: ; zero d16 vmov.i32 d16, #0x0 ; load a into d17 movw r0, :lower16:a movt r0, :upper16:a vld1.32 {d17}, [r0] ; compare two registers vceq.i32 d17, d17, d16 But, because the vector is zero, and there is a NEON instruction to compare against an immediate zero (VCEQZ), we could combine the two instructions: ; load a into d17 movw r0, :...

[LLVMdev] A bug in LLVM-GCC 4.2 with inlining __exchange_and_add

2013 Feb 03

2

[LLVMdev] A bug in LLVM-GCC 4.2 with inlining __exchange_and_add

...add r7, sp, #1200000004 e92d0d00 stmdb sp!, {r8, sl, fp}00000008 ed2d8b10 vstmdb sp!, {d8-d15}0000000c b094 sub sp, #800000000e f2405088 movw r0, :lower16:__ZN5boost10statechart6detail9id_holderI10EvActivateE11idProvider_E-0x24+0xfffffffc00000012 2300 movs r3, #000000014 f2c00000 movt r0, :upper16:__ZN5boost10statechart6detail9id_holderI10EvActivateE11idProvider_E-0x24+0xfffffffc00000018 f2407140 movw r1, :lower16:0x770-0x2c+0xfffffffc0000001c f2c00100 movt r1, :upper16:0x770-0x2c+0xfffffffc00000020 f24052c8 movw r2, :lower16:__ZTV10EvActivate-0x34+0xfffffffc00000024 4478 add r0, pc00000...

[LLVMdev] Is shortening a load a bug?

2014 Sep 11

2

[LLVMdev] Is shortening a load a bug?

...32-i64:32:32" target triple = "thumbv7m-unknown-unknown" @f = external global i32 define zeroext i8 @bar() nounwind { L.0: %rv.0 = alloca i8 %0 = load i32* @f %1 = trunc i32 %0 to i8 ret i8 %1 } ---- Which for the arm cortex-m3 generates: ---- bar: movw r0, :lower16:f movt r0, :upper16:f ldrb r0, [r0] bx lr ---- Although we are only interested in low 8-bits, the load MUST be a 32-bit load. Using a "load volatile" fixes this, but this is overkill as the memory location is not volatile. Am I missing something, or is this a bug? brian

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 15

0

[LLVMdev] MI scheduler produce badly code with inline function

On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote: > Hi all, > I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched > > The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. A bug for this is welcome. Pretty soon, I’ll

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 14

2

[LLVMdev] MI scheduler produce badly code with inline function

Hi all, I meet this problem when compiling the TREAM benchmark ( http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. so I rewrite a simple code as attached link (foo.c), and compiled with two different methods: *method A:* *$clang -O3 foo.c -static -S

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

2011 Jan 10

2

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

...e nontrivial, so what I will do is commit the "hack" patch to 8721 separately, and then the main patch, as 8721 is blocking the testing. The interim hack for 8721 can then be rolled back separately once someone (ddunbar? pdox? me? :) get around to refactoring MCExpr so that :lower16: and :upper16: can apply to arbitrary expressions. Thanks -jason

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

2011 Jan 10

2

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

...te subject? Hi Renato, If I am understanding you correctly, then the answer is no, because .s output doesn't care about relocations per se... BUT.. its also yes because sometimes, the asmwriter will sometimes need to generate sequences like below foo: movw r0, :lower16:bar-foo movt r0, :upper16:bar-foo The subtraction implies that the value bar-foo is implicitly pc-relative (at least according to GNU as). Thanks! -jason

[LLVMdev] Simple NEON optimization

2010 Nov 12

0

[LLVMdev] Simple NEON optimization

...simple: > > uint32x2_t x, res; > res = vceq_u32(x, vcreate_u32(0)); > > This will generate the following code: > > ; zero d16 > vmov.i32 d16, #0x0 > ; load a into d17 > movw r0, :lower16:a > movt r0, :upper16:a > vld1.32 {d17}, [r0] > ; compare two registers > vceq.i32 d17, d17, d16 > > But, because the vector is zero, and there is a NEON instruction to > compare against an immediate zero (VCEQZ), we could combine the two > instructions: > >...

[LLVMdev] question about alignment of structures on the stack (arm 32)

2015 Apr 20

2

[LLVMdev] question about alignment of structures on the stack (arm 32)

Dear community, I faced with code which was generated by llvm, assembly instructions of that code is relying on 8-bytes alignment for structures on the stack. The part of Objective C code is following: -(void)getCharacters:(unichar *)unicode { NSRange range; range.location = 0; range.length = [self length]; printf("%p, %p\n", &range.location, &range.length); And

[LLVMdev] [llvm-commits] [patch] ARM/MC/ELF add new stub for movt/movw in ARMFixupKinds

2010 Nov 17

1

[LLVMdev] [llvm-commits] [patch] ARM/MC/ELF add new stub for movt/movw in ARMFixupKinds

...hod string to declare a special case handler. At the current time, for the assembly printing, MCAsmStreamer::EmitInstruction(const MCInst &Inst) calls out to MCExpr::print(raw_ostream &OS) which then calls out to MCSymbolRefExpr::getVariantKindName() to print the magic :lower16: and :upper16: asm tags for .s emission Currently, movt/movw emission works correctly in .s, but not in .o emission This lead me to believe that the correct place to put the code to handle MCSymbolRefExpr::VK_ARM_(HI||LO)16 for the .o path was to place a case in getMachineOpValue() (i.e. not ARMMCCodeEmitter::g...

PBQP register allocation and copy propagation

2016 Jun 02

2

PBQP register allocation and copy propagation

...BQP allocator for Thumb-2 and have ran into a problem I'd love to get your input on. The problem is exemplfied in the codegen for the function @bar in the attached IR file: bar: push {r4, lr} sub sp, #12 (1) movw r2, :lower16:.L_MergedGlobals (1) movt r2, :upper16:.L_MergedGlobals ldm.w r2, {r0, r1, r3, r12, lr} ldrd r4, r2, [r2, #20] strd lr, r4, [sp] str r2, [sp, #8] (2) mov r2, r3 **** mov r3, r12 **** bl baz add sp, #12 pop {r4, pc} The tw...

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 16

3

[LLVMdev] MI scheduler produce badly code with inline function

....c -static -S -o foo.s -mllvm -unroll-count=4 -mcpu=cortex-a9 -fno-vectorize -fno-slp-vectorize --target=arm -mfloat-abi=hard -mllvm -enable-misched -mllvm -scheditins=false per-operand cost model : Scale: push {lr} movw r12, :lower16:c movw lr, :lower16:b movw r3, #9216 movt r12, :upper16:c mov r1, #0 vmov.f64 d16, #3.000000e+00 movt lr, :upper16:b movt r3, #244 .LBB0_1: add r0, r12, r1 add r2, lr, r1 *vldr d17, [r0]* add r1, r1, #32 vmul.f64 d17, d17, d16 cmp r1, r3 vstr d17, [r2] * vldr d17, [r0, #8]* vmul.f64 d17, d17, d16 * * vstr d17, [r2, #8]...

[LLVMdev] question about alignment of structures on the stack (arm 32)

2015 Apr 21

2

[LLVMdev] question about alignment of structures on the stack (arm 32)

...t: armv7l-unknown-linux-gnueabi Thread model: posix ----- And we get following code of assembler language: main: push {r11, lr} mov r11, sp sub sp, sp, #24 mov r0, #0 str r0, [r11, #-4] add r1, sp, #8 movw r2, :lower16:.Lmain.mStruct movt r2, :upper16:.Lmain.mStruct vldr d16, [r2] vstr d16, [sp, #8] orr r2, r1, #4 movw r3, :lower16:.L.str movt r3, :upper16:.L.str str r0, [sp, #4] mov r0, r3 bl printf ldr r1, [sp, #4] str r0, [sp] mov r0, r1 mov sp, r11 pop ...

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

2011 Jan 10

0

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

On 10 January 2011 22:59, Jason Kim <jasonwkim at google.com> wrote: > Hi everyone, happy new year. > > This note is to announce that support for PC relative reloc tags for > movw/movt is nearing completion (hopefully <48hrs!). This work is is > from Jan Voung, David Meyer and myself. Hi Jason, Happy new year! That seems a long patch... with many changes... can't

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

2011 Jan 11

0

[LLVMdev] ARM/MC/ELF Support for pcrel movw/movt coming soon

...arget1) and exception handling table symbols (prel31) are clearly disregarded by gas and subsequently discarded by armlink. > its also yes because sometimes, the asmwriter will sometimes need to > generate sequences like below > > foo: > movw r0, :lower16:bar-foo > movt r0, :upper16:bar-foo > > The subtraction implies that the value bar-foo is implicitly > pc-relative (at least according to GNU as). That was the other part of my question: will your new MC-relocationator also print the current ASM relocations? ;) cheers, --renato

PBQP register allocation and copy propagation

2016 Jun 03

2

PBQP register allocation and copy propagation

...BQP allocator for Thumb-2 and have ran into a problem I'd love to get your input on. The problem is exemplfied in the codegen for the function @bar in the attached IR file: bar: push {r4, lr} sub sp, #12 (1) movw r2, :lower16:.L_MergedGlobals (1) movt r2, :upper16:.L_MergedGlobals ldm.w r2, {r0, r1, r3, r12, lr} ldrd r4, r2, [r2, #20] strd lr, r4, [sp] str r2, [sp, #8] (2) mov r2, r3 **** mov r3, r12 **** bl baz add sp, #12 pop {r4, pc} The tw...

[LLVMdev] ARM assembler's syntax in clang

2013 Mar 08

0

[LLVMdev] ARM assembler's syntax in clang

...thumb .thumb_func foo: /* these lines are from compiler's assembly output($(CC) -S): * extern int data_table[]; * int *wheres_data_table(void) { * return &data_table[0]; * } */ movw r1, :lower16:(L_data_table$non_lazy_ptr-(LPC0_0+4)) movt r1, :upper16:(L_data_table$non_lazy_ptr-(LPC0_0+4)) LPC0_0: add r1, pc ldr r1, [r1] bx lr .section __DATA,__nl_symbol_ptr,non_lazy_symbol_pointers .align 2 L_data_table$non_lazy_ptr: .indirect_symbol _data_table .long 0 .subsections_via_symbols /* ==e...

search for: upper16