thr3ads.net - search: "ldrh"

2017 Dec 06

2

[LLD] Slow callstacks in gdb

...With most current host architectures handling packed_endian_specific_integral is fairly efficient. For example, on x86_64 reading 32 bits with 1 2 and 4 byte alignment produces in all cases: movl (%rdi), %eax But on armv6 the aligned case is ldr r0, [r0] the 2 byte aligned case is ldrh r1, [r0, #2] ldrh r0, [r0] orr r0, r0, r1, lsl #16 and the unaligned case is ldrb r1, [r0] ldrb r2, [r0, #1] ldrb r3, [r0, #2] ldrb r0, [r0, #3] orr r1, r1, r2, lsl #8 orr r0, r3, r0, lsl #8 orr r0, r1, r0, lsl #16 On armv7 it is a single ldr o...

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

2014 Feb 08

3

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

On Fri, 7 Feb 2014, Timothy B. Terriberry wrote: > Martin Storsjo wrote: >> This is required in order to build using the built-in assembler >> in clang. > > These patches break the gcc build (with "Error: bad instruction"). Ah, right, sorry about that. > Documentation I've seen is contradictory on which order ({cond}{size} or > {size}{cond}) is correct.

Sink redundant spill after RA

2018 Feb 22

2

Sink redundant spill after RA

...// 8-byte Folded Spill ldrsw x8, [x0, #4424] sxtw x10, w2 <------------- w2 is the use of spilled value before spill. sxtw x12, w1 madd x8, x8, x10, x12 ldr x9, [x0, #8] add x9, x9, x8, lsl #2 ldrh w11, [x9] ldrh w10, [x0, #16] str x2, [sp, #120] // 8-byte Folded Spill <------------- spill !!! cmp w11, w10 b.eq .LBB2_32 // %bb.1: // %if.end ldr x13, [sp, #120] // 8-byte Fol...

[LLVMdev] Custom Lowering of ARM zero-extending loads

2013 Mar 04

1

[LLVMdev] Custom Lowering of ARM zero-extending loads

Hi, For my research, I need to reshape the current ARM backend to support armv2a. Zero-extend half word load (ldrh) is not supported by armv2a, so I need to make the code generation to not generate ldrh instructions. I want to replace all those instances with a 32-bit load (ldr) and then and the result with 0xffff to mask out the upper bits. These are the modifications that I have made to accomplish that: 1....

Sink redundant spill after RA

2018 Feb 22

2

Sink redundant spill after RA

...gt; sxtw x10, w2 <------------- w2 is the > use of spilled value before spill. > > sxtw x12, w1 > > madd x8, x8, x10, x12 > > ldr x9, [x0, #8] > > add x9, x9, x8, lsl #2 > > ldrh w11, [x9] > > ldrh w10, [x0, #16] > > str x2, [sp, #120] // 8-byte Folded Spill > <------------- spill !!! > > cmp w11, w10 > > b.eq .LBB2_32 > > // %bb.1: // %if.end &gt...

[PATCH v2] arm: Use the UAL syntax for instructions

2014 Feb 08

0

[PATCH v2] arm: Use the UAL syntax for instructions

...--- a/celt/arm/celt_pitch_xcorr_arm.s +++ b/celt/arm/celt_pitch_xcorr_arm.s @@ -309,7 +309,7 @@ xcorr_kernel_edsp_process4_done SUBS r2, r2, #1 ; j-- ; Stall SMLABB r6, r12, r10, r6 ; sum[0] = MAC16_16(sum[0],x,y_0) - LDRGTH r14, [r4], #2 ; r14 = *x++ + LDRHGT r14, [r4], #2 ; r14 = *x++ SMLABT r7, r12, r10, r7 ; sum[1] = MAC16_16(sum[1],x,y_1) SMLABB r8, r12, r11, r8 ; sum[2] = MAC16_16(sum[2],x,y_2) SMLABT r9, r12, r11, r9 ; sum[3] = MAC16_16(sum[3],x,y_3) @@ -319,7 +319,7 @@ xcorr_kernel_edsp_process4_done...

Sink redundant spill after RA

2018 Feb 22

0

Sink redundant spill after RA

...// 8-byte Folded Spill ldrsw x8, [x0, #4424] sxtw x10, w2 <------------- w2 is the use of spilled value before spill. sxtw x12, w1 madd x8, x8, x10, x12 ldr x9, [x0, #8] add x9, x9, x8, lsl #2 ldrh w11, [x9] ldrh w10, [x0, #16] str x2, [sp, #120] // 8-byte Folded Spill <------------- spill !!! cmp w11, w10 b.eq .LBB2_32 // %bb.1: // %if.end Presumably there is a redefinition of x2 somewhere...

Sink redundant spill after RA

2018 Feb 22

0

Sink redundant spill after RA

...;------------- w2 is the > > use of spilled value before spill. > > > > sxtw x12, w1 > > > > madd x8, x8, x10, x12 > > > > ldr x9, [x0, #8] > > > > add x9, x9, x8, lsl #2 > > > > ldrh w11, [x9] > > > > ldrh w10, [x0, #16] > > > > str x2, [sp, #120] // 8-byte Folded Spill > > <------------- spill !!! > > > > cmp w11, w10 > > > > b.eq .LBB2_32 > > > > // %...

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

2014 Feb 07

3

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

...--- a/celt/arm/celt_pitch_xcorr_arm.s +++ b/celt/arm/celt_pitch_xcorr_arm.s @@ -309,7 +309,7 @@ xcorr_kernel_edsp_process4_done SUBS r2, r2, #1 ; j-- ; Stall SMLABB r6, r12, r10, r6 ; sum[0] = MAC16_16(sum[0],x,y_0) - LDRGTH r14, [r4], #2 ; r14 = *x++ + LDRHGT r14, [r4], #2 ; r14 = *x++ SMLABT r7, r12, r10, r7 ; sum[1] = MAC16_16(sum[1],x,y_1) SMLABB r8, r12, r11, r8 ; sum[2] = MAC16_16(sum[2],x,y_2) SMLABT r9, r12, r11, r9 ; sum[3] = MAC16_16(sum[3],x,y_3) @@ -319,7 +319,7 @@ xcorr_kernel_edsp_process4_done...

[LLD] Slow callstacks in gdb

2017 Dec 05

2

[LLD] Slow callstacks in gdb

Martin Richtarsky <s at martinien.de> writes: > Output looks as follows [1] Seems sh_offset is missing? That is what readelf prints as Off > [17] .rela.text RELA 0000000000000000 071423 001728 18 > 1 4 8 The offset of rela text should have been aligned, but it is not. Can you report a bug on icc? As a work around using the gnu assembler if possible

Speex crashing on ARM with assembler optimization enabled.

2007 Dec 12

2

Speex crashing on ARM with assembler optimization enabled.

...: blt 0x40030474 <open_loop_nbest_pitch+992> 0x40030328 <open_loop_nbest_pitch+660>: mov r6, r3 0x4003032c <open_loop_nbest_pitch+664>: mov r7, #0 ; 0x0 0x40030330 <open_loop_nbest_pitch+668>: ldr lr, [r11, #-84] 0x40030334 <open_loop_nbest_pitch+672>: ldrh r3, [r7, lr] 0x40030338 <open_loop_nbest_pitch+676>: smulbb r3, r3, r3 0x4003033c <open_loop_nbest_pitch+680>: mov r3, r3, lsl #16 0x40030340 <open_loop_nbest_pitch+684>: mov r12, r3, lsr #16 0x40030344 <open_loop_nbest_pitch+688>: ldr r3, [r4, r10, lsl #2] 0...

Speex crashing on ARM with assembler optimization enabled.

2007 Dec 12

2

Speex crashing on ARM with assembler optimization enabled.

Hi, I'm trying to get speex working on an ARM board (ARM926EJ-Sid(wb) core, ARM 5TE architecture) and getting segfaults if build with "--enable-fixed-point --enable-arm5e-asm" options. If I use just "--enable-fixed-point", then it runs fine, but once I add "--enable-arm5e-asm" it start crashing (I use testenc to test it). Further investigation showed, that it

Optimised qmf_synth and iir_mem16

2007 Dec 02

2

Optimised qmf_synth and iir_mem16

...stmia r9!, { r11-r12 } bne 0b @ Copy alternate members of mem1 and mem2 to last part of xx1 and xx2 mov r14, r5 @ Loop counter is M add r6, r6, #2 add r7, r7, #2 stmdb sp!, { r6-r7 } @ Stack &mem1[1], &mem2[1] 0: ldrh r10, [r6], #4 ldrh r11, [r6], #4 ldrh r12, [r7], #4 @ 1 cycle stall on Xscale orr r10, r10, r11, lsl #16 ldrh r11, [r7], #4 str r10, [r8], #4 subs r14, r14, #4 orr r11, r12, r11, lsl #16 str r11, [r9], #4 bne 0b sub...

[RFC] Half-Precision Support in the Arm Backends

2018 Jan 18

0

[RFC] Half-Precision Support in the Arm Backends

...en FullFP16 is not supported. This is best illustrated with this existing test which is a simple upconvert of f16 to f32: define float @test_extend32(half* %addr) { %val16 = load half, half* %addr %val32 = fpext half %val16 to float ret float %val32 } It should generate this code:: ldrh r0, [r0] ; integer half word load vmov s0, r0 vcvtb.f32.f16 s0, s0 vmov r0, s0 bx lr when we don't have the Armv8.2-A FP16 instructions available, and thus only have the conversion instructions. The problem is in the conversion rules, s...

[RFC] Half-Precision Support in the Arm Backends

2017 Dec 06

2

[RFC] Half-Precision Support in the Arm Backends

Thanks a lot for the suggestions! I will look into using vld1/vst1, sounds good. I am custom lowering the bitcasts, that's now the only place where FP_TO_FP16 and FP16_TO_FP nodes are created to avoid inefficient code generation. I will double check if I can't achieve the same without using these nodes (because I really would like to get completely rid of them). Cheers, Sjoerd.

[RFC] Half-Precision Support in the Arm Backends

2018 Jan 18

1

[RFC] Half-Precision Support in the Arm Backends

...en FullFP16 is not supported. This is best illustrated with this existing test which is a simple upconvert of f16 to f32: define float @test_extend32(half* %addr) { %val16 = load half, half* %addr %val32 = fpext half %val16 to float ret float %val32 } It should generate this code:: ldrh r0, [r0] ; integer half word load vmov s0, r0 vcvtb.f32.f16 s0, s0 vmov r0, s0 bx lr when we don't have the Armv8.2-A FP16 instructions available, and thus only have the conversion instructions. The problem is in the conversion rules, s...

[klibc 22/43] arm support for klibc

2006 Jun 26

0

[klibc 22/43] arm support for klibc

...0, #0 + strcs r2, [r3] + ldmfd sp!,{r4,r5,r7,pc} + + .balign 4 +1: + .word errno + +#else + /* Thumb version - must still load r4 and r5 and run swi */ + + .thumb_func + .balign 2 +__syscall_common: + mov r7, lr + ldr r4, [sp,#16] + sub r7, #1 /* Remove the Thumb bit */ + ldr r5, [sp,#20] + ldrh r7, [r7] + swi 0 + ldr r1, 2f + cmp r0, r1 + bcc 1f + ldr r1, 3f + neg r2, r0 + mov r0, #1 + str r2, [r1] + neg r0, r0 +1: + pop {r4,r5,r7,pc} + + .balign 4 +2: + .word -4095 +3: + .word errno + +#endif diff --git a/usr/klibc/arch/arm/sysstub.ph b/usr/klibc/arch/arm/sysstub.ph new file mode 100644...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 24

2

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...memset-with-neon.ll > llvm/trunk/test/CodeGen/ARM/2012-04-24-SplitEHCriticalEdge.ll > llvm/trunk/test/CodeGen/ARM/Windows/memset.ll > llvm/trunk/test/CodeGen/ARM/Windows/no-aeabi.ll > llvm/trunk/test/CodeGen/ARM/arm-eabi.ll > llvm/trunk/test/CodeGen/ARM/constantpool-promote-ldrh.ll > llvm/trunk/test/CodeGen/ARM/constantpool-promote.ll > llvm/trunk/test/CodeGen/ARM/crash-O0.ll > llvm/trunk/test/CodeGen/ARM/debug-info-blocks.ll > llvm/trunk/test/CodeGen/ARM/dyn-stackalloc.ll > llvm/trunk/test/CodeGen/ARM/fast-isel-intrinsic.ll > llvm/trunk/test/...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 24

0

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...t/CodeGen/ARM/2011-10-26-memset-with-neon.ll llvm/trunk/test/CodeGen/ARM/2012-04-24-SplitEHCriticalEdge.ll llvm/trunk/test/CodeGen/ARM/Windows/memset.ll llvm/trunk/test/CodeGen/ARM/Windows/no-aeabi.ll llvm/trunk/test/CodeGen/ARM/arm-eabi.ll llvm/trunk/test/CodeGen/ARM/constantpool-promote-ldrh.ll llvm/trunk/test/CodeGen/ARM/constantpool-promote.ll llvm/trunk/test/CodeGen/ARM/crash-O0.ll llvm/trunk/test/CodeGen/ARM/debug-info-blocks.ll llvm/trunk/test/CodeGen/ARM/dyn-stackalloc.ll llvm/trunk/test/CodeGen/ARM/fast-isel-intrinsic.ll llvm/trunk/test/CodeGen/ARM/interval-update-re...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

2

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...>> llvm/trunk/test/CodeGen/ARM/2012-04-24-SplitEHCriticalEdge.ll >> llvm/trunk/test/CodeGen/ARM/Windows/memset.ll >> llvm/trunk/test/CodeGen/ARM/Windows/no-aeabi.ll >> llvm/trunk/test/CodeGen/ARM/arm-eabi.ll >> llvm/trunk/test/CodeGen/ARM/constantpool-promote-ldrh.ll >> llvm/trunk/test/CodeGen/ARM/constantpool-promote.ll >> llvm/trunk/test/CodeGen/ARM/crash-O0.ll >> llvm/trunk/test/CodeGen/ARM/debug-info-blocks.ll >> llvm/trunk/test/CodeGen/ARM/dyn-stackalloc.ll >> llvm/trunk/test/CodeGen/ARM/fast-isel-intrinsic.ll &gt...

search for: ldrh