thr3ads.net - search: "r6"

{ARM} IfConversion does not detect BX instruction as a branch

2017 Oct 09

4

{ARM} IfConversion does not detect BX instruction as a branch

Hi all, I got a silly bug when compiling our project with the latest Clang. Here's the outputted assembly: > tst r3, #255 > strbeq r6, [r7] > ldreq r6, [r4, r6, lsl #2] > strne r6, [r7, #4] > ldr r6, [r4, r6, lsl #2] > bx r6 For the code to execute correctly, either the _ldr_ should be a _ldrne_ instruction or the _ldreq_ instruction should be removed. The error seems to come from the IfConvertion MachinePass. Here...

{ARM} IfConversion does not detect BX instruction as a branch

2017 Oct 11

2

{ARM} IfConversion does not detect BX instruction as a branch

...Friedman, Eli via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On 10/9/2017 3:10 AM, Gaël Jobin via llvm-dev wrote: > > Hi all, > > I got a silly bug when compiling our project with the latest Clang. Here's > the outputted assembly: > > tst r3, #255 > strbeq r6, [r7] > ldreq r6, [r4, r6, lsl #2] > strne r6, [r7, #4] > ldr r6, [r4, r6, lsl #2] > bx r6 > > For the code to execute correctly, either the *ldr* should be a *ldrne* > instruction or the *ldreq* instruction should be removed. The error seems > to come from the IfConvertion...

[PATCH] pmu/fuc: don't use movw directly anymore

2017 Nov 01

0

[PATCH] pmu/fuc: don't use movw directly anymore

...pmu/fuc/memx.fuc > +++ b/drm/nouveau/nvkm/subdev/pmu/fuc/memx.fuc > @@ -82,15 +82,15 @@ memx_train_tail: > // $r0 - zero > memx_func_enter: > #if NVKM_PPWR_CHIPSET == GT215 > - movw $r8 0x1610 > + mov $r8 0x1610 > nv_rd32($r7, $r8) > imm32($r6, 0xfffffffc) > and $r7 $r6 > - movw $r6 0x2 > + mov $r6 0x2 > or $r7 $r6 > nv_wr32($r8, $r7) > #else > - movw $r6 0x001620 > + mov $r6 0x001620 > imm32($r7, ~0x00000aa2); > nv_rd32($r8, $r6) >...

[PATCH] pmu/fuc: don't use movw directly anymore

2017 Nov 01

2

[PATCH] pmu/fuc: don't use movw directly anymore

...mx.fuc index ec03f9a4..1663bf94 100644 --- a/drm/nouveau/nvkm/subdev/pmu/fuc/memx.fuc +++ b/drm/nouveau/nvkm/subdev/pmu/fuc/memx.fuc @@ -82,15 +82,15 @@ memx_train_tail: // $r0 - zero memx_func_enter: #if NVKM_PPWR_CHIPSET == GT215 - movw $r8 0x1610 + mov $r8 0x1610 nv_rd32($r7, $r8) imm32($r6, 0xfffffffc) and $r7 $r6 - movw $r6 0x2 + mov $r6 0x2 or $r7 $r6 nv_wr32($r8, $r7) #else - movw $r6 0x001620 + mov $r6 0x001620 imm32($r7, ~0x00000aa2); nv_rd32($r8, $r6) and $r8 $r7 @@ -101,7 +101,7 @@ memx_func_enter: and $r8 $r7 nv_wr32($r6, $r8) - movw $r6 0x0026f0 + mov $r6...

[LLVMdev] Metadata

2010 Feb 11

0

[LLVMdev] Metadata

...seOptionalCommaAlign needs to know about general metadata. > > My inkling is to fix ParseOptionalCommaAlign. Sound reasonable? Well, that's a rat's nest. I backed up and thought maybe I have the metadata syntax wrong. So I tried a bunch of things: %r8 = load <2 x double>* %r6, align 16, metadata !"nontemporal" %r8 = load <2 x double>* %r6, align 16, metadata !nontemporal %r8 = load <2 x double>* %r6, align 16, !{ metadata !"nontemporal" } %r8 = load <2 x double>* %r6, align 16, !{ metadata !nontemporal } %r8 = load <2 x double&gt...

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

2014 Feb 08

3

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

On Fri, 7 Feb 2014, Timothy B. Terriberry wrote: > Martin Storsjo wrote: >> This is required in order to build using the built-in assembler >> in clang. > > These patches break the gcc build (with "Error: bad instruction"). Ah, right, sorry about that. > Documentation I've seen is contradictory on which order ({cond}{size} or > {size}{cond}) is correct.

[LLVMdev] Metadata

2010 Feb 11

3

[LLVMdev] Metadata

...uary 2010 13:31:58 David Greene wrote: > > Putting a bit (or multiple bits) in MachineMemOperand for this > > would also make sense. > > Is there any chance a MachineMemOperand will be shared by multiple > instructions? So I tried to do this: %r8 = load <2 x double>* %r6, align 16, !"nontemporal" and the assembler doesn't like it. Do I need to use named metadata? That would be rather inconvenient. The problem is this code in llvm-as: int LLParser::ParseLoad(Instruction *&Inst, PerFunctionState &PFS, bool isVolatile...

Plans to improve reference classes?

2015 Jun 23

3

Plans to improve reference classes?

...mings are in microseconds, so one would need a thousand objects before it started to be noticeable. Some motivating use cases would help. Thanks, Michael On Mon, Jun 22, 2015 at 7:06 AM, Hadley Wickham <h.wickham at gmail.com> wrote: > Apart from speed, the most important advantage of R6 over ref classes > is that's it easy to subclass a class defined in package A in package > B. This is currently difficult with ref classes because of the way it > does scoping. (And I think it's difficult to fix without fundamentally > changing how ref classes work) > > Ha...

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 20

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

...4-asm_64.S >> +++ b/arch/x86/crypto/aes-x86_64-asm_64.S >> @@ -48,8 +48,12 @@ >> #define R10 %r10 >> #define R11 %r11 >> >> +/* Hold global for PIE suport */ >> +#define RBASE %r12 >> + >> #define prologue(FUNC,KEY,B128,B192,r1,r2,r5,r6,r7,r8,r9,r10,r11) \ >> ENTRY(FUNC); \ >> + pushq RBASE; \ >> movq r1,r2; \ >> leaq KEY+48(r8),r9; \ >> movq r10,r11; \ >> @@ -74,54 +78,63 @@ >&...

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 20

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

...4-asm_64.S >> +++ b/arch/x86/crypto/aes-x86_64-asm_64.S >> @@ -48,8 +48,12 @@ >> #define R10 %r10 >> #define R11 %r11 >> >> +/* Hold global for PIE suport */ >> +#define RBASE %r12 >> + >> #define prologue(FUNC,KEY,B128,B192,r1,r2,r5,r6,r7,r8,r9,r10,r11) \ >> ENTRY(FUNC); \ >> + pushq RBASE; \ >> movq r1,r2; \ >> leaq KEY+48(r8),r9; \ >> movq r10,r11; \ >> @@ -74,54 +78,63 @@ >&...

[LLVMdev] How can I remove these redundant copy between registers?

2015 May 21

2

[LLVMdev] How can I remove these redundant copy between registers?

Hi, I've been working on a Blackfin backend (llvm-3.6.0) based on the previous one that was removed in llvm-3.1. llc generates codes like this: 29 p1 = r2; 30 r5 = [p1]; 31 p1 = r2; 32 r6 = [p1 + 4]; 33 r5 = r6 + r5; 34 r6 = [p0 + -4]; 35 r5 *= r6; 36 p1 = r2; 37 r6 = [p1 + 8]; 38 p1 = r2; p1 and r2 are in different register classes. A p* register can be used for load/stroe values from memory while a r* register can not. As we can see, line 31, 36, 38...

MMX IDCT for theora-exp

2005 Jul 20

1

MMX IDCT for theora-exp

...tants[45] = idctconstants[46] = idctconstants[47] = IdctAdjustBeforeShift; +} + + +#define MtoSTR(s) #s + +#define Dump "call MMX_dump\n" + +#define BeginIDCT "#BeginIDCT\n"\ + \ + " movq " I(3)","r2"\n" \ + \ + " movq " C(3)","r6"\n" \ + " movq " r2","r4"\n" \ + " movq " J(5)","r7"\n" \ + " pmulhw " r6","r4"\n" \ + " movq " C(5)","r1"\n" \ + " pmulhw " r7","r6"\n"...

Plans to improve reference classes?

2015 Jun 22

3

Plans to improve reference classes?

...and we therefore want to use a construction like reference classes in this project. However, we observed that the speed performance of our implementation (using reference classes) for a simple test case is rather poor compared to a non-OOP implementation. Further, turning the reference classes into R6classes (using the R6 package) gave the best performance. As speed is an issue in our project, this would for us be an important reason to use R6 classes instead of reference classes. The drawback, of course, is that the R6 package is developed by a single developer and that further development is...

[PATCH v2] arm: Use the UAL syntax for instructions

2014 Feb 08

0

[PATCH v2] arm: Use the UAL syntax for instructions

...--git a/celt/arm/celt_pitch_xcorr_arm.s b/celt/arm/celt_pitch_xcorr_arm.s index 09917b1..598e45b 100644 --- a/celt/arm/celt_pitch_xcorr_arm.s +++ b/celt/arm/celt_pitch_xcorr_arm.s @@ -309,7 +309,7 @@ xcorr_kernel_edsp_process4_done SUBS r2, r2, #1 ; j-- ; Stall SMLABB r6, r12, r10, r6 ; sum[0] = MAC16_16(sum[0],x,y_0) - LDRGTH r14, [r4], #2 ; r14 = *x++ + LDRHGT r14, [r4], #2 ; r14 = *x++ SMLABT r7, r12, r10, r7 ; sum[1] = MAC16_16(sum[1],x,y_1) SMLABB r8, r12, r11, r8 ; sum[2] = MAC16_16(sum[2],x,y_2) SMLABT...

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

2014 Feb 07

3

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

...--git a/celt/arm/celt_pitch_xcorr_arm.s b/celt/arm/celt_pitch_xcorr_arm.s index 09917b1..3c4b950 100644 --- a/celt/arm/celt_pitch_xcorr_arm.s +++ b/celt/arm/celt_pitch_xcorr_arm.s @@ -309,7 +309,7 @@ xcorr_kernel_edsp_process4_done SUBS r2, r2, #1 ; j-- ; Stall SMLABB r6, r12, r10, r6 ; sum[0] = MAC16_16(sum[0],x,y_0) - LDRGTH r14, [r4], #2 ; r14 = *x++ + LDRHGT r14, [r4], #2 ; r14 = *x++ SMLABT r7, r12, r10, r7 ; sum[1] = MAC16_16(sum[1],x,y_1) SMLABB r8, r12, r11, r8 ; sum[2] = MAC16_16(sum[2],x,y_2) SMLABT...

[ARM] Register pressure with -mthumb forces register reload before each call

2020 Apr 07

2

[ARM] Register pressure with -mthumb forces register reload before each call

...#39;m understanding what's going on in this test correctly, what's happening is: * ARMTargetLowering::LowerCall prefers indirect calls when a function is called at least 3 times in minsize * In thumb 1 (without -fno-omit-frame-pointer) we have effectively only 3 callee-saved registers (r4-r6) * The function has three arguments, so those three plus the register we need to hold the function address is more than our callee-saved registers * Therefore something needs to be spilt * The function address can be rematerialized, so we spill that and insert and LDR before each call If we did...

[ARM] Register pressure with -mthumb forces register reload before each call

2020 Apr 15

4

[ARM] Register pressure with -mthumb forces register reload before each call

...oldMemoryOperand twice, and thus converts two calls from blx to bl. callMI->dump() shows the function name "bar" correctly, however in generated assembly call to bar is garbled: (compiled with -Oz --target=arm-linux-gnueabi -marcha=armv6-m): add r7, sp, #16 mov r6, r2 mov r5, r1 mov r4, r0 bl "<90>w\n " mov r1, r2 mov r2, r5 bl "<90>w\n " mov r0, r5 mov r1, r4 mov r2, r6 ldr r6, .LCPI0_0...

Optimised qmf_synth and iir_mem16

2007 Dec 02

2

Optimised qmf_synth and iir_mem16

...movgt r14, r5 @ Clip positive cmn r14, r5 rsblt r14, r5, #0 @ Clip negative strh r14, [r2], #2 @ Write result to y[i] ldrsh r4, [r1] ldrsh r0, [r1, #2] rsb r14,r14,#0 @ r14 = -y[i] mla r5, r4, r14,r6 @ mem[0] = mem[1] - den[0]*y[i] ldrsh r4, [r1, #4] mla r6, r0, r14,r7 @ mem[1] = mem[2] - den[1]*y[i] ldrsh r0, [r1, #6] mla r7, r4, r14,r8 @ mem[2] = mem[3] - den[2]*y[i] ldrsh r4, [r1, #8] mla r8, r0, r14,r9 @ mem[3] =...

[LLVMdev] [compiler-rt] CMake bug in building ARM builtins library

2014 Jul 17

2

[LLVMdev] [compiler-rt] CMake bug in building ARM builtins library

On 7/16/14, 6:09 PM, sgundapa wrote: > I see a couple of issues here. > > If I include .S files for ARM, the –no-integrated-as path complains about > Assembler errors. > > The integrated-as path works fine though. > These are very likely just differences between the old ARM assembler syntax and the new 'Unified' syntax. Can you use an assembler that accepts UAL

[ARM] Register pressure with -mthumb forces register reload before each call

2020 Mar 31

2

[ARM] Register pressure with -mthumb forces register reload before each call

...-case, which is reduced version of of uECC_shared_secret from tinycrypt library [1], with --target=arm-linux-gnueabi -march=armv6-m -Oz -S results in reloading of register holding function's address before every call to blx: ldr r3, .LCPI0_0 blx r3 mov r0, r6 mov r1, r5 mov r2, r4 ldr r3, .LCPI0_0 blx r3 ldr r3, .LCPI0_0 mov r0, r6 mov r1, r5 mov r2, r4 blx r3 .LCPI0_0: .long foo >From dump of regalloc (attached), AFAIU, what...

search for: r6