thr3ads.net - search: "r10"

Tweaking the Register Allocator's spill placement

2017 Jan 09

4

Tweaking the Register Allocator's spill placement

...e'll call them FXLV). In one important kernel (snippet below), register allocation needs to spill values resulting from FXLV. The spiller is unaware of FXLV's latency, and thus naively inserts those spills immediately after the FXLV, incurring huge and unnecessary data stalls. FXLV r10, 0(r3.0) SV r10, 0(r63.1) # spill to stack slot FXLV r10, 16(r3.0) SV r10, 16(r63.1) # spill to stack slot FXLV r10, 32(r3.0) SV r10, 32(r63.1) # spill to stack slot FXLV r10, 48(r3.0) SV r10, 48(r63.1) # spill to stack slot ... Note also how the register allocator u...

[LLVMdev] nested function's static link gets clobbered

2008 Oct 31

3

[LLVMdev] nested function's static link gets clobbered

Fellow developers, I'm parallelizing loops to be called by pthread. The thread body that I pass to pthread_create looks like define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* %arg) parent_frame is pointer to shared variables in original function 0x00007f0de11c41f0: mov (%r10),%rax 0x00007f0de11c41f3: cmpl $0x63,(%rax) 0x00007f0de11c41f6: jg 0x7f0de11c420c 0x00007f0de11c41fc: mov 0x8(%r10),%rax 0x00007f0de11c4200: incl (%rax) 0x00007f0de11c4202: mov (%r10),%rax 0x00007f0de11c4205: incl (%rax) 0x00007f0de11c4207: jmpq 0x7f0de...

Optimised qmf_synth and iir_mem16

2007 Dec 02

2

Optimised qmf_synth and iir_mem16

...mla r6, r0, r14,r7 @ mem[1] = mem[2] - den[1]*y[i] ldrsh r0, [r1, #6] mla r7, r4, r14,r8 @ mem[2] = mem[3] - den[2]*y[i] ldrsh r4, [r1, #8] mla r8, r0, r14,r9 @ mem[3] = mem[4] - den[3]*y[i] ldrsh r0, [r1, #10] mla r9, r4, r14,r10 @ mem[4] = mem[5] - den[4]*y[i] ldrsh r4, [r1, #12] mla r10, r0, r14,r11 @ mem[5] = mem[6] - den[5]*y[i] ldrsh r0, [r1, #14] mla r11, r4, r14,r12 @ mem[6] = mem[7] - den[6]*y[i] subs r3, r3, #1 mul r12, r0, r14 @ mem[7] = -...

[LLVMdev] nested function's static link gets clobbered

2008 Nov 01

0

[LLVMdev] nested function's static link gets clobbered

...;m parallelizing loops to be called by pthread. The thread body that I pass > to pthread_create looks like > > define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* %arg) > parent_frame is pointer to shared variables in original function > > 0x00007f0de11c41f0: mov (%r10),%rax > 0x00007f0de11c41f3: cmpl $0x63,(%rax) > 0x00007f0de11c41f6: jg 0x7f0de11c420c > 0x00007f0de11c41fc: mov 0x8(%r10),%rax > 0x00007f0de11c4200: incl (%rax) > 0x00007f0de11c4202: mov (%r10),%rax > 0x00007f0de11c4205: incl (%rax) > 0x0...

[klibc 25/43] ia64 support for klibc

2006 Jun 26

0

[klibc 25/43] ia64 support for klibc

...ype \ +name (void) \ +{ \ + register long _r8 asm ("r8"); \ + register long _r10 asm ("r10"); \ + register long _r15 asm ("r15") = __NR_##name; \ + long _retval; \ + __asm __volatile (__IA64_BREAK \ +...

[patch 1/2] grub-0.97: btrfs support for a singe device configuration

2009 Sep 24

6

[patch 1/2] grub-0.97: btrfs support for a singe device configuration

[LLVMdev] Possible missed optimization on function calling?

2010 Sep 21

1

[LLVMdev] Possible missed optimization on function calling?

...a); extern int mdiv(int a, int b); int foo(int a, int b) { int a4 = mdiv(mcos(a), msin(b)); return a4; } I noticed this while testing it for the backend i'm currently developing, but it produces exactly the same code for other targets: march = msp430: push.w r11 push.w r10 push.w r9 push.w r8 mov.w r14, r11 mov.w r15, r10 ; store a mov.w r13, r15 mov.w r12, r14 ; pass b call #msin mov.w r15, r9 mov.w r14, r8 ; store msin(b) mov.w r10, r15 mov.w r11, r14 ; pass a call #mcos m...

flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

2004 Oct 06

3

flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

Sadly the latest optimization broke completely everything. The asm code isn't gas compliant. the libFLAC linker script has a typo, disabling the asm optimization and/or altivec won't let a correct build anyway. Instant fixes for the asm stuff: sed -i -e"s:;:\#:" on the lpc_asm.s to load address instead of addis+ori you could use lis and la and PLEASE use the @l(register)

altivec lpc_restore_signal

2004 Sep 10

1

altivec lpc_restore_signal

...0xfffffc00) mtspr 256,r31 ; declare VRs in vrsave cmplw cr0,r8,r4 ; i<data_len bc 4,0,L1400 ; load coefficients into v0-v7 and initial history into v8-v15 li r31,0xf and r31,r8,r31 ; r31: data%4 li r11,16 subf r31,r31,r11 ; r31: 4-(data%4) slwi r31,r31,3 ; convert to bits for vsro li r10,-4 stw r31,-4(r9) lvewx v0,r10,r9 vspltisb v18,-1 vsro v18,v18,v0 ; v18: mask vector li r31,0x8 lvsl v0,0,r31 vsldoi v0,v0,v0,12 li r31,0xc lvsl v1,0,r31 vspltisb v2,0 vspltisb v3,-1 vmrglw v2,v2,v3 vsel v0,v1,v0,v2 ; v0: reversal permutation vector add r10,r5,r6 lvsl v17,0,r5 ; v1...

[LLVMdev] Question on optimizeThumb2JumpTables

2013 Jul 23

2

[LLVMdev] Question on optimizeThumb2JumpTables

...continue; I am trying to figure out why the restriction of LeaMI->getOperand(0).getReg() != BaseReg is there. It seems this is overly restrictive. For example, here is a case where it succeeds: 8944B BB#53: derived from LLVM BB %172 Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11 Predecessors according to CFG: BB#52 8976B %R1<def> = t2LEApcrelJT <jt#2>, 2, pred:14, pred:%noreg 8992B %R1<def> = t2ADDrs %R1<kill>, %R10, 18, pred:14, pred:%noreg, opt:%noreg 9004B %LR<def> = t2MOVi 1, pred:14,...

[PATCH 05/14] arm: implement exception and hypercall entries.

2012 Feb 13

0

[PATCH 05/14] arm: implement exception and hypercall entries.

...); + DEFINE(OFFSET_VCPU_R6, offsetof(struct vcpu_guest_context, r6)); + DEFINE(OFFSET_VCPU_R7, offsetof(struct vcpu_guest_context, r7)); + DEFINE(OFFSET_VCPU_R8, offsetof(struct vcpu_guest_context, r8)); + DEFINE(OFFSET_VCPU_R9, offsetof(struct vcpu_guest_context, r9)); + DEFINE(OFFSET_VCPU_R10, offsetof(struct vcpu_guest_context, r10)); + DEFINE(OFFSET_VCPU_R11, offsetof(struct vcpu_guest_context, r11)); + DEFINE(OFFSET_VCPU_R12, offsetof(struct vcpu_guest_context, r12)); + DEFINE(OFFSET_VCPU_R13, offsetof(struct vcpu_guest_context, r13)); + DEFINE(OFFSET_VCPU_R14, offsetof(str...

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

2014 Feb 08

3

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

On Fri, 7 Feb 2014, Timothy B. Terriberry wrote: > Martin Storsjo wrote: >> This is required in order to build using the built-in assembler >> in clang. > > These patches break the gcc build (with "Error: bad instruction"). Ah, right, sorry about that. > Documentation I've seen is contradictory on which order ({cond}{size} or > {size}{cond}) is correct.

[LLVMdev] System call miscompilation using the fast register allocator

2013 Oct 22

1

[LLVMdev] System call miscompilation using the fast register allocator

...ain(i32 %argc, i8** nocapture %argv) unnamed_addr nounwind uwtable { entry: %val = alloca i32, align 4 store i32 1, i32* %val, align 4 %0 = ptrtoint i32* %val to i64 call void asm sideeffect "", "{r8}"(i64 4) nounwind call void asm sideeffect "", "{r10}"(i64 %0) nounwind call void asm sideeffect "", "{rdx}"(i64 3) nounwind call void asm sideeffect "", "{rsi}"(i64 1) nounwind call void asm sideeffect "", "{rdi}"(i64 -1) nounwind %1 = call i64 asm sideeffect "",...

[LLVMdev] Undef registers in dependency graph

2012 Nov 01

2

[LLVMdev] Undef registers in dependency graph

Hi, I see that currently physical register uses marked as "undef" can still cause dependencies. Is this intentional? SU(9): %D5<def,undef> = LDrid %R0, 0, %R10<imp-def>, %R11<imp-def> # preds left : 0 # succs left : 11 # rdefs left : 0 Latency : 1 Depth : 0 Height : 0 Successors: ... val SU(14): Latency=1 val SU(14): Latency=1 val SU(14): Latency=1 ....

[LLVMdev] Stackmaps: caller-save-registers passed as deopt args

2014 Nov 05

2

[LLVMdev] Stackmaps: caller-save-registers passed as deopt args

> On Oct 31, 2014, at 5:28 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > Hi Kevin, > > Thank you for starting this discussion! Yes, sorry for being unresponsive for a few days. Sanjoy summarized the issues perfectly. > I think the distinction is really between whether the live values are > "live on call" or "live on return".

[LLVMdev] Question on optimizeThumb2JumpTables

2013 Jul 29

0

[LLVMdev] Question on optimizeThumb2JumpTables

...restriction of > LeaMI->getOperand(0).getReg() != BaseReg is there. It seems this is overly > restrictive. For example, here is a case where it succeeds:**** > > ** ** > > 8944B BB#53: derived from LLVM BB %172**** > > Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11**** > > Predecessors according to CFG: BB#52**** > > 8976B %R1<def> = t2LEApcrelJT <jt#2>, 2, pred:14, pred:%noreg*** > * > > 8992B %R1<def> = t2ADDrs %R1<kill>, %R10, 18, pred:14, > pred:%noreg, opt:%noreg****...

Machine Scheduler on Power PC: Latency Limit and Register Pressure

2017 Oct 13

2

Machine Scheduler on Power PC: Latency Limit and Register Pressure

...by those loads we are forced to spill. Here is the final assembly. -- # BB#0: # %entry std r30, -16(r1) # 8-byte Folded Spill ld r5, 0(r3) ld r6, 0(r4) ld r7, 8(r3) ld r8, 8(r4) ld r9, 16(r3) ld r10, 16(r4) ld r11, 24(r3) ld r0, 32(r3) ld r12, 24(r4) ld r30, 32(r4) ld r3, 40(r3) ld r4, 40(r4) divd r5, r5, r6 divd r6, r7, r8 divd r7, r9, r10 divd r9, r0, r30 divd r4, r3, r4 divd r8, r11, r12...

Token.c appears to have a bug.

2003 Oct 14

1

Token.c appears to have a bug.

...he sign bit for the character that resulted from the cast. Is this really what is intended? Or should there be parenthesis around (n >> 8) to make sure that it happens before the most significant part of "n" is discarded? The actual assembly code generated is: LDL R10, n ; R10, 16(FP) SLL R10, 56, R10 SRA R10, 63, R10 STQ R10, temp_byte ; R10, 8(FP) -John wb8tyw@qsl.network Personal Opinion Only

[LLVMdev] Question on optimizeThumb2JumpTables

2013 Jul 29

1

[LLVMdev] Question on optimizeThumb2JumpTables

...figure out why the restriction of LeaMI->getOperand(0).getReg() != BaseReg is there. It seems this is overly restrictive. For example, here is a case where it succeeds: > > > > 8944B BB#53: derived from LLVM BB %172 > > Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11 > > Predecessors according to CFG: BB#52 > > 8976B %R1<def> = t2LEApcrelJT <jt#2>, 2, pred:14, pred:%noreg > > 8992B %R1<def> = t2ADDrs %R1<kill>, %R10, 18, pred:14, pred:%noreg, opt:%noreg > > 9004B...

[klibc 23/43] cris support for klibc

2006 Jun 26

0

[klibc 23/43] cris support for klibc

.../* + * In 2's complement arithmetric, -x == (~x + 1), so + * -{h,l} = (~{h,l} + {0,1) + * -{h,l} = {~h,~l} + {0,1} + * -{h,l} = {~h + cy, ~l + 1} + * ... where cy = (l == 0) + * -{h,l} = {~h + cy, -l} + */ + + .text + .balign 4 + .type __negdi2, at function + .globl __negdi2 +__negdi2: + neg.d $r10,$r10 + seq $r12 + not $r11 + ret + add.d $r12,$r11 + + .size __negdi2, .-__negdi2 diff --git a/usr/klibc/arch/cris/crt0.S b/usr/klibc/arch/cris/crt0.S new file mode 100644 index 0000000..22cb9b4 --- /dev/null +++ b/usr/klibc/arch/cris/crt0.S @@ -0,0 +1,27 @@ +# +# arch/cris/crt0.S +# +# Does arch...

search for: r10