Displaying 20 results from an estimated 1410 matches for "r10".
Did you mean:
10
2017 Jan 09
4
Tweaking the Register Allocator's spill placement
...e'll call them FXLV). In one important kernel (snippet below), register allocation needs to spill values resulting from FXLV. The spiller is unaware of FXLV's latency, and thus naively inserts those spills immediately after the FXLV, incurring huge and unnecessary data stalls.
FXLV r10, 0(r3.0)
SV r10, 0(r63.1) # spill to stack slot
FXLV r10, 16(r3.0)
SV r10, 16(r63.1) # spill to stack slot
FXLV r10, 32(r3.0)
SV r10, 32(r63.1) # spill to stack slot
FXLV r10, 48(r3.0)
SV r10, 48(r63.1) # spill to stack slot
...
Note also how the register allocator u...
2008 Oct 31
3
[LLVMdev] nested function's static link gets clobbered
Fellow developers,
I'm parallelizing loops to be called by pthread. The thread body that I pass
to pthread_create looks like
define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* %arg)
parent_frame is pointer to shared variables in original function
0x00007f0de11c41f0: mov (%r10),%rax
0x00007f0de11c41f3: cmpl $0x63,(%rax)
0x00007f0de11c41f6: jg 0x7f0de11c420c
0x00007f0de11c41fc: mov 0x8(%r10),%rax
0x00007f0de11c4200: incl (%rax)
0x00007f0de11c4202: mov (%r10),%rax
0x00007f0de11c4205: incl (%rax)
0x00007f0de11c4207: jmpq 0x7f0de...
2007 Dec 02
2
Optimised qmf_synth and iir_mem16
...mla r6, r0, r14,r7 @ mem[1] = mem[2] - den[1]*y[i]
ldrsh r0, [r1, #6]
mla r7, r4, r14,r8 @ mem[2] = mem[3] - den[2]*y[i]
ldrsh r4, [r1, #8]
mla r8, r0, r14,r9 @ mem[3] = mem[4] - den[3]*y[i]
ldrsh r0, [r1, #10]
mla r9, r4, r14,r10 @ mem[4] = mem[5] - den[4]*y[i]
ldrsh r4, [r1, #12]
mla r10, r0, r14,r11 @ mem[5] = mem[6] - den[5]*y[i]
ldrsh r0, [r1, #14]
mla r11, r4, r14,r12 @ mem[6] = mem[7] - den[6]*y[i]
subs r3, r3, #1
mul r12, r0, r14 @ mem[7] = -...
2008 Nov 01
0
[LLVMdev] nested function's static link gets clobbered
...;m parallelizing loops to be called by pthread. The thread body that I pass
> to pthread_create looks like
>
> define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* %arg)
> parent_frame is pointer to shared variables in original function
>
> 0x00007f0de11c41f0: mov (%r10),%rax
> 0x00007f0de11c41f3: cmpl $0x63,(%rax)
> 0x00007f0de11c41f6: jg 0x7f0de11c420c
> 0x00007f0de11c41fc: mov 0x8(%r10),%rax
> 0x00007f0de11c4200: incl (%rax)
> 0x00007f0de11c4202: mov (%r10),%rax
> 0x00007f0de11c4205: incl (%rax)
> 0x0...
2006 Jun 26
0
[klibc 25/43] ia64 support for klibc
...ype \
+name (void) \
+{ \
+ register long _r8 asm ("r8"); \
+ register long _r10 asm ("r10"); \
+ register long _r15 asm ("r15") = __NR_##name; \
+ long _retval; \
+ __asm __volatile (__IA64_BREAK \
+...
2009 Sep 24
6
[patch 1/2] grub-0.97: btrfs support for a singe device configuration
2010 Sep 21
1
[LLVMdev] Possible missed optimization on function calling?
...a);
extern int mdiv(int a, int b);
int foo(int a, int b)
{
int a4 = mdiv(mcos(a), msin(b));
return a4;
}
I noticed this while testing it for the backend i'm currently developing,
but it produces exactly the same code for other targets:
march = msp430:
push.w r11
push.w r10
push.w r9
push.w r8
mov.w r14, r11
mov.w r15, r10 ; store a
mov.w r13, r15
mov.w r12, r14 ; pass b
call #msin
mov.w r15, r9
mov.w r14, r8 ; store msin(b)
mov.w r10, r15
mov.w r11, r14 ; pass a
call #mcos
m...
2004 Oct 06
3
flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)
Sadly the latest optimization broke completely everything.
The asm code isn't gas compliant. the libFLAC linker script has a typo,
disabling the asm optimization and/or altivec won't let a correct build
anyway.
Instant fixes for the asm stuff:
sed -i -e"s:;:\#:" on the lpc_asm.s
to load address instead of addis+ori you could use
lis and la and PLEASE use the @l(register)
2004 Sep 10
1
altivec lpc_restore_signal
...0xfffffc00)
mtspr 256,r31 ; declare VRs in vrsave
cmplw cr0,r8,r4 ; i<data_len
bc 4,0,L1400
; load coefficients into v0-v7 and initial history into v8-v15
li r31,0xf
and r31,r8,r31 ; r31: data%4
li r11,16
subf r31,r31,r11 ; r31: 4-(data%4)
slwi r31,r31,3 ; convert to bits for vsro
li r10,-4
stw r31,-4(r9)
lvewx v0,r10,r9
vspltisb v18,-1
vsro v18,v18,v0 ; v18: mask vector
li r31,0x8
lvsl v0,0,r31
vsldoi v0,v0,v0,12
li r31,0xc
lvsl v1,0,r31
vspltisb v2,0
vspltisb v3,-1
vmrglw v2,v2,v3
vsel v0,v1,v0,v2 ; v0: reversal permutation vector
add r10,r5,r6
lvsl v17,0,r5 ; v1...
2013 Jul 23
2
[LLVMdev] Question on optimizeThumb2JumpTables
...continue;
I am trying to figure out why the restriction of
LeaMI->getOperand(0).getReg() != BaseReg is there. It seems this is overly
restrictive. For example, here is a case where it succeeds:
8944B BB#53: derived from LLVM BB %172
Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11
Predecessors according to CFG: BB#52
8976B %R1<def> = t2LEApcrelJT <jt#2>, 2, pred:14, pred:%noreg
8992B %R1<def> = t2ADDrs %R1<kill>, %R10, 18, pred:14,
pred:%noreg, opt:%noreg
9004B %LR<def> = t2MOVi 1, pred:14,...
2012 Feb 13
0
[PATCH 05/14] arm: implement exception and hypercall entries.
...);
+ DEFINE(OFFSET_VCPU_R6, offsetof(struct vcpu_guest_context, r6));
+ DEFINE(OFFSET_VCPU_R7, offsetof(struct vcpu_guest_context, r7));
+ DEFINE(OFFSET_VCPU_R8, offsetof(struct vcpu_guest_context, r8));
+ DEFINE(OFFSET_VCPU_R9, offsetof(struct vcpu_guest_context, r9));
+ DEFINE(OFFSET_VCPU_R10, offsetof(struct vcpu_guest_context, r10));
+ DEFINE(OFFSET_VCPU_R11, offsetof(struct vcpu_guest_context, r11));
+ DEFINE(OFFSET_VCPU_R12, offsetof(struct vcpu_guest_context, r12));
+ DEFINE(OFFSET_VCPU_R13, offsetof(struct vcpu_guest_context, r13));
+ DEFINE(OFFSET_VCPU_R14, offsetof(str...
2014 Feb 08
3
[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions
On Fri, 7 Feb 2014, Timothy B. Terriberry wrote:
> Martin Storsjo wrote:
>> This is required in order to build using the built-in assembler
>> in clang.
>
> These patches break the gcc build (with "Error: bad instruction").
Ah, right, sorry about that.
> Documentation I've seen is contradictory on which order ({cond}{size} or
> {size}{cond}) is correct.
2013 Oct 22
1
[LLVMdev] System call miscompilation using the fast register allocator
...ain(i32 %argc, i8** nocapture %argv) unnamed_addr nounwind
uwtable {
entry:
%val = alloca i32, align 4
store i32 1, i32* %val, align 4
%0 = ptrtoint i32* %val to i64
call void asm sideeffect "", "{r8}"(i64 4) nounwind
call void asm sideeffect "", "{r10}"(i64 %0) nounwind
call void asm sideeffect "", "{rdx}"(i64 3) nounwind
call void asm sideeffect "", "{rsi}"(i64 1) nounwind
call void asm sideeffect "", "{rdi}"(i64 -1) nounwind
%1 = call i64 asm sideeffect "",...
2012 Nov 01
2
[LLVMdev] Undef registers in dependency graph
Hi,
I see that currently physical register uses marked as "undef" can still
cause dependencies. Is this intentional?
SU(9): %D5<def,undef> = LDrid %R0, 0, %R10<imp-def>, %R11<imp-def>
# preds left : 0
# succs left : 11
# rdefs left : 0
Latency : 1
Depth : 0
Height : 0
Successors:
...
val SU(14): Latency=1
val SU(14): Latency=1
val SU(14): Latency=1
....
2014 Nov 05
2
[LLVMdev] Stackmaps: caller-save-registers passed as deopt args
> On Oct 31, 2014, at 5:28 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>
> Hi Kevin,
>
> Thank you for starting this discussion!
Yes, sorry for being unresponsive for a few days. Sanjoy summarized the issues perfectly.
> I think the distinction is really between whether the live values are
> "live on call" or "live on return".
2013 Jul 29
0
[LLVMdev] Question on optimizeThumb2JumpTables
...restriction of
> LeaMI->getOperand(0).getReg() != BaseReg is there. It seems this is overly
> restrictive. For example, here is a case where it succeeds:****
>
> ** **
>
> 8944B BB#53: derived from LLVM BB %172****
>
> Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11****
>
> Predecessors according to CFG: BB#52****
>
> 8976B %R1<def> = t2LEApcrelJT <jt#2>, 2, pred:14, pred:%noreg***
> *
>
> 8992B %R1<def> = t2ADDrs %R1<kill>, %R10, 18, pred:14,
> pred:%noreg, opt:%noreg****...
2017 Oct 13
2
Machine Scheduler on Power PC: Latency Limit and Register Pressure
...by those loads we are forced
to spill. Here is the final assembly.
--
# BB#0: # %entry
std r30, -16(r1) # 8-byte Folded Spill
ld r5, 0(r3)
ld r6, 0(r4)
ld r7, 8(r3)
ld r8, 8(r4)
ld r9, 16(r3)
ld r10, 16(r4)
ld r11, 24(r3)
ld r0, 32(r3)
ld r12, 24(r4)
ld r30, 32(r4)
ld r3, 40(r3)
ld r4, 40(r4)
divd r5, r5, r6
divd r6, r7, r8
divd r7, r9, r10
divd r9, r0, r30
divd r4, r3, r4
divd r8, r11, r12...
2003 Oct 14
1
Token.c appears to have a bug.
...he sign bit for the character that resulted from the cast.
Is this really what is intended? Or should there be parenthesis around
(n >> 8) to make sure that it happens before the most significant part
of "n" is discarded?
The actual assembly code generated is:
LDL R10, n ; R10, 16(FP)
SLL R10, 56, R10
SRA R10, 63, R10
STQ R10, temp_byte ; R10, 8(FP)
-John
wb8tyw@qsl.network
Personal Opinion Only
2013 Jul 29
1
[LLVMdev] Question on optimizeThumb2JumpTables
...figure out why the restriction of LeaMI->getOperand(0).getReg() != BaseReg is there. It seems this is overly restrictive. For example, here is a case where it succeeds:
>
>
>
> 8944B BB#53: derived from LLVM BB %172
>
> Live Ins: %R4 %R6 %D8 %Q5 %R9 %R7 %R8 %R10 %R5 %R11
>
> Predecessors according to CFG: BB#52
>
> 8976B %R1<def> = t2LEApcrelJT <jt#2>, 2, pred:14, pred:%noreg
>
> 8992B %R1<def> = t2ADDrs %R1<kill>, %R10, 18, pred:14, pred:%noreg, opt:%noreg
>
> 9004B...
2006 Jun 26
0
[klibc 23/43] cris support for klibc
.../*
+ * In 2's complement arithmetric, -x == (~x + 1), so
+ * -{h,l} = (~{h,l} + {0,1)
+ * -{h,l} = {~h,~l} + {0,1}
+ * -{h,l} = {~h + cy, ~l + 1}
+ * ... where cy = (l == 0)
+ * -{h,l} = {~h + cy, -l}
+ */
+
+ .text
+ .balign 4
+ .type __negdi2, at function
+ .globl __negdi2
+__negdi2:
+ neg.d $r10,$r10
+ seq $r12
+ not $r11
+ ret
+ add.d $r12,$r11
+
+ .size __negdi2, .-__negdi2
diff --git a/usr/klibc/arch/cris/crt0.S b/usr/klibc/arch/cris/crt0.S
new file mode 100644
index 0000000..22cb9b4
--- /dev/null
+++ b/usr/klibc/arch/cris/crt0.S
@@ -0,0 +1,27 @@
+#
+# arch/cris/crt0.S
+#
+# Does arch...