thr3ads.net - search: "r22"

2009 Dec 28

2

Modified R Code

...3 ranges i.e. j= 3). data_label = expand.grid(c("R11", "R12", "R13"), c("R21", "R23", "R23")) # gives the output like data_label Var1 Var2 1 R11 R21 2 R12 R21 3 R13 R21 4 R11 R22 5 R12 R22 6 R13 R22 7 R11 R23 8 R12 R23 9 R13 R23 If instead of two rates, suppose there are three rates, I will have to modify my code as data_label = expand.grid(c("R11", "R12", "R13"),...

[PATCH] vfork() for parisc

2006 Jul 24

1

[PATCH] vfork() for parisc

...%rp is saved and restored across the syscall, thankfully. + * + */ + + .text + .align 64 ; cache-width aligned + .globl vfork + .type vfork, at function +vfork: + /* pid_t vfork(void) */ + ble 0x100(%sr2, %r0) ; jump to gateway page + nop + + ldi -0x1000,%r19 ; %r19 = -4096 + sub %r0,%ret0,%r22 ; %r22 = -%ret0 + cmpb,>>=,n %r19,%ret0,1f ; if %ret0 >= -4096UL + ldi -1,%ret0 ; nullified on taken forward + + /* store %r22 to errno... */ + ldil L%errno,%r1 + ldo R%errno(%r1),%r1 + stw %r22,0(%r1) +1: + bv %r0(%rp) ; jump back + nop + + .size vfork,.-vfork

[PATCH 0/4] ia64/xen: paravirtualization of hand written assembly code

2008 Feb 25

6

[PATCH 0/4] ia64/xen: paravirtualization of hand written assembly code

Hi. The patch I send before was too large so that it was dropped from the maling list. I'm sending again with smaller size. This patch set is the xen paravirtualization of hand written assenbly code. And I expect that much clean up is necessary before merge. We really need the feed back before starting actual clean up as Eddie already said before. Eddie discussed how to clean up and suggested

[PATCH 0/4] ia64/xen: paravirtualization of hand written assembly code

2008 Feb 25

6

[PATCH 0/4] ia64/xen: paravirtualization of hand written assembly code

Hi. The patch I send before was too large so that it was dropped from the maling list. I'm sending again with smaller size. This patch set is the xen paravirtualization of hand written assenbly code. And I expect that much clean up is necessary before merge. We really need the feed back before starting actual clean up as Eddie already said before. Eddie discussed how to clean up and suggested

[LLVMdev] Register Pairing

2010 Dec 02

0

[LLVMdev] Register Pairing

Hi Borja, > Without doing what i mentioned and letting LLVM expand all operations wider > than 8 bits as you asked, the code produced is excellent supposing that many > of the moves there should be 16 bit moves reducing code size and right > register allocation, also something important for me is that the code is > better than gcc's. When i say right reg allocation it doesnt

[LLVMdev] Register Pairing

2010 Dec 01

2

[LLVMdev] Register Pairing

...then proceed and combine two 8 bit instructions into a 16 bit one as Jeff pointed out in his email? For example i want to combine a 16 bit add like this: // b = b + 1: (b stored in r25:r24) add r24, 1 adc r25, 0 into adw r25:r24, 1 and the one im talking all my mails about move r25, r23 move r24, r22 into movw r25:r24, r23:r22 or move r18, r2 move r19, r3 into movw r19:r18, r3:r2 any combination of moves with reg pairs are valid. I wrote a function pass to test, it scanned for moves and checked if next instruction was a move to see if globally it was a 16 move and replace those 2 insts with a...

[LLVMdev] Register Pairing

2010 Dec 05

1

[LLVMdev] Register Pairing

...s example is exactly why i need the constraints and how to combine instructions. typedef short t; extern t mcos(t a); extern t mdiv(t a, t b); t foo(t a, t b) { short p1 = mcos(b); short p2 = mcos(a); return mdiv(p1&p2, p1^p2); } This C code produces: ; a<- r25:r24 b<--r23:r22 mov r18, r24 mov r19, r25 <-- can be combined into a movw r19:r18, r25:r24 mov r25, r23 mov r24, r22 <-- can be combined into a movw r25:r24, r23:r22 call mcos ; here we have the case i was explaining, pairs dont match because they're the other way round...

[LLVMdev] Register Pairing

2010 Nov 27

3

[LLVMdev] Register Pairing

...nto 8bit operations the 16bit instructions never get selected, also the reg allocator should allocate adjacent regs to form the pairs. The most important 16 bit instruction is data movement, this instruction can move register pairs in a single cycle, doing something like this: mov r25, r23 mov r24, r22 into: movw r25:r24, r23:r22 The key point here is that the movw instruction can only move data of fixed pairs in this way. movw Rdest+1:Rdest, Rorig+1:Rorig, so movw R25:R23, R21:R18 is illegal because registers pairs aren't adjacent. Explaining this as if it was for x86 may make things more c...

[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code

2008 Feb 26

8

[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code

Hi. I rewrote the patch according to the comments. I adopted generating in-place code because it looks the quickest way. The point Eddie wanted to discuss is how to generate code and its ABI. i.e. in-place generating v.s. direct jump v.s. indirect function call Indirect function call doesn't make sense because ivt.S is compiled multi times. And it is up to pv instances to choose in-place

[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code

2008 Feb 26

8

[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code

Hi. I rewrote the patch according to the comments. I adopted generating in-place code because it looks the quickest way. The point Eddie wanted to discuss is how to generate code and its ABI. i.e. in-place generating v.s. direct jump v.s. indirect function call Indirect function call doesn't make sense because ivt.S is compiled multi times. And it is up to pv instances to choose in-place

Fix syscalls with more than four arguments on parisc

2005 Nov 25

0

Fix syscalls with more than four arguments on parisc

...* + * %r20 contains the system call number, %r2 contains whence we came + * + */ + + .text + .align 64 ; cache-width aligned + .globl __syscall_common + .type __syscall_common,@function +__syscall_common: + ldo 0x40(%sp),%sp + stw %rp,-0x54(%sp) ; save return pointer + + ldw -0x74(%sp),%r22 ; %arg4 + ldw -0x78(%sp),%r21 ; %arg5 + + ble 0x100(%sr2, %r0) ; jump to gateway page + nop ; can we move a load here? + + ldi -0x1000,%r19 ; %r19 = -4096 + sub %r0,%ret0,%r22 ; %r22 = -%ret0 + cmpb,>>=,n %r19,%ret0,1f ; if %ret0 >= -4096UL + ldi -1,%ret0 ; nullified on tak...

[PATCH] make all performance counter per-cpu

2007 Mar 27

0

[PATCH] make all performance counter per-cpu

...FLECT_CNT - movl r20=perfcounters+FAST_REFLECT_PERFC_OFS+((0x3000>>8)*4);; + movl r20=PERFC(fast_reflect + (0x3000>>8));; ld4 r21=[r20];; adds r21=1,r21;; st4 [r20]=r21;; @@ -597,7 +596,7 @@ END(fast_break_reflect) // r31 == pr ENTRY(fast_reflect) #ifdef FAST_REFLECT_CNT - movl r22=perfcounters+FAST_REFLECT_PERFC_OFS; + movl r22=PERFC(fast_reflect); shr r23=r20,8-2;; add r22=r22,r23;; ld4 r21=[r22];; @@ -938,7 +937,7 @@ fast_tlb_no_tr_match: (p7) br.cond.spnt.few page_not_present;; #ifdef FAST_REFLECT_CNT - movl r21=perfcounter+FAST_VHPT_TRANSLATE_PERFC_OFS;; + movl...

How to finalize instruction lowering after register allocation.

2018 Apr 10

1

How to finalize instruction lowering after register allocation.

...a pair of RegClassA/RegClassB registers or constants. For this purpose, I only have a MOV instruction operating either on UnitA or UnitB. So from the BUILD_VECTOR, I have to generate two MOV instructions, one for the UnitA and UnitB. Assuming that %vreg3 (of class RegClassAB) will be allocated to R22, %vreg3<def> = BUILD_VECTOR 2.7172, 3.1416 should be lowered into: MOV A 2.7172, R22 ; // RegisterA_22 = 2.7172 MOV B 3.1416, R22; // RegisterB_22 = 3.1416 Generating the two instruction could be easily done by custom inserter at end of Instruction Selection. It...

[LLVMdev] process_root.

2004 Aug 18

2

[LLVMdev] process_root.

...pointer in semispace.c (I've implemented a small collector, hopefully :). The fault occurs when I do: printf("process_root[0x%p] = 0x%p\n", (void*) Root, (void*) *Root); I.e, when I reference Root. My frontend creates llvm assembly with llvm-gcroot in the following manner: ... %r22 = alloca uint ;; typetagged integers/pointers %r23 = cast uint* %r22 to sbyte** call void %llvm.gcroot(sbyte** %r23, sbyte* null) ... So Root should be a valid pointer to the program stack. It traces two roots before the segmentation fault. Any ideas? Btw, can I run llvm-db with llvm assembly...

[LLVMdev] Register Pairing

2010 Nov 29

0

[LLVMdev] Register Pairing

...9;t have a number of 16-bit instructions. [...] > typedef unsigned short t; > t foo(t a, t b, t c) > { > return a+b; > } [...] > This is fine until we get to the register allocation stage, there it does: > BB#0: derived from LLVM BB %entry > Live Ins: %R25R24 %R23R22 > %R18<def> = COPY %R24 > %R19<def> = COPY %R25 > %R24<def> = COPY %R22<kill>, %R25R24<imp-def> > %R24<def> = ADDRdRr %R24, %R18<kill>, %SREG<imp-def> > %R25<def> = COPY %R23<kill> > %R25<def&gt...

Legalising seems to lose critical information needed for lowering return values properly?

2019 Mar 02

3

Legalising seems to lose critical information needed for lowering return values properly?

...ret i32 %1 } declare i32 @myExternalFunction1(i32, i32) Is being lowered to this assembly language... setServoAngle3: ; @setServoAngle3 ; %bb.0: ; %entry ldi r18, 119 ldi r19, 0 ldi r20, 0 ldi r21, 0 call myExternalFunction1 mov r18, r22 mov r19, r23 mov r22, r24 mov r23, r25 mov r24, r18 mov r25, r19 ret Which is clearly wrong. It should just be... setServoAngle3: ; @setServoAngle3 ; %bb.0: ; %entry ldi r18, 119 ldi r19, 0 ldi r20, 0 ldi r21, 0 call myExternalFunction...

[LLVMdev] Register coalescer and reg_sequence (virtual super-regs)

2013 May 31

2

[LLVMdev] Register coalescer and reg_sequence (virtual super-regs)

...he virtual super reg interferes with sub reg instances, even though in reality they shouldn't conflict. That is, they are individual registers and would be better compared as such for register coalescing decisions (CoalescerPair::Partial = 0). For example, I have a super reg that has r20, r21, r22, and r23 physical registers. This super reg is the dest of a reg_sequence which generates 4 COPY MIs. The first COPY coalesces (merging into r20), but the vregs for r21-r23 (SUPER_RC:%vreg50:subreg1..subreg3) are never coalesced after that because doing so generates inteference on %vreg50, the &quo...

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

2010 Nov 03

1

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

...p4 = p4 * d; } . . Compiling with NVCC, Ocelot, and LLVM, I can confirm the interleaved instruction schedule with a four-instruction reuse distance. An excerpt follows: . . %r1500 = fmul float %r1496, %r24 ; compute %1500 %r1501 = fmul float %r1497, %r23 %r1502 = fmul float %r1498, %r22 %r1503 = fmul float %r1499, %r21 %r1504 = fmul float %r1500, %r24 ; first use of %1500 %r1505 = fmul float %r1501, %r23 %r1506 = fmul float %r1502, %r22 %r1507 = fmul float %r1503, %r21 %r1508 = fmul float %r1504, %r24 ; first use of %1504 . . The JIT compiler, however, seems t...

[klibc 30/43] parisc support for klibc

2006 Jun 26

0

[klibc 30/43] parisc support for klibc

...+ * %r20 contains the system call number, %r2 contains whence we came + * + */ + + .text + .align 64 ; cache-width aligned + .globl __syscall_common + .type __syscall_common, at function +__syscall_common: + ldo 0x40(%sp),%sp + stw %rp,-0x54(%sp) ; save return pointer + + ldw -0x74(%sp),%r22 ; %arg4 + ldw -0x78(%sp),%r21 ; %arg5 + + ble 0x100(%sr2, %r0) ; jump to gateway page + nop ; can we move a load here? + + ldi -0x1000,%r19 ; %r19 = -4096 + sub %r0,%ret0,%r22 ; %r22 = -%ret0 + cmpb,>>=,n %r19,%ret0,1f ; if %ret0 >= -4096UL + ldi -1,%ret0 ; nullified on tak...

[PATCH 00/15] ia64/pv_ops take 5

2008 Apr 30

16

[PATCH 00/15] ia64/pv_ops take 5

Hi. This patchset implements ia64/pv_ops support which is the framework for virtualization support. Now all the comments so far have been addressed, but only a few exceptions. On x86 various ways to support virtualization were proposed, and eventually pv_ops won. So on ia64 the pv_ops strategy is appropriate too. Later I'll post the patchset which implements xen domU based on ia64/pv_ops.

search for: r22