thr3ads.net - search: "r15"

Help required regarding IPRA and Local Function optimization

2016 Jun 30

4

Help required regarding IPRA and Local Function optimization

...rge function with recursion and object oriented code so I am not able to find a pattern which is causing failure. So I tried following simple case to understand expected behavior from this optimization. Consider following code : define void @bar() #0 { call void asm sideeffect "movl %ecx, %r15d", "~{r15}"() #0 call void @foo() call void asm sideeffect "movl %r15d, %ebx", "~{rbx}"() #0 ret void } define internal void @foo() #0 { call void asm sideeffect "movl %r14d, %r15d", "~{r15}"() #0 ret void } and its generated assemb...

[LLVMdev] Possible missed optimization? 2.0

2010 Sep 09

2

[LLVMdev] Possible missed optimization? 2.0

...new possible missed optimization while testing more trivial code. This time it's not a with a xor but with a multiplication instruction and the example is little bit more involved. C code: typedef short t; t foo(t a, t b) { t a4 = a*b; return a4; } argument "a" is passed in R15:R14, argument "b" in R13:R12, the return value is stored in R15:R14. The mul instruction takes in two 8bit regs and returns a 16bit result in R1:R0, this is handled in the selectionDAG same way as x86 (btw mul is marked as commutable). Asm code: mul r12, r15 mov r8, r0...

[LLVMdev] how to declare that two registers must be different

2006 Sep 18

4

[LLVMdev] how to declare that two registers must be different

...the > ARM ARM. MUL (and MLA, MuLtiply and Accumulate) are the two well known ones. The very unofficial, but historically good, http://www.pinknoise.demon.co.uk/ARMinstrs/ARMinstrs.html#Multiplication says "The destination register shall not be the same as the operand register Rm. R15 shall not be used as an operand or as the destination register." Then there's the long multiplication instruction on some ARM architectures, (U|S)(MUL|MLA)L Rl, Rh, Rm, Rs which calculate Rm * Rs and overwrite or accumulate the 64-bit result into Rl and Rh. pinknoise says...

[LLVMdev] Possible missed optimization on function calling?

2010 Sep 21

1

[LLVMdev] Possible missed optimization on function calling?

...mdiv(mcos(a), msin(b)); return a4; } I noticed this while testing it for the backend i'm currently developing, but it produces exactly the same code for other targets: march = msp430: push.w r11 push.w r10 push.w r9 push.w r8 mov.w r14, r11 mov.w r15, r10 ; store a mov.w r13, r15 mov.w r12, r14 ; pass b call #msin mov.w r15, r9 mov.w r14, r8 ; store msin(b) mov.w r10, r15 mov.w r11, r14 ; pass a call #mcos mov.w r9, r13 ; pass msin(b) mov.w r8, r12 call #mdiv...

Help required regarding IPRA and Local Function optimization

2016 Jun 30

0

Help required regarding IPRA and Local Function optimization

...ervation I have identified one bug: ... 0x10002d8ff <+1855>: movl -0x74(%rbp), %r13d 0x10002d903 <+1859>: movq -0x30(%rbp), %r12 ; this contains address of a structure 0x10002d907 <+1863>: movq -0x38(%rbp), %r14 0x10002d90b <+1867>: movq -0x58(%rbp), %r15 0x10002d90f <+1871>: leaq -0x150(%rbp), %rdi 0x10002d916 <+1878>: movq -0x50(%rbp), %rsi 0x10002d91a <+1882>: callq 0x10001a940 ; sqlite3ExprResolveNames at sqlite3.c:47419 this function preserves callee saved regs 0x10002d91f <+1887>: test...

[GE users] Apple Leopard has dtrace -- anyone used the SGE probes/scripts yet?

2007 Nov 14

10

[GE users] Apple Leopard has dtrace -- anyone used the SGE probes/scripts yet?

Hi, Chris (cc) and I try to get the SGE master monitor work with Apple Leopard dtrace. Unfortunately we are stuck with the error msg below. Anyone having an idea what could be the cause? What I can rule out as cause is function inlining for the reasons explained below. Background information on SGE master monitor implementation is under http://wiki.gridengine.info/wiki/index.php/Dtrace

Debug symbols are missing in elf

2020 Apr 18

2

Debug symbols are missing in elf

...00000 imm 0 8: R_MICROBLAZE_64 .rodata.str1.1 c: a0c00000 ori r6, r0, 0 10: f8a10028 swi r5, r1, 40 14: b0000000 imm 0 14: R_MICROBLAZE_64_PCREL printf 18: b9f40000 brlid r15, 0 1c: 10a60000 addk r5, r6, r0 printf("Successfully ran Hello World application"); 20: b0000000 imm 0 20: R_MICROBLAZE_64 .rodata.str1.1+0xe 24: a0a00000 ori r5, r0, 0 28: b0000000 imm 0...

[LLVMdev] how to declare that two registers must be different

2006 Sep 18

0

[LLVMdev] how to declare that two registers must be different

> "The destination register shall not be the same as the operand > register Rm. R15 shall not be used as an operand or as the > destination register." The ARM ARM has this "Operand restriction" on MUL: Specifying the same register for <Rd> and <Rm> has UNPEDICTABLE results. > Then, for the load and store multiple instructions, LDM and STM, th...

PIC and mcmodel=large on x86 doesn't use any relocations

2016 Oct 27

1

PIC and mcmodel=large on x86 doesn't use any relocations

...For example, static int src; // Lsrc: .long static int dst; // Ldst: .long extern int *dptr; // .extern dptr void DataLoadAndStore() { // Large Memory Model code sequences from AMD64 abi // Figure 3.22: Position-Independent Global Data Load and Store // // Assume that %r15 has been loaded with GOT address by // function prologue. // movabs $Lsrc at GOTOFF,%rax ; R_X86_64_GOTOFF64 // movabs $Ldst at GOTOFF,%rdx ; R_X86_64_GOTOFF64 // movl (%rax,%r15),%ecx // movl %ecx,(%rdx,%r15) dst = src; // movabs $dptr at GOT,%rax ; R_X86_64_GOT64 // mo...

[PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020 Jul 17

1

[PATCH] drm/nouveau: Accept 'legacy' format modifiers

...mov %rcx,%rax 6: 65 48 03 05 d4 0e ca add %gs:0x70ca0ed4(%rip),%rax # 0x70ca0ee2 d: 70 e: 48 8b 70 08 mov 0x8(%rax),%rsi 12: 48 39 f2 cmp %rsi,%rdx 15: 75 e7 jne 0xfffffffffffffffe 17: 4c 8b 38 mov (%rax),%r15 1a: 4d 85 ff test %r15,%r15 1d: 0f 84 8f 01 00 00 je 0x1b2 23: 8b 45 20 mov 0x20(%rbp),%eax 26: 48 8b 7d 00 mov 0x0(%rbp),%rdi 2a:* 49 8b 1c 07 mov (%r15,%rax,1),%rbx <-- trapping instruction 2e: 40 f6 c7 0f...

[LLVMdev] Explicit register usage in LLVM assembly

2011 Apr 02

2

[LLVMdev] Explicit register usage in LLVM assembly

Hello! Is there a way to force explicit register usage (e.g. %r15 in amd64 architecture) in LLVM assembly code? I was proposed in #llvm channel at irc.oftc.net to use inline assembly but i find it rather impractical in my case. Is there any other way? Thanx, ~y.

Debug symbols are missing in elf

2020 Apr 18

2

Debug symbols are missing in elf

...8: R_MICROBLAZE_64 .rodata.str1.1 >> c: a0c00000 ori r6, r0, 0 >> 10: f8a10028 swi r5, r1, 40 >> 14: b0000000 imm 0 >> 14: R_MICROBLAZE_64_PCREL printf >> 18: b9f40000 brlid r15, 0 >> 1c: 10a60000 addk r5, r6, r0 >> printf("Successfully ran Hello World application"); >> 20: b0000000 imm 0 >> 20: R_MICROBLAZE_64 .rodata.str1.1+0xe >> 24: a0a00000 ori r5, r0, 0...

[LLVMdev] Possible missed optimization?

2010 Sep 04

3

[LLVMdev] Possible missed optimization?

On Sep 4, 2010, at 11:21 AM, Borja Ferrer wrote: > I've noticed this pattern happening with other operators aswell, but used xor in this example. As i said before, i tried with different register allocation orders, but it will produce always the same result. GCC is emitting longer code, but since LLVM is so nearer to the optimal code sequence i wanted to reach it. In LLVM, copies are

[LLVMdev] Explicit register usage in LLVM assembly

2011 Apr 02

0

[LLVMdev] Explicit register usage in LLVM assembly

Hello Yiannis, You could write a custom backend that doesn't allocate %r15 for general usage. The normal way to do this is to set up a custom calling convention for all functions that keeps a sentinel in %r15 so that it always holds the sentinel. This is how new operating systems are supported with custom ABIs. The only problem is that you cannot be assured that %r...

SCEV and LoopStrengthReduction Formulae

2018 Apr 03

4

SCEV and LoopStrengthReduction Formulae

...is should stand alone as its own pass: // Example which can be optimized via cmp/jmp fusion. // clang -O3 -S test.c extern void g(int); void f(int *p, long long n) { do { g(*p++); } while (--n); } LLVM currently generates the following sequence for x86_64 targets: LBB0_1: movl (%r15,%rbx,4), %edi callq g addq $1, %rbx cmpq %rbx, %r14 jne .LBB0_1 LLVM can perform compare-jump fusion, it already does in certain cases, but not in the case above. We can remove the cmp above if we were to perform the following transformation: 1.0) Initialize the induction variable, %rbx, to b...

[PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020 Jul 18

0

[PATCH] drm/nouveau: Accept 'legacy' format modifiers

...03 05 d4 0e ca add %gs:0x70ca0ed4(%rip),%rax # 0x70ca0ee2 > d: 70 > e: 48 8b 70 08 mov 0x8(%rax),%rsi > 12: 48 39 f2 cmp %rsi,%rdx > 15: 75 e7 jne 0xfffffffffffffffe > 17: 4c 8b 38 mov (%rax),%r15 > 1a: 4d 85 ff test %r15,%r15 > 1d: 0f 84 8f 01 00 00 je 0x1b2 > 23: 8b 45 20 mov 0x20(%rbp),%eax > 26: 48 8b 7d 00 mov 0x0(%rbp),%rdi > 2a:* 49 8b 1c 07 mov (%r15,%rax,1),%rbx <-- trapping instructio...

TableGen register class

2016 Feb 03

2

TableGen register class

Hi, Assume I define registers R0...R15 and two register classes RegA and RegB. RegA contains R0 to R7 while RegB contains R0 to R15. Then I check the machine instruction, it seems that in some cases, the %vreg0 belongs to RegB; in other cases %vreg1 belongs to RegA_RegB. Can you tell me how TableGen decides which is which? At first, I...

SCEV and LoopStrengthReduction Formulae

2018 Apr 04

0

SCEV and LoopStrengthReduction Formulae

...ch can be optimized via cmp/jmp fusion. > // clang -O3 -S test.c > extern void g(int); > void f(int *p, long long n) { > do { > g(*p++); > } while (--n); > } > > LLVM currently generates the following sequence for x86_64 targets: > LBB0_1: > movl (%r15,%rbx,4), %edi > callq g > addq $1, %rbx > cmpq %rbx, %r14 > jne .LBB0_1 > > LLVM can perform compare-jump fusion, it already does in certain cases, but > not in the case above. We can remove the cmp above if we were to perform > the following transformation: > 1.0)...

[LLVMdev] Need a clue to improve the optimization of some C code

2015 Mar 03

2

[LLVMdev] Need a clue to improve the optimization of some C code

...ks for any feedback. Ciao Nat! P.S. In case someone is interested, here is the assembler code and the IR that produced it. Relevant LLVM generated x86_64 assembler portion with -Os ~~~ testq %r12, %r12 je LBB0_5 ## BB#1: movq -8(%r12), %rcx movq (%rcx), %rax movq -8(%rax), %rdx andq %r15, %rdx cmpq %r15, (%rax,%rdx) je LBB0_2 ## BB#3: addq $8, %rcx jmp LBB0_4 LBB0_2: leaq 8(%rdx,%rax), %rcx LBB0_4: movq %r12, %rdi movq %r15, %rsi movq %r14, %rdx callq *(%rcx) movq %rax, %rbx LBB0_5: ~~~ Better/tighter assembler code would be (saves 2 instructions, one jump less) ~~~ tes...

HiPE calling convention

2017 Sep 29

2

HiPE calling convention

...stion to the HiPE calling convention. I am trying to enable HiPE call for Rust compiler. That presentation mentioned that: Virtual registers with “special” use, pinned to hardware registers (unallocatable). VM Register AMD64 Register Native stack pointer %nsp Heap pointer %r15 Process pointer %rbp Reading that I am under impression that both r15 and rbp should not be used in functions marked using "HiPE" calling convention. That's it looks like r15 and rbp are reserved some purpose (like addressing dynamic language argument/locals). However when tr...

search for: r15