thr3ads.net - search: "jne"

2016 May 13

4

RFC: callee saved register verifier

...p movabsq $0xCA5FCA5FCA5FCA5F, %rbx # can also be movq %rbp, %rbx etc. movabsq $0xCA5FCA5FCA5FCA5F, %r12 movabsq $0xCA5FCA5FCA5FCA5F, %r13 movabsq $0xCA5FCA5FCA5FCA5F, %r14 movabsq $0xCA5FCA5FCA5FCA5F, %r15 callq foo movabsq $0xCA5FCA5FCA5FCA5F, %rax cmpq %rax, %rbp jne .LBB1_5 movabsq $0xCA5FCA5FCA5FCA5F, %rax cmpq %rax, %rbx jne .LBB1_5 movabsq $0xCA5FCA5FCA5FCA5F, %rax cmpq %rax, %r12 jne .LBB1_5 movabsq $0xCA5FCA5FCA5FCA5F, %rax cmpq %rax, %r13 jne .LBB1_5 movabsq $0xCA5FCA5FCA5FCA5F, %rax cmpq %rax,...

[LLVMdev] code-altering Passes for llc

2009 Aug 02

2

[LLVMdev] code-altering Passes for llc

Greetinigs, I am extending llc to include runtime checks for calls (in X86). So a call 'call target' is altered to look like this: [some check] jne error_function call target I've done this by implementing a MachineFunctionPass that is instantiated and added to the PassManager in X86TargetMachine::addPreRegAlloc. In order to create the jne-instruction I need some BasicBlock that contains the error routine. So I tried to create a Module...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

4

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...// these 4 lines is crc >>= 1; // rather poor! } return ~crc; } See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> (-O2) crc32be: # @crc32be xor eax, eax test esi, esi jne .LBB0_2 jmp .LBB0_5 .LBB0_4: # in Loop: Header=BB0_2 Depth=1 add rdi, 1 test esi, esi je .LBB0_5 .LBB0_2: # =>This Loop Header: Depth=1 add esi, -1 movzx edx, byte ptr [rdi] shl edx, 24 xor edx, eax m...

[LLVMdev] Boostrap Failure -- Expected Differences?

2007 Apr 30

0

[LLVMdev] Boostrap Failure -- Expected Differences?

...> + 3bf: 73 65 jae 426 <__FUNCTION__.20866+0xa> > 3c1: 5f pop %edi > 3c2: 64 65 63 6c 00 2f arpl %bp,%fs:%gs:0x2f(%eax,%eax,1) > > 000003c7 <.str>: > 3c7: 2f das > - 3c8: 75 73 jne 43d <__FUNCTION__.21160+0x4> > + 3c8: 75 73 jne 43d <__FUNCTION__.21073+0x4> > 3ca: 65 gs > - 3cb: 72 73 jb 440 <__FUNCTION__.21160+0x7> > + 3cb: 72 73 jb 440 <__FUNCTION__.21073+0x7&...

[PATCH] drm/nouveau/kms: Fix NULL pointer dereference in nouveau_connector_detect_depth

2020 Oct 09

1

[PATCH] drm/nouveau/kms: Fix NULL pointer dereference in nouveau_connector_detect_depth

...0a 00 or (%rax),%al 2: 00 48 8b add %cl,-0x75(%rax) 5: 49 rex.WB 6: 48 c7 87 b8 00 00 00 movq $0x6,0xb8(%rdi) d: 06 00 00 00 11: 80 b9 4d 0a 00 00 00 cmpb $0x0,0xa4d(%rcx) 18: 75 1e jne 0x38 1a: 83 fa 41 cmp $0x41,%edx 1d: 75 05 jne 0x24 1f: 48 85 c0 test %rax,%rax 22: 75 29 jne 0x4d 24: 8b 81 10 0d 00 00 mov 0xd10(%rcx),%eax 2a:* 39 06 cmp %eax,(%rs...

[LLVMdev] Commutability of X86 FMA3 instructions.

2013 Dec 20

2

[LLVMdev] Commutability of X86 FMA3 instructions.

...or the curious, the reason that I'm asking is that we currently always select the 213 variant, but this introduces an extra copies in accumulator-style loops. Something like: while (...) accumulator = x * y + accumulator; yields: loop: vfmadd.213 y, x, acc vmovaps acc, x decl count jne loop instead of loop: vfmadd.231 acc, x, y decl count jne loop I have started writing a patch to generate the 231 variant by default, and I want to know whether I need to go to the trouble of adding custom commute logic. If these things aren't commutable then I don't need to worry...

[PATCH v2] drm/nouveau/kms: Fix NULL pointer dereference in nouveau_connector_detect_depth

2020 Oct 13

1

[PATCH v2] drm/nouveau/kms: Fix NULL pointer dereference in nouveau_connector_detect_depth

...0a 00 or (%rax),%al 2: 00 48 8b add %cl,-0x75(%rax) 5: 49 rex.WB 6: 48 c7 87 b8 00 00 00 movq $0x6,0xb8(%rdi) d: 06 00 00 00 11: 80 b9 4d 0a 00 00 00 cmpb $0x0,0xa4d(%rcx) 18: 75 1e jne 0x38 1a: 83 fa 41 cmp $0x41,%edx 1d: 75 05 jne 0x24 1f: 48 85 c0 test %rax,%rax 22: 75 29 jne 0x4d 24: 8b 81 10 0d 00 00 mov 0xd10(%rcx),%eax 2a:* 39 06 cmp %eax,(%rs...

[LLVMdev] Boostrap Failure -- Expected Differences?

2007 Apr 27

2

[LLVMdev] Boostrap Failure -- Expected Differences?

The saga continues. I've been tracking the interface changes and merging them with the refactoring work I'm doing. I got as far as building stage3 of llvm-gcc but the object files from stage2 and stage3 differ: warning: ./cc1-checksum.o differs warning: ./cc1plus-checksum.o differs (Are the above two ok?) The list below is clearly bad. I think it's every object file in the

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 27

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...poor! >> } >> return ~crc; >> } >> >> See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> >> (-O2) >> >> crc32be: # @crc32be >> xor eax, eax >> test esi, esi >> jne .LBB0_2 >> jmp .LBB0_5 >> .LBB0_4: # in Loop: Header=BB0_2 Depth=1 >> add rdi, 1 >> test esi, esi >> je .LBB0_5 >> .LBB0_2: # =>This Loop Header: Depth=1 >> add esi, -1 >> movzx edx,...

SCEV and LoopStrengthReduction Formulae

2018 Apr 04

0

SCEV and LoopStrengthReduction Formulae

> cmpq %rbx, %r14 > jne .LBB0_1 > > LLVM can perform compare-jump fusion, it already does in certain cases, but > not in the case above. We can remove the cmp above if we were to perform > the following transformation: Do you mean branch-fusion (https://en.wikichip.org/wiki/macro-operation_fusion)? Is there...

RFC: callee saved register verifier

2016 May 13

2

RFC: callee saved register verifier

...%rbx etc. > > movabsq $0xCA5FCA5FCA5FCA5F, %r12 > > movabsq $0xCA5FCA5FCA5FCA5F, %r13 > > movabsq $0xCA5FCA5FCA5FCA5F, %r14 > > movabsq $0xCA5FCA5FCA5FCA5F, %r15 > > callq foo > > movabsq $0xCA5FCA5FCA5FCA5F, %rax > > cmpq %rax, %rbp > > jne .LBB1_5 > > movabsq $0xCA5FCA5FCA5FCA5F, %rax > > cmpq %rax, %rbx > > jne .LBB1_5 > > movabsq $0xCA5FCA5FCA5FCA5F, %rax > > cmpq %rax, %r12 > > jne .LBB1_5 > > movabsq $0xCA5FCA5FCA5FCA5F, %rax > > cmpq %rax, %r13 > &gt...

[RFC] New pass: LoopExitValues

2015 Sep 01

2

[RFC] New pass: LoopExitValues

...----------------- matrix_mul: testl %edi, %edi je .LBB0_5 xorl %r9d, %r9d xorl %r8d, %r8d .LBB0_2: xorl %r11d, %r11d .LBB0_3: movl %r9d, %r10d movl (%rdx,%r10,4), %eax imull %ecx, %eax movl %eax, (%rsi,%r10,4) incl %r11d incl %r9d cmpl %r11d, %edi jne .LBB0_3 incl %r8d cmpl %edi, %r8d jne .LBB0_2 .LBB0_5: retq Without LoopExitValues: ----------------------------------- matrix_mul: pushq %rbx # Eliminated by L.E.V. pass .Ltmp0: .Ltmp1: testl %edi, %edi je .LBB0_5 xorl %r8d, %r8d xorl %r9d, %r9d .LB...

[LLVMdev] code-altering Passes for llc

2009 Aug 02

0

[LLVMdev] code-altering Passes for llc

On Aug 2, 2009, at 7:09 AM, Artjom Kochtchi wrote: > > Greetinigs, > > I am extending llc to include runtime checks for calls (in X86). So > a call > 'call target' is altered to look like this: > > [some check] > jne error_function > call target > > I've done this by implementing a MachineFunctionPass that is > instantiated > and added to the PassManager in X86TargetMachine::addPreRegAlloc. > > In order to create the jne-instruction I need some BasicBlock that > contains > th...

[PATCH v1 06/27] x86/entry/64: Adapt assembly for PIE support

2017 Oct 20

3

[PATCH v1 06/27] x86/entry/64: Adapt assembly for PIE support

...a pointer to the C function implementing the syscall. >> * IRQs are on. >> */ >> - cmpq $.Lentry_SYSCALL_64_after_fastpath_call, (%rsp) >> + leaq .Lentry_SYSCALL_64_after_fastpath_call(%rip), %r11 >> + cmpq %r11, (%rsp) >> jne 1f >> >> /* >> @@ -1172,7 +1176,8 @@ ENTRY(error_entry) >> movl %ecx, %eax /* zero extend */ >> cmpq %rax, RIP+8(%rsp) >> je .Lbstep_iret >> - cmpq $.Lgs_change, RIP+8(%rsp) >> + l...

[PATCH v1 06/27] x86/entry/64: Adapt assembly for PIE support

2017 Oct 20

3

[PATCH v1 06/27] x86/entry/64: Adapt assembly for PIE support

...a pointer to the C function implementing the syscall. >> * IRQs are on. >> */ >> - cmpq $.Lentry_SYSCALL_64_after_fastpath_call, (%rsp) >> + leaq .Lentry_SYSCALL_64_after_fastpath_call(%rip), %r11 >> + cmpq %r11, (%rsp) >> jne 1f >> >> /* >> @@ -1172,7 +1176,8 @@ ENTRY(error_entry) >> movl %ecx, %eax /* zero extend */ >> cmpq %rax, RIP+8(%rsp) >> je .Lbstep_iret >> - cmpq $.Lgs_change, RIP+8(%rsp) >> + l...

[PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020 Jul 17

1

[PATCH] drm/nouveau: Accept 'legacy' format modifiers

...51 08 mov 0x8(%rcx),%edx 3: 48 89 c8 mov %rcx,%rax 6: 65 48 03 05 d4 0e ca add %gs:0x70ca0ed4(%rip),%rax # 0x70ca0ee2 d: 70 e: 48 8b 70 08 mov 0x8(%rax),%rsi 12: 48 39 f2 cmp %rsi,%rdx 15: 75 e7 jne 0xfffffffffffffffe 17: 4c 8b 38 mov (%rax),%r15 1a: 4d 85 ff test %r15,%r15 1d: 0f 84 8f 01 00 00 je 0x1b2 23: 8b 45 20 mov 0x20(%rbp),%eax 26: 48 8b 7d 00 mov 0x0(%rbp),%rdi 2a:* 49 8b 1c 07 mov (%r15,%...

SCEV and LoopStrengthReduction Formulae

2018 Apr 03

4

SCEV and LoopStrengthReduction Formulae

...n be optimized via cmp/jmp fusion. // clang -O3 -S test.c extern void g(int); void f(int *p, long long n) { do { g(*p++); } while (--n); } LLVM currently generates the following sequence for x86_64 targets: LBB0_1: movl (%r15,%rbx,4), %edi callq g addq $1, %rbx cmpq %rbx, %r14 jne .LBB0_1 LLVM can perform compare-jump fusion, it already does in certain cases, but not in the case above. We can remove the cmp above if we were to perform the following transformation: 1.0) Initialize the induction variable, %rbx, to be 'n' instead of zero. 1.1) Negate the induction var...

[PATCH] core: Allow pasting from a VMware host by typing Ctrl-P

2011 Feb 18

1

[PATCH] core: Allow pasting from a VMware host by typing Ctrl-P

...inc b/core/ui.inc index 2d44447..0e82779 100644 --- a/core/ui.inc +++ b/core/ui.inc @@ -125,6 +125,8 @@ not_ascii: je print_version cmp al,'X' & 1Fh ; <Ctrl-X> je force_text_mode + cmp al,'P' & 1Fh ; <Ctrl-P> + je paste cmp al,08h ; Backspace jne get_char backspace: cmp di,command_line ; Make sure there is anything @@ -143,6 +145,10 @@ force_text_mode: call vgaclearmode jmp enter_command +paste: + call vmware_paste + jmp get_char + set_func_flag: mov byte [FuncFlag],1 jmp short get_char_2 @@ -568,6 +574,72 @@ getchar_time...

[RFC] New pass: LoopExitValues

2015 Aug 31

2

[RFC] New pass: LoopExitValues

Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the

[LLVMdev] Commutability of X86 FMA3 instructions.

2013 Dec 20

0

[LLVMdev] Commutability of X86 FMA3 instructions.

...ently > always select the 213 variant, but this introduces an extra copies in > accumulator-style loops. Something like: > > while (...) > accumulator = x * y + accumulator; > > yields: > > loop: > vfmadd.213 y, x, acc > vmovaps acc, x > decl count > jne loop > > instead of > > loop: > vfmadd.231 acc, x, y > decl count > jne loop > > I have started writing a patch to generate the 231 variant by default, > and I want to know whether I need to go to the trouble of adding > custom commute logic. If these things a...

search for: jne