thr3ads.net - search: "esi"

[LLVMdev] Suboptimal code due to excessive spilling

2012 Mar 28

2

[LLVMdev] Suboptimal code due to excessive spilling

...[i] = 5; // xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx } return s; } ====== Output A ====== ====================== foo: # @foo .Ltmp12: .cfi_startproc # BB#0: pushl %ebx .Ltmp13: .cfi_def_cfa_offset 8 pushl %edi .Ltmp14: .cfi_def_cfa_offset 12 pushl %esi .Ltmp15: .cfi_def_cfa_offset 16 subl $88, %esp .Ltmp16: .cfi_def_cfa_offset 104 .Ltmp17: .cfi_offset %esi, -16 .Ltmp18: .cfi_offset %edi, -12 .Ltmp19: .cfi_offset %ebx, -8 pxor %xmm0, %xmm0 movl 112(%esp), %eax testl %eax, %eax je .LBB1_3 # BB#1: xorl %ebx, %ebx movl 108(%esp), %ecx mo...

[LLVMdev] Area for improvement

2005 Feb 22

5

[LLVMdev] Area for improvement

...[i][j]='.'; for (i=0;i<COLS;i++) b[i][ROWS]=0; } This generates the following X86 code: .text .align 16 .globl init_board .type init_board, @function init_board: subl $4, %esp movl %esi, (%esp) movl 8(%esp), %eax movl $0, %ecx .LBBinit_board_1: # loopexit.1 imull $7, %ecx, %edx movl %eax, %esi addl %edx, %esi movb $46, (%esi) imull $7, %ecx, %edx movl %eax, %esi addl %edx, %esi leal 1(%esi), %ed...

[LLVMdev] Suboptimal code due to excessive spilling

2012 Apr 05

0

[LLVMdev] Suboptimal code due to excessive spilling

...[i] = 5; // xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx } return s; } ====== Output A ====== ====================== foo: # @foo .Ltmp12: .cfi_startproc # BB#0: pushl %ebx .Ltmp13: .cfi_def_cfa_offset 8 pushl %edi .Ltmp14: .cfi_def_cfa_offset 12 pushl %esi .Ltmp15: .cfi_def_cfa_offset 16 subl $88, %esp .Ltmp16: .cfi_def_cfa_offset 104 .Ltmp17: .cfi_offset %esi, -16 .Ltmp18: .cfi_offset %edi, -12 .Ltmp19: .cfi_offset %ebx, -8 pxor %xmm0, %xmm0 movl 112(%esp), %eax testl %eax, %eax je .LBB1_3 # BB#1: xorl %ebx, %ebx movl 108(%esp), %ecx mo...

[LLVMdev] Area for improvement

2005 Feb 22

0

[LLVMdev] Area for improvement

...for (i=0;i<COLS;i++) > for (j=0;j<ROWS;j++) > b[i][j]='.'; > for (i=0;i<COLS;i++) > b[i][ROWS]=0; > } > > This generates the following X86 code: > imull $7, %ecx, %edx > movl %eax, %esi > addl %edx, %esi > movb $46, (%esi) > imull $7, %ecx, %edx > movl %eax, %esi > addl %edx, %esi > leal 1(%esi), %edx ... (many many copies of this, see the end of the email for full output) ... > The code generated by GCC is much faster. L...

[LLVMdev] Area for improvement

2005 Feb 22

2

[LLVMdev] Area for improvement

...for (j=0;j<ROWS;j++) >> b[i][j]='.'; >> for (i=0;i<COLS;i++) >> b[i][ROWS]=0; >> } >> >> This generates the following X86 code: >> imull $7, %ecx, %edx >> movl %eax, %esi >> addl %edx, %esi >> movb $46, (%esi) >> imull $7, %ecx, %edx >> movl %eax, %esi >> addl %edx, %esi >> leal 1(%esi), %edx > > ... (many many copies of this, see the end of the email for full > output) ... > >&g...

esi cache server built on mongrel

2007 May 15

1

esi cache server built on mongrel

Hi all, I just wanted to share a project I''ve been working on now for a few months. It''s still far from complete, but has already been very useful for me and my coworkers. I''m calling it mongrel-esi. ESI stands for Edge Side Include. The specs live here, http://www.w3.org/TR/esi-lang. I currently only have support for <esi:include, <esi:try, <esi:attempt, and <esi:except. Also, it supports HTTP_COOKIES, so fragments can be conditionally cached by a cookie. I have implemente...

Nasty Bug (BIOS?).

2005 Aug 18

2

Nasty Bug (BIOS?).

...th the known EBIOS/CBIOS-problem. The symptom was exactly the same (hangs at ...EBIOS). As 3.10-pre8 and 3.10-pre9, in contrary what was mentioned in the ML, did not bring any improvement, I looked deeper into what could be my specific problem. I found out that the program just halted at 'cmp [esi],edx' (line 658; ldlinux.asm 3.10-pre9)! By replacing that code by 'cmp [si],edx' the problems were gone, it worked like a charm;-) The only thing I can think of to cause this is that the 16 high bits of esi are incorrect. (An opcode-bug for this command seemed out of the question.) Af...

[LLVMdev] Area for improvement

2005 Feb 22

0

[LLVMdev] Area for improvement

...<COLS;i++) > b[i][ROWS]=0; > } > > This generates the following X86 code: > > .text > .align 16 > .globl init_board > .type init_board, @function > init_board: > subl $4, %esp > movl %esi, (%esp) > movl 8(%esp), %eax > movl $0, %ecx > .LBBinit_board_1: # loopexit.1 > imull $7, %ecx, %edx > movl %eax, %esi > addl %edx, %esi > movb $46, (%esi) > imull $7, %ecx, %edx > movl %eax, %esi >...

[LLVMdev] Bug in X86CompilationCallback_SSE

2009 Mar 11

4

[LLVMdev] Bug in X86CompilationCallback_SSE

...X86CompilationCallback_SSE: 0xb74b98e0 <X86CompilationCallback_SSE+0>: push %ebp 0xb74b98e1 <X86CompilationCallback_SSE+1>: mov %esp,%ebp 0xb74b98e3 <X86CompilationCallback_SSE+3>: sub $0x78,%esp 0xb74b98e6 <X86CompilationCallback_SSE+6>: mov %esi,-0x8(%ebp) 0xb74b98e9 <X86CompilationCallback_SSE+9>: lea 0x17(%esp),%esi 0xb74b98ed <X86CompilationCallback_SSE+13>: and $0xfffffff0,%esi 0xb74b98f0 <X86CompilationCallback_SSE+16>: mov %ebx,-0xc(%ebp) 0xb74b98f3 <X86CompilationCallback_SSE+19>: mo...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 13

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...push r12 push rbx sub rsp, 18h mov ebx, 0FFFFFFFFh cmp edi, 2 jnz loc_100000F29 mov rdi, [rsi+8] ; char * xor r14d, r14d xor esi, esi ; char ** mov edx, 0Ah ; int call _strtol mov r15, rax shl rax, 20h mov rsi, offset __mh_execute_header add rsi, rax sar rsi, 20h ; si...

[LLVMdev] Area for improvement

2005 Feb 22

0

[LLVMdev] Area for improvement

...) >>> b[i][j]='.'; >>> for (i=0;i<COLS;i++) >>> b[i][ROWS]=0; >>> } >>> >>> This generates the following X86 code: >>> imull $7, %ecx, %edx >>> movl %eax, %esi >>> addl %edx, %esi >>> movb $46, (%esi) >>> imull $7, %ecx, %edx >>> movl %eax, %esi >>> addl %edx, %esi >>> leal 1(%esi), %edx >> >> ... (many many copies of this, see the end of the email for f...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...sub rsp, 18h >> mov ebx, 0FFFFFFFFh >> cmp edi, 2 >> jnz loc_100000F29 >> mov rdi, [rsi+8] ; char * >> xor r14d, r14d >> xor esi, esi ; char ** >> mov edx, 0Ah ; int >> call _strtol >> mov r15, rax >> shl rax, 20h >> mov rsi, offset __mh_execute_header >> add...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...mov ebx, 0FFFFFFFFh >>>> cmp edi, 2 >>>> jnz loc_100000F29 >>>> mov rdi, [rsi+8] ; char * >>>> xor r14d, r14d >>>> xor esi, esi ; char ** >>>> mov edx, 0Ah ; int >>>> call _strtol >>>> mov r15, rax >>>> shl rax, 20h >>>> mov rsi, offset __mh_execute...

Suboptimal code generated by clang+llc in quite a common scenario (?)

2019 Aug 08

2

Suboptimal code generated by clang+llc in quite a common scenario (?)

...!tbaa !13 ret i32 0 } According to that, the variable ‘scscx’ is loaded three times despite it’s never modified. The resulting assembly code is this: .globl _tst _tst: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset %ebp, -8 movl %esp, %ebp .cfi_def_cfa_register %ebp pushl %esi .cfi_offset %esi, -12 movb 16(%ebp), %al movb 12(%ebp), %cl movb 8(%ebp), %dl movl _scscx, %esi movb %dl, (%esi) movl _scscx, %edx movb %cl, 1(%edx) movl _scscx, %ecx movb %al, 2(%ecx) xorl %eax, %eax popl %esi popl %ebp retl .cfi_endproc .comm _pp,3,0 .section __DATA,__data .glo...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

4

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...// these 4 lines is crc >>= 1; // rather poor! } return ~crc; } See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> (-O2) crc32be: # @crc32be xor eax, eax test esi, esi jne .LBB0_2 jmp .LBB0_5 .LBB0_4: # in Loop: Header=BB0_2 Depth=1 add rdi, 1 test esi, esi je .LBB0_5 .LBB0_2: # =>This Loop Header: Depth=1 add esi, -1 movzx edx, byte ptr [rdi] shl edx, 24 xor e...

[LLVMdev] Boostrap Failure -- Expected Differences?

2007 Apr 30

0

[LLVMdev] Boostrap Failure -- Expected Differences?

...e4 alias_sets_conflict_p > @@ -11617,23 +11617,23 @@ > 39c: R_386_32 gt_ggc_mx_varray_head_tag > 3a0: R_386_32 gt_pch_nx_varray_head_tag > > -000003b8 <__FUNCTION__.20147>: > +000003b8 <__FUNCTION__.20062>: > 3b8: 66 69 6e 64 5f 62 imul $0x625f,0x64(%esi),%bp > 3be: 61 popa > - 3bf: 73 65 jae 426 <__FUNCTION__.20952+0xa> > + 3bf: 73 65 jae 426 <__FUNCTION__.20866+0xa> > 3c1: 5f pop %edi > 3c2: 64 65 63 6c 00 2f arpl %bp,%fs:%gs:0x2f...

cannot use winedbg on ubuntu feisty ?

2007 Aug 06

0

cannot use winedbg on ubuntu feisty ?

...ng debugger... Unhandled exception: page fault on read access to 0x00000000 in 32-bit code (0xb7d5cc23). Register dump: CS:0073 SS:007b DS:007b ES:007b FS:0033 GS:003b EIP:b7d5cc23 ESP:0033ef4c EBP:0033ef68 EFLAGS:00210246( - 00 -RIZP1) EAX:00000000 EBX:7ec6850c ECX:00000000 EDX:012c0070 ESI:012cbcc4 EDI:00000000 Stack dump: 0x0033ef4c: 7ec5aa55 00000000 00000cf8 0033efa8 0x0033ef5c: 7ec6850c 012cbcc4 0013074c 0033efa8 0x0033ef6c: 7ec5bc4d 0013074c 00000000 0033efa8 0x0033ef7c: 7ec6850c 00000000 7ec6850c 0033efa8 0x0033ef8c: 7ec60295 0012f988 012cbca0 00000004 0x0033ef9c: 7ec6850...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 27

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...crc; } unsigned int bar(unsigned int crc, unsigned int poly) { if (crc & 1) crc >>= 1, crc ^= poly; else crc >>= 1; return crc; } and you get the perfect code for the left-shifting case! foo: # @foo lea eax, [rdi + rdi] sar edi, 31 and edi, esi xor eax, edi ret The right-shifting case leaves but still room for improvement! bar: # @bar | bar: # @bar mov eax, edi | and eax, 1 | neg eax | shr edi | shr edi | sbb eax...

Where's the optimiser gone? (part 5.c): missed tail calls, and more...

2018 Dec 01

2

Where's the optimiser gone? (part 5.c): missed tail calls, and more...

...sh dword ptr [esp + 16] | push dword ptr [esp + 16] | push dword ptr [esp + 16] | call __allrem | jmp __allrem ret 16 | __int64 __fastcall mul(__int64 foo, __int64 bar) { return foo * bar; } push esi | mov ecx, dword ptr [esp + 16] mov ecx, dword ptr [esp + 16] | mov edx, dword ptr [esp + 12] mov esi, dword ptr [esp + 8] | imul edx, dword ptr [esp + 8] mov eax, ecx | mov eax, dword ptr [esp + 4] imul e...

[LLVMdev] Float compare-for-equality and select optimization opportunity

2008 May 27

3

[LLVMdev] Float compare-for-equality and select optimization opportunity

...a = b; b = c; c = t; } This is the resulting x86 assembly code: movss xmm0,dword ptr [ecx+4] ucomiss xmm0,dword ptr [ecx+8] sete al setnp dl test dl,al mov edx,edi cmovne edx,ecx cmovne ecx,esi cmovne esi,edi While I'm pleasantly surprised that my branch does get turned into several select operations as intended (cmov - conditional move - in x86), I'm confused why it uses the ucomiss instruction (unordered compare and set flags). I only used IRBuilder::CreateFCmpOEQ. It...

search for: esi