> You're misreading the asm... nothing is touching memory. (BTW, "leal > -1(%eax), %eax" isn't a memory operation; it's just subtracting one > from %eax.) You might want to try reading the LLVM IR (which you can > generate with llvm-gcc -S -emit-llvm); it tends to be easier to read.I tried that, but I'm still learning LLVM. Seeing indvar, phi nodes, tail calls on printfs, and nounwinds had me more confused than the asm.> A taken and non-taken branch have roughly the same cost on any > remotely recent x86 processor.I was wondering if that might be the case. The crux of the example still seems intact. From LLVM SVN, converted to asm via llc: .text .align 4,0x90 .globl _main _main: subl $12, %esp movl $1999, %eax xorl %ecx, %ecx movl $1999, %edx .align 4,0x90 LBB1_1: ## loopto cmpl $1, %eax leal -1(%eax), %eax cmove %edx, %eax incl %ecx cmpl $999999999, %ecx jne LBB1_1 ## loopto LBB1_2: ## bb1 movl %eax, 4(%esp) movl $LC, (%esp) call _printf xorl %eax, %eax addl $12, %esp ret .section __TEXT,__cstring,cstring_literals LC: ## LC .asciz "Timeout: %i\n" .subsections_via_symbols Setting the loops to decl instead of cmove/incl might seem like more work, but appears to be faster: .text .align 4,0x90 .globl _main _main: subl $12, %esp movl $2000, %eax movl $1000000000, %ecx .align 4,0x90 LBB1_3: movl $2000, %eax LBB1_1: ## loopto decl %eax jz LBB1_3 decl %ecx jnz LBB1_1 ## loopto LBB1_2: ## bb1 movl %eax, 4(%esp) movl $LC, (%esp) call _printf xorl %eax, %eax addl $12, %esp ret .section __TEXT,__cstring,cstring_literals LC: ## LC .asciz "Timeout: %i\n" .subsections_via_symbols The first example is 1.7s, the second is 1.0s. That's on my dual core OS X box. I have a 2-processor quad-core Xeon box that runs Linux and also has very similar results. Jonathan _________________________________________________________________ Windows Live™ Contacts: Organize your contact list. http://windowslive.com/connect/post/marcusatmicrosoft.spaces.live.com-Blog-cns!503D1D86EBB2B53C!2285.entry?ocid=TXT_TAGLM_WL_UGC_Contacts_032009
On Mon, Mar 2, 2009 at 4:58 PM, Jonathan Turner <probata at hotmail.com> wrote:> The crux of the example still seems intact.Have you tried putting something non-trivial (like asm("nop;");) where you'd put the code that runs on the timeout? -Eli
On Mar 2, 2009, at 4:58 PM, Jonathan Turner wrote:> > >> You're misreading the asm... nothing is touching memory. (BTW, "leal >> -1(%eax), %eax" isn't a memory operation; it's just subtracting one >> from %eax.) You might want to try reading the LLVM IR (which you can >> generate with llvm-gcc -S -emit-llvm); it tends to be easier to read. > > I tried that, but I'm still learning LLVM. Seeing indvar, phi nodes, > tail > calls on printfs, and nounwinds had me more confused than the asm. > >> A taken and non-taken branch have roughly the same cost on any >> remotely recent x86 processor. > > I was wondering if that might be the case. > > The crux of the example still seems intact. From LLVM SVN, > converted to asm via llc: > > > .align 4,0x90 > LBB1_1: ## loopto > cmpl $1, %eax > leal -1(%eax), %eax > cmove %edx, %eax > incl %ecx > cmpl $999999999, %ecx > jne LBB1_1 ## loopto > > LBB1_1: ## loopto > decl %eax > jz LBB1_3 > decl %ecx > jnz LBB1_1 ## loopto >The main issue is incl updates the EFLAGS condition code register. But llvm x86 isn't taking advantage of that. This is a known issue, hopefully someone will find the time to implement before 2.6. The second issue is the leal -1 can be turned (back) into a decl. Combine that with the optimization previously described, it can eliminate the first cmpl. Feel free to file a bugzilla for this. I'm hopefully this will be fixed in the not too far future. Thanks, Evan> > Jonathan > > _________________________________________________________________ > Windows Live™ Contacts: Organize your contact list. > http://windowslive.com/connect/post/marcusatmicrosoft.spaces.live.com-Blog-cns!503D1D86EBB2B53C!2285.entry?ocid=TXT_TAGLM_WL_UGC_Contacts_032009 > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> Have you tried putting something non-trivial (like asm("nop;");) where > you'd put the code that runs on the timeout? > > -EliUsing a asm("nop") does fix the llvm output, which makes it sound like a bug. At least in my expectations, a trivial loop should be faster than a non-trivial one.> The main issue is incl updates the EFLAGS condition code register. But > llvm x86 isn't taking advantage of that. This is a known issue, > hopefully someone will find the time to implement before 2.6. > > The second issue is the leal -1 can be turned (back) into a decl. > Combine that with the optimization previously described, it can > eliminate the first cmpl. > > Feel free to file a bugzilla for this. I'm hopefully this will be > fixed in the not too far future. > > Thanks, > > EvanWill do. Thanks. Jonathan _________________________________________________________________ Windows Live™ Contacts: Organize your contact list. http://windowslive.com/connect/post/marcusatmicrosoft.spaces.live.com-Blog-cns!503D1D86EBB2B53C!2285.entry?ocid=TXT_TAGLM_WL_UGC_Contacts_032009