I was playing around in x86 assembly the other day, looking at ways to optimize my cooperative multitasking system. Currently, it uses a 'timeout' counter that is decremented each time through a loop, letting me stop the loop and go to the next cooperative thread if the loop runs a little long. The asm has two overlapping loops: --- _main: mov ecx, 1000000000 timeoutloop: mov edx, 2000 loopto: dec edx jz timeoutloop dec ecx jnz loopto mov eax, 0 ret --- Which takes 1.0s on my machine. To compare, I wanted to see what LLVM performance was like and if a similar technique would yield good performance. I cooked up this test in C: --- #include <stdio.h> int main() { int loop = 1000000000; int timeout; timeoutloop: timeout = 2000; loopto: if (--timeout == 0) goto timeoutloop; if (--loop != 0) goto loopto; printf("Timeout: %i\n", timeout); return 0; } --- With gcc -O3 4.2 and 4.4 we match 1.0s. The LLVM, after running it through opt -std-compile-opts, is around 1.7s. Should I be looking at any particular optimization passes that aren't in -std-compile-opts to match the gcc speeds? Thanks, Jonathan _________________________________________________________________ Windows Live™ Groups: Create an online spot for your favorite groups to meet. http://windowslive.com/online/groups?ocid=TXT_TAGLM_WL_groups_032009 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090302/466f154b/attachment.html>
On Mon, Mar 2, 2009 at 11:38 AM, Jonathan Turner <probata at hotmail.com> wrote:> With gcc -O3 4.2 and 4.4 we match 1.0s. The LLVM, after running it through > opt -std-compile-opts, is around 1.7s.Hmm, on my computer, I get around 2.5 seconds with both gcc -O3 and llvm-gcc -O3 (using llvm-gcc from svn). Not sure what you're doing differently; I wouldn't be surprised if it's sensitive to the version of LLVM.> Should I be looking at any particular optimization passes that aren't in > -std-compile-opts to match the gcc speeds?First, try looking at the generated code... the code LLVM generates is probably not what you're expecting. I'm getting the following for the main loop: .LBB1_1: # loopto cmpl $1, %eax leal -1(%eax), %eax cmove %edx, %eax incl %ecx cmpl $999999999, %ecx jne .LBB1_1 # loopto LLVM is optimizing your oddly nested loops into a single loop which does some extra computation to keep track of the timeout variable. Since you'd normally be doing something non-trivial in the timeout portion of the loop, the results you're getting with this contrived testcase are irrelevant to your actual issue. In general, you'll probably get better results from LLVM with properly nested loops; LLVM's loop optimizers don't know how to deal with deal with overlapping loops. I'd suggest writing it more like the following: int timeout = 2000; int loopcond; do { timeoutwork(); do { timeout--; loopcond = computationresult(); } while (loopcond && timeout); } while (loopcond); -Eli
> Date: Mon, 2 Mar 2009 13:41:45 -0800 > From: eli.friedman at gmail.com > To: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Tight overlapping loops and performance > > Hmm, on my computer, I get around 2.5 seconds with both gcc -O3 and > llvm-gcc -O3 (using llvm-gcc from svn). Not sure what you're doing > differently; I wouldn't be surprised if it's sensitive to the version > of LLVM.For which version of gcc? I should mention I'm on OS X and using the LLVM SVN.> First, try looking at the generated code... the code LLVM generates is > probably not what you're expecting. I'm getting the following for the > main loop:I was seeing the same thing, but wasn't sure what to make of it. It looks like values are being swapped into and out of memory and not holding them in registers. That's why I was asking about other optimization passes, at first glance -mem2reg looked like a good candidate, but I didn't notice any improvement using it blindly.> int timeout = 2000; > int loopcond; > do { > timeoutwork(); > do { > timeout--; > loopcond = computationresult(); > } while (loopcond && timeout); > } while (loopcond);My current implementation uses something very similar, but if you'll notice the difference between this example and my examples is that the branch for checking 'timeout' is taken in the majority case where in mine it isn't. It can be checked separately for less cost, assuming the variables stay in registers. Jonathan _________________________________________________________________ Windows Live™ Contacts: Organize your contact list. http://windowslive.com/connect/post/marcusatmicrosoft.spaces.live.com-Blog-cns!503D1D86EBB2B53C!2285.entry?ocid=TXT_TAGLM_WL_UGC_Contacts_032009 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090302/b0b3d67c/attachment.html>