I've attached a 2 examples of patterns for updating an array. They're simplified examples of some code generation I'm doing where I need to pass a large, unknown number of arguments (ints, doubles, pointers to other information) to a function. The two patterns are: Stack[0] = 0; Stack[1] = 1; Stack[2] = 42; And Int I = 0; Stack[i++] = 0; Stack[i++] = 1; Stack[i++] = 42; I've found that the optimizer is much faster at handling the 2nd pattern than the 1st. With 3000 assignments - as in the examples attached - "opt -std-compile-opts -S func1.ll -o func1.opt" takes 10 seconds, while func2.ll takes 0.2s. Anyone have any ideas why? I've tested with LLVM 2.9, and LLVM 3.0 RC1. Regards, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111121/8bd28fe3/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: opt.zip Type: application/x-zip-compressed Size: 196440 bytes Desc: opt.zip URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111121/8bd28fe3/attachment.bin>
On Mon, Nov 21, 2011 at 8:58 AM, Michael Smith <Michael.Smith at synopsys.com> wrote:> The two patterns are: > > Stack[0] = 0; > Stack[1] = 1; > Stack[2] = 42; > > And > > Int I = 0; > > Stack[i++] = 0; > Stack[i++] = 1; > Stack[i++] = 42;The optimizer is probably not sophisticated enough to figure out that the array indices in your first pattern are sequentially incrementing integers. So the generated code has to store each individual array index constant somewhere, load it into a register, multiply it by the size of your array elements, then add it to the pointer to the base of the array. It does that for each array access. On some architectures, loading constant values can be expensive because the machine code words do not provide enough bits to encode arbitrary immediate constants. The constants are most often stored just after the end of the function's executable code, then read into a register with PC-relative addressing. Other architectures allow one to load any immediate constant, but this comes at the cost of placing the value of the constant just after the instruction that loads it. But that stalls the processor's pipeline until the constant has been loaded into the register. In your second pattern, the simple way would be to have a fixed pointer to the base of your array, and a register that is used as an offset. Most architectures offer autoincrement/autodecrement modes for access patterns just as you describe, in which that index register is incremented by the size of your array elements just after it is used as an offset. But in this case the optimizer is smart enough to see that you're indexing through an array. Rather than using a base register and an offset register, the initial value of i is added to the base register, then this register is used all by itself, again with autoincrementing. Not having to use a second register for the index allows that second register to be used for some other purpose, so that other parts of your code can be optimized more effectively. -- Don Quixote de la Mancha Dulcinea Technologies Corporation Software of Elegance and Beauty http://www.dulcineatech.com quixote at dulcineatech.com
Where does the -time-passes option say the time is going? Sent from my iPhone On Nov 21, 2011, at 8:58 AM, Michael Smith <Michael.Smith at synopsys.com> wrote:> I’ve attached a 2 examples of patterns for updating an array. They’re simplified examples of some code generation I’m doing where I need to pass a large, unknown number of arguments (ints, doubles, pointers to other information) to a function. > > The two patterns are: > Stack[0] = 0; > Stack[1] = 1; > Stack[2] = 42; > And > Int I = 0; > Stack[i++] = 0; > Stack[i++] = 1; > Stack[i++] = 42; > > I’ve found that the optimizer is much faster at handling the 2nd pattern than the 1st. With 3000 assignments - as in the examples attached – “opt –std-compile-opts –S func1.ll –o func1.opt” takes 10 seconds, while func2.ll takes 0.2s. Anyone have any ideas why? > > I’ve tested with LLVM 2.9, and LLVM 3.0 RC1. > > Regards, > Michael > <opt.zip> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111121/7bfac643/attachment.html>
Dead Code Elimination. I’m going to try re-working it following some of the insights Don Quixote provided. From: Cameron Zwarich [mailto:zwarich at apple.com] Sent: Monday, November 21, 2011 12:17 PM To: Michael Smith Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Optimization of array access Where does the -time-passes option say the time is going? Sent from my iPhone On Nov 21, 2011, at 8:58 AM, Michael Smith <Michael.Smith at synopsys.com<mailto:Michael.Smith at synopsys.com>> wrote: I’ve attached a 2 examples of patterns for updating an array. They’re simplified examples of some code generation I’m doing where I need to pass a large, unknown number of arguments (ints, doubles, pointers to other information) to a function. The two patterns are: Stack[0] = 0; Stack[1] = 1; Stack[2] = 42; And Int I = 0; Stack[i++] = 0; Stack[i++] = 1; Stack[i++] = 42; I’ve found that the optimizer is much faster at handling the 2nd pattern than the 1st. With 3000 assignments - as in the examples attached – “opt –std-compile-opts –S func1.ll –o func1.opt” takes 10 seconds, while func2.ll takes 0.2s. Anyone have any ideas why? I’ve tested with LLVM 2.9, and LLVM 3.0 RC1. Regards, Michael <opt.zip> _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111121/41fa16b2/attachment.html>