I have a question about the performance of the implementation of the stack protector in LLVM. Consider the following C program: ====void canary() { char buf[20]; buf[0]++; } int main() { int i; for (i = 0; i < 1000000000; ++i) canary(); return 0; } ==== This should definately run slower when stack protection is enabled, right? I have measured the runtime of this program on two different systems compiled with GCC and LLVM. Here are the results (percentages are the difference with the unprotected version of the program): | Desktop | Laptop | -----+---------+--------+ GCC | +13% | +277% | LLVM | -3%(!) | +330% | (These measurements are the median values of 10 runs.) So the obvious question is: can anybody explain how it is possible that using the stack protector causes the program to run 3% faster on my desktop? I have tried profiling the program using valgrind (cachegrind & callgrind) but the results show absolutely no reason at all for these measurements. I have attached an archive with the source code and compiled binaries. Here are the specs of the two systems: * Desktop - Ubuntu 11.10 - Linux 3.0.0-16-generic-pae - Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (2048K cache) * Laptop - Ubuntu 11.10 - Linux 3.0.0-16-generic - Intel(R) Atom(TM) CPU N450 @ 1.66GHz (512K cache) Kind regards, Job -------------- next part -------------- A non-text attachment was scrubbed... Name: canary.tgz Type: application/x-gzip Size: 4321 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120309/22de8da6/attachment.bin>
What optimization level are you using? -O0 is not interesting, and at -O1 the optimizer nukes all the code In your example, the stack variable and the stack accesses are optimized away: % ./build/Release+Asserts/bin/clang -O1 -S -emit-llvm -o - stack.c define void @canary() nounwind uwtable readnone { entry: ret void } define i32 @main() nounwind uwtable readnone { for.end: ret i32 0 } You need to prepare a more optimizer-resistant benchmark. --kcc On Fri, Mar 9, 2012 at 2:52 AM, Job Noorman <jobnoorman at gmail.com> wrote:> I have a question about the performance of the implementation of the stack > protector in LLVM. > > Consider the following C program: > ====> void canary() > { > char buf[20]; > buf[0]++; > } > > int main() > { > int i; > for (i = 0; i < 1000000000; ++i) > canary(); > return 0; > } > ====> > This should definately run slower when stack protection is enabled, right? > > I have measured the runtime of this program on two different systems > compiled > with GCC and LLVM. Here are the results (percentages are the difference > with > the unprotected version of the program): > > | Desktop | Laptop | > -----+---------+--------+ > GCC | +13% | +277% | > LLVM | -3%(!) | +330% | > > (These measurements are the median values of 10 runs.) > > So the obvious question is: can anybody explain how it is possible that > using > the stack protector causes the program to run 3% faster on my desktop? > > I have tried profiling the program using valgrind (cachegrind & callgrind) > but > the results show absolutely no reason at all for these measurements. > > I have attached an archive with the source code and compiled binaries. > > Here are the specs of the two systems: > * Desktop > - Ubuntu 11.10 > - Linux 3.0.0-16-generic-pae > - Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (2048K cache) > * Laptop > - Ubuntu 11.10 > - Linux 3.0.0-16-generic > - Intel(R) Atom(TM) CPU N450 @ 1.66GHz (512K cache) > > Kind regards, > Job > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120309/fd7103c1/attachment.html>
If you compile this with optimizations, then the 'canary()' function should be totally inlined into the 'main()' function. In that case, the cost of the stack protectors will be very small compared to the loop. -bw On Mar 9, 2012, at 2:52 AM, Job Noorman <jobnoorman at gmail.com> wrote:> I have a question about the performance of the implementation of the stack > protector in LLVM. > > Consider the following C program: > ====> void canary() > { > char buf[20]; > buf[0]++; > } > > int main() > { > int i; > for (i = 0; i < 1000000000; ++i) > canary(); > return 0; > } > ====> > This should definately run slower when stack protection is enabled, right? > > I have measured the runtime of this program on two different systems compiled > with GCC and LLVM. Here are the results (percentages are the difference with > the unprotected version of the program): > > | Desktop | Laptop | > -----+---------+--------+ > GCC | +13% | +277% | > LLVM | -3%(!) | +330% | > > (These measurements are the median values of 10 runs.) > > So the obvious question is: can anybody explain how it is possible that using > the stack protector causes the program to run 3% faster on my desktop? > > I have tried profiling the program using valgrind (cachegrind & callgrind) but > the results show absolutely no reason at all for these measurements. > > I have attached an archive with the source code and compiled binaries. > > Here are the specs of the two systems: > * Desktop > - Ubuntu 11.10 > - Linux 3.0.0-16-generic-pae > - Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (2048K cache) > * Laptop > - Ubuntu 11.10 > - Linux 3.0.0-16-generic > - Intel(R) Atom(TM) CPU N450 @ 1.66GHz (512K cache) > > Kind regards, > Job > <canary.tgz>_______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> If you compile this with optimizations, then the 'canary()' function should > be totally inlined into the 'main()' function. In that case, the cost of > the stack protectors will be very small compared to the loop.Yes, I know. I'm just really interested in an explanation on how it is possible that the use of canaries results in faster code in the binaries I attached to my original message (which are unoptimized). If you look at the binaries, you see that the bodies of canary() are exactly the same except that in the protected binary, it has some extra stuff in the prologue/epilogue. So, how can it be that a function that does exactly the same plus something extra runs faster? On Friday 09 March 2012 20:17:08 you wrote:> If you compile this with optimizations, then the 'canary()' function should > be totally inlined into the 'main()' function. In that case, the cost of > the stack protectors will be very small compared to the loop. > > -bw > > On Mar 9, 2012, at 2:52 AM, Job Noorman <jobnoorman at gmail.com> wrote: > > I have a question about the performance of the implementation of the stack > > protector in LLVM. > > > > Consider the following C program: > > ====> > void canary() > > { > > > > char buf[20]; > > buf[0]++; > > > > } > > > > int main() > > { > > > > int i; > > for (i = 0; i < 1000000000; ++i) > > > > canary(); > > > > return 0; > > > > } > > ====> > > > This should definately run slower when stack protection is enabled, right? > > > > I have measured the runtime of this program on two different systems > > compiled with GCC and LLVM. Here are the results (percentages are the > > difference with> > > the unprotected version of the program): > > | Desktop | Laptop | > > > > -----+---------+--------+ > > GCC | +13% | +277% | > > LLVM | -3%(!) | +330% | > > > > (These measurements are the median values of 10 runs.) > > > > So the obvious question is: can anybody explain how it is possible that > > using the stack protector causes the program to run 3% faster on my > > desktop? > > > > I have tried profiling the program using valgrind (cachegrind & callgrind) > > but the results show absolutely no reason at all for these measurements. > > > > I have attached an archive with the source code and compiled binaries. > > > > Here are the specs of the two systems: > > * Desktop > > - Ubuntu 11.10 > > - Linux 3.0.0-16-generic-pae > > - Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (2048K cache) > > * Laptop > > - Ubuntu 11.10 > > - Linux 3.0.0-16-generic > > - Intel(R) Atom(TM) CPU N450 @ 1.66GHz (512K cache) > > > > Kind regards, > > Job > > <canary.tgz>_______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev