Displaying 2 results from an estimated 2 matches for "perfoptim".
2011 Jan 04
0
[LLVMdev] Is PIC code defeating the branch predictor?
...r returns
> in the function because calls and returns no longer are matched.
According to benchmarks by Apple, it's nevertheless faster on modern
x86 processors than the trampoline-based alternative (except maybe on
Atom, as mentioned in another reply): http://lists.apple.com/archives/perfoptimization-dev/2007/Nov/msg00005.html
At the time of that post, Apple's version of GCC still generated
trampolines (hence the remark). They switched that to the above
pattern afterwards.
Jonas
2011 Jan 04
4
[LLVMdev] Is PIC code defeating the branch predictor?
I noticed that we generate code like this for i386 PIC:
calll L0$pb
L0$pb:
popl %eax
movl %eax, -24(%ebp) ## 4-byte Spill
I worry that this defeats the return address prediction for returns in the function because calls and returns no longer are matched.
From Intel's Optimization Reference Manual:
"The return address stack mechanism augments the static and dynamic