Why does LLVM struggle to optimize the following stack direction check?
#include <stdint.h>
inline int stack_direction() {
int x = 0;
int y = 0;
return uintptr_t(&x) < uintptr_t(&y);
}
int main(int argc, const char * argv[]) {
return stack_direction();
}
It generates the following assembly:
main: # @main
lea rcx, [rsp - 8]
lea rdx, [rsp - 4]
xor eax, eax
cmp rdx, rcx
setb al
ret
It seems to me it should be possible, because clearly LLVM knows the layout
of x and y at compile time.
Shared code: https://godbolt.org/z/ZQKESy <https://t.co/8wdz6ftAm7?amp=1>
Thanks
Samuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190625/16e9dc24/attachment.html>
Hi Samuel, On Tue, 25 Jun 2019 at 12:39, Samuel Williams via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Why does LLVM struggle to optimize the following stack direction check?Probably because it's a niche operation. You can't rely on x and y being laid out in any particular order even during multiple calls to stack_direction within a single compilation, so there's really not much you can do with such a comparison. That makes it a rare case. On the LLVM side, the optimization isn't something that would happen naturally through generic analyses so someone would have to sit down specifically to write it into LLVM; and no-one has. The only way I can think of to reliably detect the direction is using an __attribute__((noinline)) function to compare locals from two different, nested frames (even that's iffy though on a semantic level). If there turned out to be a compelling enough use-case, an intrinsic could be added to get the result more efficiently. Cheers. Tim.
On Tue, 25 Jun 2019 at 13:22, Tim Northover <t.p.northover at gmail.com> wrote:> The only way I can think of to reliably detect the direction is using > an __attribute__((noinline)) function to compare locals from two > different, nested frames (even that's iffy though on a semantic > level). If there turned out to be a compelling enough use-case, an > intrinsic could be added to get the result more efficiently.Actually, (uintptr_t)__builtin_frame_address(0) < (uintptr_t)__builtin_frame_address(1) would probably work too. Cheers. Tim.
Apparently Analagous Threads
- Optimised stack direction?
- [LLVMdev] question about enabling cfl-aa and collecting a57 numbers
- A pattern for portable __builtin_add_overflow()
- [LLVMdev] question about enabling cfl-aa and collecting a57 numbers
- [PATCH -next] vdpasim: remove unused variable 'ret'