Why does LLVM struggle to optimize the following stack direction check? #include <stdint.h> inline int stack_direction() { int x = 0; int y = 0; return uintptr_t(&x) < uintptr_t(&y); } int main(int argc, const char * argv[]) { return stack_direction(); } It generates the following assembly: main: # @main lea rcx, [rsp - 8] lea rdx, [rsp - 4] xor eax, eax cmp rdx, rcx setb al ret It seems to me it should be possible, because clearly LLVM knows the layout of x and y at compile time. Shared code: https://godbolt.org/z/ZQKESy <https://t.co/8wdz6ftAm7?amp=1> Thanks Samuel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190625/16e9dc24/attachment.html>
Hi Samuel, On Tue, 25 Jun 2019 at 12:39, Samuel Williams via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Why does LLVM struggle to optimize the following stack direction check?Probably because it's a niche operation. You can't rely on x and y being laid out in any particular order even during multiple calls to stack_direction within a single compilation, so there's really not much you can do with such a comparison. That makes it a rare case. On the LLVM side, the optimization isn't something that would happen naturally through generic analyses so someone would have to sit down specifically to write it into LLVM; and no-one has. The only way I can think of to reliably detect the direction is using an __attribute__((noinline)) function to compare locals from two different, nested frames (even that's iffy though on a semantic level). If there turned out to be a compelling enough use-case, an intrinsic could be added to get the result more efficiently. Cheers. Tim.
On Tue, 25 Jun 2019 at 13:22, Tim Northover <t.p.northover at gmail.com> wrote:> The only way I can think of to reliably detect the direction is using > an __attribute__((noinline)) function to compare locals from two > different, nested frames (even that's iffy though on a semantic > level). If there turned out to be a compelling enough use-case, an > intrinsic could be added to get the result more efficiently.Actually, (uintptr_t)__builtin_frame_address(0) < (uintptr_t)__builtin_frame_address(1) would probably work too. Cheers. Tim.
Apparently Analagous Threads
- Optimised stack direction?
- [LLVMdev] question about enabling cfl-aa and collecting a57 numbers
- A pattern for portable __builtin_add_overflow()
- [LLVMdev] question about enabling cfl-aa and collecting a57 numbers
- [PATCH -next] vdpasim: remove unused variable 'ret'