David Blaikie via llvm-dev
2020-Jul-24 00:46 UTC
[llvm-dev] Zero length function pointer equality
LLVM can produce zero length functions from cases like this (when optimizations are enabled): void f1() { __builtin_unreachable(); } int f2() { /* missing return statement */ } This code is valid, so long as the functions are never called. I believe C++ requires that all functions have a distinct address (ie: &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) gets optimized into an unconditional assertion failure) But these zero length functions can end up with identical addresses. I'm unaware of anything in the C++ spec (or the LLVM langref) that would indicate that would allow distinct functions to have identical addresses - so should we do something about this in the LLVM backend? add a little padding? a nop instruction? (if we're adding an instruction anyway, perhaps we might as well make it an int3?) (I came across this due to DWARF issues with zero length functions & thinking about if/how this should be supported)
Richard Smith via llvm-dev
2020-Jul-24 02:17 UTC
[llvm-dev] Zero length function pointer equality
On Thu, 23 Jul 2020 at 17:46, David Blaikie <dblaikie at gmail.com> wrote:> LLVM can produce zero length functions from cases like this (when > optimizations are enabled): > > void f1() { __builtin_unreachable(); } > int f2() { /* missing return statement */ } > > This code is valid, so long as the functions are never called. > > I believe C++ requires that all functions have a distinct address (ie: > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) > gets optimized into an unconditional assertion failure) > > But these zero length functions can end up with identical addresses. > > I'm unaware of anything in the C++ spec (or the LLVM langref) that > would indicate that would allow distinct functions to have identical > addresses - so should we do something about this in the LLVM backend? > add a little padding? a nop instruction? (if we're adding an > instruction anyway, perhaps we might as well make it an int3?) > > (I came across this due to DWARF issues with zero length functions & > thinking about if/how this should be supported) >Yes, I think at least if the optimizer turns a non-empty function into an empty function, that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr, so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200723/5b3c6548/attachment.html>
David Blaikie via llvm-dev
2020-Jul-24 03:28 UTC
[llvm-dev] Zero length function pointer equality
On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <richard at metafoo.co.uk> wrote:> > On Thu, 23 Jul 2020 at 17:46, David Blaikie <dblaikie at gmail.com> wrote: >> >> LLVM can produce zero length functions from cases like this (when >> optimizations are enabled): >> >> void f1() { __builtin_unreachable(); } >> int f2() { /* missing return statement */ } >> >> This code is valid, so long as the functions are never called. >> >> I believe C++ requires that all functions have a distinct address (ie: >> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) >> gets optimized into an unconditional assertion failure) >> >> But these zero length functions can end up with identical addresses. >> >> I'm unaware of anything in the C++ spec (or the LLVM langref) that >> would indicate that would allow distinct functions to have identical >> addresses - so should we do something about this in the LLVM backend? >> add a little padding? a nop instruction? (if we're adding an >> instruction anyway, perhaps we might as well make it an int3?) >> >> (I came across this due to DWARF issues with zero length functions & >> thinking about if/how this should be supported) > > > Yes, I think at least if the optimizer turns a non-empty function into an empty function,What about functions that are already empty? (well, I guess at the LLVM IR level, no function can be empty, because every basic block must end in some terminator instruction - is that the distinction you're drawing?)> that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,It seems like other LLVM passes make this assumption too - which is how "f1 == f2" can be folded to a constant false. I haven't checked to see exactly where that constant folding happens. (hmm, looks like it happens in some constant folding utility - happens in the inliner if there's inlining, happens at IR generation if there's no function indirection, etc)> so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.Guess that still leaves the possibility of the last function in an object file as being zero-length? (or I guess not, because otherwise when linked it could still end up with the same address as the function that comes after it)
David Chisnall via llvm-dev
2020-Jul-24 09:41 UTC
[llvm-dev] Zero length function pointer equality
On 24/07/2020 01:46, David Blaikie via llvm-dev wrote:> I believe C++ requires that all functions have a distinct address (ie: > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) > gets optimized into an unconditional assertion failure) > > But these zero length functions can end up with identical addresses. > > I'm unaware of anything in the C++ spec (or the LLVM langref) that > would indicate that would allow distinct functions to have identical > addresses - so should we do something about this in the LLVM backend? > add a little padding? a nop instruction? (if we're adding an > instruction anyway, perhaps we might as well make it an int3?)This is also a problem with identical function merging in the linker, which link.exe does quite aggressively. The special case of zero-length functions seems less common than the more general case of merging, in both cases you will end up with a single implementation in the binary that has two symbols for the same address. For example, consider the following trivial program: #include <stdio.h> int a() { return 42; } int b() { return 42; } int main() { printf("a == b? %d\n", a == b); return 0; } Compiled with cl.exe /Gy, this prints: a == b? 1 Given that functions are immutable, it's a somewhat odd decision at the abstract machine level to assume that they have identity that is distinct from their value (though it can simplify debugging - back traces in Windows executables are sometimes quite confusing when you see a call into a function that is structurally correct but nominally incorrect). Given that link.exe can happily violate this guarantee in the general case, I'm not too concerned that LLVM can violate it in the special case. From the perspective of a programmer, I'm not sure what kind of logic would be broken by function equality returning true when two functions with different names but identical behaviour are invoked. I'm curious if you have any examples. David
Hans Wennborg via llvm-dev
2020-Jul-24 13:16 UTC
[llvm-dev] [cfe-dev] Zero length function pointer equality
Maybe we can just expand this to always apply: https://reviews.llvm.org/D32330 On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev <cfe-dev at lists.llvm.org> wrote:> > LLVM can produce zero length functions from cases like this (when > optimizations are enabled): > > void f1() { __builtin_unreachable(); } > int f2() { /* missing return statement */ } > > This code is valid, so long as the functions are never called. > > I believe C++ requires that all functions have a distinct address (ie: > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) > gets optimized into an unconditional assertion failure) > > But these zero length functions can end up with identical addresses. > > I'm unaware of anything in the C++ spec (or the LLVM langref) that > would indicate that would allow distinct functions to have identical > addresses - so should we do something about this in the LLVM backend? > add a little padding? a nop instruction? (if we're adding an > instruction anyway, perhaps we might as well make it an int3?) > > (I came across this due to DWARF issues with zero length functions & > thinking about if/how this should be supported) > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
David Blaikie via llvm-dev
2020-Jul-25 01:36 UTC
[llvm-dev] Zero length function pointer equality
On Fri, Jul 24, 2020 at 2:42 AM David Chisnall via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > On 24/07/2020 01:46, David Blaikie via llvm-dev wrote: > > I believe C++ requires that all functions have a distinct address (ie: > > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) > > gets optimized into an unconditional assertion failure) > > > > But these zero length functions can end up with identical addresses. > > > > I'm unaware of anything in the C++ spec (or the LLVM langref) that > > would indicate that would allow distinct functions to have identical > > addresses - so should we do something about this in the LLVM backend? > > add a little padding? a nop instruction? (if we're adding an > > instruction anyway, perhaps we might as well make it an int3?) > > This is also a problem with identical function merging in the linker, > which link.exe does quite aggressively.Yeah, though that's a choice of the Windows linker to be non-conforming (& can be disabled), both with the LLVM IR semantics and the C++ semantics - which doesn't necessarily mean Clang and LLVM should also be non-conforming.> The special case of zero-length > functions seems less common than the more general case of merging,On Windows, to be sure - on Linux, for instance, not as much.> in > both cases you will end up with a single implementation in the binary > that has two symbols for the same address. For example, consider the > following trivial program: > > #include <stdio.h> > > int a() > { > return 42; > } > > int b() > { > return 42; > } > > int main() > { > printf("a == b? %d\n", a == b); > return 0; > } > > Compiled with cl.exe /Gy, this prints: > > a == b? 1 > > Given that functions are immutable, it's a somewhat odd decision at the > abstract machine level to assume that they have identity that is > distinct from their value (though it can simplify debugging - back > traces in Windows executables are sometimes quite confusing when you see > a call into a function that is structurally correct but nominally > incorrect).Yep, when I used to work on Windows myself and my teammates disabled the linker feature to make development/debugging/backtraces easier to read. I think there's value in LLVM's decision here - for debuggability, and correctly implementing C++ semantics. I don't think it'd be great if we went the other direction (defining LLVM IR to have no naming importance - so that merging two LLVM modules could merge function implementations and redirect function calls to the singular remaining instance). Opt-in, maybe (I guess you could opt-in by marking all functions unnamed_addr - indeed that's why unnamed_addr was introduced, I think, to allow identical code folding to be implemented in a way that was correct for C++).> Given that link.exe can happily violate this guarantee in the general > case, I'm not too concerned that LLVM can violate it in the special > case. From the perspective of a programmer, I'm not sure what kind of > logic would be broken by function equality returning true when two > functions with different names but identical behaviour are invoked. I'm > curious if you have any examples.I don't have any concrete examples of C++ code that depends on pointer inequality between zero-length functions, no. (though we do lots of work to make Clang conforming in other ways even without code that requires such conformance)
Richard Smith via llvm-dev
2020-Jul-25 01:39 UTC
[llvm-dev] Zero length function pointer equality
On Fri, 24 Jul 2020 at 02:42, David Chisnall via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 24/07/2020 01:46, David Blaikie via llvm-dev wrote: > > I believe C++ requires that all functions have a distinct address (ie: > > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) > > gets optimized into an unconditional assertion failure) > > > > But these zero length functions can end up with identical addresses. > > > > I'm unaware of anything in the C++ spec (or the LLVM langref) that > > would indicate that would allow distinct functions to have identical > > addresses - so should we do something about this in the LLVM backend? > > add a little padding? a nop instruction? (if we're adding an > > instruction anyway, perhaps we might as well make it an int3?) > > This is also a problem with identical function merging in the linker, > which link.exe does quite aggressively. The special case of zero-length > functions seems less common than the more general case of merging, in > both cases you will end up with a single implementation in the binary > that has two symbols for the same address. For example, consider the > following trivial program: > > #include <stdio.h> > > int a() > { > return 42; > } > > int b() > { > return 42; > } > > int main() > { > printf("a == b? %d\n", a == b); > return 0; > } > > Compiled with cl.exe /Gy, this prints: > > a == b? 1 > > Given that functions are immutable, it's a somewhat odd decision at the > abstract machine level to assume that they have identity that is > distinct from their value (though it can simplify debugging - back > traces in Windows executables are sometimes quite confusing when you see > a call into a function that is structurally correct but nominally > incorrect). > > Given that link.exe can happily violate this guarantee in the general > case, I'm not too concerned that LLVM can violate it in the special > case. From the perspective of a programmer, I'm not sure what kind of > logic would be broken by function equality returning true when two > functions with different names but identical behaviour are invoked. I'm > curious if you have any examples. >This is a well-known conformance-violating bug in link.exe; LLVM should not be making things worse by introducing a similar bug itself. Smarter linkers (for example, I think both lld and gold) will do identical function combining only if all but one of the function symbols is only used as the target of calls (and not to actually observe the address). And yes, this non-conforming behavior (rarely) breaks things in practice. See this research paper: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36912.pdf> David > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200724/33901ddc/attachment-0001.html>
David Blaikie via llvm-dev
2020-Jul-25 01:39 UTC
[llvm-dev] [cfe-dev] Zero length function pointer equality
Looks perfect to me! well, a couple of questions: Why a noop, rather than int3/ud2/etc? Might be worth using the existing code that places such an instruction when building at -O0? & you mention that this causes problems on Windows - but ICF done by the Windows linker does not cause such problems? (I'd have thought they'd result in the same situation - two functions described as being at the same address?) is there a quick summary of why those two cases turn out differently? On Fri, Jul 24, 2020 at 6:17 AM Hans Wennborg <hans at chromium.org> wrote:> > Maybe we can just expand this to always apply: https://reviews.llvm.org/D32330 > > On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev > <cfe-dev at lists.llvm.org> wrote: > > > > LLVM can produce zero length functions from cases like this (when > > optimizations are enabled): > > > > void f1() { __builtin_unreachable(); } > > int f2() { /* missing return statement */ } > > > > This code is valid, so long as the functions are never called. > > > > I believe C++ requires that all functions have a distinct address (ie: > > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) > > gets optimized into an unconditional assertion failure) > > > > But these zero length functions can end up with identical addresses. > > > > I'm unaware of anything in the C++ spec (or the LLVM langref) that > > would indicate that would allow distinct functions to have identical > > addresses - so should we do something about this in the LLVM backend? > > add a little padding? a nop instruction? (if we're adding an > > instruction anyway, perhaps we might as well make it an int3?) > > > > (I came across this due to DWARF issues with zero length functions & > > thinking about if/how this should be supported) > > _______________________________________________ > > cfe-dev mailing list > > cfe-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reasonably Related Threads
- [cfe-dev] Zero length function pointer equality
- Zero length function pointer equality
- [LLVMdev] Exploiting 'unreachable' for optimization purposes
- [LLVMdev] Exploiting 'unreachable' for optimization purposes
- distinct DISubprograms hindering sharing inlined subprogram descriptions