David Blaikie via llvm-dev
2020-Jul-24 03:28 UTC
[llvm-dev] Zero length function pointer equality
On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <richard at metafoo.co.uk> wrote:> > On Thu, 23 Jul 2020 at 17:46, David Blaikie <dblaikie at gmail.com> wrote: >> >> LLVM can produce zero length functions from cases like this (when >> optimizations are enabled): >> >> void f1() { __builtin_unreachable(); } >> int f2() { /* missing return statement */ } >> >> This code is valid, so long as the functions are never called. >> >> I believe C++ requires that all functions have a distinct address (ie: >> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) >> gets optimized into an unconditional assertion failure) >> >> But these zero length functions can end up with identical addresses. >> >> I'm unaware of anything in the C++ spec (or the LLVM langref) that >> would indicate that would allow distinct functions to have identical >> addresses - so should we do something about this in the LLVM backend? >> add a little padding? a nop instruction? (if we're adding an >> instruction anyway, perhaps we might as well make it an int3?) >> >> (I came across this due to DWARF issues with zero length functions & >> thinking about if/how this should be supported) > > > Yes, I think at least if the optimizer turns a non-empty function into an empty function,What about functions that are already empty? (well, I guess at the LLVM IR level, no function can be empty, because every basic block must end in some terminator instruction - is that the distinction you're drawing?)> that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,It seems like other LLVM passes make this assumption too - which is how "f1 == f2" can be folded to a constant false. I haven't checked to see exactly where that constant folding happens. (hmm, looks like it happens in some constant folding utility - happens in the inliner if there's inlining, happens at IR generation if there's no function indirection, etc)> so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.Guess that still leaves the possibility of the last function in an object file as being zero-length? (or I guess not, because otherwise when linked it could still end up with the same address as the function that comes after it)
Richard Smith via llvm-dev
2020-Jul-24 04:10 UTC
[llvm-dev] Zero length function pointer equality
On Thu, 23 Jul 2020 at 20:28, David Blaikie <dblaikie at gmail.com> wrote:> On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <richard at metafoo.co.uk> > wrote: > > > > On Thu, 23 Jul 2020 at 17:46, David Blaikie <dblaikie at gmail.com> wrote: > >> > >> LLVM can produce zero length functions from cases like this (when > >> optimizations are enabled): > >> > >> void f1() { __builtin_unreachable(); } > >> int f2() { /* missing return statement */ } > >> > >> This code is valid, so long as the functions are never called. > >> > >> I believe C++ requires that all functions have a distinct address (ie: > >> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) > >> gets optimized into an unconditional assertion failure) > >> > >> But these zero length functions can end up with identical addresses. > >> > >> I'm unaware of anything in the C++ spec (or the LLVM langref) that > >> would indicate that would allow distinct functions to have identical > >> addresses - so should we do something about this in the LLVM backend? > >> add a little padding? a nop instruction? (if we're adding an > >> instruction anyway, perhaps we might as well make it an int3?) > >> > >> (I came across this due to DWARF issues with zero length functions & > >> thinking about if/how this should be supported) > > > > > > Yes, I think at least if the optimizer turns a non-empty function into > an empty function, > > What about functions that are already empty? (well, I guess at the > LLVM IR level, no function can be empty, because every basic block > must end in some terminator instruction - is that the distinction > you're drawing?) >Here's what I was thinking: a case could be made that the frontend is responsible for making sure that functions don't start non-empty, in much the same way that if the frontend produces a global of zero size, it gets what it asked for. But you're right, there really isn't such a thing as an empty function at the IR level, because there's always an entry block and it always has a terminator.> > that's a miscompile for C and C++ source-language programs. My (possibly > flawed) understanding is that LLVM is obliged to give a different address > to distinct globals if neither of them is marked unnamed_addr, > > It seems like other LLVM passes make this assumption too - which is > how "f1 == f2" can be folded to a constant false. I haven't checked to > see exactly where that constant folding happens. (hmm, looks like it > happens in some constant folding utility - happens in the inliner if > there's inlining, happens at IR generation if there's no function > indirection, etc) > > > so it seems to me that this is a backend bug. Generating a ud2 function > body in this case seems ideal to me. > > Guess that still leaves the possibility of the last function in an > object file as being zero-length? (or I guess not, because otherwise > when linked it could still end up with the same address as the > function that comes after it) >Yes, I think that's right. We should never put a non-unnamed_addr global at the end of a section because we don't know if it will share an address with another global. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200723/8cffddae/attachment.html>
David Blaikie via llvm-dev
2020-Jul-25 01:27 UTC
[llvm-dev] Zero length function pointer equality
On Thu, Jul 23, 2020 at 9:10 PM Richard Smith <richard at metafoo.co.uk> wrote:> > On Thu, 23 Jul 2020 at 20:28, David Blaikie <dblaikie at gmail.com> wrote: >> >> On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <richard at metafoo.co.uk> wrote: >> > >> > On Thu, 23 Jul 2020 at 17:46, David Blaikie <dblaikie at gmail.com> wrote: >> >> >> >> LLVM can produce zero length functions from cases like this (when >> >> optimizations are enabled): >> >> >> >> void f1() { __builtin_unreachable(); } >> >> int f2() { /* missing return statement */ } >> >> >> >> This code is valid, so long as the functions are never called. >> >> >> >> I believe C++ requires that all functions have a distinct address (ie: >> >> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) >> >> gets optimized into an unconditional assertion failure) >> >> >> >> But these zero length functions can end up with identical addresses. >> >> >> >> I'm unaware of anything in the C++ spec (or the LLVM langref) that >> >> would indicate that would allow distinct functions to have identical >> >> addresses - so should we do something about this in the LLVM backend? >> >> add a little padding? a nop instruction? (if we're adding an >> >> instruction anyway, perhaps we might as well make it an int3?) >> >> >> >> (I came across this due to DWARF issues with zero length functions & >> >> thinking about if/how this should be supported) >> > >> > >> > Yes, I think at least if the optimizer turns a non-empty function into an empty function, >> >> What about functions that are already empty? (well, I guess at the >> LLVM IR level, no function can be empty, because every basic block >> must end in some terminator instruction - is that the distinction >> you're drawing?) > > > Here's what I was thinking: a case could be made that the frontend is responsible for making sure that functions don't start non-empty, in much the same way that if the frontend produces a global of zero size, it gets what it asked for. > But you're right, there really isn't such a thing as an empty function at the IR level, because there's always an entry block and it always has a terminator. > >> >> > that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr, >> >> It seems like other LLVM passes make this assumption too - which is >> how "f1 == f2" can be folded to a constant false. I haven't checked to >> see exactly where that constant folding happens. (hmm, looks like it >> happens in some constant folding utility - happens in the inliner if >> there's inlining, happens at IR generation if there's no function >> indirection, etc) >> >> > so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me. >> >> Guess that still leaves the possibility of the last function in an >> object file as being zero-length? (or I guess not, because otherwise >> when linked it could still end up with the same address as the >> function that comes after it) > > > Yes, I think that's right. We should never put a non-unnamed_addr global at the end of a section because we don't know if it will share an address with another global.Fair point, we have unnamed_addr that helps distinguish the important cases - though that does mean addressing this problem wouldn't coincidentally address my DWARF problem (zero length functions are weird/problematic in DWARF for a few reasons). Doesn't mean it isn't worth fixing, though.