thr3ads.net - llvm dev - [llvm-dev] Zero length function pointer equality [Jul 2020]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2020-Jul-24 03:28 UTC

[llvm-dev] Zero length function pointer equality

On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <richard at metafoo.co.uk>
wrote:>
> On Thu, 23 Jul 2020 at 17:46, David Blaikie <dblaikie at gmail.com>
wrote:
>>
>> LLVM can produce zero length functions from cases like this (when
>> optimizations are enabled):
>>
>> void f1() { __builtin_unreachable(); }
>> int f2() { /* missing return statement */ }
>>
>> This code is valid, so long as the functions are never called.
>>
>> I believe C++ requires that all functions have a distinct address (ie:
>> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 ==
f2)
>> gets optimized into an unconditional assertion failure)
>>
>> But these zero length functions can end up with identical addresses.
>>
>> I'm unaware of anything in the C++ spec (or the LLVM langref) that
>> would indicate that would allow distinct functions to have identical
>> addresses - so should we do something about this in the LLVM backend?
>> add a little padding? a nop instruction? (if we're adding an
>> instruction anyway, perhaps we might as well make it an int3?)
>>
>> (I came across this due to DWARF issues with zero length functions
&
>> thinking about if/how this should be supported)
>
>
> Yes, I think at least if the optimizer turns a non-empty function into an
empty function,
What about functions that are already empty? (well, I guess at the
LLVM IR level, no function can be empty, because every basic block
must end in some terminator instruction - is that the distinction
you're drawing?)
> that's a miscompile for C and C++ source-language programs. My
(possibly flawed) understanding is that LLVM is obliged to give a different
address to distinct globals if neither of them is marked unnamed_addr,
It seems like other LLVM passes make this assumption too - which is
how "f1 == f2" can be folded to a constant false. I haven't
checked to
see exactly where that constant folding happens. (hmm, looks like it
happens in some constant folding utility - happens in the inliner if
there's inlining, happens at IR generation if there's no function
indirection, etc)
> so it seems to me that this is a backend bug. Generating a ud2 function
body in this case seems ideal to me.
Guess that still leaves the possibility of the last function in an
object file as being zero-length? (or I guess not, because otherwise
when linked it could still end up with the same address as the
function that comes after it)

Richard Smith via llvm-dev

2020-Jul-24 04:10 UTC

head link

[llvm-dev] Zero length function pointer equality

On Thu, 23 Jul 2020 at 20:28, David Blaikie <dblaikie at gmail.com> wrote:
> On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <richard at
metafoo.co.uk>
> wrote:
> >
> > On Thu, 23 Jul 2020 at 17:46, David Blaikie <dblaikie at
gmail.com> wrote:
> >>
> >> LLVM can produce zero length functions from cases like this (when
> >> optimizations are enabled):
> >>
> >> void f1() { __builtin_unreachable(); }
> >> int f2() { /* missing return statement */ }
> >>
> >> This code is valid, so long as the functions are never called.
> >>
> >> I believe C++ requires that all functions have a distinct address
(ie:
> >> &f1 != &f2) and LLVM optimizes code on this basis
(assert(f1 == f2)
> >> gets optimized into an unconditional assertion failure)
> >>
> >> But these zero length functions can end up with identical
addresses.
> >>
> >> I'm unaware of anything in the C++ spec (or the LLVM langref)
that
> >> would indicate that would allow distinct functions to have
identical
> >> addresses - so should we do something about this in the LLVM
backend?
> >> add a little padding? a nop instruction? (if we're adding an
> >> instruction anyway, perhaps we might as well make it an int3?)
> >>
> >> (I came across this due to DWARF issues with zero length functions
&
> >> thinking about if/how this should be supported)
> >
> >
> > Yes, I think at least if the optimizer turns a non-empty function into
> an empty function,
>
> What about functions that are already empty? (well, I guess at the
> LLVM IR level, no function can be empty, because every basic block
> must end in some terminator instruction - is that the distinction
> you're drawing?)
>
Here's what I was thinking: a case could be made that the frontend is
responsible for making sure that functions don't start non-empty, in much
the same way that if the frontend produces a global of zero size, it gets
what it asked for.
But you're right, there really isn't such a thing as an empty function
at
the IR level, because there's always an entry block and it always has a
terminator.

> > that's a miscompile for C and C++ source-language programs. My
(possibly
> flawed) understanding is that LLVM is obliged to give a different address
> to distinct globals if neither of them is marked unnamed_addr,
>
> It seems like other LLVM passes make this assumption too - which is
> how "f1 == f2" can be folded to a constant false. I haven't
checked to
> see exactly where that constant folding happens. (hmm, looks like it
> happens in some constant folding utility - happens in the inliner if
> there's inlining, happens at IR generation if there's no function
> indirection, etc)
>
> > so it seems to me that this is a backend bug. Generating a ud2
function
> body in this case seems ideal to me.
>
> Guess that still leaves the possibility of the last function in an
> object file as being zero-length? (or I guess not, because otherwise
> when linked it could still end up with the same address as the
> function that comes after it)
>
Yes, I think that's right. We should never put a non-unnamed_addr global at
the end of a section because we don't know if it will share an address with
another global.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200723/8cffddae/attachment.html>

David Blaikie via llvm-dev

2020-Jul-25 01:27 UTC

head link

[llvm-dev] Zero length function pointer equality

On Thu, Jul 23, 2020 at 9:10 PM Richard Smith <richard at metafoo.co.uk>
wrote:>
> On Thu, 23 Jul 2020 at 20:28, David Blaikie <dblaikie at gmail.com>
wrote:
>>
>> On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <richard at
metafoo.co.uk> wrote:
>> >
>> > On Thu, 23 Jul 2020 at 17:46, David Blaikie <dblaikie at
gmail.com> wrote:
>> >>
>> >> LLVM can produce zero length functions from cases like this
(when
>> >> optimizations are enabled):
>> >>
>> >> void f1() { __builtin_unreachable(); }
>> >> int f2() { /* missing return statement */ }
>> >>
>> >> This code is valid, so long as the functions are never called.
>> >>
>> >> I believe C++ requires that all functions have a distinct
address (ie:
>> >> &f1 != &f2) and LLVM optimizes code on this basis
(assert(f1 == f2)
>> >> gets optimized into an unconditional assertion failure)
>> >>
>> >> But these zero length functions can end up with identical
addresses.
>> >>
>> >> I'm unaware of anything in the C++ spec (or the LLVM
langref) that
>> >> would indicate that would allow distinct functions to have
identical
>> >> addresses - so should we do something about this in the LLVM
backend?
>> >> add a little padding? a nop instruction? (if we're adding
an
>> >> instruction anyway, perhaps we might as well make it an int3?)
>> >>
>> >> (I came across this due to DWARF issues with zero length
functions &
>> >> thinking about if/how this should be supported)
>> >
>> >
>> > Yes, I think at least if the optimizer turns a non-empty function
into an empty function,
>>
>> What about functions that are already empty? (well, I guess at the
>> LLVM IR level, no function can be empty, because every basic block
>> must end in some terminator instruction - is that the distinction
>> you're drawing?)
>
>
> Here's what I was thinking: a case could be made that the frontend is
responsible for making sure that functions don't start non-empty, in much
the same way that if the frontend produces a global of zero size, it gets what
it asked for.
> But you're right, there really isn't such a thing as an empty
function at the IR level, because there's always an entry block and it
always has a terminator.
>
>>
>> > that's a miscompile for C and C++ source-language programs. My
(possibly flawed) understanding is that LLVM is obliged to give a different
address to distinct globals if neither of them is marked unnamed_addr,
>>
>> It seems like other LLVM passes make this assumption too - which is
>> how "f1 == f2" can be folded to a constant false. I
haven't checked to
>> see exactly where that constant folding happens. (hmm, looks like it
>> happens in some constant folding utility - happens in the inliner if
>> there's inlining, happens at IR generation if there's no
function
>> indirection, etc)
>>
>> > so it seems to me that this is a backend bug. Generating a ud2
function body in this case seems ideal to me.
>>
>> Guess that still leaves the possibility of the last function in an
>> object file as being zero-length? (or I guess not, because otherwise
>> when linked it could still end up with the same address as the
>> function that comes after it)
>
>
> Yes, I think that's right. We should never put a non-unnamed_addr
global at the end of a section because we don't know if it will share an
address with another global.
Fair point, we have unnamed_addr that helps distinguish the important
cases - though that does mean addressing this problem wouldn't
coincidentally address my DWARF problem (zero length functions are
weird/problematic in DWARF for a few reasons).

Doesn't mean it isn't worth fixing, though.

llvm dev - Jul 2020 - Zero length function pointer equality

[llvm-dev] Zero length function pointer equality

[llvm-dev] Zero length function pointer equality

[llvm-dev] Zero length function pointer equality