thr3ads.net - llvm dev - [llvm-dev] Zero length function pointer equality [Jul 2020]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2020-Jul-24 00:46 UTC

[llvm-dev] Zero length function pointer equality

LLVM can produce zero length functions from cases like this (when
optimizations are enabled):

void f1() { __builtin_unreachable(); }
int f2() { /* missing return statement */ }

This code is valid, so long as the functions are never called.

I believe C++ requires that all functions have a distinct address (ie:
&f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
gets optimized into an unconditional assertion failure)

But these zero length functions can end up with identical addresses.

I'm unaware of anything in the C++ spec (or the LLVM langref) that
would indicate that would allow distinct functions to have identical
addresses - so should we do something about this in the LLVM backend?
add a little padding? a nop instruction? (if we're adding an
instruction anyway, perhaps we might as well make it an int3?)

(I came across this due to DWARF issues with zero length functions &
thinking about if/how this should be supported)

Richard Smith via llvm-dev

2020-Jul-24 02:17 UTC

head link

[llvm-dev] Zero length function pointer equality

On Thu, 23 Jul 2020 at 17:46, David Blaikie <dblaikie at gmail.com> wrote:
> LLVM can produce zero length functions from cases like this (when
> optimizations are enabled):
>
> void f1() { __builtin_unreachable(); }
> int f2() { /* missing return statement */ }
>
> This code is valid, so long as the functions are never called.
>
> I believe C++ requires that all functions have a distinct address (ie:
> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> gets optimized into an unconditional assertion failure)
>
> But these zero length functions can end up with identical addresses.
>
> I'm unaware of anything in the C++ spec (or the LLVM langref) that
> would indicate that would allow distinct functions to have identical
> addresses - so should we do something about this in the LLVM backend?
> add a little padding? a nop instruction? (if we're adding an
> instruction anyway, perhaps we might as well make it an int3?)
>
> (I came across this due to DWARF issues with zero length functions &
> thinking about if/how this should be supported)
>
Yes, I think at least if the optimizer turns a non-empty function into an
empty function, that's a miscompile for C and C++ source-language programs.
My (possibly flawed) understanding is that LLVM is obliged to give a
different address to distinct globals if neither of them is marked
unnamed_addr, so it seems to me that this is a backend bug. Generating a
ud2 function body in this case seems ideal to me.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200723/5b3c6548/attachment.html>

David Blaikie via llvm-dev

2020-Jul-24 03:28 UTC

head link

[llvm-dev] Zero length function pointer equality

On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <richard at metafoo.co.uk>
wrote:>
> On Thu, 23 Jul 2020 at 17:46, David Blaikie <dblaikie at gmail.com>
wrote:
>>
>> LLVM can produce zero length functions from cases like this (when
>> optimizations are enabled):
>>
>> void f1() { __builtin_unreachable(); }
>> int f2() { /* missing return statement */ }
>>
>> This code is valid, so long as the functions are never called.
>>
>> I believe C++ requires that all functions have a distinct address (ie:
>> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 ==
f2)
>> gets optimized into an unconditional assertion failure)
>>
>> But these zero length functions can end up with identical addresses.
>>
>> I'm unaware of anything in the C++ spec (or the LLVM langref) that
>> would indicate that would allow distinct functions to have identical
>> addresses - so should we do something about this in the LLVM backend?
>> add a little padding? a nop instruction? (if we're adding an
>> instruction anyway, perhaps we might as well make it an int3?)
>>
>> (I came across this due to DWARF issues with zero length functions
&
>> thinking about if/how this should be supported)
>
>
> Yes, I think at least if the optimizer turns a non-empty function into an
empty function,
What about functions that are already empty? (well, I guess at the
LLVM IR level, no function can be empty, because every basic block
must end in some terminator instruction - is that the distinction
you're drawing?)
> that's a miscompile for C and C++ source-language programs. My
(possibly flawed) understanding is that LLVM is obliged to give a different
address to distinct globals if neither of them is marked unnamed_addr,
It seems like other LLVM passes make this assumption too - which is
how "f1 == f2" can be folded to a constant false. I haven't
checked to
see exactly where that constant folding happens. (hmm, looks like it
happens in some constant folding utility - happens in the inliner if
there's inlining, happens at IR generation if there's no function
indirection, etc)
> so it seems to me that this is a backend bug. Generating a ud2 function
body in this case seems ideal to me.
Guess that still leaves the possibility of the last function in an
object file as being zero-length? (or I guess not, because otherwise
when linked it could still end up with the same address as the
function that comes after it)

David Chisnall via llvm-dev

2020-Jul-24 09:41 UTC

head link

[llvm-dev] Zero length function pointer equality

On 24/07/2020 01:46, David Blaikie via llvm-dev wrote:> I believe C++ requires that all functions have a distinct address (ie:
> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> gets optimized into an unconditional assertion failure)
> 
> But these zero length functions can end up with identical addresses.
> 
> I'm unaware of anything in the C++ spec (or the LLVM langref) that
> would indicate that would allow distinct functions to have identical
> addresses - so should we do something about this in the LLVM backend?
> add a little padding? a nop instruction? (if we're adding an
> instruction anyway, perhaps we might as well make it an int3?)
This is also a problem with identical function merging in the linker, 
which link.exe does quite aggressively.  The special case of zero-length 
functions seems less common than the more general case of merging, in 
both cases you will end up with a single implementation in the binary 
that has two symbols for the same address.  For example, consider the 
following trivial program:

#include <stdio.h>

int a()
{
         return 42;
}

int b()
{
         return 42;
}

int main()
{
         printf("a == b? %d\n", a == b);
         return 0;
}

Compiled with cl.exe /Gy, this prints:

a == b? 1

Given that functions are immutable, it's a somewhat odd decision at the 
abstract machine level to assume that they have identity that is 
distinct from their value (though it can simplify debugging - back 
traces in Windows executables are sometimes quite confusing when you see 
a call into a function that is structurally correct but nominally 
incorrect).

Given that link.exe can happily violate this guarantee in the general 
case, I'm not too concerned that LLVM can violate it in the special 
case.  From the perspective of a programmer, I'm not sure what kind of 
logic would be broken by function equality returning true when two 
functions with different names but identical behaviour are invoked.  I'm 
curious if you have any examples.

David

Hans Wennborg via llvm-dev

2020-Jul-24 13:16 UTC

head link

[llvm-dev] [cfe-dev] Zero length function pointer equality

Maybe we can just expand this to always apply: https://reviews.llvm.org/D32330

On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
<cfe-dev at lists.llvm.org> wrote:>
> LLVM can produce zero length functions from cases like this (when
> optimizations are enabled):
>
> void f1() { __builtin_unreachable(); }
> int f2() { /* missing return statement */ }
>
> This code is valid, so long as the functions are never called.
>
> I believe C++ requires that all functions have a distinct address (ie:
> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> gets optimized into an unconditional assertion failure)
>
> But these zero length functions can end up with identical addresses.
>
> I'm unaware of anything in the C++ spec (or the LLVM langref) that
> would indicate that would allow distinct functions to have identical
> addresses - so should we do something about this in the LLVM backend?
> add a little padding? a nop instruction? (if we're adding an
> instruction anyway, perhaps we might as well make it an int3?)
>
> (I came across this due to DWARF issues with zero length functions &
> thinking about if/how this should be supported)
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

David Blaikie via llvm-dev

2020-Jul-25 01:36 UTC

head link

[llvm-dev] Zero length function pointer equality

On Fri, Jul 24, 2020 at 2:42 AM David Chisnall via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> On 24/07/2020 01:46, David Blaikie via llvm-dev wrote:
> > I believe C++ requires that all functions have a distinct address (ie:
> > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1
== f2)
> > gets optimized into an unconditional assertion failure)
> >
> > But these zero length functions can end up with identical addresses.
> >
> > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > would indicate that would allow distinct functions to have identical
> > addresses - so should we do something about this in the LLVM backend?
> > add a little padding? a nop instruction? (if we're adding an
> > instruction anyway, perhaps we might as well make it an int3?)
>
> This is also a problem with identical function merging in the linker,
> which link.exe does quite aggressively.
Yeah, though that's a choice of the Windows linker to be
non-conforming (& can be disabled), both with the LLVM IR semantics
and the C++ semantics - which doesn't necessarily mean Clang and LLVM
should also be non-conforming.
> The special case of zero-length
> functions seems less common than the more general case of merging,
On Windows, to be sure - on Linux, for instance, not as much.
> in
> both cases you will end up with a single implementation in the binary
> that has two symbols for the same address.  For example, consider the
> following trivial program:
>
> #include <stdio.h>
>
> int a()
> {
>          return 42;
> }
>
> int b()
> {
>          return 42;
> }
>
> int main()
> {
>          printf("a == b? %d\n", a == b);
>          return 0;
> }
>
> Compiled with cl.exe /Gy, this prints:
>
> a == b? 1
>
> Given that functions are immutable, it's a somewhat odd decision at the
> abstract machine level to assume that they have identity that is
> distinct from their value (though it can simplify debugging - back
> traces in Windows executables are sometimes quite confusing when you see
> a call into a function that is structurally correct but nominally
> incorrect).
Yep, when I used to work on Windows myself and my teammates disabled
the linker feature to make development/debugging/backtraces easier to
read.

I think there's value in LLVM's decision here - for debuggability, and
correctly implementing C++ semantics. I don't think it'd be great if
we went the other direction (defining LLVM IR to have no naming
importance - so that merging two LLVM modules could merge function
implementations and redirect function calls to the singular remaining
instance). Opt-in, maybe (I guess you could opt-in by marking all
functions unnamed_addr - indeed that's why unnamed_addr was
introduced, I think, to allow identical code folding to be implemented
in a way that was correct for C++).
> Given that link.exe can happily violate this guarantee in the general
> case, I'm not too concerned that LLVM can violate it in the special
> case.  From the perspective of a programmer, I'm not sure what kind of
> logic would be broken by function equality returning true when two
> functions with different names but identical behaviour are invoked. 
I'm
> curious if you have any examples.
I don't have any concrete examples of C++ code that depends on pointer
inequality between zero-length functions, no. (though we do lots of
work to make Clang conforming in other ways even without code that
requires such conformance)

Richard Smith via llvm-dev

2020-Jul-25 01:39 UTC

head link

[llvm-dev] Zero length function pointer equality

On Fri, 24 Jul 2020 at 02:42, David Chisnall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 24/07/2020 01:46, David Blaikie via llvm-dev wrote:
> > I believe C++ requires that all functions have a distinct address (ie:
> > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1
== f2)
> > gets optimized into an unconditional assertion failure)
> >
> > But these zero length functions can end up with identical addresses.
> >
> > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > would indicate that would allow distinct functions to have identical
> > addresses - so should we do something about this in the LLVM backend?
> > add a little padding? a nop instruction? (if we're adding an
> > instruction anyway, perhaps we might as well make it an int3?)
>
> This is also a problem with identical function merging in the linker,
> which link.exe does quite aggressively.  The special case of zero-length
> functions seems less common than the more general case of merging, in
> both cases you will end up with a single implementation in the binary
> that has two symbols for the same address.  For example, consider the
> following trivial program:
>
> #include <stdio.h>
>
> int a()
> {
>          return 42;
> }
>
> int b()
> {
>          return 42;
> }
>
> int main()
> {
>          printf("a == b? %d\n", a == b);
>          return 0;
> }
>
> Compiled with cl.exe /Gy, this prints:
>
> a == b? 1
>
> Given that functions are immutable, it's a somewhat odd decision at the
> abstract machine level to assume that they have identity that is
> distinct from their value (though it can simplify debugging - back
> traces in Windows executables are sometimes quite confusing when you see
> a call into a function that is structurally correct but nominally
> incorrect).
>
> Given that link.exe can happily violate this guarantee in the general
> case, I'm not too concerned that LLVM can violate it in the special
> case.  From the perspective of a programmer, I'm not sure what kind of
> logic would be broken by function equality returning true when two
> functions with different names but identical behaviour are invoked. 
I'm
> curious if you have any examples.
>
This is a well-known conformance-violating bug in link.exe; LLVM should not
be making things worse by introducing a similar bug itself. Smarter linkers
(for example, I think both lld and gold) will do identical function
combining only if all but one of the function symbols is only used as the
target of calls (and not to actually observe the address). And yes, this
non-conforming behavior (rarely) breaks things in practice. See this
research paper:
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36912.pdf

> David
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200724/33901ddc/attachment-0001.html>

David Blaikie via llvm-dev

2020-Jul-25 01:39 UTC

head link

[llvm-dev] [cfe-dev] Zero length function pointer equality

Looks perfect to me!

well, a couple of questions: Why a noop, rather than int3/ud2/etc?
Might be worth using the existing code that places such an instruction
when building at -O0?
& you mention that this causes problems on Windows - but ICF done by
the Windows linker does not cause such problems? (I'd have thought
they'd result in the same situation - two functions described as being
at the same address?) is there a quick summary of why those two cases
turn out differently?

On Fri, Jul 24, 2020 at 6:17 AM Hans Wennborg <hans at chromium.org>
wrote:>
> Maybe we can just expand this to always apply:
https://reviews.llvm.org/D32330
>
> On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
> >
> > LLVM can produce zero length functions from cases like this (when
> > optimizations are enabled):
> >
> > void f1() { __builtin_unreachable(); }
> > int f2() { /* missing return statement */ }
> >
> > This code is valid, so long as the functions are never called.
> >
> > I believe C++ requires that all functions have a distinct address (ie:
> > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1
== f2)
> > gets optimized into an unconditional assertion failure)
> >
> > But these zero length functions can end up with identical addresses.
> >
> > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > would indicate that would allow distinct functions to have identical
> > addresses - so should we do something about this in the LLVM backend?
> > add a little padding? a nop instruction? (if we're adding an
> > instruction anyway, perhaps we might as well make it an int3?)
> >
> > (I came across this due to DWARF issues with zero length functions
&
> > thinking about if/how this should be supported)
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Jul 2020 - Zero length function pointer equality

[llvm-dev] Zero length function pointer equality

[llvm-dev] Zero length function pointer equality

[llvm-dev] Zero length function pointer equality

[llvm-dev] Zero length function pointer equality

[llvm-dev] [cfe-dev] Zero length function pointer equality

[llvm-dev] Zero length function pointer equality

[llvm-dev] Zero length function pointer equality

[llvm-dev] [cfe-dev] Zero length function pointer equality

Possibly Parallel Threads