thr3ads.net - llvm dev - [llvm-dev] A sufficient test for GV unification via unnamed

If this information is useful, please help other people find it:
Share via:

Christian Convey via llvm-dev

2015-Nov-09 19:42 UTC

[llvm-dev] A sufficient test for GV unification via unnamed_addr ?

Does anyone know of a good test to estimate whether or not a pair of
GlobalVariables could potentially be unified due to the "unnamed_addr"
flag? This is for an out-of-source AA project, and I need to err on the
side of assuming unification is possible. But for precision reasons, I'd
really like to avoid false positives.

I'm having trouble understanding just how equivalent two GV's
initalizers
must be for the linker to be allowed to unify them. Here's what the docs
<http://llvm.org/docs/LangRef.html#global-variables> say:

Global variables can be marked with unnamed_addr which indicates that
the> address is not significant, only the content. Constants marked like this
> can be merged with other constants if they have the same initializer. Note
> that a constant with significant address *can* be merged with a
> unnamed_addr constant, the result being a constant whose address is
> significant.
>From that wording, I'm having trouble figuring out just how similar twoGVs' initializers must be before the linker is considered free to unify the
GVs' storage. I've got a few theories, but would appreciate any
suggestions. I'm hoping for an overall test which is both precise, and not
too computationally intensive on a program with very many globals.

- Theory 1: At the LLVM C++ API level, GV1 and GV2 can only be unified
if their initializer is the very same API object. I.e.,
"GV1->getInitializer() == GV2->getInitializer()." For this
to be a
sufficient test, I think there would need to be some strong promises by the
C++ API implementation regarding using a single object to represent equal
or equivalent initial values.

- Theory 2: The linker requires that the initializers for GV1 and GV2
are *syntactically* equivalent compile-time constants, but their
initializers might not be described using the same llvm::Constant object.
For example:

@GV1 = private unnamed_addr constant [4 x i8] c"Foo\00", align 1
@GV2 = private constant [4 x i8] c"Foo\00", align 1

- Theory 3: Type-safe compile-time-constant semantic equivalence, but
unlike Theory 2, allows for syntactically alternative representations. All
that matters is that two initializer objects are equivalently typed,
equivalently shaped, and ultimately have equivalent constituent scalar
values.

@GV1 = unnamed_addr constant [4 x i32] zeroinitializer, align 16
@GV2 = constant [4 x i32] [i32 0, i32 0, i32 0, i32 0], align 16

- Theory 4: Arbitrary compile-time-constant bit-pattern equivalence.
For example:

@X = constant i32 -1, align 4
@Y = unnamed_addr constant [4 x i8] c"\FF\FF\FF\FF", align 1

Note: I'm using LLVM's 3.7's C++ API. The target program will
ultimately
be linked on modern x86-64 Linux system, *probably* using Gnu ld. The
program is compiled with clang or clang++, and in some cases I've used
"llvm-link" to combine the target-program bitcode files into a single
module. My analysis only considers a single bitcode file in isolation.

Thanks,
Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151109/8f1565eb/attachment.html>

llvm dev - Nov 2015 - A sufficient test for GV unification via unnamed_addr ?

[llvm-dev] A sufficient test for GV unification via unnamed_addr ?