thr3ads.net - llvm dev - [llvm-dev] Question: llvm-link type merge behaviour of c++ classes [May 2020]

If this information is useful, please help other people find it:
Share via:

Björn Fiedler via llvm-dev

2020-May-28 10:33 UTC

[llvm-dev] Question: llvm-link type merge behaviour of c++ classes

Hi LLVM community,

I'd like to ask a question regarding the behavior of llvm-link:

My code contains Classes which are structurally equivalent but they are totally
unrelated and distinct on a c++ point of view.
However, if the compiled IR gets processed by llvm-link, these types are merged
together.
My question is: Is this expected behavior or a bug?

To explain it more in detail, a reduced example follows:

IR code before llvm-link:
```
...
%class.Bakery = type { i32, i32 }
%class.Container = type { i8 }
%class.Rectangle = type <{ %class.Shape, i8, [3 x i8] }>
%class.Shape = type { i32, i32 }
...
define linkonce_odr dso_local void
@_ZN9Container6insertEP5Shape(%class.Container*, %class.Shape*) #1 comdat align
2 { ... }
...
```

IR code after llvm-link:
```
...
%class.Bakery = type { i32, i32 }
%class.Container = type { i8 }
%class.Rectangle = type <{ %class.Bakery, i8, [3 x i8] }>
...
define linkonce_odr dso_local void
@_ZN9Container6insertEP5Shape(%class.Container*, %class.Bakery*) #1 comdat align
2 { ... }
...
```

In this example the `Bakery` and `Shape` types get merged. The type definition
of `Rectangle` reflects this change, too, but from my intuition, they should
stay distinct. I've fond an article from Chris[1] where the new type system
is described. There he states that the name gets part of the type. "This
means that LLVM 3.0 doesn't exhibit the previous confusing behavior where
two seemingly different structs would be printed with the same name."

So, if the name is part of the type and the "confusing behavior" is
removed, why get these types merged? Is this the intended behavior?

My use-case and reason for all this comes from writing an analysis tool for IR
code which gets stuck in finding matching calls for function pointers. The
changing and merging in these types messes up the current logic of finding
matching candidates.

To reproduce this code, find the c++ code below and use the following
invocations:
```
clang++-9 mwe.cc -S -emit-llvm -o before_link.ll
llvm-link-9 -S before_link.ll -o after_link.ll
```

mwe.cc
```
// #include "shapes.h"
// shapes.h content follows
class Shape {
  public:
    int width, height;
};

class Rectangle : public Shape {
  public:
    bool is_square;
};

class Container {
  public:
    void insert(Shape* s){};
};

// end shapes.h

// #include "bakery.h"
// bakery.h content follows:
class Bakery {
  public:
    int num_ovens, num_employees;
};
// end bakery.h

// some instances
Bakery b;

Container c;
Rectangle r;

void do_stuff() { c.insert(&r); }

void bake(Bakery* bakery) {}
```

My system:
```
clang++-9 --version
clang version 9.0.1-12
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin


llvm-link-9 --version
LLVM (http://llvm.org/):
  LLVM version 9.0.1
 
  Optimized build.
  Default target: x86_64-pc-linux-gnu
  Host CPU: skylake
```


Thanks in advance
Björn


[1] http://blog.llvm.org/2011/11/llvm-30-type-system-rewrite.html


-- 
Björn Fiedler, M.Sc. (Scientific Staff)
Leibniz Universität Hannover (LUH)
Fachgebiet System- und Rechnerarchitektur (SRA)
Appelstraße 4
30167 Hannover, Germany

Tel:    +49 511 762-19736
Fax:    +49 511 762-19733
eMail:  fiedler at sra.uni-hannover.de
WWW:    https://www.sra.uni-hannover.de


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200528/237c6f23/attachment.sig>

Alex Denisov via llvm-dev

2020-May-31 12:08 UTC

head link

[llvm-dev] Question: llvm-link type merge behaviour of c++ classes

Hi Björn,

I’m not particularly knowledgeable about the implementation of the llvm-link,
but my guess is that the merging is intended there.
I had a very similar problem recently: basically the loss of information caused
by IRLinker (the implementation behind llvm-link). I tried to dig into it to
find if it’s possible to change the behavior, but stuck pretty quickly since the
code there is not very intuitive (IMHO, of course).

I recall reading somewhere that the intention to merge structurally equivalent
types was to get rid of “duplicated” types, i.e.:

You have a module A and a module B, both include a header with the same type
resulting in the following bitcode:

; Module A
%struct.Shape = type { … }

; Module B
%struct.Shape = type { … }

However, when the modules are loaded in the same LLVM context you will see
something like this:

; Module A
%struct.Shape = type { … }

; Module B
%struct.Shape.1 = type { … }

So in the end if you link them together you'll get:

; Module A+B merged
%struct.Shape = type { … }

Which is desired.
What is not desired, IMO, is the situation that you describe, i.e.: “different”
types being merged regardless of their name.


So, in the end we decided not to use IRLinker at all and merge the types on our
own, you can read more on the approach here:
https://lowlevelbits.org/type-equality-in-llvm/

I’m not sure if that helps at all, but you are definitely not the only one out
there who is confused by the implementation :)
> On 28. May 2020, at 12:33, Björn Fiedler via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi LLVM community,
> 
> I'd like to ask a question regarding the behavior of llvm-link:
> 
> My code contains Classes which are structurally equivalent but they are
totally unrelated and distinct on a c++ point of view.
> However, if the compiled IR gets processed by llvm-link, these types are
merged together.
> My question is: Is this expected behavior or a bug?
> 
> To explain it more in detail, a reduced example follows:
> 
> IR code before llvm-link:
> ```
> ...
> %class.Bakery = type { i32, i32 }
> %class.Container = type { i8 }
> %class.Rectangle = type <{ %class.Shape, i8, [3 x i8] }>
> %class.Shape = type { i32, i32 }
> ...
> define linkonce_odr dso_local void
@_ZN9Container6insertEP5Shape(%class.Container*, %class.Shape*) #1 comdat align
2 { ... }
> ...
> ```
> 
> IR code after llvm-link:
> ```
> ...
> %class.Bakery = type { i32, i32 }
> %class.Container = type { i8 }
> %class.Rectangle = type <{ %class.Bakery, i8, [3 x i8] }>
> ...
> define linkonce_odr dso_local void
@_ZN9Container6insertEP5Shape(%class.Container*, %class.Bakery*) #1 comdat align
2 { ... }
> ...
> ```
> 
> In this example the `Bakery` and `Shape` types get merged. The type
definition of `Rectangle` reflects this change, too, but from my intuition, they
should stay distinct. I've fond an article from Chris[1] where the new type
system is described. There he states that the name gets part of the type.
"This means that LLVM 3.0 doesn't exhibit the previous confusing
behavior where two seemingly different structs would be printed with the same
name."
> 
> So, if the name is part of the type and the "confusing behavior"
is removed, why get these types merged? Is this the intended behavior?
> 
> My use-case and reason for all this comes from writing an analysis tool for
IR code which gets stuck in finding matching calls for function pointers. The
changing and merging in these types messes up the current logic of finding
matching candidates.
> 
> To reproduce this code, find the c++ code below and use the following
invocations:
> ```
> clang++-9 mwe.cc -S -emit-llvm -o before_link.ll
> llvm-link-9 -S before_link.ll -o after_link.ll
> ```
> 
> mwe.cc
> ```
> // #include "shapes.h"
> // shapes.h content follows
> class Shape {
>   public:
>     int width, height;
> };
> 
> class Rectangle : public Shape {
>   public:
>     bool is_square;
> };
> 
> class Container {
>   public:
>     void insert(Shape* s){};
> };
> 
> // end shapes.h
> 
> // #include "bakery.h"
> // bakery.h content follows:
> class Bakery {
>   public:
>     int num_ovens, num_employees;
> };
> // end bakery.h
> 
> // some instances
> Bakery b;
> 
> Container c;
> Rectangle r;
> 
> void do_stuff() { c.insert(&r); }
> 
> void bake(Bakery* bakery) {}
> ```
> 
> My system:
> ```
> clang++-9 --version
> clang version 9.0.1-12
> Target: x86_64-pc-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
> 
> 
> llvm-link-9 --version
> LLVM (http://llvm.org/):
>   LLVM version 9.0.1
>  
>   Optimized build.
>   Default target: x86_64-pc-linux-gnu
>   Host CPU: skylake
> ```
> 
> 
> Thanks in advance
> Björn
> 
> 
> [1] http://blog.llvm.org/2011/11/llvm-30-type-system-rewrite.html
> 
> 
> -- 
> Björn Fiedler, M.Sc. (Scientific Staff)
> Leibniz Universität Hannover (LUH)
> Fachgebiet System- und Rechnerarchitektur (SRA)
> Appelstraße 4
> 30167 Hannover, Germany
> 
> Tel:    +49 511 762-19736
> Fax:    +49 511 762-19733
> eMail:  fiedler at sra.uni-hannover.de
> WWW:    https://www.sra.uni-hannover.de
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

David Blaikie via llvm-dev

2020-May-31 16:57 UTC

head link

[llvm-dev] Question: llvm-link type merge behaviour of c++ classes

Yeah, this is intentional (see the "The Linker "links" types and
retypes IR objects" part of the original 3.0 type rewrite post linked
in your email, Bjorn).

In general, don't rely on type information in LLVM IR for carrying any
semantic information from the source code beyond the practical
semantics of what to load from what offsets, alignment, etc. Those are
the semantics that are part of the IR and must be preserved through
optimizations, etc.

On Sun, May 31, 2020 at 5:09 AM Alex Denisov via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Hi Björn,
>
> I’m not particularly knowledgeable about the implementation of the
llvm-link, but my guess is that the merging is intended there.
> I had a very similar problem recently: basically the loss of information
caused by IRLinker (the implementation behind llvm-link). I tried to dig into it
to find if it’s possible to change the behavior, but stuck pretty quickly since
the code there is not very intuitive (IMHO, of course).
>
> I recall reading somewhere that the intention to merge structurally
equivalent types was to get rid of “duplicated” types, i.e.:
>
> You have a module A and a module B, both include a header with the same
type resulting in the following bitcode:
>
> ; Module A
> %struct.Shape = type { … }
>
> ; Module B
> %struct.Shape = type { … }
>
> However, when the modules are loaded in the same LLVM context you will see
something like this:
>
> ; Module A
> %struct.Shape = type { … }
>
> ; Module B
> %struct.Shape.1 = type { … }
>
> So in the end if you link them together you'll get:
>
> ; Module A+B merged
> %struct.Shape = type { … }
>
> Which is desired.
> What is not desired, IMO, is the situation that you describe, i.e.:
“different” types being merged regardless of their name.
>
>
> So, in the end we decided not to use IRLinker at all and merge the types on
our own, you can read more on the approach here:
https://lowlevelbits.org/type-equality-in-llvm/
>
> I’m not sure if that helps at all, but you are definitely not the only one
out there who is confused by the implementation :)
>
> > On 28. May 2020, at 12:33, Björn Fiedler via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> >
> > Hi LLVM community,
> >
> > I'd like to ask a question regarding the behavior of llvm-link:
> >
> > My code contains Classes which are structurally equivalent but they
are totally unrelated and distinct on a c++ point of view.
> > However, if the compiled IR gets processed by llvm-link, these types
are merged together.
> > My question is: Is this expected behavior or a bug?
> >
> > To explain it more in detail, a reduced example follows:
> >
> > IR code before llvm-link:
> > ```
> > ...
> > %class.Bakery = type { i32, i32 }
> > %class.Container = type { i8 }
> > %class.Rectangle = type <{ %class.Shape, i8, [3 x i8] }>
> > %class.Shape = type { i32, i32 }
> > ...
> > define linkonce_odr dso_local void
@_ZN9Container6insertEP5Shape(%class.Container*, %class.Shape*) #1 comdat align
2 { ... }
> > ...
> > ```
> >
> > IR code after llvm-link:
> > ```
> > ...
> > %class.Bakery = type { i32, i32 }
> > %class.Container = type { i8 }
> > %class.Rectangle = type <{ %class.Bakery, i8, [3 x i8] }>
> > ...
> > define linkonce_odr dso_local void
@_ZN9Container6insertEP5Shape(%class.Container*, %class.Bakery*) #1 comdat align
2 { ... }
> > ...
> > ```
> >
> > In this example the `Bakery` and `Shape` types get merged. The type
definition of `Rectangle` reflects this change, too, but from my intuition, they
should stay distinct. I've fond an article from Chris[1] where the new type
system is described. There he states that the name gets part of the type.
"This means that LLVM 3.0 doesn't exhibit the previous confusing
behavior where two seemingly different structs would be printed with the same
name."
> >
> > So, if the name is part of the type and the "confusing
behavior" is removed, why get these types merged? Is this the intended
behavior?
> >
> > My use-case and reason for all this comes from writing an analysis
tool for IR code which gets stuck in finding matching calls for function
pointers. The changing and merging in these types messes up the current logic of
finding matching candidates.
> >
> > To reproduce this code, find the c++ code below and use the following
invocations:
> > ```
> > clang++-9 mwe.cc -S -emit-llvm -o before_link.ll
> > llvm-link-9 -S before_link.ll -o after_link.ll
> > ```
> >
> > mwe.cc
> > ```
> > // #include "shapes.h"
> > // shapes.h content follows
> > class Shape {
> >   public:
> >     int width, height;
> > };
> >
> > class Rectangle : public Shape {
> >   public:
> >     bool is_square;
> > };
> >
> > class Container {
> >   public:
> >     void insert(Shape* s){};
> > };
> >
> > // end shapes.h
> >
> > // #include "bakery.h"
> > // bakery.h content follows:
> > class Bakery {
> >   public:
> >     int num_ovens, num_employees;
> > };
> > // end bakery.h
> >
> > // some instances
> > Bakery b;
> >
> > Container c;
> > Rectangle r;
> >
> > void do_stuff() { c.insert(&r); }
> >
> > void bake(Bakery* bakery) {}
> > ```
> >
> > My system:
> > ```
> > clang++-9 --version
> > clang version 9.0.1-12
> > Target: x86_64-pc-linux-gnu
> > Thread model: posix
> > InstalledDir: /usr/bin
> >
> >
> > llvm-link-9 --version
> > LLVM (http://llvm.org/):
> >   LLVM version 9.0.1
> >
> >   Optimized build.
> >   Default target: x86_64-pc-linux-gnu
> >   Host CPU: skylake
> > ```
> >
> >
> > Thanks in advance
> > Björn
> >
> >
> > [1] http://blog.llvm.org/2011/11/llvm-30-type-system-rewrite.html
> >
> >
> > --
> > Björn Fiedler, M.Sc. (Scientific Staff)
> > Leibniz Universität Hannover (LUH)
> > Fachgebiet System- und Rechnerarchitektur (SRA)
> > Appelstraße 4
> > 30167 Hannover, Germany
> >
> > Tel:    +49 511 762-19736
> > Fax:    +49 511 762-19733
> > eMail:  fiedler at sra.uni-hannover.de
> > WWW:    https://www.sra.uni-hannover.de
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - May 2020 - Question: llvm-link type merge behaviour of c++ classes

[llvm-dev] Question: llvm-link type merge behaviour of c++ classes

[llvm-dev] Question: llvm-link type merge behaviour of c++ classes

[llvm-dev] Question: llvm-link type merge behaviour of c++ classes