thr3ads.net - llvm dev - [LLVMdev] Disjoint types after reading several modules [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Clemens Hammacher

2012-Jan-31 14:16 UTC

[LLVMdev] Disjoint types after reading several modules

Dear community,

we are currently facing a problem related to the new type system in llvm 
3.0.
Our setting is the following: We have two or more modules, all in the 
same LLVMContext. They are sharing some types, meaning that for example 
functions in different modules are referencing the same (meaning pointer 
identical) type.
Now we write the different modules to the disk, and read them back from 
another program (again into the same LLVMContext).
The problem now is that named structs get duplicated for the different 
modules, meaning that when the second module is read, a new named struct 
is created in the context, and its name gets suffixed by a number.
This is because each module contains its own type table with all the 
types used in that module. When reading in the corresponding bitcode, 
the BitcodeReader explicitly calls StructType::create, without looking 
up in the context whether an equivalent type (even with the same name) 
already exists.
So I think that llvm is behaving correctly here, according to the new 
type system. But for us, the problem is that previously identical types 
are not identical any more after deserialization, which leads to 
problems when copying code between the modules.

So did anyone already stumble across that problem, and solved it? Or is 
there a known solution to it?

Our idea for solving this is to add a named metadata node to each module 
before serializing it to bitcode, in order to identify previously 
identical types after deserialization. The metadata consists of a list 
of constants, where each even entry is a ConstantAggregateZero of a 
named struct, and the succeeding entry is a constant integer uniquely 
identifying that type. We plan to just use the Type* casted to i64.
Then after reading in all modules, we could find the named metadata, 
iterate over its elements and unify all Types which have the same number 
assigned. This would involve recreating and replacing global values, if 
their type changed.
Does this approach sound reasonable to you?

Another option would be to merge all modules together in a new module 
before serialization, prefixing all global values. After deserialization 
of this single module, the types would still be correct, and the module 
could be split up again. But this would require some rearrangements in 
our code since the modules would have to be written out at one single 
point. That's why we discarded that idea for now.

If anything is unclear, I can provide examples.

Thanks for any comments,
Clemens

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6392 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120131/3c65ed6f/attachment.bin>

Chris Lattner

2012-Feb-02 00:15 UTC

head link

[LLVMdev] Disjoint types after reading several modules

On Jan 31, 2012, at 6:16 AM, Clemens Hammacher wrote:> This is because each module contains its own type table with all the types
used in that module. When reading in the corresponding bitcode, the
BitcodeReader explicitly calls StructType::create, without looking up in the
context whether an equivalent type (even with the same name) already exists.
> So I think that llvm is behaving correctly here, according to the new type
system. But for us, the problem is that previously identical types are not
identical any more after deserialization, which leads to problems when copying
code between the modules.
> 
> So did anyone already stumble across that problem, and solved it? Or is
there a known solution to it?
I'm familiar with the scenario, but haven't heard of anyone trying to do
something quite like this.  The linker has to solve the exact same problem (read
multiple .bc files and unify types across them).  This is the impetus behind
TypeMapTy in lib/Linker/LinkModules.cpp.  You'll probably need to do
something like that.
> Our idea for solving this is to add a named metadata node to each module
before serializing it to bitcode, in order to identify previously identical
types after deserialization. The metadata consists of a list of constants, where
each even entry is a ConstantAggregateZero of a named struct, and the succeeding
entry is a constant integer uniquely identifying that type. We plan to just use
the Type* casted to i64.
> Then after reading in all modules, we could find the named metadata,
iterate over its elements and unify all Types which have the same number
assigned. This would involve recreating and replacing global values, if their
type changed.
> Does this approach sound reasonable to you?
I have to ask: why are you writing these modules out as separate bc files?  A
more typical approach would be to write out one big .bc file, and then lazily
read in functions as you need them.  This avoids problems like you're
seeing, and has the advantage of sharing types and constants as well.

-Chris

Clemens Hammacher

2012-Feb-02 12:30 UTC

head link

[LLVMdev] Disjoint types after reading several modules

Hi Chris,
thanks for your answer!

On 2/2/12 1:15 AM, Chris Lattner wrote:> The linker has to solve the exact same problem (read multiple .bc files and
unify types across them). This is the impetus behind TypeMapTy in
lib/Linker/LinkModules.cpp. You'll probably need to do something like that.
I already looked into that. The linker is using the GlobalValues of both
modules to identify the types to unify.
This leads to interesting effects in some cases, but I'll write another
post about this.

Nevertheless the TypeMapTy is a great piece of code, and we will
definitely reuse it to remap duplicated types (and composed types) to
unique ones (via mutateType(), recursively descending to all uses).
> I have to ask: why are you writing these modules out as separate bc files?
I knew that someone would ask that ;)
We need to have separate modules during runtime. One of them contains
the code that is actually JIT compiled and executed, and simultaneously
different optimizations are concurrently (in individual threads)
building up or restructuring new code in their individual "working
modules". Eventually some code will get copied over to the main module
to be executed, and that's why they need to use the same types.

Cheers,
Clemens

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6392 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120202/7d96d23a/attachment.bin>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Jan 2012 - [LLVMdev] Disjoint types after reading several modules

[LLVMdev] Disjoint types after reading several modules

[LLVMdev] Disjoint types after reading several modules

[LLVMdev] Disjoint types after reading several modules

Apparently Analagous Threads