Clemens Hammacher
2012-Jan-31 14:16 UTC
[LLVMdev] Disjoint types after reading several modules
Dear community, we are currently facing a problem related to the new type system in llvm 3.0. Our setting is the following: We have two or more modules, all in the same LLVMContext. They are sharing some types, meaning that for example functions in different modules are referencing the same (meaning pointer identical) type. Now we write the different modules to the disk, and read them back from another program (again into the same LLVMContext). The problem now is that named structs get duplicated for the different modules, meaning that when the second module is read, a new named struct is created in the context, and its name gets suffixed by a number. This is because each module contains its own type table with all the types used in that module. When reading in the corresponding bitcode, the BitcodeReader explicitly calls StructType::create, without looking up in the context whether an equivalent type (even with the same name) already exists. So I think that llvm is behaving correctly here, according to the new type system. But for us, the problem is that previously identical types are not identical any more after deserialization, which leads to problems when copying code between the modules. So did anyone already stumble across that problem, and solved it? Or is there a known solution to it? Our idea for solving this is to add a named metadata node to each module before serializing it to bitcode, in order to identify previously identical types after deserialization. The metadata consists of a list of constants, where each even entry is a ConstantAggregateZero of a named struct, and the succeeding entry is a constant integer uniquely identifying that type. We plan to just use the Type* casted to i64. Then after reading in all modules, we could find the named metadata, iterate over its elements and unify all Types which have the same number assigned. This would involve recreating and replacing global values, if their type changed. Does this approach sound reasonable to you? Another option would be to merge all modules together in a new module before serialization, prefixing all global values. After deserialization of this single module, the types would still be correct, and the module could be split up again. But this would require some rearrangements in our code since the modules would have to be written out at one single point. That's why we discarded that idea for now. If anything is unclear, I can provide examples. Thanks for any comments, Clemens -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6392 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120131/3c65ed6f/attachment.bin>
Chris Lattner
2012-Feb-02 00:15 UTC
[LLVMdev] Disjoint types after reading several modules
On Jan 31, 2012, at 6:16 AM, Clemens Hammacher wrote:> This is because each module contains its own type table with all the types used in that module. When reading in the corresponding bitcode, the BitcodeReader explicitly calls StructType::create, without looking up in the context whether an equivalent type (even with the same name) already exists. > So I think that llvm is behaving correctly here, according to the new type system. But for us, the problem is that previously identical types are not identical any more after deserialization, which leads to problems when copying code between the modules. > > So did anyone already stumble across that problem, and solved it? Or is there a known solution to it?I'm familiar with the scenario, but haven't heard of anyone trying to do something quite like this. The linker has to solve the exact same problem (read multiple .bc files and unify types across them). This is the impetus behind TypeMapTy in lib/Linker/LinkModules.cpp. You'll probably need to do something like that.> Our idea for solving this is to add a named metadata node to each module before serializing it to bitcode, in order to identify previously identical types after deserialization. The metadata consists of a list of constants, where each even entry is a ConstantAggregateZero of a named struct, and the succeeding entry is a constant integer uniquely identifying that type. We plan to just use the Type* casted to i64. > Then after reading in all modules, we could find the named metadata, iterate over its elements and unify all Types which have the same number assigned. This would involve recreating and replacing global values, if their type changed. > Does this approach sound reasonable to you?I have to ask: why are you writing these modules out as separate bc files? A more typical approach would be to write out one big .bc file, and then lazily read in functions as you need them. This avoids problems like you're seeing, and has the advantage of sharing types and constants as well. -Chris
Clemens Hammacher
2012-Feb-02 12:30 UTC
[LLVMdev] Disjoint types after reading several modules
Hi Chris, thanks for your answer! On 2/2/12 1:15 AM, Chris Lattner wrote:> The linker has to solve the exact same problem (read multiple .bc files and unify types across them). This is the impetus behind TypeMapTy in lib/Linker/LinkModules.cpp. You'll probably need to do something like that.I already looked into that. The linker is using the GlobalValues of both modules to identify the types to unify. This leads to interesting effects in some cases, but I'll write another post about this. Nevertheless the TypeMapTy is a great piece of code, and we will definitely reuse it to remap duplicated types (and composed types) to unique ones (via mutateType(), recursively descending to all uses).> I have to ask: why are you writing these modules out as separate bc files?I knew that someone would ask that ;) We need to have separate modules during runtime. One of them contains the code that is actually JIT compiled and executed, and simultaneously different optimizations are concurrently (in individual threads) building up or restructuring new code in their individual "working modules". Eventually some code will get copied over to the main module to be executed, and that's why they need to use the same types. Cheers, Clemens -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6392 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120202/7d96d23a/attachment.bin>
Reasonably Related Threads
- [LLVMdev] Disjoint types after reading several modules
- [LLVMdev] Disjoint types after reading several modules
- [LLVMdev] Disjoint types after reading several modules
- [LLVMdev] Disjoint types after reading several modules
- [LLVMdev] Disjoint types after reading several modules