DeadMG
2015-Jul-19 10:48 UTC
[LLVMdev] llvm::Linker incorrectly fails to link in all aspects of the source module
I've got some code using the LLVM linker. When I link one module into another, the linker fails to correctly represent all the aspects of the source module. Specifically, I've observed that types whch are structurally equivalent get merged together, even though they're explicitly named types and not unnamed structural types. Here's my reproducing case. I have the source and the output IR. #pragma warning(push, 0) #include <llvm/ExecutionEngine/GenericValue.h> #include <llvm/ExecutionEngine/MCJIT.h> #include <llvm/ExecutionEngine/ExecutionEngine.h> #include <llvm/Support/Program.h> #include <llvm/Support/FileSystem.h> #include <llvm/Support/DynamicLibrary.h> #include <llvm/IR/Verifier.h> #include <llvm/IR/Type.h> #include <llvm/IR/DerivedTypes.h> #include <llvm/IR/IRBuilder.h> #include <llvm/Transforms/Utils/Cloning.h> #include <llvm/Linker/Linker.h> #include <llvm/Support/raw_ostream.h> #pragma warning(pop) std::string printModule(llvm::Module& module) { std::string mod_ir; llvm::raw_string_ostream stream(mod_ir); module.print(stream, nullptr); stream.flush(); return mod_ir; } int main() { llvm::LLVMContext con; llvm::Module src("in", con); llvm::Module dest("out", con); auto srcb1 = llvm::StructType::create(con, std::vector<llvm::Type*>{ llvm::PointerType::getInt8PtrTy(con) }, "srcb1"); auto srcb2 = llvm::StructType::create(con, std::vector<llvm::Type*>{ llvm::PointerType::getInt8PtrTy(con) }, "srcb2"); auto srcty = llvm::StructType::create(con, std::vector<llvm::Type*>{ srcb1, srcb2 }, "srcty"); auto func = llvm::Function::Create(llvm::FunctionType::get(srcty, {}, false), llvm::GlobalValue::LinkageTypes::ExternalLinkage, "srcfunc", &src); llvm::BasicBlock* entries llvm::BasicBlock::Create(func->getParent()->getContext(), "entry", func); llvm::IRBuilder<> allocabuilder(entries); auto insert allocabuilder.CreateInsertValue(llvm::ConstantAggregateZero::get(srcty), llvm::ConstantAggregateZero::get(srcb1), { 0 }); allocabuilder.CreateRet(insert); auto before = printModule(src); auto clone = std::unique_ptr<llvm::Module>(llvm::CloneModule(&src)); llvm::Linker::LinkModules(&dest, clone.get()); auto after = printModule(dest); if (before != after) __debugbreak(); } // Before: ; ModuleID = 'in' %srcty = type { %srcb1, %srcb2 } %srcb1 = type { i8* } %srcb2 = type { i8* } define %srcty @srcfunc() { entry: ret %srcty zeroinitializer } // After: ; ModuleID = 'out' %srcty = type { %srcb1, %srcb1 } %srcb1 = type { i8* } define %srcty @srcfunc() { entry: ret %srcty zeroinitializer } You can see in before and after that the two structurally equivalent but distinct named types, srcb1 and srcb2, were merged. After a bit of discussion on #llvm, it was suggested that this is intended behaviour. If so, this is terribly broken. For one thing, my code depends on looking up types from the module by name. So far it just so happens that I don't have any test cases that look up structurally equivalent types after linking by name, but it certainly could occur for some user inputs for my compiler. Secondly, it's much more difficult for me to determine what is going on in this IR. In my compiler then I strictly generate one LLVM type for various types in the source code. If the compiler is broken for any reason, and I look at the IR output, then I expect to see this. If I don't see this, then I think the compiler is broken. I just spent three days trying to figure out why on earth my compiler was not generating the types correctly, when it was all along. And it's much more difficult to interpret the outcome when the IR no longer distinguishes between the two logically completely distinct types that just happen to have the same IR representation. Fundamentally, LLVM should never mutate the contents of the module unless it's explicitly requested, because the programmer depends on properties of the IR that are more than just binary equivalence. Moving the contents of one module into another module is no exception. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150719/c57a3012/attachment.html>
Jason Koenig
2015-Jul-20 17:25 UTC
[LLVMdev] llvm::Linker incorrectly fails to link in all aspects of the source module
>From http://blog.llvm.org/2011/11/llvm-30-type-system-rewrite.html:As in LLVM 2.9, type names are not really designed to be used as semantic> information in IR: we expect everything to continue working if the -strip > pass is used to remove all extraneous names from the IR. However, for > research and other purposes, it can sometimes be a convenient hack to > propagate information from a front-end into LLVM IR by using type names. > This will work reliably in LLVM 3.0 (so long as you don't run the strip > pass or something equivalent) because identified types aren't uniqued. > However, be aware that the suffix can be added and write your code to > tolerate it. > A more robust way to be able to identify a specific type in the optimizer > (or some other point after the frontend has run) is to use a named metadata > node to find the type. For example, if you want to find the %foo type, you > could generate IR that looks like this: > %foo = type { ... } > ... > !magic.types = !{ %foo zeroinitializer } > Then to find the "foo" type, you'd just look up the "magic.types" named > metadata, and get the type of the first element. Even if type names are > stripped or types get auto-renamed, the type of the first element will > always be correct and stable.-Jason Koenig On Sun, Jul 19, 2015 at 3:48 AM, DeadMG <wolfeinstein at gmail.com> wrote:> I've got some code using the LLVM linker. When I link one module into > another, the linker fails to correctly represent all the aspects of the > source module. Specifically, I've observed that types whch are structurally > equivalent get merged together, even though they're explicitly named types > and not unnamed structural types. > > Here's my reproducing case. I have the source and the output IR. > > #pragma warning(push, 0) > #include <llvm/ExecutionEngine/GenericValue.h> > #include <llvm/ExecutionEngine/MCJIT.h> > #include <llvm/ExecutionEngine/ExecutionEngine.h> > #include <llvm/Support/Program.h> > #include <llvm/Support/FileSystem.h> > #include <llvm/Support/DynamicLibrary.h> > #include <llvm/IR/Verifier.h> > #include <llvm/IR/Type.h> > #include <llvm/IR/DerivedTypes.h> > #include <llvm/IR/IRBuilder.h> > #include <llvm/Transforms/Utils/Cloning.h> > #include <llvm/Linker/Linker.h> > #include <llvm/Support/raw_ostream.h> > #pragma warning(pop) > > std::string printModule(llvm::Module& module) { > std::string mod_ir; > llvm::raw_string_ostream stream(mod_ir); > module.print(stream, nullptr); > stream.flush(); > return mod_ir; > } > int main() { > llvm::LLVMContext con; > llvm::Module src("in", con); > llvm::Module dest("out", con); > auto srcb1 = llvm::StructType::create(con, std::vector<llvm::Type*>{ > llvm::PointerType::getInt8PtrTy(con) }, "srcb1"); > auto srcb2 = llvm::StructType::create(con, std::vector<llvm::Type*>{ > llvm::PointerType::getInt8PtrTy(con) }, "srcb2"); > auto srcty = llvm::StructType::create(con, std::vector<llvm::Type*>{ > srcb1, srcb2 }, "srcty"); > auto func = llvm::Function::Create(llvm::FunctionType::get(srcty, {}, > false), llvm::GlobalValue::LinkageTypes::ExternalLinkage, "srcfunc", &src); > llvm::BasicBlock* entries > llvm::BasicBlock::Create(func->getParent()->getContext(), "entry", func); > llvm::IRBuilder<> allocabuilder(entries); > auto insert > allocabuilder.CreateInsertValue(llvm::ConstantAggregateZero::get(srcty), > llvm::ConstantAggregateZero::get(srcb1), { 0 }); > allocabuilder.CreateRet(insert); > > auto before = printModule(src); > auto clone = std::unique_ptr<llvm::Module>(llvm::CloneModule(&src)); > llvm::Linker::LinkModules(&dest, clone.get()); > auto after = printModule(dest); > if (before != after) > __debugbreak(); > } > > // Before: > > ; ModuleID = 'in' > > %srcty = type { %srcb1, %srcb2 } > %srcb1 = type { i8* } > %srcb2 = type { i8* } > > define %srcty @srcfunc() { > entry: > ret %srcty zeroinitializer > } > > // After: > > ; ModuleID = 'out' > > %srcty = type { %srcb1, %srcb1 } > %srcb1 = type { i8* } > > define %srcty @srcfunc() { > entry: > ret %srcty zeroinitializer > } > > You can see in before and after that the two structurally equivalent but > distinct named types, srcb1 and srcb2, were merged. After a bit of > discussion on #llvm, it was suggested that this is intended behaviour. If > so, this is terribly broken. > > For one thing, my code depends on looking up types from the module by > name. So far it just so happens that I don't have any test cases that look > up structurally equivalent types after linking by name, but it certainly > could occur for some user inputs for my compiler. > > Secondly, it's much more difficult for me to determine what is going on in > this IR. In my compiler then I strictly generate one LLVM type for various > types in the source code. If the compiler is broken for any reason, and I > look at the IR output, then I expect to see this. If I don't see this, then > I think the compiler is broken. I just spent three days trying to figure > out why on earth my compiler was not generating the types correctly, when > it was all along. And it's much more difficult to interpret the outcome > when the IR no longer distinguishes between the two logically completely > distinct types that just happen to have the same IR representation. > > Fundamentally, LLVM should never mutate the contents of the module unless > it's explicitly requested, because the programmer depends on properties of > the IR that are more than just binary equivalence. Moving the contents of > one module into another module is no exception. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150720/15210a13/attachment.html>