Khilan Gudka
2015-Mar-20 18:30 UTC
[LLVMdev] New kind of metadata to capture LLVM IR linking structure
Hello all llvm-link merges together the metadata from the IR files being linked together. This means that when linking different libraries together (i.e. multiple source files that have been compiled into a single LLVM IR file) it can be hard or impossible to identify the library boundaries. We're using LLVM to do static analysis of applications (together with their dependent libraries) and have found it useful to be able to determine which library a particular Instruction* or GlobalVariable* came from (e.g. so that we can ignore some of them, or focus analysis on particular ones). To preserve this information across linking, I've implemented a new kind of metadata node MDLLVMModule that records: 1) Which LLVM modules (i.e. LLVM IR file) have been linked into this LLVM module 2) Which compilation units directly contribute to this LLVM module (i.e. that are not part of an LLVM submodule) The format of the metadata looks like this: !llvm.module = !{!0} !0 = !MDLLVMModule(name: "test123.bc", modules: !1, cus: !24) !1 = !{!2} !2 = !MDLLVMModule(name: "test12.bc", cus: !3) !3 = !{!4, !18} !4 = !MDCompileUnit(... filename: "test1.c" ...) !18 = !MDCompileUnit(... filename: "test2.c" ...) !24 = !{!25} !25 = !MDCompileUnit(... filename: "test3.c" ...) Each linked LLVM module has the named metadata node "llvm.module" that points to its own MDLLVMModule node. In this example, we see that this is the metadata for llvm module "test123.bc" that is built up from linking module "test12.bc" and the compilation unit "test3.c." Module "test12.bc" itself is built up by linking the compilation units "test1.c" and "test2.c" The name of a module defaults to the base filename of the output file, but this can be overridden with the (also new) command-line flag -module-name to llvm-link, as in: llvm-link -module-name=mytest -o test.bc <files> I thought this might be useful to the wider LLVM community and would like to see this added to LLVM. I have attached a patch that I produced against r232466. I've also added a corresponding DILLVMModule class. I would be very grateful if someone could review this. Thanks -- Khilan Gudka Research Associate Security Group Computer Laboratory University of Cambridge http://www.cl.cam.ac.uk/~kg365/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150320/29a8a5d6/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm.module.metadata.patch Type: application/octet-stream Size: 18503 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150320/29a8a5d6/attachment.obj>
David Blaikie
2015-Mar-20 18:39 UTC
[LLVMdev] New kind of metadata to capture LLVM IR linking structure
On Fri, Mar 20, 2015 at 11:30 AM, Khilan Gudka <Khilan.Gudka at cl.cam.ac.uk> wrote:> Hello all > > llvm-link merges together the metadata from the IR files being linked > together. This means that when linking different libraries together (i.e. > multiple source files that have been compiled into a single LLVM IR file) > it can be hard or impossible to identify the library boundaries. > > We're using LLVM to do static analysis of applications (together with > their dependent libraries) and have found it useful to be able to determine > which library a particular Instruction* or GlobalVariable* came from (e.g. > so that we can ignore some of them, or focus analysis on particular ones). > > To preserve this information across linking, I've implemented a new kind > of metadata node MDLLVMModule that records: > > 1) Which LLVM modules (i.e. LLVM IR file) have been linked into this LLVM > module > 2) Which compilation units directly contribute to this LLVM module (i.e. > that are not part of an LLVM submodule) > > The format of the metadata looks like this: > > !llvm.module = !{!0} > > !0 = !MDLLVMModule(name: "test123.bc", modules: !1, cus: !24) > !1 = !{!2} > !2 = !MDLLVMModule(name: "test12.bc", cus: !3) > !3 = !{!4, !18} > !4 = !MDCompileUnit(... filename: "test1.c" ...) > !18 = !MDCompileUnit(... filename: "test2.c" ...) > !24 = !{!25} > !25 = !MDCompileUnit(... filename: "test3.c" ...) > > Each linked LLVM module has the named metadata node "llvm.module" that > points > to its own MDLLVMModule node. In this example, we see that this is the > metadata > for llvm module "test123.bc" that is built up from linking module > "test12.bc" > and the compilation unit "test3.c." Module "test12.bc" itself is built up > by linking the compilation units "test1.c" and "test2.c" > > The name of a module defaults to the base filename of the output file, but > this can be overridden with the (also new) command-line flag -module-name > to llvm-link, as in: > > llvm-link -module-name=mytest -o test.bc <files> > > I thought this might be useful to the wider LLVM community and would like > to see this added to LLVM. > > I have attached a patch that I produced against r232466. I've also added a > corresponding DILLVMModule class. >What's the benefit/purpose of the MDLLVMModule over just having the MDCompileUnits themselves? I would imagine the user cares about which source file the problem was in (obtained from the MDCompileUnit), not the sequence of BC modules that may've been built into?> > I would be very grateful if someone could review this. > > Thanks > > -- > Khilan Gudka > Research Associate > Security Group > Computer Laboratory > University of Cambridge > http://www.cl.cam.ac.uk/~kg365/ > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150320/be608702/attachment.html>
Khilan Gudka
2015-Mar-23 15:15 UTC
[LLVMdev] New kind of metadata to capture LLVM IR linking structure
Hi David Thanks for your email. What's the benefit/purpose of the MDLLVMModule over just having the> MDCompileUnits themselves? I would imagine the user cares about which > source file the problem was in (obtained from the MDCompileUnit), not the > sequence of BC modules that may've been built into? >We envisage it to be useful when an analysis tool built using LLVM needs to know which MDCompileUnits were part of a particular library that has been linked in. For instance, we're currently analysing the sandboxing behaviour within the Chromium web browser, which comprises hundreds of internal libraries and many external ones. To be able to perform this analysis we have to link them all together into a single .bc/.ll file. Having the module structure allows us to model interactions between different modules (without manually (and sometimes unreliably) having to work out which source file corresponds to which library (e.g. libssl, libpci, libpolicy, librenderer, etc)). It also allows an analysis tool to support turning on/off output warnings for particular libraries (as they can lead to a lot of analysis output).> > >> >> I would be very grateful if someone could review this. >> >> Thanks >> >> -- >> Khilan Gudka >> Research Associate >> Security Group >> Computer Laboratory >> University of Cambridge >> http://www.cl.cam.ac.uk/~kg365/ >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150323/f3587f6b/attachment.html>
Apparently Analagous Threads
- [LLVMdev] New kind of metadata to capture LLVM IR linking structure
- [LLVMdev] New kind of metadata to capture LLVM IR linking structure
- [LLVMdev] RFC: Metadata attachments to function definitions
- [LLVMdev] RFC: Metadata attachments to function definitions
- [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR