Eric Christopher
2015-Mar-23 16:52 UTC
[LLVMdev] New kind of metadata to capture LLVM IR linking structure
On Mon, Mar 23, 2015 at 9:50 AM David Blaikie <dblaikie at gmail.com> wrote:> On Mon, Mar 23, 2015 at 8:15 AM, Khilan Gudka <Khilan.Gudka at cl.cam.ac.uk> > wrote: > >> Hi David >> >> Thanks for your email. >> >> What's the benefit/purpose of the MDLLVMModule over just having the >>> MDCompileUnits themselves? I would imagine the user cares about which >>> source file the problem was in (obtained from the MDCompileUnit), not the >>> sequence of BC modules that may've been built into? >>> >> >> We envisage it to be useful when an analysis tool built using LLVM needs >> to know which MDCompileUnits were part of a particular library that has >> been linked in. For instance, we're currently analysing the sandboxing >> behaviour within the Chromium web browser, which comprises hundreds of >> internal libraries and many external ones. To be able to perform this >> analysis we have to link them all together into a single .bc/.ll file. >> >> Having the module structure allows us to model interactions between >> different modules (without manually (and sometimes unreliably) having to >> work out which source file corresponds to which library (e.g. libssl, >> libpci, libpolicy, librenderer, etc)). It also allows an analysis tool to >> support turning on/off output warnings for particular libraries (as they >> can lead to a lot of analysis output). >> > > Fair enough - I've no idea/opinion on whether that's the right abstraction > (other people with more domain knowledge of analysis infrastructure might > chime in with some thoughts). > > Practically speaking: would directory paths be sufficient? The > MDCompileUnits already have information about where the source file was. > >I agree, this seems very weird. You have very good source location information down to directory/file/line/column for individual instructions in the existing metadata scheme, I'm not sure what this is getting you over that? -eric> > - David > > >> >> >>> >>> >>>> >>>> I would be very grateful if someone could review this. >>>> >>>> Thanks >>>> >>>> -- >>>> Khilan Gudka >>>> Research Associate >>>> Security Group >>>> Computer Laboratory >>>> University of Cambridge >>>> http://www.cl.cam.ac.uk/~kg365/ >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150323/df17d21b/attachment.html>
Khilan Gudka
2015-Mar-23 17:24 UTC
[LLVMdev] New kind of metadata to capture LLVM IR linking structure
Yes we did consider using directory paths to identify libraries, however there are cases where this doesn't work. For example, chromium builds a libcommon which mostly consists of source files from the folder chrome/commom/..., but it also contains the file components/nacl/common/pnacl_types.cc (although, other files in that folder are not part of libcommon). -- Khilan Gudka Research Associate Security Group Computer Laboratory University of Cambridge http://www.cl.cam.ac.uk/~kg365/ On 23 March 2015 at 16:52, Eric Christopher <echristo at gmail.com> wrote:> > > On Mon, Mar 23, 2015 at 9:50 AM David Blaikie <dblaikie at gmail.com> wrote: > >> On Mon, Mar 23, 2015 at 8:15 AM, Khilan Gudka <Khilan.Gudka at cl.cam.ac.uk> >> wrote: >> >>> Hi David >>> >>> Thanks for your email. >>> >>> What's the benefit/purpose of the MDLLVMModule over just having the >>>> MDCompileUnits themselves? I would imagine the user cares about which >>>> source file the problem was in (obtained from the MDCompileUnit), not the >>>> sequence of BC modules that may've been built into? >>>> >>> >>> We envisage it to be useful when an analysis tool built using LLVM needs >>> to know which MDCompileUnits were part of a particular library that has >>> been linked in. For instance, we're currently analysing the sandboxing >>> behaviour within the Chromium web browser, which comprises hundreds of >>> internal libraries and many external ones. To be able to perform this >>> analysis we have to link them all together into a single .bc/.ll file. >>> >>> Having the module structure allows us to model interactions between >>> different modules (without manually (and sometimes unreliably) having to >>> work out which source file corresponds to which library (e.g. libssl, >>> libpci, libpolicy, librenderer, etc)). It also allows an analysis tool to >>> support turning on/off output warnings for particular libraries (as they >>> can lead to a lot of analysis output). >>> >> >> Fair enough - I've no idea/opinion on whether that's the right >> abstraction (other people with more domain knowledge of analysis >> infrastructure might chime in with some thoughts). >> >> Practically speaking: would directory paths be sufficient? The >> MDCompileUnits already have information about where the source file was. >> >> > I agree, this seems very weird. You have very good source location > information down to directory/file/line/column for individual instructions > in the existing metadata scheme, I'm not sure what this is getting you over > that? > > -eric > > >> >> - David >> >> >>> >>> >>>> >>>> >>>>> >>>>> I would be very grateful if someone could review this. >>>>> >>>>> Thanks >>>>> >>>>> -- >>>>> Khilan Gudka >>>>> Research Associate >>>>> Security Group >>>>> Computer Laboratory >>>>> University of Cambridge >>>>> http://www.cl.cam.ac.uk/~kg365/ >>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>>> >>>> >>> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150323/df48c6ad/attachment.html>
Khilan Gudka
2015-Mar-23 17:33 UTC
[LLVMdev] New kind of metadata to capture LLVM IR linking structure
Another example is libbrowser (from chromium) that includes sources files from chrome/browser but also chrome/third_party/mozilla_security_manager. It would be nice to have a way of reliably identifying which compilation units were part of a library, which is what we were trying to achieve with MDLLVMModule but if there are better abstractions then am all for that. -- Khilan Gudka Research Associate Security Group Computer Laboratory University of Cambridge http://www.cl.cam.ac.uk/~kg365/ On 23 March 2015 at 17:24, Khilan Gudka <Khilan.Gudka at cl.cam.ac.uk> wrote:> Yes we did consider using directory paths to identify libraries, however > there are cases where this doesn't work. For example, chromium builds a > libcommon which mostly consists of source files from the folder > chrome/commom/..., but it also contains the > file components/nacl/common/pnacl_types.cc (although, other files in that > folder are not part of libcommon). > > -- > Khilan Gudka > Research Associate > Security Group > Computer Laboratory > University of Cambridge > http://www.cl.cam.ac.uk/~kg365/ > > On 23 March 2015 at 16:52, Eric Christopher <echristo at gmail.com> wrote: > >> >> >> On Mon, Mar 23, 2015 at 9:50 AM David Blaikie <dblaikie at gmail.com> wrote: >> >>> On Mon, Mar 23, 2015 at 8:15 AM, Khilan Gudka <Khilan.Gudka at cl.cam.ac.uk >>> > wrote: >>> >>>> Hi David >>>> >>>> Thanks for your email. >>>> >>>> What's the benefit/purpose of the MDLLVMModule over just having the >>>>> MDCompileUnits themselves? I would imagine the user cares about which >>>>> source file the problem was in (obtained from the MDCompileUnit), not the >>>>> sequence of BC modules that may've been built into? >>>>> >>>> >>>> We envisage it to be useful when an analysis tool built using LLVM >>>> needs to know which MDCompileUnits were part of a particular library that >>>> has been linked in. For instance, we're currently analysing the sandboxing >>>> behaviour within the Chromium web browser, which comprises hundreds of >>>> internal libraries and many external ones. To be able to perform this >>>> analysis we have to link them all together into a single .bc/.ll file. >>>> >>>> Having the module structure allows us to model interactions between >>>> different modules (without manually (and sometimes unreliably) having to >>>> work out which source file corresponds to which library (e.g. libssl, >>>> libpci, libpolicy, librenderer, etc)). It also allows an analysis tool to >>>> support turning on/off output warnings for particular libraries (as they >>>> can lead to a lot of analysis output). >>>> >>> >>> Fair enough - I've no idea/opinion on whether that's the right >>> abstraction (other people with more domain knowledge of analysis >>> infrastructure might chime in with some thoughts). >>> >>> Practically speaking: would directory paths be sufficient? The >>> MDCompileUnits already have information about where the source file was. >>> >>> >> I agree, this seems very weird. You have very good source location >> information down to directory/file/line/column for individual instructions >> in the existing metadata scheme, I'm not sure what this is getting you over >> that? >> >> -eric >> >> >>> >>> - David >>> >>> >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> I would be very grateful if someone could review this. >>>>>> >>>>>> Thanks >>>>>> >>>>>> -- >>>>>> Khilan Gudka >>>>>> Research Associate >>>>>> Security Group >>>>>> Computer Laboratory >>>>>> University of Cambridge >>>>>> http://www.cl.cam.ac.uk/~kg365/ >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150323/3ae94a2f/attachment.html>
Duncan P. N. Exon Smith
2015-Mar-23 20:14 UTC
[LLVMdev] New kind of metadata to capture LLVM IR linking structure
> On 2015-Mar-23, at 09:52, Eric Christopher <echristo at gmail.com> wrote: > > > > On Mon, Mar 23, 2015 at 9:50 AM David Blaikie <dblaikie at gmail.com> wrote: > On Mon, Mar 23, 2015 at 8:15 AM, Khilan Gudka <Khilan.Gudka at cl.cam.ac.uk> wrote: > Hi David > > Thanks for your email. > > What's the benefit/purpose of the MDLLVMModule over just having the MDCompileUnits themselves? I would imagine the user cares about which source file the problem was in (obtained from the MDCompileUnit), not the sequence of BC modules that may've been built into? > > We envisage it to be useful when an analysis tool built using LLVM needs to know which MDCompileUnits were part of a particular library that has been linked in. For instance, we're currently analysing the sandboxing behaviour within the Chromium web browser, which comprises hundreds of internal libraries and many external ones. To be able to perform this analysis we have to link them all together into a single .bc/.ll file. > > Having the module structure allows us to model interactions between different modules (without manually (and sometimes unreliably) having to work out which source file corresponds to which library (e.g. libssl, libpci, libpolicy, librenderer, etc)). It also allows an analysis tool to support turning on/off output warnings for particular libraries (as they can lead to a lot of analysis output). > > Fair enough - I've no idea/opinion on whether that's the right abstraction (other people with more domain knowledge of analysis infrastructure might chime in with some thoughts). > > Practically speaking: would directory paths be sufficient? The MDCompileUnits already have information about where the source file was. > > > I agree, this seems very weird. You have very good source location information down to directory/file/line/column for individual instructions in the existing metadata scheme, I'm not sure what this is getting you over that? >Seems weird to me too. Moreover, this isn't really debug info, and it's not clear that it's generally useful, so adding first-class support for it via specialized metadata nodes seems premature.> -eric > > > - David > > > > > I would be very grateful if someone could review this. > > Thanks > > -- > Khilan Gudka > Research Associate > Security Group > Computer Laboratory > University of Cambridge > http://www.cl.cam.ac.uk/~kg365/ > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > _______________________________________________ > llvm-commits mailing list > llvm-commits at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
Khilan Gudka
2015-Mar-23 20:50 UTC
[LLVMdev] New kind of metadata to capture LLVM IR linking structure
Hi all I appreciate the feedback and it looks like recording the information using MD nodes may not have been the right choice. I quite like the idea of having the build system dump the library information. Thanks Khilan On 23 Mar 2015 20:15, "Duncan P. N. Exon Smith" <dexonsmith at apple.com> wrote:> > > On 2015-Mar-23, at 09:52, Eric Christopher <echristo at gmail.com> wrote: > > > > > > > > On Mon, Mar 23, 2015 at 9:50 AM David Blaikie <dblaikie at gmail.com> > wrote: > > On Mon, Mar 23, 2015 at 8:15 AM, Khilan Gudka <Khilan.Gudka at cl.cam.ac.uk> > wrote: > > Hi David > > > > Thanks for your email. > > > > What's the benefit/purpose of the MDLLVMModule over just having the > MDCompileUnits themselves? I would imagine the user cares about which > source file the problem was in (obtained from the MDCompileUnit), not the > sequence of BC modules that may've been built into? > > > > We envisage it to be useful when an analysis tool built using LLVM needs > to know which MDCompileUnits were part of a particular library that has > been linked in. For instance, we're currently analysing the sandboxing > behaviour within the Chromium web browser, which comprises hundreds of > internal libraries and many external ones. To be able to perform this > analysis we have to link them all together into a single .bc/.ll file. > > > > Having the module structure allows us to model interactions between > different modules (without manually (and sometimes unreliably) having to > work out which source file corresponds to which library (e.g. libssl, > libpci, libpolicy, librenderer, etc)). It also allows an analysis tool to > support turning on/off output warnings for particular libraries (as they > can lead to a lot of analysis output). > > > > Fair enough - I've no idea/opinion on whether that's the right > abstraction (other people with more domain knowledge of analysis > infrastructure might chime in with some thoughts). > > > > Practically speaking: would directory paths be sufficient? The > MDCompileUnits already have information about where the source file was. > > > > > > I agree, this seems very weird. You have very good source location > information down to directory/file/line/column for individual instructions > in the existing metadata scheme, I'm not sure what this is getting you over > that? > > > > Seems weird to me too. > > Moreover, this isn't really debug info, and it's not clear that it's > generally useful, so adding first-class support for it via specialized > metadata nodes seems premature. > > > -eric > > > > > > - David > > > > > > > > > > I would be very grateful if someone could review this. > > > > Thanks > > > > -- > > Khilan Gudka > > Research Associate > > Security Group > > Computer Laboratory > > University of Cambridge > > http://www.cl.cam.ac.uk/~kg365/ > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > > llvm-commits mailing list > > llvm-commits at cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150323/2e1481de/attachment.html>