Peter Collingbourne via llvm-dev
2016-Oct-26  19:04 UTC
[llvm-dev] RFC: APIs for bitcode files containing multiple modules
On Tue, Oct 25, 2016 at 8:36 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:> > On Oct 25, 2016, at 6:28 PM, Peter Collingbourne <peter at pcc.me.uk> wrote: > > Hi all, > > As mentioned in my recent RFC entitled "RFC: a more detailed design for > ThinLTO + vcall CFI" I would like to introduce the ability for bitcode > files to contain multiple modules. In https://reviews.llvm.org/D24786 I > took a step towards that by proposing a change to the module format so that > the block info block is stored at the top level. The next step is to think > about what the API would look like for reading and writing multiple modules. > > Here's what I have in mind. To create a multi-module bitcode file, you > would create a BitcodeWriter object and add modules to it: > > BitcodeWriter W(OS); > W.addModule(M1); > W.addModule(M2); > W.write(); > > > That requires the two modules to lives longer than the bitcode write, the > API could be: > > BitcodeWriter W(OS); > W.writeModule(M1); > // delete M1 > // ... > // create M2 > W.writeModule(M2); > > (Maybe you had this in mind, but the API naming didn’t reflect it so I’m > not sure). >In the API I prototyped, I took the maximum BitsRequiredForTypeIndices value from all the modules, and used it to produce the abbreviations for the top level block info block (without this I was seeing "Unexpected abbrev ordering!" errors in the bitcode writer as a result of emitting the "same" abbreviation multiple times). That would have required us to keep the modules around until the call to write(). However, let me revisit this, because it does not seem necessary (i.e. we can just continue to emit block info blocks within the module block except with different abbreviation numbers for each module).> Reading a multi-module bitcode file would be supported with a > BitcodeReader class. Each of the functional reader APIs in ReaderWriter.h > would have a member function on BitcodeReader. We would also have a next() > member function which would move to the next module in the file. For > example: > > BitcodeReader R(MBRef); > Expected<bool> B = R.hasGlobalValueSummary(); > std::unique_ptr<Module> M1 = R.getLazyModule(Ctx); // lazily load the > first module > R.next(); > std::unique_ptr<Module> M2 = R.parseBitcodeFile(Ctx); // eagerly load the > second module > > > > That makes the API quite stateful, you may have good implementation reason > for this, but they’re not clear to me. > I rather see the bitcode reader as a random access container, iterating > over modules. >Random access seems reasonable to me as well. I will see how feasible that is. Thanks, -- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161026/a5465bf1/attachment.html>
Will Dietz via llvm-dev
2016-Oct-28  13:11 UTC
[llvm-dev] RFC: APIs for bitcode files containing multiple modules
On Wed, Oct 26, 2016 at 2:04 PM, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org> wrote:> On Tue, Oct 25, 2016 at 8:36 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: >> >> >> On Oct 25, 2016, at 6:28 PM, Peter Collingbourne <peter at pcc.me.uk> wrote: >> >> Hi all, >> >> As mentioned in my recent RFC entitled "RFC: a more detailed design for >> ThinLTO + vcall CFI" I would like to introduce the ability for bitcode files >> to contain multiple modules. In https://reviews.llvm.org/D24786 I took a >> step towards that by proposing a change to the module format so that the >> block info block is stored at the top level. The next step is to think about >> what the API would look like for reading and writing multiple modules. >> >> Here's what I have in mind. To create a multi-module bitcode file, you >> would create a BitcodeWriter object and add modules to it: >> >> BitcodeWriter W(OS); >> W.addModule(M1); >> W.addModule(M2); >> W.write(); >> >> >> That requires the two modules to lives longer than the bitcode write, the >> API could be: >> >> BitcodeWriter W(OS); >> W.writeModule(M1); >> // delete M1 >> // ... >> // create M2 >> W.writeModule(M2); >> >> (Maybe you had this in mind, but the API naming didn’t reflect it so I’m >> not sure). > > > In the API I prototyped, I took the maximum BitsRequiredForTypeIndices value > from all the modules, and used it to produce the abbreviations for the top/ > level block info block (without this I was seeing "Unexpected abbrev > ordering!" errors in the bitcode writer as a result of emitting the "same" > abbreviation multiple times). That would have required us to keep the > modules around until the call to write(). However, let me revisit this, > because it does not seem necessary (i.e. we can just continue to emit block > info blocks within the module block except with different abbreviation > numbers for each module). >> >> Reading a multi-module bitcode file would be supported with a >> BitcodeReader class. Each of the functional reader APIs in ReaderWriter.h >> would have a member function on BitcodeReader. We would also have a next() >> member function which would move to the next module in the file. For >> example: >> >> BitcodeReader R(MBRef); >> Expected<bool> B = R.hasGlobalValueSummary();What's this used for? Would there be a "readGlobalValueSummary()" similar to function summaries?>> std::unique_ptr<Module> M1 = R.getLazyModule(Ctx); // lazily load the >> first module >> R.next(); >> std::unique_ptr<Module> M2 = R.parseBitcodeFile(Ctx); // eagerly load the >> second moduleI'm very excited about the idea of storing multiple modules in a bitcode file, and the (thin)LTO and CFI goodness you're building using it. I have a few questions about where you're going if you don't mind--and it's related to the API in that it's awfully hard to judge an API without knowing what it's expected to be used for or what the underlying data represents. On that-- I'm sorry if I've missed this information, but reading through your RFC's and posts I'm not finding the answer. Is there a definition/explanation of what it means to have a bitcode file containing multiple modules? Is this a storage optimization where each module is what today is an "llvm::Module" but we're encoding them into a single file for efficiency/convenience reasons? If so, can these modules have different triples? Different ("conflicting") definitions for a global? There are also multiple tools that take bitcode as input, and currently expect a single module. Will these be made to reject multiple-module bitcode, and if not is the plan to extend tools to handle multiple-module files? Beyond the random access suggestion (+1) and lifetime comments, it seems like there should be a way to reason about the contents of these modules--names, identifiers, flags, *something* so that "load the first module lazily and the second eagerly" can become "load the module containing my CFI information eagerly but the rest lazily" or something, or at least to check that this file was created using -fsanitize=cfi and not something else. Anyway sorry for all the questions and thanks for your efforts, looking forward to using this in the near future! :)>> >> >> >> That makes the API quite stateful, you may have good implementation reason >> for this, but they’re not clear to me. >> I rather see the bitcode reader as a random access container, iterating >> over modules. > > > Random access seems reasonable to me as well. I will see how feasible that > is. > > Thanks, > -- > -- > Peter > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Peter Collingbourne via llvm-dev
2016-Oct-28  19:06 UTC
[llvm-dev] RFC: APIs for bitcode files containing multiple modules
On Fri, Oct 28, 2016 at 6:11 AM, Will Dietz <willdtz at gmail.com> wrote:> On Wed, Oct 26, 2016 at 2:04 PM, Peter Collingbourne via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > On Tue, Oct 25, 2016 at 8:36 PM, Mehdi Amini <mehdi.amini at apple.com> > wrote: > >> > >> > >> On Oct 25, 2016, at 6:28 PM, Peter Collingbourne <peter at pcc.me.uk> > wrote: > >> > >> Hi all, > >> > >> As mentioned in my recent RFC entitled "RFC: a more detailed design for > >> ThinLTO + vcall CFI" I would like to introduce the ability for bitcode > files > >> to contain multiple modules. In https://reviews.llvm.org/D24786 I took > a > >> step towards that by proposing a change to the module format so that the > >> block info block is stored at the top level. The next step is to think > about > >> what the API would look like for reading and writing multiple modules. > >> > >> Here's what I have in mind. To create a multi-module bitcode file, you > >> would create a BitcodeWriter object and add modules to it: > >> > >> BitcodeWriter W(OS); > >> W.addModule(M1); > >> W.addModule(M2); > >> W.write(); > >> > >> > >> That requires the two modules to lives longer than the bitcode write, > the > >> API could be: > >> > >> BitcodeWriter W(OS); > >> W.writeModule(M1); > >> // delete M1 > >> // ... > >> // create M2 > >> W.writeModule(M2); > >> > >> (Maybe you had this in mind, but the API naming didn’t reflect it so I’m > >> not sure). > > > > > > In the API I prototyped, I took the maximum BitsRequiredForTypeIndices > value > > from all the modules, and used it to produce the abbreviations for the > top/ > > level block info block (without this I was seeing "Unexpected abbrev > > ordering!" errors in the bitcode writer as a result of emitting the > "same" > > abbreviation multiple times). That would have required us to keep the > > modules around until the call to write(). However, let me revisit this, > > because it does not seem necessary (i.e. we can just continue to emit > block > > info blocks within the module block except with different abbreviation > > numbers for each module). > >> > >> Reading a multi-module bitcode file would be supported with a > >> BitcodeReader class. Each of the functional reader APIs in > ReaderWriter.h > >> would have a member function on BitcodeReader. We would also have a > next() > >> member function which would move to the next module in the file. For > >> example: > >> > >> BitcodeReader R(MBRef); > >> Expected<bool> B = R.hasGlobalValueSummary(); > > What's this used for?This would be the equivalent to the existing llvm::hasGlobalValueSummary() function, which currently controls whether we compile a module with regular LTO or with ThinLTO. Would there be a "readGlobalValueSummary()"> similar to function summaries? >There would be a getModuleSummaryIndex() which again would be similar to llvm::getModuleSummaryIndex(). Note that the module summary already covers all global values, not just functions.>> std::unique_ptr<Module> M1 = R.getLazyModule(Ctx); // lazily load the > >> first module > >> R.next(); > >> std::unique_ptr<Module> M2 = R.parseBitcodeFile(Ctx); // eagerly load > the > >> second module > > I'm very excited about the idea of storing multiple modules in a > bitcode file, and the (thin)LTO and CFI goodness you're building using > it. > > I have a few questions about where you're going if you don't mind--and > it's related to the API in that it's awfully hard to judge an API > without knowing what it's expected to be used for or what the > underlying data represents. > > On that-- I'm sorry if I've missed this information, but reading > through your RFC's and posts I'm not finding the answer. > Is there a definition/explanation of what it means to have a bitcode > file containing multiple modules? > > Is this a storage optimization where each module is what today is an > "llvm::Module" but we're encoding them into a single file for > efficiency/convenience reasons? >Yes, each module would be an llvm::Module. This is more for convenience reasons -- it's the simplest way to split modules that use CFI into a regular LTO part and a ThinLTO part (as described in the RFC entitled "RFC: a more detailed design for ThinLTO + vcall CFI") while storing the entire compiled translation unit in a single file. If so, can these modules have different triples? That would certainly be possible in principle, but it's not part of my use case. I'd imagine that another potential use case for this could be to allow for LTO when targeting heterogeneous architectures (e.g. CUDA/OpenMP), but I'm not sure about the specifics of how that could work.> Different ("conflicting") definitions for a global? >In principle such inputs would be rejected by the linker with a duplicate symbol error. That might not be the appropriate thing to do in the heterogeneous case though. There are also multiple tools that take bitcode as input, and> currently expect a single module. > Will these be made to reject multiple-module bitcode, and if not is > the plan to extend tools to handle multiple-module files? >For testing purposes I was planning to extend llvm-dis (and possibly opt) to take a flag specifying a module index, and introduce an llvm-join tool which could be used to create a bitcode from multiple inputs. The other tools probably don't need to know about this and could just read the first module. Beyond the random access suggestion (+1) and lifetime comments, it> seems like there should be a way to reason about the contents of these > modules--names, identifiers, flags, *something* so that "load the > first module lazily and the second eagerly" can become "load the > module containing my CFI information eagerly but the rest lazily" or > something, or at least to check that this file was created using > -fsanitize=cfi and not something else. >Right, this is the sort of functionality that would be provided by functions such as hasGlobalValueSummary(). Thanks, -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161028/7fa383e1/attachment.html>
Apparently Analagous Threads
- RFC: APIs for bitcode files containing multiple modules
- RFC: APIs for bitcode files containing multiple modules
- RFC: APIs for bitcode files containing multiple modules
- RFC: APIs for bitcode files containing multiple modules
- RFC: APIs for bitcode files containing multiple modules