Robinson, Paul
2014-Nov-21 21:31 UTC
[LLVMdev] Using the unused "version" field in the bitcode wrapper (redux)
> Reading the bitcode reader while working on another issues I found > that we already have a version in the bitcode itself (not the darwin > wrapper) and it is used! It is stored with the > bitc::MODULE_CODE_VERSION. It is used to select relative ids, which > impacts the entire bitcode, and so it makes sense to be based on a > version. > > If we ever have a new feature that could not be otherwise detected, > bumping the number is a reasonable way of making sure old versions of > llvm will reject new bitcode instead of misinterpreting it.Right, that version number is used to resolve *ambiguities* in how to interpret some chunk of bitcode. It is not a generic bitcode version scheme, because most bitcode format changes involve things like adding new operands or opcodes, which are easily identified without needing an explicit version number. The scenario I am most concerned about is this: - We as a vendor publish toolchain #12 based on SVN r250000. - During subsequent LLVM development, changes happen (!). For example, a new key letter 'g' in the Data Layout. This is not a bitcode ambiguity so MODULE_CODE_VERSION is unchanged. - We as a vendor publish toolchain #13 based on SVN r300000. - Some middleware provider publishes libIncrediblyUseful.bc built using spiffy new toolchain #13. - Some hapless game developer tries to use libIncrediblyUseful.bc but is still on toolchain #12. This causes an error during some LTO build phase, of course; the question is, what kind of error and how does Hapless Game Developer know what to do? We as compiler developers want to see something along the lines of "unknown data layout specifier." That kind of diagnostic is seriously helpful to the LLVM community, because it describes the actual problem. This does *nothing* for Hapless Game Developer. HGD wants to see "this bitcode file was generated by a newer version, I don't understand how to interpret it" because _that's_ the actual problem. The "actual problem" is context dependent. How can we account for that? Proposed solution: Whether to emit a bitcode wrapper becomes a target-dependent predicate. Bitcode is written by Module, which already has target info attached, so it's a matter of picking some convenient place to keep that info. Initially only Darwin would do this, but it would be a step up from the current explicit triple check. The wrapper has a standard header, same as the current header: - Magic - Version - BitcodeOffset - BitcodeSize The target can supply additional data to put after the header (and before the actual bitcode starts). Darwin would supply the CPUType field like it does now. This is 100% compatible with what exists today, but will be easy to extend for (ahem) other vendors who want wrappers. Any vendor who supports bitcode as a long-lived on-disk format should specify that it wants a wrapper. It is the vendor's responsibility to provide sensible version numbers for successive toolchain releases. The LLVM project does not specify how to come up with version numbers. We default to zero (so Darwin automatically gets its historical value). NOTE: This solution explicitly does NOT solve the "bitcode must be understandable to older toolchains" problem. What it DOES solve is the "older toolchains must provide an easily understood diagnostic when presented with newer bitcode files" problem. Vendor toolchain release scenarios: 1) Releasing based on arbitrary trunk revisions. The vendor's toolchain release number, encoded in to 32 bits, is likely to serve well as the bitcode wrapper version number. If you release strictly from trunk (not release branches) then the SVN revision number from the LLVM repo can also serve this purpose. 2) Releasing strictly based on LLVM releases. Using the LLVM version number, encoded into 32 bits, is a pretty reasonable alternative. Even if you release multiple toolchains from the same LLVM release, the bitcode formats will be the same, so the bitcode wrapper version number can also be the same. --paulr P.S. I think the illustrative example of a new DataLayout specifier would reach an llvm_unreachable, and not emit a proper diagnostic at all. This is part of the generic diagnostics-from-LLVM problem.
Rafael Espíndola
2014-Nov-24 15:53 UTC
[LLVMdev] Using the unused "version" field in the bitcode wrapper (redux)
> Right, that version number is used to resolve *ambiguities* in how to > interpret some chunk of bitcode. It is not a generic bitcode version > scheme, because most bitcode format changes involve things like adding > new operands or opcodes, which are easily identified without needing > an explicit version number.That is what it is used for at the moment. It is is just a number, we can increment it as often as we want.> The scenario I am most concerned about is this: > > - We as a vendor publish toolchain #12 based on SVN r250000. > - During subsequent LLVM development, changes happen (!). > For example, a new key letter 'g' in the Data Layout. This is > not a bitcode ambiguity so MODULE_CODE_VERSION is unchanged. > - We as a vendor publish toolchain #13 based on SVN r300000. > - Some middleware provider publishes libIncrediblyUseful.bc built > using spiffy new toolchain #13. > - Some hapless game developer tries to use libIncrediblyUseful.bc > but is still on toolchain #12. This causes an error during some > LTO build phase, of course; the question is, what kind of error > and how does Hapless Game Developer know what to do?In summary. An old tool reading new bitcode. It should report that the particular new feature is not support. In this case, something like "unknown 'g' flag in datalayout".> This does *nothing* for Hapless Game Developer. HGD wants to see > "this bitcode file was generated by a newer version, I don't understand > how to interpret it" because _that's_ the actual problem.We can probably add a note saying "newer or corrupt BC".> Proposed solution: > > Whether to emit a bitcode wrapper becomes a target-dependent predicate.Again, *any* solution involving the bitcode wrapper is not interesting to the open source project since it is not required. Cheers, Rafael
Sean Silva
2014-Nov-25 00:19 UTC
[LLVMdev] Using the unused "version" field in the bitcode wrapper (redux)
On Mon, Nov 24, 2014 at 7:53 AM, Rafael Espíndola < rafael.espindola at gmail.com> wrote:> > Right, that version number is used to resolve *ambiguities* in how to > > interpret some chunk of bitcode. It is not a generic bitcode version > > scheme, because most bitcode format changes involve things like adding > > new operands or opcodes, which are easily identified without needing > > an explicit version number. > > That is what it is used for at the moment. It is is just a number, we > can increment it as often as we want. > > > The scenario I am most concerned about is this: > > > > - We as a vendor publish toolchain #12 based on SVN r250000. > > - During subsequent LLVM development, changes happen (!). > > For example, a new key letter 'g' in the Data Layout. This is > > not a bitcode ambiguity so MODULE_CODE_VERSION is unchanged. > > - We as a vendor publish toolchain #13 based on SVN r300000. > > - Some middleware provider publishes libIncrediblyUseful.bc built > > using spiffy new toolchain #13. > > - Some hapless game developer tries to use libIncrediblyUseful.bc > > but is still on toolchain #12. This causes an error during some > > LTO build phase, of course; the question is, what kind of error > > and how does Hapless Game Developer know what to do? > > In summary. An old tool reading new bitcode. It should report that the > particular new feature is not support. In this case, something like > "unknown 'g' flag in datalayout". > > > This does *nothing* for Hapless Game Developer. HGD wants to see > > "this bitcode file was generated by a newer version, I don't understand > > how to interpret it" because _that's_ the actual problem. > > We can probably add a note saying "newer or corrupt BC". >This makes sense to me. I think it is pretty safe to assume that any "invalid bitcode" that an end-user would get their hands on is just because the bitcode is from a newer version. -- Sean Silva> > > Proposed solution: > > > > Whether to emit a bitcode wrapper becomes a target-dependent predicate. > > Again, *any* solution involving the bitcode wrapper is not interesting > to the open source project since it is not required. > > Cheers, > Rafael > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141124/4014b416/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Using the unused "version" field in the bitcode wrapper (redux)
- [LLVMdev] Using the unused "version" field in the bitcode wrapper (redux)
- [LLVMdev] Using the unused "version" field in the bitcode wrapper (redux)
- [LLVMdev] Using the unused "version" field in the bitcode wrapper (redux)
- [LLVMdev] Using the unused "version" field in the bitcode wrapper (redux)