Peter Collingbourne via llvm-dev
2017-Apr-04 20:11 UTC
[llvm-dev] RFC: Adding a string table to the bitcode format
On Tue, Apr 4, 2017 at 12:36 PM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:> > On 2017-Apr-04, at 12:12, Peter Collingbourne <peter at pcc.me.uk> wrote: > > On Mon, Apr 3, 2017 at 8:13 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > >> >> On Apr 3, 2017, at 7:08 PM, Peter Collingbourne <peter at pcc.me.uk> wrote: >> >> Hi, >> >> As part of PR27551 I want to add a string table to the bitcode format to >> allow global value and comdat names to be shared with the proposed symbol >> table (and, as side effects, allow comdat names to be shared with value >> names, make bitcode files more compressible and make bitcode easier to >> parse). The format of the string table would be a top-level block >> containing a blob containing null-terminated strings [0] similar to the >> string table format used in most object files. >> >> >> >> I’m in favor of this, but note that currently string can be encoded with >> less than 8 bits / char in some cases (there might some size increase >> because of this). >> > > Sure, but I think we need to make the right tradeoff between making data > more efficient to read and using fewer bits. In this case I think the right > tradeoff is clearly in favour of being efficient to read, because accessing > it is in the critical path of a consumer (i.e. a linker), and the part that > needs to be efficient to read is a relatively small part of the data in the > bitcode file. The same logic applies to the symbol table (note that we use > support::ulittle32_t instead of a bit encoding). > > > The same logic might suggest storing string sizes (as a prefix) to avoid > scanning for string lengths. >It might do, keeping in mind that reading pretty much every existing object file format already requires scanning for string lengths. Certainly something to try and evaluate, at least. Peter> > That said we already paid this with the metadata table in the recent past >> for example. >> > >> The format of MODULE_CODE_{FUNCTION,GLOBALVAR,ALIAS,IFUNC,COMDAT} >> records would change so that their first operand would specify their names >> with a byte offset into the string table. (To allow for backwards >> compatibility, I would increment the bitcode version.) >> >> >> I assume you mean the EPOCH? >> > > No, the MODULE_CODE_VERSION. > http://llvm-cs.pcc.me.uk/lib/Bitcode/Writer/BitcodeWriter.cpp#3822 > It isn't clear to me why we have both. > > >> Here is what it would look like as bcanalyzer output: >> >> <MODULE_BLOCK> >> <VERSION op0=2> >> <COMDAT op0=0 ...> ; name = foo >> <FUNCTION op0=0 ...> ; name = foo >> <GLOBALVAR op0=4 ...> ; name = bar >> <ALIAS op0=8 ...> ; name = baz >> ; function bodies, etc. >> </MODULE_BLOCK> >> <STRTAB_BLOCK> >> <STRTAB_BLOB blob="foo\0bar\0baz\0"> >> </STRTAB_BLOCK> >> >> >> Why is the string table after the module instead of before? >> > > For implementation simplicity. The idea is that the BitcodeWriter would > have a member of type StringTableBuilder which would accumulate strings > while writing the bitcode module(s) (and symtab in the future). At the end, > the client would call something like BitcodeWriter::writeStrtab() which > would write out the string table. > > >> >> Each STRTAB_BLOCK would apply to all preceding MODULE_BLOCKs. This means >> that bitcode files can continue to be concatenated with "llvm-cat -b". >> (Normally bitcode files would contain a single string table, which in >> multi-module bitcode files would be shared between modules.) >> >> This *almost* allows us to remove the global (top-level) VST entirely, if >> not for the function offset in the FNENTRY record. However, this offset is >> not actually required because we can scan the module's FUNCTION_BLOCK_IDs >> as we were doing before http://reviews.llvm.org/D12536 (this may have a >> performance impact, so I'll measure it first). >> >> Assuming that performance looks good, does this seem reasonable to folks? >> >> >> >> I rather seek to have a symbol table that entirely replace the VST, kee. >> If there is a perf impact with the FNENTRY offset, why can’t it be >> replicated in the symbol table? >> > > Sure, we could in principle store function offsets in the symbol table as > well, if that helps with performance. But I want to measure the impact and > find out whether that is actually the case first. > > Thanks, > -- > -- > Peter > > >-- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/1a566d64/attachment-0001.html>
Rafael Espíndola via llvm-dev
2017-Apr-04 21:35 UTC
[llvm-dev] RFC: Adding a string table to the bitcode format
> It might do, keeping in mind that reading pretty much every existing object > file format already requires scanning for string lengths. Certainly > something to try and evaluate, at least.In lld strlen does show up in the profile. I haven't benchmarked it recently enough to remember exactly how much. The string table builder code currently supports tail merging. On ELF at least that is a very modest size saving. If the size is written as a prefix to the string, that doesn't work. If the size is stored in the reference (like a stringef) we would be able to merge any substring (not sure if it is profitable). Cheers, Rafael
Rui Ueyama via llvm-dev
2017-Apr-04 22:39 UTC
[llvm-dev] RFC: Adding a string table to the bitcode format
On Tue, Apr 4, 2017 at 2:35 PM, Rafael Espíndola via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > It might do, keeping in mind that reading pretty much every existing > object > > file format already requires scanning for string lengths. Certainly > > something to try and evaluate, at least. > > In lld strlen does show up in the profile. I haven't benchmarked it > recently enough to remember exactly how much. >It used to take substantial share of the total execution time (IIRC ~10%), but since we read only global symbol names, that isn't taking that much time now. The string table builder code currently supports tail merging. On ELF> at least that is a very modest size saving. If the size is written as > a prefix to the string, that doesn't work. If the size is stored in > the reference (like a stringef) we would be able to merge any > substring (not sure if it is profitable). > > Cheers, > Rafael > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/53b61c34/attachment.html>
Peter Collingbourne via llvm-dev
2017-Apr-06 21:49 UTC
[llvm-dev] RFC: Adding a string table to the bitcode format
On Tue, Apr 4, 2017 at 2:35 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:> > It might do, keeping in mind that reading pretty much every existing > object > > file format already requires scanning for string lengths. Certainly > > something to try and evaluate, at least. > > In lld strlen does show up in the profile. I haven't benchmarked it > recently enough to remember exactly how much. >So I have spent some time evaluating the various options that have been discussed on this thread. They are: 1) adding the string table itself (this involves moving all data in the VST to either the string table or elsewhere in the bitcode, with the exception of function offsets) 2) representing string references as offset/size 3) removing VST function offsets (and the VST itself) I am attaching three patches with my prototypes, where t1 implements the string table, t2 is string table + remove function offsets and t3 is string table + remove function offsets + offset/size. (They are all against r299588.) I used these patches to evaluate the perf impact when linking Chromium with ThinLTO. Median time elapsed for a no-op incremental link (#runs = 100): - trunk: 51.71s - string table: 48.66s (-5.9%) - string table + remove function offsets: 49.12s (-5.0%) - string table + remove function offsets + offset/size: 48.10s (-7.0%) And for a full link (#runs = 13): - trunk: 605.24s - string table: 597.88s (-1.3%) - string table + remove function offsets: 598.99s (-1.1%) - string table + remove function offsets + offset/size: 598.97s (-1.1%) It is hard to draw any conclusions from the full ThinLTO links (partly due to the small #runs) other than that adding the string table is a clear win. But I think it is clear enough from the incremental results that offset/size is a win. And as Duncan mentioned it should allow us to more easily share strings with the metadata string table. Function offsets do appear to be a win, so it seems likely that we will want something like them, either in the symbol table or elsewhere. My conclusion is that there should be no great harm in storing just the function offsets in the VST for now because, after we add the string table, the data in the VST will no longer be needed for correctness and any potential future change that moves the function offsets elsewhere can simply start ignoring the VST in the reader. I also measured the impact on file size by taking the total sum of object file sizes for each of the variants. They are: - trunk: 1183510504 bytes - string table: 1151533664 bytes (-2.7%) - string table + remove function offsets: 1147981736 bytes (-3.0%) - string table + remove function offsets + offset/size: 1149361632 bytes (-2.9%) The decrease in size vs trunk is expected because of sharing of strings between comdats and globals as well as between modules. But the differences in sizes between the different options does not seem as important. So here is what I plan to do: - Create a new patch that implements string table + offset/size and take further measurements, to make sure that removing function offsets is not a confounding factor for performance. - Clean up the patch and send it for review. The string table builder code currently supports tail merging. On ELF> at least that is a very modest size saving. If the size is written as > a prefix to the string, that doesn't work. If the size is stored in > the reference (like a stringef) we would be able to merge any > substring (not sure if it is profitable). >I agree that we should store the size in the reference. Tail merging should help on targets that require IR name mangling because the mangled name should almost always be a suffix of the unmangled name or vice versa. Thanks, -- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170406/26de42dd/attachment-0001.html> -------------- next part -------------- diff --git a/llvm/include/llvm/Bitcode/BitcodeReader.h b/llvm/include/llvm/Bitcode/BitcodeReader.h index 9e042b17241..0701ddbb7f1 100644 --- a/llvm/include/llvm/Bitcode/BitcodeReader.h +++ b/llvm/include/llvm/Bitcode/BitcodeReader.h @@ -46,6 +46,9 @@ namespace llvm { ArrayRef<uint8_t> Buffer; StringRef ModuleIdentifier; + // The string table used to interpret this module. + StringRef Strtab; + // The bitstream location of the IDENTIFICATION_BLOCK. uint64_t IdentificationBit; @@ -70,6 +73,7 @@ namespace llvm { StringRef getBuffer() const { return StringRef((const char *)Buffer.begin(), Buffer.size()); } + StringRef getStrtab() const { return Strtab; } StringRef getModuleIdentifier() const { return ModuleIdentifier; } diff --git a/llvm/include/llvm/Bitcode/BitcodeWriter.h b/llvm/include/llvm/Bitcode/BitcodeWriter.h index 271cb2d81bb..c7ddb8b333c 100644 --- a/llvm/include/llvm/Bitcode/BitcodeWriter.h +++ b/llvm/include/llvm/Bitcode/BitcodeWriter.h @@ -15,6 +15,7 @@ #define LLVM_BITCODE_BITCODEWRITER_H #include "llvm/IR/ModuleSummaryIndex.h" +#include "llvm/MC/StringTableBuilder.h" #include <string> namespace llvm { @@ -26,12 +27,25 @@ namespace llvm { SmallVectorImpl<char> &Buffer; std::unique_ptr<BitstreamWriter> Stream; + StringTableBuilder StrtabBuilder{StringTableBuilder::ELF}; + bool WroteStrtab = false; + + void writeBlob(unsigned Block, unsigned Record, StringRef Blob); + public: /// Create a BitcodeWriter that writes to Buffer. BitcodeWriter(SmallVectorImpl<char> &Buffer); ~BitcodeWriter(); + /// Write the bitcode file's string table. This must be called exactly once + /// after all modules have been written. + void writeStrtab(); + + /// Copy the string table for another module into this bitcode file. This + /// should be called after copying the module itself into the bitcode file. + void copyStrtab(StringRef Strtab); + /// Write the specified module to the buffer specified at construction time. /// /// If \c ShouldPreserveUseListOrder, encode the use-list order for each \a diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h index e2d2fbb0f44..9ab4b6ab143 100644 --- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h +++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h @@ -22,7 +22,7 @@ namespace llvm { namespace bitc { -// The only top-level block type defined is for a module. +// The only top-level block types are MODULE, IDENTIFICATION and STRTAB. enum BlockIDs { // Blocks MODULE_BLOCK_ID = FIRST_APPLICATION_BLOCKID, @@ -52,7 +52,9 @@ enum BlockIDs { OPERAND_BUNDLE_TAGS_BLOCK_ID, - METADATA_KIND_BLOCK_ID + METADATA_KIND_BLOCK_ID, + + STRTAB_BLOCK_ID, }; /// Identification block contains a string that describes the producer details, @@ -171,8 +173,9 @@ enum ValueSymtabCodes { VST_CODE_ENTRY = 1, // VST_ENTRY: [valueid, namechar x N] VST_CODE_BBENTRY = 2, // VST_BBENTRY: [bbid, namechar x N] VST_CODE_FNENTRY = 3, // VST_FNENTRY: [valueid, offset, namechar x N] + // 4 is available for reuse because it was never used in a release. // VST_COMBINED_ENTRY: [valueid, refguid] - VST_CODE_COMBINED_ENTRY = 5 + VST_CODE_COMBINED_ENTRY = 5, }; // The module path symbol table only has one code (MST_CODE_ENTRY). @@ -232,6 +235,7 @@ enum GlobalValueSummarySymtabCodes { // llvm.type.checked.load intrinsic with all constant integer arguments. // [typeid, offset, n x arg] FS_TYPE_CHECKED_LOAD_CONST_VCALL = 15, + FS_VALUE_GUID = 16, }; enum MetadataCodes { @@ -550,6 +554,10 @@ enum ComdatSelectionKindCodes { COMDAT_SELECTION_KIND_SAME_SIZE = 5, }; +enum StrtabCodes { + STRTAB_BLOB = 1, +}; + } // End bitc namespace } // End llvm namespace diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index 23649e811ed..1556a02b259 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -372,12 +372,28 @@ Expected<std::string> readTriple(BitstreamCursor &Stream) { class BitcodeReaderBase { protected: - BitcodeReaderBase(BitstreamCursor Stream) : Stream(std::move(Stream)) { + BitcodeReaderBase(BitstreamCursor Stream, StringRef Strtab) + : Stream(std::move(Stream)), Strtab(Strtab) { this->Stream.setBlockInfo(&BlockInfo); } BitstreamBlockInfo BlockInfo; BitstreamCursor Stream; + StringRef Strtab; + + /// Indicates that we are using a new encoding for instruction operands where + /// most operands in the current FUNCTION_BLOCK are encoded relative to the + /// instruction number, for a more compact encoding. Some instruction + /// operands are not relative to the instruction ID: basic block numbers, and + /// types. Once the old style function blocks have been phased out, we would + /// not need this flag. + bool UseRelativeIDs = false; + + bool UseStrtab = false; + + Error readVersionRecord(ArrayRef<uint64_t> Record); + std::pair<StringRef, ArrayRef<uint64_t>> + readNameFromStrtab(ArrayRef<uint64_t> Record); bool readBlockInfo(); @@ -405,6 +421,9 @@ class BitcodeReader : public BitcodeReaderBase, public GVMaterializer { bool SeenValueSymbolTable = false; uint64_t VSTOffset = 0; + std::vector<std::string> SectionTable; + std::vector<std::string> GCTable; + std::vector<Type*> TypeList; BitcodeReaderValueList ValueList; Optional<MetadataLoader> MDLoader; @@ -459,14 +478,6 @@ class BitcodeReader : public BitcodeReaderBase, public GVMaterializer { DenseMap<Function *, std::vector<BasicBlock *>> BasicBlockFwdRefs; std::deque<Function *> BasicBlockFwdRefQueue; - /// Indicates that we are using a new encoding for instruction operands where - /// most operands in the current FUNCTION_BLOCK are encoded relative to the - /// instruction number, for a more compact encoding. Some instruction - /// operands are not relative to the instruction ID: basic block numbers, and - /// types. Once the old style function blocks have been phased out, we would - /// not need this flag. - bool UseRelativeIDs = false; - /// True if all functions will be materialized, negating the need to process /// (e.g.) blockaddress forward references. bool WillMaterializeAllForwardRefs = false; @@ -477,8 +488,8 @@ class BitcodeReader : public BitcodeReaderBase, public GVMaterializer { std::vector<std::string> BundleTags; public: - BitcodeReader(BitstreamCursor Stream, StringRef ProducerIdentification, - LLVMContext &Context); + BitcodeReader(BitstreamCursor Stream, StringRef Strtab, + StringRef ProducerIdentification, LLVMContext &Context); Error materializeForwardReferencedFunctions(); @@ -598,6 +609,12 @@ private: Error parseAlignmentValue(uint64_t Exponent, unsigned &Alignment); Error parseAttrKind(uint64_t Code, Attribute::AttrKind *Kind); Error parseModule(uint64_t ResumeBit, bool ShouldLazyLoadMetadata = false); + + Error readComdatRecord(ArrayRef<uint64_t> Record); + Error readGlobalVarRecord(ArrayRef<uint64_t> Record); + Error readFunctionRecord(ArrayRef<uint64_t> Record); + Error readGlobalIndirectSymbolRecord(unsigned BitCode, ArrayRef<uint64_t> Record); + Error parseAttributeBlock(); Error parseAttributeGroupBlock(); Error parseTypeTable(); @@ -607,6 +624,7 @@ private: Expected<Value *> recordValue(SmallVectorImpl<uint64_t> &Record, unsigned NameIndex, Triple &TT); Error parseValueSymbolTable(uint64_t Offset = 0); + Error parseGlobalValueSymbolTable(); Error parseConstants(); Error rememberAndSkipFunctionBodies(); Error rememberAndSkipFunctionBody(); @@ -659,8 +677,8 @@ class ModuleSummaryIndexBitcodeReader : public BitcodeReaderBase { std::string SourceFileName; public: - ModuleSummaryIndexBitcodeReader( - BitstreamCursor Stream, ModuleSummaryIndex &TheIndex); + ModuleSummaryIndexBitcodeReader(BitstreamCursor Stream, StringRef Strtab, + ModuleSummaryIndex &TheIndex); Error parseModule(StringRef ModulePath); @@ -694,10 +712,10 @@ std::error_code llvm::errorToErrorCodeAndEmitErrors(LLVMContext &Ctx, return std::error_code(); } -BitcodeReader::BitcodeReader(BitstreamCursor Stream, +BitcodeReader::BitcodeReader(BitstreamCursor Stream, StringRef Strtab, StringRef ProducerIdentification, LLVMContext &Context) - : BitcodeReaderBase(std::move(Stream)), Context(Context), + : BitcodeReaderBase(std::move(Stream), Strtab), Context(Context), ValueList(Context) { this->ProducerIdentification = ProducerIdentification; } @@ -1682,14 +1700,15 @@ Error BitcodeReader::parseOperandBundleTags() { /// Associate a value with its name from the given index in the provided record. Expected<Value *> BitcodeReader::recordValue(SmallVectorImpl<uint64_t> &Record, unsigned NameIndex, Triple &TT) { - SmallString<128> ValueName; - if (convertToString(Record, NameIndex, ValueName)) - return error("Invalid record"); unsigned ValueID = Record[0]; if (ValueID >= ValueList.size() || !ValueList[ValueID]) return error("Invalid record"); Value *V = ValueList[ValueID]; + SmallString<128> ValueName; + if (convertToString(Record, NameIndex, ValueName)) + return error("Invalid record"); + StringRef NameStr(ValueName.data(), ValueName.size()); if (NameStr.find_first_of(0) != StringRef::npos) return error("Invalid value name"); @@ -1727,6 +1746,48 @@ static uint64_t jumpToValueSymbolTable(uint64_t Offset, return CurrentBit; } +Error BitcodeReader::parseGlobalValueSymbolTable() { + unsigned FuncBitcodeOffsetDelta + Stream.getAbbrevIDWidth() + bitc::BlockIDWidth; + + if (Stream.EnterSubBlock(bitc::VALUE_SYMTAB_BLOCK_ID)) + return error("Invalid record"); + + SmallVector<uint64_t, 64> Record; + while (true) { + BitstreamEntry Entry = Stream.advanceSkippingSubblocks(); + + switch (Entry.Kind) { + case BitstreamEntry::SubBlock: + case BitstreamEntry::Error: + return error("Malformed block"); + case BitstreamEntry::EndBlock: + return Error::success(); + case BitstreamEntry::Record: + break; + } + + Record.clear(); + switch (Stream.readRecord(Entry.ID, Record)) { + case bitc::VST_CODE_FNENTRY: { // [valueid, offset, namechar x N] + // Note that we subtract 1 here because the offset is relative to one word + // before the start of the identification or module block, which was + // historically always the start of the regular bitcode header. + uint64_t FuncWordOffset = Record[1] - 1; + uint64_t FuncBitOffset = FuncWordOffset * 32; + DeferredFunctionInfo[cast<Function>(ValueList[Record[0]])] + FuncBitOffset + FuncBitcodeOffsetDelta; + // Set the LastFunctionBlockBit to point to the last function block. + // Later when parsing is resumed after function materialization, + // we can simply skip that last function block. + if (FuncBitOffset > LastFunctionBlockBit) + LastFunctionBlockBit = FuncBitOffset; + break; + } + } + } +} + /// Parse the value symbol table at either the current parsing location or /// at the given bit offset if provided. Error BitcodeReader::parseValueSymbolTable(uint64_t Offset) { @@ -1734,8 +1795,15 @@ Error BitcodeReader::parseValueSymbolTable(uint64_t Offset) { // Pass in the Offset to distinguish between calling for the module-level // VST (where we want to jump to the VST offset) and the function-level // VST (where we don't). - if (Offset > 0) + if (Offset > 0) { CurrentBit = jumpToValueSymbolTable(Offset, Stream); + if (UseStrtab) { + if (Error Err = parseGlobalValueSymbolTable()) + return Err; + Stream.JumpToBit(CurrentBit); + return Error::success(); + } + } // Compute the delta between the bitcode indices in the VST (the word offset // to the word-aligned ENTER_SUBBLOCK for the function block, and that @@ -1779,18 +1847,17 @@ Error BitcodeReader::parseValueSymbolTable(uint64_t Offset) { // Read a record. Record.clear(); - switch (Stream.readRecord(Entry.ID, Record)) { + switch (unsigned Code = Stream.readRecord(Entry.ID, Record)) { default: // Default behavior: unknown type. break; - case bitc::VST_CODE_ENTRY: { // VST_CODE_ENTRY: [valueid, namechar x N] + case bitc::VST_CODE_ENTRY: { // [valueid, namechar x N] Expected<Value *> ValOrErr = recordValue(Record, 1, TT); if (Error Err = ValOrErr.takeError()) return Err; ValOrErr.get(); break; } - case bitc::VST_CODE_FNENTRY: { - // VST_CODE_FNENTRY: [valueid, offset, namechar x N] + case bitc::VST_CODE_FNENTRY: { // [valueid, offset, namechar x N] Expected<Value *> ValOrErr = recordValue(Record, 2, TT); if (Error Err = ValOrErr.takeError()) return Err; @@ -2609,6 +2676,271 @@ bool BitcodeReaderBase::readBlockInfo() { return false; } +std::pair<StringRef, ArrayRef<uint64_t>> +BitcodeReaderBase::readNameFromStrtab(ArrayRef<uint64_t> Record) { + if (!UseStrtab) + return {"", Record}; + if (Record[0] >= Strtab.size()) + return {"", {}}; + return {Strtab.data() + Record[0], Record.slice(1)}; +} + +Error BitcodeReader::readComdatRecord(ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + if (Record.size() < 1) + return error("Invalid record"); + Comdat::SelectionKind SK = getDecodedComdatSelectionKind(Record[0]); + std::string OldFormatName; + if (!UseStrtab) { + if (Record.size() < 2) + return error("Invalid record"); + unsigned ComdatNameSize = Record[1]; + OldFormatName.reserve(ComdatNameSize); + for (unsigned i = 0; i != ComdatNameSize; ++i) + OldFormatName += (char)Record[2 + i]; + Name = OldFormatName; + } + Comdat *C = TheModule->getOrInsertComdat(Name); + C->setSelectionKind(SK); + ComdatList.push_back(C); + return Error::success(); +} + +Error BitcodeReader::readGlobalVarRecord(ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + if (Record.size() < 6) + return error("Invalid record"); + Type *Ty = getTypeByID(Record[0]); + if (!Ty) + return error("Invalid record"); + bool isConstant = Record[1] & 1; + bool explicitType = Record[1] & 2; + unsigned AddressSpace; + if (explicitType) { + AddressSpace = Record[1] >> 2; + } else { + if (!Ty->isPointerTy()) + return error("Invalid type for value"); + AddressSpace = cast<PointerType>(Ty)->getAddressSpace(); + Ty = cast<PointerType>(Ty)->getElementType(); + } + + uint64_t RawLinkage = Record[3]; + GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); + unsigned Alignment; + if (Error Err = parseAlignmentValue(Record[4], Alignment)) + return Err; + std::string Section; + if (Record[5]) { + if (Record[5] - 1 >= SectionTable.size()) + return error("Invalid ID"); + Section = SectionTable[Record[5] - 1]; + } + GlobalValue::VisibilityTypes Visibility = GlobalValue::DefaultVisibility; + // Local linkage must have default visibility. + if (Record.size() > 6 && !GlobalValue::isLocalLinkage(Linkage)) + // FIXME: Change to an error if non-default in 4.0. + Visibility = getDecodedVisibility(Record[6]); + + GlobalVariable::ThreadLocalMode TLM = GlobalVariable::NotThreadLocal; + if (Record.size() > 7) + TLM = getDecodedThreadLocalMode(Record[7]); + + GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; + if (Record.size() > 8) + UnnamedAddr = getDecodedUnnamedAddrType(Record[8]); + + bool ExternallyInitialized = false; + if (Record.size() > 9) + ExternallyInitialized = Record[9]; + + GlobalVariable *NewGV + new GlobalVariable(*TheModule, Ty, isConstant, Linkage, nullptr, Name, + nullptr, TLM, AddressSpace, ExternallyInitialized); + NewGV->setAlignment(Alignment); + if (!Section.empty()) + NewGV->setSection(Section); + NewGV->setVisibility(Visibility); + NewGV->setUnnamedAddr(UnnamedAddr); + + if (Record.size() > 10) + NewGV->setDLLStorageClass(getDecodedDLLStorageClass(Record[10])); + else + upgradeDLLImportExportLinkage(NewGV, RawLinkage); + + ValueList.push_back(NewGV); + + // Remember which value to use for the global initializer. + if (unsigned InitID = Record[2]) + GlobalInits.push_back(std::make_pair(NewGV, InitID - 1)); + + if (Record.size() > 11) { + if (unsigned ComdatID = Record[11]) { + if (ComdatID > ComdatList.size()) + return error("Invalid global variable comdat ID"); + NewGV->setComdat(ComdatList[ComdatID - 1]); + } + } else if (hasImplicitComdat(RawLinkage)) { + NewGV->setComdat(reinterpret_cast<Comdat *>(1)); + } + return Error::success(); +} + +Error BitcodeReader::readFunctionRecord(ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + if (Record.size() < 8) + return error("Invalid record"); + Type *Ty = getTypeByID(Record[0]); + if (!Ty) + return error("Invalid record"); + if (auto *PTy = dyn_cast<PointerType>(Ty)) + Ty = PTy->getElementType(); + auto *FTy = dyn_cast<FunctionType>(Ty); + if (!FTy) + return error("Invalid type for value"); + auto CC = static_cast<CallingConv::ID>(Record[1]); + if (CC & ~CallingConv::MaxID) + return error("Invalid calling convention ID"); + + Function *Func + Function::Create(FTy, GlobalValue::ExternalLinkage, Name, TheModule); + + Func->setCallingConv(CC); + bool isProto = Record[2]; + uint64_t RawLinkage = Record[3]; + Func->setLinkage(getDecodedLinkage(RawLinkage)); + Func->setAttributes(getAttributes(Record[4])); + + unsigned Alignment; + if (Error Err = parseAlignmentValue(Record[5], Alignment)) + return Err; + Func->setAlignment(Alignment); + if (Record[6]) { + if (Record[6] - 1 >= SectionTable.size()) + return error("Invalid ID"); + Func->setSection(SectionTable[Record[6] - 1]); + } + // Local linkage must have default visibility. + if (!Func->hasLocalLinkage()) + // FIXME: Change to an error if non-default in 4.0. + Func->setVisibility(getDecodedVisibility(Record[7])); + if (Record.size() > 8 && Record[8]) { + if (Record[8] - 1 >= GCTable.size()) + return error("Invalid ID"); + Func->setGC(GCTable[Record[8] - 1]); + } + GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; + if (Record.size() > 9) + UnnamedAddr = getDecodedUnnamedAddrType(Record[9]); + Func->setUnnamedAddr(UnnamedAddr); + if (Record.size() > 10 && Record[10] != 0) + FunctionPrologues.push_back(std::make_pair(Func, Record[10] - 1)); + + if (Record.size() > 11) + Func->setDLLStorageClass(getDecodedDLLStorageClass(Record[11])); + else + upgradeDLLImportExportLinkage(Func, RawLinkage); + + if (Record.size() > 12) { + if (unsigned ComdatID = Record[12]) { + if (ComdatID > ComdatList.size()) + return error("Invalid function comdat ID"); + Func->setComdat(ComdatList[ComdatID - 1]); + } + } else if (hasImplicitComdat(RawLinkage)) { + Func->setComdat(reinterpret_cast<Comdat *>(1)); + } + + if (Record.size() > 13 && Record[13] != 0) + FunctionPrefixes.push_back(std::make_pair(Func, Record[13] - 1)); + + if (Record.size() > 14 && Record[14] != 0) + FunctionPersonalityFns.push_back(std::make_pair(Func, Record[14] - 1)); + + ValueList.push_back(Func); + + // If this is a function with a body, remember the prototype we are + // creating now, so that we can match up the body with them later. + if (!isProto) { + Func->setIsMaterializable(true); + FunctionsWithBodies.push_back(Func); + DeferredFunctionInfo[Func] = 0; + } + return Error::success(); +} + +Error BitcodeReader::readGlobalIndirectSymbolRecord(unsigned BitCode, + ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + bool NewRecord = BitCode != bitc::MODULE_CODE_ALIAS_OLD; + if (Record.size() < (3 + (unsigned)NewRecord)) + return error("Invalid record"); + unsigned OpNum = 0; + Type *Ty = getTypeByID(Record[OpNum++]); + if (!Ty) + return error("Invalid record"); + + unsigned AddrSpace; + if (!NewRecord) { + auto *PTy = dyn_cast<PointerType>(Ty); + if (!PTy) + return error("Invalid type for value"); + Ty = PTy->getElementType(); + AddrSpace = PTy->getAddressSpace(); + } else { + AddrSpace = Record[OpNum++]; + } + + auto Val = Record[OpNum++]; + auto Linkage = Record[OpNum++]; + GlobalIndirectSymbol *NewGA; + if (BitCode == bitc::MODULE_CODE_ALIAS || + BitCode == bitc::MODULE_CODE_ALIAS_OLD) + NewGA = GlobalAlias::create(Ty, AddrSpace, getDecodedLinkage(Linkage), Name, + TheModule); + else + NewGA = GlobalIFunc::create(Ty, AddrSpace, getDecodedLinkage(Linkage), Name, + nullptr, TheModule); + // Old bitcode files didn't have visibility field. + // Local linkage must have default visibility. + if (OpNum != Record.size()) { + auto VisInd = OpNum++; + if (!NewGA->hasLocalLinkage()) + // FIXME: Change to an error if non-default in 4.0. + NewGA->setVisibility(getDecodedVisibility(Record[VisInd])); + } + if (OpNum != Record.size()) + NewGA->setDLLStorageClass(getDecodedDLLStorageClass(Record[OpNum++])); + else + upgradeDLLImportExportLinkage(NewGA, Linkage); + if (OpNum != Record.size()) + NewGA->setThreadLocalMode(getDecodedThreadLocalMode(Record[OpNum++])); + if (OpNum != Record.size()) + NewGA->setUnnamedAddr(getDecodedUnnamedAddrType(Record[OpNum++])); + ValueList.push_back(NewGA); + IndirectSymbolInits.push_back(std::make_pair(NewGA, Val)); + return Error::success(); +} + +Error BitcodeReaderBase::readVersionRecord(ArrayRef<uint64_t> Record) { + if (Record.size() < 1) + return error("Invalid record"); + unsigned ModuleVersion = Record[0]; + if (ModuleVersion > 2) + return error("Invalid value"); + UseRelativeIDs = ModuleVersion >= 1; + UseStrtab = ModuleVersion >= 2; + return Error::success(); +} + Error BitcodeReader::parseModule(uint64_t ResumeBit, bool ShouldLazyLoadMetadata) { if (ResumeBit) @@ -2617,8 +2949,6 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, return error("Invalid record"); SmallVector<uint64_t, 64> Record; - std::vector<std::string> SectionTable; - std::vector<std::string> GCTable; // Read all the records for this module. while (true) { @@ -2765,20 +3095,8 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, switch (BitCode) { default: break; // Default behavior, ignore unknown content. case bitc::MODULE_CODE_VERSION: { // VERSION: [version#] - if (Record.size() < 1) - return error("Invalid record"); - // Only version #0 and #1 are supported so far. - unsigned module_version = Record[0]; - switch (module_version) { - default: - return error("Invalid value"); - case 0: - UseRelativeIDs = false; - break; - case 1: - UseRelativeIDs = true; - break; - } + if (Error Err = readVersionRecord(Record)) + return Err; break; } case bitc::MODULE_CODE_TRIPLE: { // TRIPLE: [strchr x N] @@ -2825,184 +3143,25 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, break; } case bitc::MODULE_CODE_COMDAT: { // COMDAT: [selection_kind, name] - if (Record.size() < 2) - return error("Invalid record"); - Comdat::SelectionKind SK = getDecodedComdatSelectionKind(Record[0]); - unsigned ComdatNameSize = Record[1]; - std::string ComdatName; - ComdatName.reserve(ComdatNameSize); - for (unsigned i = 0; i != ComdatNameSize; ++i) - ComdatName += (char)Record[2 + i]; - Comdat *C = TheModule->getOrInsertComdat(ComdatName); - C->setSelectionKind(SK); - ComdatList.push_back(C); - break; - } - // GLOBALVAR: [pointer type, isconst, initid, + if (Error Err = readComdatRecord(Record)) + return Err; + break; + } + // GLOBALVAR: [(name_offset, )pointer type, isconst, initid, // linkage, alignment, section, visibility, threadlocal, // unnamed_addr, externally_initialized, dllstorageclass, // comdat] case bitc::MODULE_CODE_GLOBALVAR: { - if (Record.size() < 6) - return error("Invalid record"); - Type *Ty = getTypeByID(Record[0]); - if (!Ty) - return error("Invalid record"); - bool isConstant = Record[1] & 1; - bool explicitType = Record[1] & 2; - unsigned AddressSpace; - if (explicitType) { - AddressSpace = Record[1] >> 2; - } else { - if (!Ty->isPointerTy()) - return error("Invalid type for value"); - AddressSpace = cast<PointerType>(Ty)->getAddressSpace(); - Ty = cast<PointerType>(Ty)->getElementType(); - } - - uint64_t RawLinkage = Record[3]; - GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - unsigned Alignment; - if (Error Err = parseAlignmentValue(Record[4], Alignment)) + if (Error Err = readGlobalVarRecord(Record)) return Err; - std::string Section; - if (Record[5]) { - if (Record[5]-1 >= SectionTable.size()) - return error("Invalid ID"); - Section = SectionTable[Record[5]-1]; - } - GlobalValue::VisibilityTypes Visibility = GlobalValue::DefaultVisibility; - // Local linkage must have default visibility. - if (Record.size() > 6 && !GlobalValue::isLocalLinkage(Linkage)) - // FIXME: Change to an error if non-default in 4.0. - Visibility = getDecodedVisibility(Record[6]); - - GlobalVariable::ThreadLocalMode TLM = GlobalVariable::NotThreadLocal; - if (Record.size() > 7) - TLM = getDecodedThreadLocalMode(Record[7]); - - GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; - if (Record.size() > 8) - UnnamedAddr = getDecodedUnnamedAddrType(Record[8]); - - bool ExternallyInitialized = false; - if (Record.size() > 9) - ExternallyInitialized = Record[9]; - - GlobalVariable *NewGV - new GlobalVariable(*TheModule, Ty, isConstant, Linkage, nullptr, "", nullptr, - TLM, AddressSpace, ExternallyInitialized); - NewGV->setAlignment(Alignment); - if (!Section.empty()) - NewGV->setSection(Section); - NewGV->setVisibility(Visibility); - NewGV->setUnnamedAddr(UnnamedAddr); - - if (Record.size() > 10) - NewGV->setDLLStorageClass(getDecodedDLLStorageClass(Record[10])); - else - upgradeDLLImportExportLinkage(NewGV, RawLinkage); - - ValueList.push_back(NewGV); - - // Remember which value to use for the global initializer. - if (unsigned InitID = Record[2]) - GlobalInits.push_back(std::make_pair(NewGV, InitID-1)); - - if (Record.size() > 11) { - if (unsigned ComdatID = Record[11]) { - if (ComdatID > ComdatList.size()) - return error("Invalid global variable comdat ID"); - NewGV->setComdat(ComdatList[ComdatID - 1]); - } - } else if (hasImplicitComdat(RawLinkage)) { - NewGV->setComdat(reinterpret_cast<Comdat *>(1)); - } - break; } - // FUNCTION: [type, callingconv, isproto, linkage, paramattr, + // FUNCTION: [(name_offset, )type, callingconv, isproto, linkage, paramattr, // alignment, section, visibility, gc, unnamed_addr, // prologuedata, dllstorageclass, comdat, prefixdata] case bitc::MODULE_CODE_FUNCTION: { - if (Record.size() < 8) - return error("Invalid record"); - Type *Ty = getTypeByID(Record[0]); - if (!Ty) - return error("Invalid record"); - if (auto *PTy = dyn_cast<PointerType>(Ty)) - Ty = PTy->getElementType(); - auto *FTy = dyn_cast<FunctionType>(Ty); - if (!FTy) - return error("Invalid type for value"); - auto CC = static_cast<CallingConv::ID>(Record[1]); - if (CC & ~CallingConv::MaxID) - return error("Invalid calling convention ID"); - - Function *Func = Function::Create(FTy, GlobalValue::ExternalLinkage, - "", TheModule); - - Func->setCallingConv(CC); - bool isProto = Record[2]; - uint64_t RawLinkage = Record[3]; - Func->setLinkage(getDecodedLinkage(RawLinkage)); - Func->setAttributes(getAttributes(Record[4])); - - unsigned Alignment; - if (Error Err = parseAlignmentValue(Record[5], Alignment)) + if (Error Err = readFunctionRecord(Record)) return Err; - Func->setAlignment(Alignment); - if (Record[6]) { - if (Record[6]-1 >= SectionTable.size()) - return error("Invalid ID"); - Func->setSection(SectionTable[Record[6]-1]); - } - // Local linkage must have default visibility. - if (!Func->hasLocalLinkage()) - // FIXME: Change to an error if non-default in 4.0. - Func->setVisibility(getDecodedVisibility(Record[7])); - if (Record.size() > 8 && Record[8]) { - if (Record[8]-1 >= GCTable.size()) - return error("Invalid ID"); - Func->setGC(GCTable[Record[8] - 1]); - } - GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; - if (Record.size() > 9) - UnnamedAddr = getDecodedUnnamedAddrType(Record[9]); - Func->setUnnamedAddr(UnnamedAddr); - if (Record.size() > 10 && Record[10] != 0) - FunctionPrologues.push_back(std::make_pair(Func, Record[10]-1)); - - if (Record.size() > 11) - Func->setDLLStorageClass(getDecodedDLLStorageClass(Record[11])); - else - upgradeDLLImportExportLinkage(Func, RawLinkage); - - if (Record.size() > 12) { - if (unsigned ComdatID = Record[12]) { - if (ComdatID > ComdatList.size()) - return error("Invalid function comdat ID"); - Func->setComdat(ComdatList[ComdatID - 1]); - } - } else if (hasImplicitComdat(RawLinkage)) { - Func->setComdat(reinterpret_cast<Comdat *>(1)); - } - - if (Record.size() > 13 && Record[13] != 0) - FunctionPrefixes.push_back(std::make_pair(Func, Record[13]-1)); - - if (Record.size() > 14 && Record[14] != 0) - FunctionPersonalityFns.push_back(std::make_pair(Func, Record[14] - 1)); - - ValueList.push_back(Func); - - // If this is a function with a body, remember the prototype we are - // creating now, so that we can match up the body with them later. - if (!isProto) { - Func->setIsMaterializable(true); - FunctionsWithBodies.push_back(Func); - DeferredFunctionInfo[Func] = 0; - } break; } // ALIAS: [alias type, addrspace, aliasee val#, linkage] @@ -3011,54 +3170,10 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, case bitc::MODULE_CODE_IFUNC: case bitc::MODULE_CODE_ALIAS: case bitc::MODULE_CODE_ALIAS_OLD: { - bool NewRecord = BitCode != bitc::MODULE_CODE_ALIAS_OLD; - if (Record.size() < (3 + (unsigned)NewRecord)) - return error("Invalid record"); - unsigned OpNum = 0; - Type *Ty = getTypeByID(Record[OpNum++]); - if (!Ty) - return error("Invalid record"); - - unsigned AddrSpace; - if (!NewRecord) { - auto *PTy = dyn_cast<PointerType>(Ty); - if (!PTy) - return error("Invalid type for value"); - Ty = PTy->getElementType(); - AddrSpace = PTy->getAddressSpace(); - } else { - AddrSpace = Record[OpNum++]; - } - - auto Val = Record[OpNum++]; - auto Linkage = Record[OpNum++]; - GlobalIndirectSymbol *NewGA; - if (BitCode == bitc::MODULE_CODE_ALIAS || - BitCode == bitc::MODULE_CODE_ALIAS_OLD) - NewGA = GlobalAlias::create(Ty, AddrSpace, getDecodedLinkage(Linkage), - "", TheModule); - else - NewGA = GlobalIFunc::create(Ty, AddrSpace, getDecodedLinkage(Linkage), - "", nullptr, TheModule); - // Old bitcode files didn't have visibility field. - // Local linkage must have default visibility. - if (OpNum != Record.size()) { - auto VisInd = OpNum++; - if (!NewGA->hasLocalLinkage()) - // FIXME: Change to an error if non-default in 4.0. - NewGA->setVisibility(getDecodedVisibility(Record[VisInd])); - } - if (OpNum != Record.size()) - NewGA->setDLLStorageClass(getDecodedDLLStorageClass(Record[OpNum++])); - else - upgradeDLLImportExportLinkage(NewGA, Linkage); - if (OpNum != Record.size()) - NewGA->setThreadLocalMode(getDecodedThreadLocalMode(Record[OpNum++])); - if (OpNum != Record.size()) - NewGA->setUnnamedAddr(getDecodedUnnamedAddrType(Record[OpNum++])); - ValueList.push_back(NewGA); - IndirectSymbolInits.push_back(std::make_pair(NewGA, Val)); + if (Error Err = readGlobalIndirectSymbolRecord(BitCode, Record)) + return Err; break; + } /// MODULE_CODE_VSTOFFSET: [offset] case bitc::MODULE_CODE_VSTOFFSET: @@ -4535,8 +4650,8 @@ std::vector<StructType *> BitcodeReader::getIdentifiedStructTypes() const { } ModuleSummaryIndexBitcodeReader::ModuleSummaryIndexBitcodeReader( - BitstreamCursor Cursor, ModuleSummaryIndex &TheIndex) - : BitcodeReaderBase(std::move(Cursor)), TheIndex(TheIndex) {} + BitstreamCursor Cursor, StringRef Strtab, ModuleSummaryIndex &TheIndex) + : BitcodeReaderBase(std::move(Cursor), Strtab), TheIndex(TheIndex) {} std::pair<GlobalValue::GUID, GlobalValue::GUID> ModuleSummaryIndexBitcodeReader::getGUIDFromValueId(unsigned ValueId) { @@ -4560,7 +4675,7 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( SmallVector<uint64_t, 64> Record; // Read all the records for this value table. - SmallString<128> ValueName; + SmallString<128> Buf; while (true) { BitstreamEntry Entry = Stream.advanceSkippingSubblocks(); @@ -4580,12 +4695,17 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( // Read a record. Record.clear(); - switch (Stream.readRecord(Entry.ID, Record)) { + switch (unsigned Code = Stream.readRecord(Entry.ID, Record)) { default: // Default behavior: ignore (e.g. VST_CODE_BBENTRY records). break; - case bitc::VST_CODE_ENTRY: { // VST_CODE_ENTRY: [valueid, namechar x N] - if (convertToString(Record, 1, ValueName)) + case bitc::VST_CODE_ENTRY: { // [valueid, namechar x N] + if (UseStrtab) + break; + StringRef ValueName; + Buf.clear(); + if (convertToString(Record, 1, Buf)) return error("Invalid record"); + ValueName = Buf; unsigned ValueID = Record[0]; assert(!SourceFileName.empty()); auto VLI = ValueIdToLinkageMap.find(ValueID); @@ -4603,13 +4723,17 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( << ValueName << "\n"; ValueIdToCallGraphGUIDMap[ValueID] std::make_pair(ValueGUID, OriginalNameID); - ValueName.clear(); + Buf.clear(); break; } - case bitc::VST_CODE_FNENTRY: { - // VST_CODE_FNENTRY: [valueid, offset, namechar x N] - if (convertToString(Record, 2, ValueName)) + case bitc::VST_CODE_FNENTRY: { // [valueid, offset, namechar x N] + if (UseStrtab) + break; + StringRef ValueName; + Buf.clear(); + if (convertToString(Record, 2, Buf)) return error("Invalid record"); + ValueName = Buf; unsigned ValueID = Record[0]; assert(!SourceFileName.empty()); auto VLI = ValueIdToLinkageMap.find(ValueID); @@ -4627,8 +4751,6 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( << ValueName << "\n"; ValueIdToCallGraphGUIDMap[ValueID] std::make_pair(FunctionGUID, OriginalNameID); - - ValueName.clear(); break; } case bitc::VST_CODE_COMBINED_ENTRY: { @@ -4715,6 +4837,11 @@ Error ModuleSummaryIndexBitcodeReader::parseModule(StringRef ModulePath) { default: break; // Default behavior, ignore unknown content. /// MODULE_CODE_SOURCE_FILENAME: [namechar x N] + case bitc::MODULE_CODE_VERSION: { + if (Error Err = readVersionRecord(Record)) + return Err; + break; + } case bitc::MODULE_CODE_SOURCE_FILENAME: { SmallString<128> ValueName; if (convertToString(Record, 0, ValueName)) @@ -4748,37 +4875,35 @@ Error ModuleSummaryIndexBitcodeReader::parseModule(StringRef ModulePath) { // was historically always the start of the regular bitcode header. VSTOffset = Record[0] - 1; break; - // GLOBALVAR: [pointer type, isconst, initid, - // linkage, alignment, section, visibility, threadlocal, - // unnamed_addr, externally_initialized, dllstorageclass, - // comdat] - case bitc::MODULE_CODE_GLOBALVAR: { - if (Record.size() < 6) - return error("Invalid record"); - uint64_t RawLinkage = Record[3]; - GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - ValueIdToLinkageMap[ValueId++] = Linkage; - break; - } - // FUNCTION: [type, callingconv, isproto, linkage, paramattr, - // alignment, section, visibility, gc, unnamed_addr, - // prologuedata, dllstorageclass, comdat, prefixdata] - case bitc::MODULE_CODE_FUNCTION: { - if (Record.size() < 8) - return error("Invalid record"); - uint64_t RawLinkage = Record[3]; - GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - ValueIdToLinkageMap[ValueId++] = Linkage; - break; - } - // ALIAS: [alias type, addrspace, aliasee val#, linkage, visibility, - // dllstorageclass] + // GLOBALVAR: [pointer type, isconst, initid, linkage, ...] + // FUNCTION: [type, callingconv, isproto, linkage, ...] + // ALIAS: [alias type, addrspace, aliasee val#, linkage, ...] + case bitc::MODULE_CODE_GLOBALVAR: + case bitc::MODULE_CODE_FUNCTION: case bitc::MODULE_CODE_ALIAS: { - if (Record.size() < 6) + StringRef Name; + ArrayRef<uint64_t> GVRecord; + std::tie(Name, GVRecord) = readNameFromStrtab(Record); + if (GVRecord.size() <= 3) return error("Invalid record"); - uint64_t RawLinkage = Record[3]; + uint64_t RawLinkage = GVRecord[3]; GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - ValueIdToLinkageMap[ValueId++] = Linkage; + if (!UseStrtab) { + ValueIdToLinkageMap[ValueId++] = Linkage; + break; + } + + std::string GlobalId + GlobalValue::getGlobalIdentifier(Name, Linkage, SourceFileName); + auto ValueGUID = GlobalValue::getGUID(GlobalId); + auto OriginalNameID = ValueGUID; + if (GlobalValue::isLocalLinkage(Linkage)) + OriginalNameID = GlobalValue::getGUID(Name); + if (PrintSummaryGUIDs) + dbgs() << "GUID " << ValueGUID << "(" << OriginalNameID << ") is " + << Name << "\n"; + ValueIdToCallGraphGUIDMap[ValueId++] + std::make_pair(ValueGUID, OriginalNameID); break; } } @@ -4889,6 +5014,13 @@ Error ModuleSummaryIndexBitcodeReader::parseEntireSummary( switch (BitCode) { default: // Default behavior: ignore. break; + + case bitc::FS_VALUE_GUID: { + uint64_t ValueID = Record[0]; + GlobalValue::GUID RefGUID = Record[1]; + ValueIdToCallGraphGUIDMap[ValueID] = std::make_pair(RefGUID, RefGUID); + break; + } // FS_PERMODULE: [valueid, flags, instcount, numrefs, numrefs x valueid, // n x (valueid)] // FS_PERMODULE_PROFILE: [valueid, flags, instcount, numrefs, @@ -5193,6 +5325,35 @@ const std::error_category &llvm::BitcodeErrorCategory() { return *ErrorCategory; } +static Expected<StringRef> readStrtab(BitstreamCursor &Stream) { + if (Stream.EnterSubBlock(bitc::STRTAB_BLOCK_ID)) + return error("Invalid record"); + + StringRef Strtab; + while (1) { + BitstreamEntry Entry = Stream.advance(); + switch (Entry.Kind) { + case BitstreamEntry::EndBlock: + return Strtab; + + case BitstreamEntry::Error: + return error("Malformed block"); + + case BitstreamEntry::SubBlock: + if (Stream.SkipBlock()) + return error("Malformed block"); + break; + + case BitstreamEntry::Record: + StringRef Blob; + SmallVector<uint64_t, 1> Record; + if (Stream.readRecord(Entry.ID, Record, &Blob) == bitc::STRTAB_BLOB) + Strtab = Blob; + break; + } + } +} + //===----------------------------------------------------------------------===// // External interface //===----------------------------------------------------------------------===// @@ -5245,6 +5406,22 @@ llvm::getBitcodeModuleList(MemoryBufferRef Buffer) { continue; } + if (Entry.ID == bitc::STRTAB_BLOCK_ID) { + Expected<StringRef> Strtab = readStrtab(Stream); + if (!Strtab) + return Strtab.takeError(); + // This string table is used by every preceding bitcode module that does + // not have its own string table. A bitcode file may have multiple + // string tables if it was created by binary concatenation, for example + // with "llvm-cat -b". + for (auto I = Modules.rbegin(), E = Modules.rend(); I != E; ++I) { + if (!I->Strtab.empty()) + break; + I->Strtab = *Strtab; + } + continue; + } + if (Stream.SkipBlock()) return error("Malformed block"); continue; @@ -5281,8 +5458,8 @@ BitcodeModule::getModuleImpl(LLVMContext &Context, bool MaterializeAll, } Stream.JumpToBit(ModuleBit); - auto *R - new BitcodeReader(std::move(Stream), ProducerIdentification, Context); + auto *R = new BitcodeReader(std::move(Stream), Strtab, ProducerIdentification, + Context); std::unique_ptr<Module> M llvm::make_unique<Module>(ModuleIdentifier, Context); @@ -5317,7 +5494,7 @@ Expected<std::unique_ptr<ModuleSummaryIndex>> BitcodeModule::getSummary() { Stream.JumpToBit(ModuleBit); auto Index = llvm::make_unique<ModuleSummaryIndex>(); - ModuleSummaryIndexBitcodeReader R(std::move(Stream), *Index); + ModuleSummaryIndexBitcodeReader R(std::move(Stream), Strtab, *Index); if (Error Err = R.parseModule(ModuleIdentifier)) return std::move(Err); diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp index 1bdb439086a..fd271a364fc 100644 --- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp +++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp @@ -28,6 +28,7 @@ #include "llvm/IR/Operator.h" #include "llvm/IR/UseListOrder.h" #include "llvm/IR/ValueSymbolTable.h" +#include "llvm/MC/StringTableBuilder.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/Program.h" @@ -96,6 +97,8 @@ class ModuleBitcodeWriter : public BitcodeWriterBase { /// Pointer to the buffer allocated by caller for bitcode writing. const SmallVectorImpl<char> &Buffer; + StringTableBuilder &StrtabBuilder; + /// The Module to write to bitcode. const Module &M; @@ -131,11 +134,12 @@ public: /// Constructs a ModuleBitcodeWriter object for the given Module, /// writing to the provided \p Buffer. ModuleBitcodeWriter(const Module *M, SmallVectorImpl<char> &Buffer, + StringTableBuilder &StrtabBuilder, BitstreamWriter &Stream, bool ShouldPreserveUseListOrder, const ModuleSummaryIndex *Index, bool GenerateHash, ModuleHash *ModHash = nullptr) - : BitcodeWriterBase(Stream), Buffer(Buffer), M(*M), - VE(*M, ShouldPreserveUseListOrder), Index(Index), + : BitcodeWriterBase(Stream), Buffer(Buffer), StrtabBuilder(StrtabBuilder), + M(*M), VE(*M, ShouldPreserveUseListOrder), Index(Index), GenerateHash(GenerateHash), ModHash(ModHash), BitcodeStartBit(Stream.GetCurrentBitNo()) { // Assign ValueIds to any callee values in the index that came from @@ -261,9 +265,9 @@ private: SmallVectorImpl<uint64_t> &Vals); void writeInstruction(const Instruction &I, unsigned InstID, SmallVectorImpl<unsigned> &Vals); - void writeValueSymbolTable( - const ValueSymbolTable &VST, bool IsModuleLevel = false, - DenseMap<const Function *, uint64_t> *FunctionToBitcodeIndex = nullptr); + void writeValueSymbolTable(const ValueSymbolTable &VST); + void writeGlobalValueSymbolTable( + DenseMap<const Function *, uint64_t> &FunctionToBitcodeIndex); void writeUseList(UseListOrder &&Order); void writeUseListBlock(const Function *F); void @@ -478,7 +482,6 @@ public: private: void writeIndex(); void writeModStrings(); - void writeCombinedValueSymbolTable(); void writeCombinedGlobalValueSummary(); /// Indicates whether the provided \p ModulePath should be written into @@ -1049,12 +1052,8 @@ void ModuleBitcodeWriter::writeComdats() { SmallVector<unsigned, 64> Vals; for (const Comdat *C : VE.getComdats()) { // COMDAT: [selection_kind, name] + Vals.push_back(StrtabBuilder.add(C->getName())); Vals.push_back(getEncodedComdatSelectionKind(*C)); - size_t Size = C->getName().size(); - assert(isUInt<32>(Size)); - Vals.push_back(Size); - for (char Chr : C->getName()) - Vals.push_back((unsigned char)Chr); Stream.EmitRecord(bitc::MODULE_CODE_COMDAT, Vals, /*AbbrevToUse=*/0); Vals.clear(); } @@ -1166,6 +1165,7 @@ void ModuleBitcodeWriter::writeModuleInfo() { // Add an abbrev for common globals with no visibility or thread localness. auto Abbv = std::make_shared<BitCodeAbbrev>(); Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_GLOBALVAR)); + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, Log2_32_Ceil(MaxGlobalType+1))); Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // AddrSpace << 2 @@ -1189,15 +1189,41 @@ void ModuleBitcodeWriter::writeModuleInfo() { SimpleGVarAbbrev = Stream.EmitAbbrev(std::move(Abbv)); } - // Emit the global variable information. SmallVector<unsigned, 64> Vals; + // Emit the module's source file name. + { + StringEncoding Bits = getStringEncoding(M.getSourceFileName().data(), + M.getSourceFileName().size()); + BitCodeAbbrevOp AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8); + if (Bits == SE_Char6) + AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Char6); + else if (Bits == SE_Fixed7) + AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7); + + // MODULE_CODE_SOURCE_FILENAME: [namechar x N] + auto Abbv = std::make_shared<BitCodeAbbrev>(); + Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_SOURCE_FILENAME)); + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); + Abbv->Add(AbbrevOpToUse); + unsigned FilenameAbbrev = Stream.EmitAbbrev(std::move(Abbv)); + + for (const auto P : M.getSourceFileName()) + Vals.push_back((unsigned char)P); + + // Emit the finished record. + Stream.EmitRecord(bitc::MODULE_CODE_SOURCE_FILENAME, Vals, FilenameAbbrev); + Vals.clear(); + } + + // Emit the global variable information. for (const GlobalVariable &GV : M.globals()) { unsigned AbbrevToUse = 0; - // GLOBALVAR: [type, isconst, initid, + // GLOBALVAR: [(name, )type, isconst, initid, // linkage, alignment, section, visibility, threadlocal, // unnamed_addr, externally_initialized, dllstorageclass, // comdat] + Vals.push_back(StrtabBuilder.add(GV.getName())); Vals.push_back(VE.getTypeID(GV.getValueType())); Vals.push_back(GV.getType()->getAddressSpace() << 2 | 2 | GV.isConstant()); Vals.push_back(GV.isDeclaration() ? 0 : @@ -1230,6 +1256,7 @@ void ModuleBitcodeWriter::writeModuleInfo() { // FUNCTION: [type, callingconv, isproto, linkage, paramattrs, alignment, // section, visibility, gc, unnamed_addr, prologuedata, // dllstorageclass, comdat, prefixdata, personalityfn] + Vals.push_back(StrtabBuilder.add(F.getName())); Vals.push_back(VE.getTypeID(F.getFunctionType())); Vals.push_back(F.getCallingConv()); Vals.push_back(F.isDeclaration()); @@ -1258,6 +1285,7 @@ void ModuleBitcodeWriter::writeModuleInfo() { for (const GlobalAlias &A : M.aliases()) { // ALIAS: [alias type, aliasee val#, linkage, visibility, dllstorageclass, // threadlocal, unnamed_addr] + Vals.push_back(StrtabBuilder.add(A.getName())); Vals.push_back(VE.getTypeID(A.getValueType())); Vals.push_back(A.getType()->getAddressSpace()); Vals.push_back(VE.getValueID(A.getAliasee())); @@ -1274,6 +1302,7 @@ void ModuleBitcodeWriter::writeModuleInfo() { // Emit the ifunc information. for (const GlobalIFunc &I : M.ifuncs()) { // IFUNC: [ifunc type, address space, resolver val#, linkage, visibility] + Vals.push_back(StrtabBuilder.add(I.getName())); Vals.push_back(VE.getTypeID(I.getValueType())); Vals.push_back(I.getType()->getAddressSpace()); Vals.push_back(VE.getValueID(I.getResolver())); @@ -1283,34 +1312,6 @@ void ModuleBitcodeWriter::writeModuleInfo() { Vals.clear(); } - // Emit the module's source file name. - { - StringEncoding Bits = getStringEncoding(M.getSourceFileName().data(), - M.getSourceFileName().size()); - BitCodeAbbrevOp AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8); - if (Bits == SE_Char6) - AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Char6); - else if (Bits == SE_Fixed7) - AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7); - - // MODULE_CODE_SOURCE_FILENAME: [namechar x N] - auto Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_SOURCE_FILENAME)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(AbbrevOpToUse); - unsigned FilenameAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - - for (const auto P : M.getSourceFileName()) - Vals.push_back((unsigned char)P); - - // Emit the finished record. - Stream.EmitRecord(bitc::MODULE_CODE_SOURCE_FILENAME, Vals, FilenameAbbrev); - Vals.clear(); - } - - // If we have a VST, write the VSTOFFSET record placeholder. - if (M.getValueSymbolTable().empty()) - return; writeValueSymbolTableForwardDecl(); } @@ -2840,21 +2841,9 @@ void ModuleBitcodeWriter::writeInstruction(const Instruction &I, Vals.clear(); } -/// Emit names for globals/functions etc. \p IsModuleLevel is true when -/// we are writing the module-level VST, where we are including a function -/// bitcode index and need to backpatch the VST forward declaration record. -void ModuleBitcodeWriter::writeValueSymbolTable( - const ValueSymbolTable &VST, bool IsModuleLevel, - DenseMap<const Function *, uint64_t> *FunctionToBitcodeIndex) { - if (VST.empty()) { - // writeValueSymbolTableForwardDecl should have returned early as - // well. Ensure this handling remains in sync by asserting that - // the placeholder offset is not set. - assert(!IsModuleLevel || !hasVSTOffsetPlaceholder()); - return; - } - - if (IsModuleLevel && hasVSTOffsetPlaceholder()) { +void ModuleBitcodeWriter::writeGlobalValueSymbolTable( + DenseMap<const Function *, uint64_t> &FunctionToBitcodeIndex) { + if (hasVSTOffsetPlaceholder()) { // Get the offset of the VST we are writing, and backpatch it into // the VST forward declaration record. uint64_t VSTOffset = Stream.GetCurrentBitNo(); @@ -2869,49 +2858,47 @@ void ModuleBitcodeWriter::writeValueSymbolTable( Stream.EnterSubblock(bitc::VALUE_SYMTAB_BLOCK_ID, 4); - // For the module-level VST, add abbrev Ids for the VST_CODE_FNENTRY - // records, which are not used in the per-function VSTs. - unsigned FnEntry8BitAbbrev; - unsigned FnEntry7BitAbbrev; - unsigned FnEntry6BitAbbrev; - unsigned GUIDEntryAbbrev; - if (IsModuleLevel && hasVSTOffsetPlaceholder()) { - // 8-bit fixed-width VST_CODE_FNENTRY function strings. + unsigned FnEntryAbbrev; + if (hasVSTOffsetPlaceholder()) { auto Abbv = std::make_shared<BitCodeAbbrev>(); Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY)); Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8)); - FnEntry8BitAbbrev = Stream.EmitAbbrev(std::move(Abbv)); + FnEntryAbbrev = Stream.EmitAbbrev(std::move(Abbv)); + } - // 7-bit fixed width VST_CODE_FNENTRY function strings. - Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7)); - FnEntry7BitAbbrev = Stream.EmitAbbrev(std::move(Abbv)); + for (const Function &F : M) { + uint64_t Record[2]; - // 6-bit char6 VST_CODE_FNENTRY function strings. - Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Char6)); - FnEntry6BitAbbrev = Stream.EmitAbbrev(std::move(Abbv)); + if (F.isDeclaration()) + continue; - // FIXME: Change the name of this record as it is now used by - // the per-module index as well. - Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_COMBINED_ENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // refguid - GUIDEntryAbbrev = Stream.EmitAbbrev(std::move(Abbv)); + Record[0] = VE.getValueID(&F); + + // Save the word offset of the function (from the start of the + // actual bitcode written to the stream). + uint64_t BitcodeIndex = FunctionToBitcodeIndex[&F] - bitcodeStartBit(); + assert((BitcodeIndex & 31) == 0 && "function block not 32-bit aligned"); + // Note that we add 1 here because the offset is relative to one word + // before the start of the identification block, which was historically + // always the start of the regular bitcode header. + Record[1] = BitcodeIndex / 32 + 1; + + Stream.EmitRecord(bitc::VST_CODE_FNENTRY, Record, FnEntryAbbrev); } + Stream.ExitBlock(); +} + +/// Emit names for globals/functions etc. \p IsModuleLevel is true when +/// we are writing the module-level VST, where we are including a function +/// bitcode index and need to backpatch the VST forward declaration record. +void ModuleBitcodeWriter::writeValueSymbolTable(const ValueSymbolTable &VST) { + if (VST.empty()) + return; + + Stream.EnterSubblock(bitc::VALUE_SYMTAB_BLOCK_ID, 4); + // FIXME: Set up the abbrev, we know how many values there are! // FIXME: We know if the type names can use 7-bit ascii. SmallVector<uint64_t, 64> NameVals; @@ -2924,15 +2911,6 @@ void ModuleBitcodeWriter::writeValueSymbolTable( unsigned AbbrevToUse = VST_ENTRY_8_ABBREV; NameVals.push_back(VE.getValueID(Name.getValue())); - Function *F = dyn_cast<Function>(Name.getValue()); - if (!F) { - // If value is an alias, need to get the aliased base object to - // see if it is a function. - auto *GA = dyn_cast<GlobalAlias>(Name.getValue()); - if (GA && GA->getBaseObject()) - F = dyn_cast<Function>(GA->getBaseObject()); - } - // VST_CODE_ENTRY: [valueid, namechar x N] // VST_CODE_FNENTRY: [valueid, funcoffset, namechar x N] // VST_CODE_BBENTRY: [bbid, namechar x N] @@ -2941,28 +2919,6 @@ void ModuleBitcodeWriter::writeValueSymbolTable( Code = bitc::VST_CODE_BBENTRY; if (Bits == SE_Char6) AbbrevToUse = VST_BBENTRY_6_ABBREV; - } else if (F && !F->isDeclaration()) { - // Must be the module-level VST, where we pass in the Index and - // have a VSTOffsetPlaceholder. The function-level VST should not - // contain any Function symbols. - assert(FunctionToBitcodeIndex); - assert(hasVSTOffsetPlaceholder()); - - // Save the word offset of the function (from the start of the - // actual bitcode written to the stream). - uint64_t BitcodeIndex = (*FunctionToBitcodeIndex)[F] - bitcodeStartBit(); - assert((BitcodeIndex & 31) == 0 && "function block not 32-bit aligned"); - // Note that we add 1 here because the offset is relative to one word - // before the start of the identification block, which was historically - // always the start of the regular bitcode header. - NameVals.push_back(BitcodeIndex / 32 + 1); - - Code = bitc::VST_CODE_FNENTRY; - AbbrevToUse = FnEntry8BitAbbrev; - if (Bits == SE_Char6) - AbbrevToUse = FnEntry6BitAbbrev; - else if (Bits == SE_Fixed7) - AbbrevToUse = FnEntry7BitAbbrev; } else { Code = bitc::VST_CODE_ENTRY; if (Bits == SE_Char6) @@ -2978,47 +2934,7 @@ void ModuleBitcodeWriter::writeValueSymbolTable( Stream.EmitRecord(Code, NameVals, AbbrevToUse); NameVals.clear(); } - // Emit any GUID valueIDs created for indirect call edges into the - // module-level VST. - if (IsModuleLevel && hasVSTOffsetPlaceholder()) - for (const auto &GI : valueIds()) { - NameVals.push_back(GI.second); - NameVals.push_back(GI.first); - Stream.EmitRecord(bitc::VST_CODE_COMBINED_ENTRY, NameVals, - GUIDEntryAbbrev); - NameVals.clear(); - } - Stream.ExitBlock(); -} - -/// Emit function names and summary offsets for the combined index -/// used by ThinLTO. -void IndexBitcodeWriter::writeCombinedValueSymbolTable() { - assert(hasVSTOffsetPlaceholder() && "Expected non-zero VSTOffsetPlaceholder"); - // Get the offset of the VST we are writing, and backpatch it into - // the VST forward declaration record. - uint64_t VSTOffset = Stream.GetCurrentBitNo(); - assert((VSTOffset & 31) == 0 && "VST block not 32-bit aligned"); - Stream.BackpatchWord(VSTOffsetPlaceholder, VSTOffset / 32); - - Stream.EnterSubblock(bitc::VALUE_SYMTAB_BLOCK_ID, 4); - - auto Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_COMBINED_ENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // refguid - unsigned EntryAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - SmallVector<uint64_t, 64> NameVals; - for (const auto &GVI : valueIds()) { - // VST_CODE_COMBINED_ENTRY: [valueid, refguid] - NameVals.push_back(GVI.second); - NameVals.push_back(GVI.first); - - // Emit the finished record. - Stream.EmitRecord(bitc::VST_CODE_COMBINED_ENTRY, NameVals, EntryAbbrev); - NameVals.clear(); - } Stream.ExitBlock(); } @@ -3510,6 +3426,11 @@ void ModuleBitcodeWriter::writePerModuleGlobalValueSummary() { return; } + for (const auto &GVI : valueIds()) { + Stream.EmitRecord(bitc::FS_VALUE_GUID, + ArrayRef<uint64_t>{GVI.second, GVI.first}); + } + // Abbrev for FS_PERMODULE. auto Abbv = std::make_shared<BitCodeAbbrev>(); Abbv->Add(BitCodeAbbrevOp(bitc::FS_PERMODULE)); @@ -3602,6 +3523,39 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() { Stream.EnterSubblock(bitc::GLOBALVAL_SUMMARY_BLOCK_ID, 3); Stream.EmitRecord(bitc::FS_VERSION, ArrayRef<uint64_t>{INDEX_VERSION}); + // Create value IDs for undefined references. + for (const auto &I : *this) { + if (auto *VS = dyn_cast<GlobalVarSummary>(I.second)) { + for (auto &RI : VS->refs()) + getValueId(RI.getGUID()); + continue; + } + + auto *FS = dyn_cast<FunctionSummary>(I.second); + if (!FS) + continue; + for (auto &RI : FS->refs()) + getValueId(RI.getGUID()); + + for (auto &EI : FS->calls()) { + GlobalValue::GUID GUID = EI.first.getGUID(); + if (!hasValueId(GUID)) { + // For SamplePGO, the indirect call targets for local functions will + // have its original name annotated in profile. We try to find the + // corresponding PGOFuncName as the GUID. + GUID = Index.getGUIDFromOriginalID(GUID); + if (GUID == 0 || !hasValueId(GUID)) + continue; + } + getValueId(GUID); + } + } + + for (const auto &GVI : valueIds()) { + Stream.EmitRecord(bitc::FS_VALUE_GUID, + ArrayRef<uint64_t>{GVI.second, GVI.first}); + } + // Abbrev for FS_COMBINED. auto Abbv = std::make_shared<BitCodeAbbrev>(); Abbv->Add(BitCodeAbbrevOp(bitc::FS_COMBINED)); @@ -3816,10 +3770,7 @@ void ModuleBitcodeWriter::write() { Stream.EnterSubblock(bitc::MODULE_BLOCK_ID, 3); size_t BlockStartPos = Buffer.size(); - SmallVector<unsigned, 1> Vals; - unsigned CurVersion = 1; - Vals.push_back(CurVersion); - Stream.EmitRecord(bitc::MODULE_CODE_VERSION, Vals); + Stream.EmitRecord(bitc::MODULE_CODE_VERSION, ArrayRef<uint64_t>{2}); // Emit blockinfo, which defines the standard abbreviations etc. writeBlockInfo(); @@ -3865,8 +3816,7 @@ void ModuleBitcodeWriter::write() { if (Index) writePerModuleGlobalValueSummary(); - writeValueSymbolTable(M.getValueSymbolTable(), - /* IsModuleLevel */ true, &FunctionToBitcodeIndex); + writeGlobalValueSymbolTable(FunctionToBitcodeIndex); writeModuleHash(BlockStartPos); @@ -3954,13 +3904,46 @@ BitcodeWriter::BitcodeWriter(SmallVectorImpl<char> &Buffer) writeBitcodeHeader(*Stream); } -BitcodeWriter::~BitcodeWriter() = default; +BitcodeWriter::~BitcodeWriter() { assert(WroteStrtab); } + +void BitcodeWriter::writeBlob(unsigned Block, unsigned Record, StringRef Blob) { + Stream->EnterSubblock(Block, 3); + + auto Abbv = std::make_shared<BitCodeAbbrev>(); + Abbv->Add(BitCodeAbbrevOp(Record)); + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); + auto AbbrevNo = Stream->EmitAbbrev(std::move(Abbv)); + + Stream->EmitRecordWithBlob(AbbrevNo, ArrayRef<uint64_t>{Record}, Blob); + + Stream->ExitBlock(); +} + +void BitcodeWriter::writeStrtab() { + assert(!WroteStrtab); + + std::string Strtab; + raw_string_ostream OS(Strtab); + + StrtabBuilder.finalizeInOrder(); + StrtabBuilder.write(OS); + + OS.flush(); + writeBlob(bitc::STRTAB_BLOCK_ID, bitc::STRTAB_BLOB, Strtab); + + WroteStrtab = true; +} + +void BitcodeWriter::copyStrtab(StringRef Strtab) { + writeBlob(bitc::STRTAB_BLOCK_ID, bitc::STRTAB_BLOB, Strtab); + WroteStrtab = true; +} void BitcodeWriter::writeModule(const Module *M, bool ShouldPreserveUseListOrder, const ModuleSummaryIndex *Index, bool GenerateHash, ModuleHash *ModHash) { - ModuleBitcodeWriter ModuleWriter(M, Buffer, *Stream, + ModuleBitcodeWriter ModuleWriter(M, Buffer, StrtabBuilder, *Stream, ShouldPreserveUseListOrder, Index, GenerateHash, ModHash); ModuleWriter.write(); @@ -3984,6 +3967,7 @@ void llvm::WriteBitcodeToFile(const Module *M, raw_ostream &Out, BitcodeWriter Writer(Buffer); Writer.writeModule(M, ShouldPreserveUseListOrder, Index, GenerateHash, ModHash); + Writer.writeStrtab(); if (TT.isOSDarwin() || TT.isOSBinFormatMachO()) emitDarwinBCHeaderAndTrailer(Buffer, TT); @@ -3995,13 +3979,7 @@ void llvm::WriteBitcodeToFile(const Module *M, raw_ostream &Out, void IndexBitcodeWriter::write() { Stream.EnterSubblock(bitc::MODULE_BLOCK_ID, 3); - SmallVector<unsigned, 1> Vals; - unsigned CurVersion = 1; - Vals.push_back(CurVersion); - Stream.EmitRecord(bitc::MODULE_CODE_VERSION, Vals); - - // If we have a VST, write the VSTOFFSET record placeholder. - writeValueSymbolTableForwardDecl(); + Stream.EmitRecord(bitc::MODULE_CODE_VERSION, ArrayRef<uint64_t>{2}); // Write the module paths in the combined index. writeModStrings(); @@ -4009,10 +3987,6 @@ void IndexBitcodeWriter::write() { // Write the summary combined index records. writeCombinedGlobalValueSummary(); - // Need a special VST writer for the combined index (we don't have a - // real VST and real values when this is invoked). - writeCombinedValueSymbolTable(); - Stream.ExitBlock(); } diff --git a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp index 393213f0810..416f0565abd 100644 --- a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp +++ b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp @@ -336,6 +336,7 @@ void splitAndWriteThinLTOBitcode( W.writeModule(&M, /*ShouldPreserveUseListOrder=*/false, &Index, /*GenerateHash=*/true, &ModHash); W.writeModule(MergedM.get()); + W.writeStrtab(); OS << Buffer; // If a minimized bitcode module was requested for the thin link, @@ -348,6 +349,7 @@ void splitAndWriteThinLTOBitcode( W2.writeModule(&M, /*ShouldPreserveUseListOrder=*/false, &Index, /*GenerateHash=*/false, &ModHash); W2.writeModule(MergedM.get()); + W2.writeStrtab(); *ThinLinkOS << Buffer; } } diff --git a/llvm/tools/llvm-cat/llvm-cat.cpp b/llvm/tools/llvm-cat/llvm-cat.cpp index d884970309b..c9a65db3229 100644 --- a/llvm/tools/llvm-cat/llvm-cat.cpp +++ b/llvm/tools/llvm-cat/llvm-cat.cpp @@ -44,11 +44,14 @@ int main(int argc, char **argv) { std::unique_ptr<MemoryBuffer> MB = ExitOnErr( errorOrToExpected(MemoryBuffer::getFileOrSTDIN(InputFilename))); std::vector<BitcodeModule> Mods = ExitOnErr(getBitcodeModuleList(*MB)); - for (auto &BitcodeMod : Mods) + for (auto &BitcodeMod : Mods) { Buffer.insert(Buffer.end(), BitcodeMod.getBuffer().begin(), BitcodeMod.getBuffer().end()); + Writer.copyStrtab(BitcodeMod.getStrtab()); + } } } else { + std::vector<std::unique_ptr<Module>> OwnedMods; for (std::string InputFilename : InputFilenames) { SMDiagnostic Err; std::unique_ptr<Module> M = parseIRFile(InputFilename, Err, Context); @@ -57,7 +60,9 @@ int main(int argc, char **argv) { return 1; } Writer.writeModule(M.get()); + OwnedMods.push_back(std::move(M)); } + Writer.writeStrtab(); } std::error_code EC; diff --git a/llvm/tools/llvm-modextract/llvm-modextract.cpp b/llvm/tools/llvm-modextract/llvm-modextract.cpp index 6c2e364be44..375e7baf968 100644 --- a/llvm/tools/llvm-modextract/llvm-modextract.cpp +++ b/llvm/tools/llvm-modextract/llvm-modextract.cpp @@ -61,7 +61,10 @@ int main(int argc, char **argv) { if (BinaryExtract) { SmallVector<char, 0> Header; BitcodeWriter Writer(Header); - Out->os() << Header << Ms[ModuleIndex].getBuffer(); + Header.append(Ms[ModuleIndex].getBuffer().begin(), + Ms[ModuleIndex].getBuffer().end()); + Writer.copyStrtab(Ms[ModuleIndex].getStrtab()); + Out->os() << Header; Out->keep(); return 0; } -------------- next part -------------- diff --git a/llvm/include/llvm/Bitcode/BitcodeReader.h b/llvm/include/llvm/Bitcode/BitcodeReader.h index 9e042b17241..0701ddbb7f1 100644 --- a/llvm/include/llvm/Bitcode/BitcodeReader.h +++ b/llvm/include/llvm/Bitcode/BitcodeReader.h @@ -46,6 +46,9 @@ namespace llvm { ArrayRef<uint8_t> Buffer; StringRef ModuleIdentifier; + // The string table used to interpret this module. + StringRef Strtab; + // The bitstream location of the IDENTIFICATION_BLOCK. uint64_t IdentificationBit; @@ -70,6 +73,7 @@ namespace llvm { StringRef getBuffer() const { return StringRef((const char *)Buffer.begin(), Buffer.size()); } + StringRef getStrtab() const { return Strtab; } StringRef getModuleIdentifier() const { return ModuleIdentifier; } diff --git a/llvm/include/llvm/Bitcode/BitcodeWriter.h b/llvm/include/llvm/Bitcode/BitcodeWriter.h index 271cb2d81bb..c7ddb8b333c 100644 --- a/llvm/include/llvm/Bitcode/BitcodeWriter.h +++ b/llvm/include/llvm/Bitcode/BitcodeWriter.h @@ -15,6 +15,7 @@ #define LLVM_BITCODE_BITCODEWRITER_H #include "llvm/IR/ModuleSummaryIndex.h" +#include "llvm/MC/StringTableBuilder.h" #include <string> namespace llvm { @@ -26,12 +27,25 @@ namespace llvm { SmallVectorImpl<char> &Buffer; std::unique_ptr<BitstreamWriter> Stream; + StringTableBuilder StrtabBuilder{StringTableBuilder::ELF}; + bool WroteStrtab = false; + + void writeBlob(unsigned Block, unsigned Record, StringRef Blob); + public: /// Create a BitcodeWriter that writes to Buffer. BitcodeWriter(SmallVectorImpl<char> &Buffer); ~BitcodeWriter(); + /// Write the bitcode file's string table. This must be called exactly once + /// after all modules have been written. + void writeStrtab(); + + /// Copy the string table for another module into this bitcode file. This + /// should be called after copying the module itself into the bitcode file. + void copyStrtab(StringRef Strtab); + /// Write the specified module to the buffer specified at construction time. /// /// If \c ShouldPreserveUseListOrder, encode the use-list order for each \a diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h index e2d2fbb0f44..9ab4b6ab143 100644 --- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h +++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h @@ -22,7 +22,7 @@ namespace llvm { namespace bitc { -// The only top-level block type defined is for a module. +// The only top-level block types are MODULE, IDENTIFICATION and STRTAB. enum BlockIDs { // Blocks MODULE_BLOCK_ID = FIRST_APPLICATION_BLOCKID, @@ -52,7 +52,9 @@ enum BlockIDs { OPERAND_BUNDLE_TAGS_BLOCK_ID, - METADATA_KIND_BLOCK_ID + METADATA_KIND_BLOCK_ID, + + STRTAB_BLOCK_ID, }; /// Identification block contains a string that describes the producer details, @@ -171,8 +173,9 @@ enum ValueSymtabCodes { VST_CODE_ENTRY = 1, // VST_ENTRY: [valueid, namechar x N] VST_CODE_BBENTRY = 2, // VST_BBENTRY: [bbid, namechar x N] VST_CODE_FNENTRY = 3, // VST_FNENTRY: [valueid, offset, namechar x N] + // 4 is available for reuse because it was never used in a release. // VST_COMBINED_ENTRY: [valueid, refguid] - VST_CODE_COMBINED_ENTRY = 5 + VST_CODE_COMBINED_ENTRY = 5, }; // The module path symbol table only has one code (MST_CODE_ENTRY). @@ -232,6 +235,7 @@ enum GlobalValueSummarySymtabCodes { // llvm.type.checked.load intrinsic with all constant integer arguments. // [typeid, offset, n x arg] FS_TYPE_CHECKED_LOAD_CONST_VCALL = 15, + FS_VALUE_GUID = 16, }; enum MetadataCodes { @@ -550,6 +554,10 @@ enum ComdatSelectionKindCodes { COMDAT_SELECTION_KIND_SAME_SIZE = 5, }; +enum StrtabCodes { + STRTAB_BLOB = 1, +}; + } // End bitc namespace } // End llvm namespace diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index 23649e811ed..1556a02b259 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -372,12 +372,28 @@ Expected<std::string> readTriple(BitstreamCursor &Stream) { class BitcodeReaderBase { protected: - BitcodeReaderBase(BitstreamCursor Stream) : Stream(std::move(Stream)) { + BitcodeReaderBase(BitstreamCursor Stream, StringRef Strtab) + : Stream(std::move(Stream)), Strtab(Strtab) { this->Stream.setBlockInfo(&BlockInfo); } BitstreamBlockInfo BlockInfo; BitstreamCursor Stream; + StringRef Strtab; + + /// Indicates that we are using a new encoding for instruction operands where + /// most operands in the current FUNCTION_BLOCK are encoded relative to the + /// instruction number, for a more compact encoding. Some instruction + /// operands are not relative to the instruction ID: basic block numbers, and + /// types. Once the old style function blocks have been phased out, we would + /// not need this flag. + bool UseRelativeIDs = false; + + bool UseStrtab = false; + + Error readVersionRecord(ArrayRef<uint64_t> Record); + std::pair<StringRef, ArrayRef<uint64_t>> + readNameFromStrtab(ArrayRef<uint64_t> Record); bool readBlockInfo(); @@ -405,6 +421,9 @@ class BitcodeReader : public BitcodeReaderBase, public GVMaterializer { bool SeenValueSymbolTable = false; uint64_t VSTOffset = 0; + std::vector<std::string> SectionTable; + std::vector<std::string> GCTable; + std::vector<Type*> TypeList; BitcodeReaderValueList ValueList; Optional<MetadataLoader> MDLoader; @@ -459,14 +478,6 @@ class BitcodeReader : public BitcodeReaderBase, public GVMaterializer { DenseMap<Function *, std::vector<BasicBlock *>> BasicBlockFwdRefs; std::deque<Function *> BasicBlockFwdRefQueue; - /// Indicates that we are using a new encoding for instruction operands where - /// most operands in the current FUNCTION_BLOCK are encoded relative to the - /// instruction number, for a more compact encoding. Some instruction - /// operands are not relative to the instruction ID: basic block numbers, and - /// types. Once the old style function blocks have been phased out, we would - /// not need this flag. - bool UseRelativeIDs = false; - /// True if all functions will be materialized, negating the need to process /// (e.g.) blockaddress forward references. bool WillMaterializeAllForwardRefs = false; @@ -477,8 +488,8 @@ class BitcodeReader : public BitcodeReaderBase, public GVMaterializer { std::vector<std::string> BundleTags; public: - BitcodeReader(BitstreamCursor Stream, StringRef ProducerIdentification, - LLVMContext &Context); + BitcodeReader(BitstreamCursor Stream, StringRef Strtab, + StringRef ProducerIdentification, LLVMContext &Context); Error materializeForwardReferencedFunctions(); @@ -598,6 +609,12 @@ private: Error parseAlignmentValue(uint64_t Exponent, unsigned &Alignment); Error parseAttrKind(uint64_t Code, Attribute::AttrKind *Kind); Error parseModule(uint64_t ResumeBit, bool ShouldLazyLoadMetadata = false); + + Error readComdatRecord(ArrayRef<uint64_t> Record); + Error readGlobalVarRecord(ArrayRef<uint64_t> Record); + Error readFunctionRecord(ArrayRef<uint64_t> Record); + Error readGlobalIndirectSymbolRecord(unsigned BitCode, ArrayRef<uint64_t> Record); + Error parseAttributeBlock(); Error parseAttributeGroupBlock(); Error parseTypeTable(); @@ -607,6 +624,7 @@ private: Expected<Value *> recordValue(SmallVectorImpl<uint64_t> &Record, unsigned NameIndex, Triple &TT); Error parseValueSymbolTable(uint64_t Offset = 0); + Error parseGlobalValueSymbolTable(); Error parseConstants(); Error rememberAndSkipFunctionBodies(); Error rememberAndSkipFunctionBody(); @@ -659,8 +677,8 @@ class ModuleSummaryIndexBitcodeReader : public BitcodeReaderBase { std::string SourceFileName; public: - ModuleSummaryIndexBitcodeReader( - BitstreamCursor Stream, ModuleSummaryIndex &TheIndex); + ModuleSummaryIndexBitcodeReader(BitstreamCursor Stream, StringRef Strtab, + ModuleSummaryIndex &TheIndex); Error parseModule(StringRef ModulePath); @@ -694,10 +712,10 @@ std::error_code llvm::errorToErrorCodeAndEmitErrors(LLVMContext &Ctx, return std::error_code(); } -BitcodeReader::BitcodeReader(BitstreamCursor Stream, +BitcodeReader::BitcodeReader(BitstreamCursor Stream, StringRef Strtab, StringRef ProducerIdentification, LLVMContext &Context) - : BitcodeReaderBase(std::move(Stream)), Context(Context), + : BitcodeReaderBase(std::move(Stream), Strtab), Context(Context), ValueList(Context) { this->ProducerIdentification = ProducerIdentification; } @@ -1682,14 +1700,15 @@ Error BitcodeReader::parseOperandBundleTags() { /// Associate a value with its name from the given index in the provided record. Expected<Value *> BitcodeReader::recordValue(SmallVectorImpl<uint64_t> &Record, unsigned NameIndex, Triple &TT) { - SmallString<128> ValueName; - if (convertToString(Record, NameIndex, ValueName)) - return error("Invalid record"); unsigned ValueID = Record[0]; if (ValueID >= ValueList.size() || !ValueList[ValueID]) return error("Invalid record"); Value *V = ValueList[ValueID]; + SmallString<128> ValueName; + if (convertToString(Record, NameIndex, ValueName)) + return error("Invalid record"); + StringRef NameStr(ValueName.data(), ValueName.size()); if (NameStr.find_first_of(0) != StringRef::npos) return error("Invalid value name"); @@ -1727,6 +1746,48 @@ static uint64_t jumpToValueSymbolTable(uint64_t Offset, return CurrentBit; } +Error BitcodeReader::parseGlobalValueSymbolTable() { + unsigned FuncBitcodeOffsetDelta + Stream.getAbbrevIDWidth() + bitc::BlockIDWidth; + + if (Stream.EnterSubBlock(bitc::VALUE_SYMTAB_BLOCK_ID)) + return error("Invalid record"); + + SmallVector<uint64_t, 64> Record; + while (true) { + BitstreamEntry Entry = Stream.advanceSkippingSubblocks(); + + switch (Entry.Kind) { + case BitstreamEntry::SubBlock: + case BitstreamEntry::Error: + return error("Malformed block"); + case BitstreamEntry::EndBlock: + return Error::success(); + case BitstreamEntry::Record: + break; + } + + Record.clear(); + switch (Stream.readRecord(Entry.ID, Record)) { + case bitc::VST_CODE_FNENTRY: { // [valueid, offset, namechar x N] + // Note that we subtract 1 here because the offset is relative to one word + // before the start of the identification or module block, which was + // historically always the start of the regular bitcode header. + uint64_t FuncWordOffset = Record[1] - 1; + uint64_t FuncBitOffset = FuncWordOffset * 32; + DeferredFunctionInfo[cast<Function>(ValueList[Record[0]])] + FuncBitOffset + FuncBitcodeOffsetDelta; + // Set the LastFunctionBlockBit to point to the last function block. + // Later when parsing is resumed after function materialization, + // we can simply skip that last function block. + if (FuncBitOffset > LastFunctionBlockBit) + LastFunctionBlockBit = FuncBitOffset; + break; + } + } + } +} + /// Parse the value symbol table at either the current parsing location or /// at the given bit offset if provided. Error BitcodeReader::parseValueSymbolTable(uint64_t Offset) { @@ -1734,8 +1795,15 @@ Error BitcodeReader::parseValueSymbolTable(uint64_t Offset) { // Pass in the Offset to distinguish between calling for the module-level // VST (where we want to jump to the VST offset) and the function-level // VST (where we don't). - if (Offset > 0) + if (Offset > 0) { CurrentBit = jumpToValueSymbolTable(Offset, Stream); + if (UseStrtab) { + if (Error Err = parseGlobalValueSymbolTable()) + return Err; + Stream.JumpToBit(CurrentBit); + return Error::success(); + } + } // Compute the delta between the bitcode indices in the VST (the word offset // to the word-aligned ENTER_SUBBLOCK for the function block, and that @@ -1779,18 +1847,17 @@ Error BitcodeReader::parseValueSymbolTable(uint64_t Offset) { // Read a record. Record.clear(); - switch (Stream.readRecord(Entry.ID, Record)) { + switch (unsigned Code = Stream.readRecord(Entry.ID, Record)) { default: // Default behavior: unknown type. break; - case bitc::VST_CODE_ENTRY: { // VST_CODE_ENTRY: [valueid, namechar x N] + case bitc::VST_CODE_ENTRY: { // [valueid, namechar x N] Expected<Value *> ValOrErr = recordValue(Record, 1, TT); if (Error Err = ValOrErr.takeError()) return Err; ValOrErr.get(); break; } - case bitc::VST_CODE_FNENTRY: { - // VST_CODE_FNENTRY: [valueid, offset, namechar x N] + case bitc::VST_CODE_FNENTRY: { // [valueid, offset, namechar x N] Expected<Value *> ValOrErr = recordValue(Record, 2, TT); if (Error Err = ValOrErr.takeError()) return Err; @@ -2609,6 +2676,271 @@ bool BitcodeReaderBase::readBlockInfo() { return false; } +std::pair<StringRef, ArrayRef<uint64_t>> +BitcodeReaderBase::readNameFromStrtab(ArrayRef<uint64_t> Record) { + if (!UseStrtab) + return {"", Record}; + if (Record[0] >= Strtab.size()) + return {"", {}}; + return {Strtab.data() + Record[0], Record.slice(1)}; +} + +Error BitcodeReader::readComdatRecord(ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + if (Record.size() < 1) + return error("Invalid record"); + Comdat::SelectionKind SK = getDecodedComdatSelectionKind(Record[0]); + std::string OldFormatName; + if (!UseStrtab) { + if (Record.size() < 2) + return error("Invalid record"); + unsigned ComdatNameSize = Record[1]; + OldFormatName.reserve(ComdatNameSize); + for (unsigned i = 0; i != ComdatNameSize; ++i) + OldFormatName += (char)Record[2 + i]; + Name = OldFormatName; + } + Comdat *C = TheModule->getOrInsertComdat(Name); + C->setSelectionKind(SK); + ComdatList.push_back(C); + return Error::success(); +} + +Error BitcodeReader::readGlobalVarRecord(ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + if (Record.size() < 6) + return error("Invalid record"); + Type *Ty = getTypeByID(Record[0]); + if (!Ty) + return error("Invalid record"); + bool isConstant = Record[1] & 1; + bool explicitType = Record[1] & 2; + unsigned AddressSpace; + if (explicitType) { + AddressSpace = Record[1] >> 2; + } else { + if (!Ty->isPointerTy()) + return error("Invalid type for value"); + AddressSpace = cast<PointerType>(Ty)->getAddressSpace(); + Ty = cast<PointerType>(Ty)->getElementType(); + } + + uint64_t RawLinkage = Record[3]; + GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); + unsigned Alignment; + if (Error Err = parseAlignmentValue(Record[4], Alignment)) + return Err; + std::string Section; + if (Record[5]) { + if (Record[5] - 1 >= SectionTable.size()) + return error("Invalid ID"); + Section = SectionTable[Record[5] - 1]; + } + GlobalValue::VisibilityTypes Visibility = GlobalValue::DefaultVisibility; + // Local linkage must have default visibility. + if (Record.size() > 6 && !GlobalValue::isLocalLinkage(Linkage)) + // FIXME: Change to an error if non-default in 4.0. + Visibility = getDecodedVisibility(Record[6]); + + GlobalVariable::ThreadLocalMode TLM = GlobalVariable::NotThreadLocal; + if (Record.size() > 7) + TLM = getDecodedThreadLocalMode(Record[7]); + + GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; + if (Record.size() > 8) + UnnamedAddr = getDecodedUnnamedAddrType(Record[8]); + + bool ExternallyInitialized = false; + if (Record.size() > 9) + ExternallyInitialized = Record[9]; + + GlobalVariable *NewGV + new GlobalVariable(*TheModule, Ty, isConstant, Linkage, nullptr, Name, + nullptr, TLM, AddressSpace, ExternallyInitialized); + NewGV->setAlignment(Alignment); + if (!Section.empty()) + NewGV->setSection(Section); + NewGV->setVisibility(Visibility); + NewGV->setUnnamedAddr(UnnamedAddr); + + if (Record.size() > 10) + NewGV->setDLLStorageClass(getDecodedDLLStorageClass(Record[10])); + else + upgradeDLLImportExportLinkage(NewGV, RawLinkage); + + ValueList.push_back(NewGV); + + // Remember which value to use for the global initializer. + if (unsigned InitID = Record[2]) + GlobalInits.push_back(std::make_pair(NewGV, InitID - 1)); + + if (Record.size() > 11) { + if (unsigned ComdatID = Record[11]) { + if (ComdatID > ComdatList.size()) + return error("Invalid global variable comdat ID"); + NewGV->setComdat(ComdatList[ComdatID - 1]); + } + } else if (hasImplicitComdat(RawLinkage)) { + NewGV->setComdat(reinterpret_cast<Comdat *>(1)); + } + return Error::success(); +} + +Error BitcodeReader::readFunctionRecord(ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + if (Record.size() < 8) + return error("Invalid record"); + Type *Ty = getTypeByID(Record[0]); + if (!Ty) + return error("Invalid record"); + if (auto *PTy = dyn_cast<PointerType>(Ty)) + Ty = PTy->getElementType(); + auto *FTy = dyn_cast<FunctionType>(Ty); + if (!FTy) + return error("Invalid type for value"); + auto CC = static_cast<CallingConv::ID>(Record[1]); + if (CC & ~CallingConv::MaxID) + return error("Invalid calling convention ID"); + + Function *Func + Function::Create(FTy, GlobalValue::ExternalLinkage, Name, TheModule); + + Func->setCallingConv(CC); + bool isProto = Record[2]; + uint64_t RawLinkage = Record[3]; + Func->setLinkage(getDecodedLinkage(RawLinkage)); + Func->setAttributes(getAttributes(Record[4])); + + unsigned Alignment; + if (Error Err = parseAlignmentValue(Record[5], Alignment)) + return Err; + Func->setAlignment(Alignment); + if (Record[6]) { + if (Record[6] - 1 >= SectionTable.size()) + return error("Invalid ID"); + Func->setSection(SectionTable[Record[6] - 1]); + } + // Local linkage must have default visibility. + if (!Func->hasLocalLinkage()) + // FIXME: Change to an error if non-default in 4.0. + Func->setVisibility(getDecodedVisibility(Record[7])); + if (Record.size() > 8 && Record[8]) { + if (Record[8] - 1 >= GCTable.size()) + return error("Invalid ID"); + Func->setGC(GCTable[Record[8] - 1]); + } + GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; + if (Record.size() > 9) + UnnamedAddr = getDecodedUnnamedAddrType(Record[9]); + Func->setUnnamedAddr(UnnamedAddr); + if (Record.size() > 10 && Record[10] != 0) + FunctionPrologues.push_back(std::make_pair(Func, Record[10] - 1)); + + if (Record.size() > 11) + Func->setDLLStorageClass(getDecodedDLLStorageClass(Record[11])); + else + upgradeDLLImportExportLinkage(Func, RawLinkage); + + if (Record.size() > 12) { + if (unsigned ComdatID = Record[12]) { + if (ComdatID > ComdatList.size()) + return error("Invalid function comdat ID"); + Func->setComdat(ComdatList[ComdatID - 1]); + } + } else if (hasImplicitComdat(RawLinkage)) { + Func->setComdat(reinterpret_cast<Comdat *>(1)); + } + + if (Record.size() > 13 && Record[13] != 0) + FunctionPrefixes.push_back(std::make_pair(Func, Record[13] - 1)); + + if (Record.size() > 14 && Record[14] != 0) + FunctionPersonalityFns.push_back(std::make_pair(Func, Record[14] - 1)); + + ValueList.push_back(Func); + + // If this is a function with a body, remember the prototype we are + // creating now, so that we can match up the body with them later. + if (!isProto) { + Func->setIsMaterializable(true); + FunctionsWithBodies.push_back(Func); + DeferredFunctionInfo[Func] = 0; + } + return Error::success(); +} + +Error BitcodeReader::readGlobalIndirectSymbolRecord(unsigned BitCode, + ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + bool NewRecord = BitCode != bitc::MODULE_CODE_ALIAS_OLD; + if (Record.size() < (3 + (unsigned)NewRecord)) + return error("Invalid record"); + unsigned OpNum = 0; + Type *Ty = getTypeByID(Record[OpNum++]); + if (!Ty) + return error("Invalid record"); + + unsigned AddrSpace; + if (!NewRecord) { + auto *PTy = dyn_cast<PointerType>(Ty); + if (!PTy) + return error("Invalid type for value"); + Ty = PTy->getElementType(); + AddrSpace = PTy->getAddressSpace(); + } else { + AddrSpace = Record[OpNum++]; + } + + auto Val = Record[OpNum++]; + auto Linkage = Record[OpNum++]; + GlobalIndirectSymbol *NewGA; + if (BitCode == bitc::MODULE_CODE_ALIAS || + BitCode == bitc::MODULE_CODE_ALIAS_OLD) + NewGA = GlobalAlias::create(Ty, AddrSpace, getDecodedLinkage(Linkage), Name, + TheModule); + else + NewGA = GlobalIFunc::create(Ty, AddrSpace, getDecodedLinkage(Linkage), Name, + nullptr, TheModule); + // Old bitcode files didn't have visibility field. + // Local linkage must have default visibility. + if (OpNum != Record.size()) { + auto VisInd = OpNum++; + if (!NewGA->hasLocalLinkage()) + // FIXME: Change to an error if non-default in 4.0. + NewGA->setVisibility(getDecodedVisibility(Record[VisInd])); + } + if (OpNum != Record.size()) + NewGA->setDLLStorageClass(getDecodedDLLStorageClass(Record[OpNum++])); + else + upgradeDLLImportExportLinkage(NewGA, Linkage); + if (OpNum != Record.size()) + NewGA->setThreadLocalMode(getDecodedThreadLocalMode(Record[OpNum++])); + if (OpNum != Record.size()) + NewGA->setUnnamedAddr(getDecodedUnnamedAddrType(Record[OpNum++])); + ValueList.push_back(NewGA); + IndirectSymbolInits.push_back(std::make_pair(NewGA, Val)); + return Error::success(); +} + +Error BitcodeReaderBase::readVersionRecord(ArrayRef<uint64_t> Record) { + if (Record.size() < 1) + return error("Invalid record"); + unsigned ModuleVersion = Record[0]; + if (ModuleVersion > 2) + return error("Invalid value"); + UseRelativeIDs = ModuleVersion >= 1; + UseStrtab = ModuleVersion >= 2; + return Error::success(); +} + Error BitcodeReader::parseModule(uint64_t ResumeBit, bool ShouldLazyLoadMetadata) { if (ResumeBit) @@ -2617,8 +2949,6 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, return error("Invalid record"); SmallVector<uint64_t, 64> Record; - std::vector<std::string> SectionTable; - std::vector<std::string> GCTable; // Read all the records for this module. while (true) { @@ -2765,20 +3095,8 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, switch (BitCode) { default: break; // Default behavior, ignore unknown content. case bitc::MODULE_CODE_VERSION: { // VERSION: [version#] - if (Record.size() < 1) - return error("Invalid record"); - // Only version #0 and #1 are supported so far. - unsigned module_version = Record[0]; - switch (module_version) { - default: - return error("Invalid value"); - case 0: - UseRelativeIDs = false; - break; - case 1: - UseRelativeIDs = true; - break; - } + if (Error Err = readVersionRecord(Record)) + return Err; break; } case bitc::MODULE_CODE_TRIPLE: { // TRIPLE: [strchr x N] @@ -2825,184 +3143,25 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, break; } case bitc::MODULE_CODE_COMDAT: { // COMDAT: [selection_kind, name] - if (Record.size() < 2) - return error("Invalid record"); - Comdat::SelectionKind SK = getDecodedComdatSelectionKind(Record[0]); - unsigned ComdatNameSize = Record[1]; - std::string ComdatName; - ComdatName.reserve(ComdatNameSize); - for (unsigned i = 0; i != ComdatNameSize; ++i) - ComdatName += (char)Record[2 + i]; - Comdat *C = TheModule->getOrInsertComdat(ComdatName); - C->setSelectionKind(SK); - ComdatList.push_back(C); - break; - } - // GLOBALVAR: [pointer type, isconst, initid, + if (Error Err = readComdatRecord(Record)) + return Err; + break; + } + // GLOBALVAR: [(name_offset, )pointer type, isconst, initid, // linkage, alignment, section, visibility, threadlocal, // unnamed_addr, externally_initialized, dllstorageclass, // comdat] case bitc::MODULE_CODE_GLOBALVAR: { - if (Record.size() < 6) - return error("Invalid record"); - Type *Ty = getTypeByID(Record[0]); - if (!Ty) - return error("Invalid record"); - bool isConstant = Record[1] & 1; - bool explicitType = Record[1] & 2; - unsigned AddressSpace; - if (explicitType) { - AddressSpace = Record[1] >> 2; - } else { - if (!Ty->isPointerTy()) - return error("Invalid type for value"); - AddressSpace = cast<PointerType>(Ty)->getAddressSpace(); - Ty = cast<PointerType>(Ty)->getElementType(); - } - - uint64_t RawLinkage = Record[3]; - GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - unsigned Alignment; - if (Error Err = parseAlignmentValue(Record[4], Alignment)) + if (Error Err = readGlobalVarRecord(Record)) return Err; - std::string Section; - if (Record[5]) { - if (Record[5]-1 >= SectionTable.size()) - return error("Invalid ID"); - Section = SectionTable[Record[5]-1]; - } - GlobalValue::VisibilityTypes Visibility = GlobalValue::DefaultVisibility; - // Local linkage must have default visibility. - if (Record.size() > 6 && !GlobalValue::isLocalLinkage(Linkage)) - // FIXME: Change to an error if non-default in 4.0. - Visibility = getDecodedVisibility(Record[6]); - - GlobalVariable::ThreadLocalMode TLM = GlobalVariable::NotThreadLocal; - if (Record.size() > 7) - TLM = getDecodedThreadLocalMode(Record[7]); - - GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; - if (Record.size() > 8) - UnnamedAddr = getDecodedUnnamedAddrType(Record[8]); - - bool ExternallyInitialized = false; - if (Record.size() > 9) - ExternallyInitialized = Record[9]; - - GlobalVariable *NewGV - new GlobalVariable(*TheModule, Ty, isConstant, Linkage, nullptr, "", nullptr, - TLM, AddressSpace, ExternallyInitialized); - NewGV->setAlignment(Alignment); - if (!Section.empty()) - NewGV->setSection(Section); - NewGV->setVisibility(Visibility); - NewGV->setUnnamedAddr(UnnamedAddr); - - if (Record.size() > 10) - NewGV->setDLLStorageClass(getDecodedDLLStorageClass(Record[10])); - else - upgradeDLLImportExportLinkage(NewGV, RawLinkage); - - ValueList.push_back(NewGV); - - // Remember which value to use for the global initializer. - if (unsigned InitID = Record[2]) - GlobalInits.push_back(std::make_pair(NewGV, InitID-1)); - - if (Record.size() > 11) { - if (unsigned ComdatID = Record[11]) { - if (ComdatID > ComdatList.size()) - return error("Invalid global variable comdat ID"); - NewGV->setComdat(ComdatList[ComdatID - 1]); - } - } else if (hasImplicitComdat(RawLinkage)) { - NewGV->setComdat(reinterpret_cast<Comdat *>(1)); - } - break; } - // FUNCTION: [type, callingconv, isproto, linkage, paramattr, + // FUNCTION: [(name_offset, )type, callingconv, isproto, linkage, paramattr, // alignment, section, visibility, gc, unnamed_addr, // prologuedata, dllstorageclass, comdat, prefixdata] case bitc::MODULE_CODE_FUNCTION: { - if (Record.size() < 8) - return error("Invalid record"); - Type *Ty = getTypeByID(Record[0]); - if (!Ty) - return error("Invalid record"); - if (auto *PTy = dyn_cast<PointerType>(Ty)) - Ty = PTy->getElementType(); - auto *FTy = dyn_cast<FunctionType>(Ty); - if (!FTy) - return error("Invalid type for value"); - auto CC = static_cast<CallingConv::ID>(Record[1]); - if (CC & ~CallingConv::MaxID) - return error("Invalid calling convention ID"); - - Function *Func = Function::Create(FTy, GlobalValue::ExternalLinkage, - "", TheModule); - - Func->setCallingConv(CC); - bool isProto = Record[2]; - uint64_t RawLinkage = Record[3]; - Func->setLinkage(getDecodedLinkage(RawLinkage)); - Func->setAttributes(getAttributes(Record[4])); - - unsigned Alignment; - if (Error Err = parseAlignmentValue(Record[5], Alignment)) + if (Error Err = readFunctionRecord(Record)) return Err; - Func->setAlignment(Alignment); - if (Record[6]) { - if (Record[6]-1 >= SectionTable.size()) - return error("Invalid ID"); - Func->setSection(SectionTable[Record[6]-1]); - } - // Local linkage must have default visibility. - if (!Func->hasLocalLinkage()) - // FIXME: Change to an error if non-default in 4.0. - Func->setVisibility(getDecodedVisibility(Record[7])); - if (Record.size() > 8 && Record[8]) { - if (Record[8]-1 >= GCTable.size()) - return error("Invalid ID"); - Func->setGC(GCTable[Record[8] - 1]); - } - GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; - if (Record.size() > 9) - UnnamedAddr = getDecodedUnnamedAddrType(Record[9]); - Func->setUnnamedAddr(UnnamedAddr); - if (Record.size() > 10 && Record[10] != 0) - FunctionPrologues.push_back(std::make_pair(Func, Record[10]-1)); - - if (Record.size() > 11) - Func->setDLLStorageClass(getDecodedDLLStorageClass(Record[11])); - else - upgradeDLLImportExportLinkage(Func, RawLinkage); - - if (Record.size() > 12) { - if (unsigned ComdatID = Record[12]) { - if (ComdatID > ComdatList.size()) - return error("Invalid function comdat ID"); - Func->setComdat(ComdatList[ComdatID - 1]); - } - } else if (hasImplicitComdat(RawLinkage)) { - Func->setComdat(reinterpret_cast<Comdat *>(1)); - } - - if (Record.size() > 13 && Record[13] != 0) - FunctionPrefixes.push_back(std::make_pair(Func, Record[13]-1)); - - if (Record.size() > 14 && Record[14] != 0) - FunctionPersonalityFns.push_back(std::make_pair(Func, Record[14] - 1)); - - ValueList.push_back(Func); - - // If this is a function with a body, remember the prototype we are - // creating now, so that we can match up the body with them later. - if (!isProto) { - Func->setIsMaterializable(true); - FunctionsWithBodies.push_back(Func); - DeferredFunctionInfo[Func] = 0; - } break; } // ALIAS: [alias type, addrspace, aliasee val#, linkage] @@ -3011,54 +3170,10 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, case bitc::MODULE_CODE_IFUNC: case bitc::MODULE_CODE_ALIAS: case bitc::MODULE_CODE_ALIAS_OLD: { - bool NewRecord = BitCode != bitc::MODULE_CODE_ALIAS_OLD; - if (Record.size() < (3 + (unsigned)NewRecord)) - return error("Invalid record"); - unsigned OpNum = 0; - Type *Ty = getTypeByID(Record[OpNum++]); - if (!Ty) - return error("Invalid record"); - - unsigned AddrSpace; - if (!NewRecord) { - auto *PTy = dyn_cast<PointerType>(Ty); - if (!PTy) - return error("Invalid type for value"); - Ty = PTy->getElementType(); - AddrSpace = PTy->getAddressSpace(); - } else { - AddrSpace = Record[OpNum++]; - } - - auto Val = Record[OpNum++]; - auto Linkage = Record[OpNum++]; - GlobalIndirectSymbol *NewGA; - if (BitCode == bitc::MODULE_CODE_ALIAS || - BitCode == bitc::MODULE_CODE_ALIAS_OLD) - NewGA = GlobalAlias::create(Ty, AddrSpace, getDecodedLinkage(Linkage), - "", TheModule); - else - NewGA = GlobalIFunc::create(Ty, AddrSpace, getDecodedLinkage(Linkage), - "", nullptr, TheModule); - // Old bitcode files didn't have visibility field. - // Local linkage must have default visibility. - if (OpNum != Record.size()) { - auto VisInd = OpNum++; - if (!NewGA->hasLocalLinkage()) - // FIXME: Change to an error if non-default in 4.0. - NewGA->setVisibility(getDecodedVisibility(Record[VisInd])); - } - if (OpNum != Record.size()) - NewGA->setDLLStorageClass(getDecodedDLLStorageClass(Record[OpNum++])); - else - upgradeDLLImportExportLinkage(NewGA, Linkage); - if (OpNum != Record.size()) - NewGA->setThreadLocalMode(getDecodedThreadLocalMode(Record[OpNum++])); - if (OpNum != Record.size()) - NewGA->setUnnamedAddr(getDecodedUnnamedAddrType(Record[OpNum++])); - ValueList.push_back(NewGA); - IndirectSymbolInits.push_back(std::make_pair(NewGA, Val)); + if (Error Err = readGlobalIndirectSymbolRecord(BitCode, Record)) + return Err; break; + } /// MODULE_CODE_VSTOFFSET: [offset] case bitc::MODULE_CODE_VSTOFFSET: @@ -4535,8 +4650,8 @@ std::vector<StructType *> BitcodeReader::getIdentifiedStructTypes() const { } ModuleSummaryIndexBitcodeReader::ModuleSummaryIndexBitcodeReader( - BitstreamCursor Cursor, ModuleSummaryIndex &TheIndex) - : BitcodeReaderBase(std::move(Cursor)), TheIndex(TheIndex) {} + BitstreamCursor Cursor, StringRef Strtab, ModuleSummaryIndex &TheIndex) + : BitcodeReaderBase(std::move(Cursor), Strtab), TheIndex(TheIndex) {} std::pair<GlobalValue::GUID, GlobalValue::GUID> ModuleSummaryIndexBitcodeReader::getGUIDFromValueId(unsigned ValueId) { @@ -4560,7 +4675,7 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( SmallVector<uint64_t, 64> Record; // Read all the records for this value table. - SmallString<128> ValueName; + SmallString<128> Buf; while (true) { BitstreamEntry Entry = Stream.advanceSkippingSubblocks(); @@ -4580,12 +4695,17 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( // Read a record. Record.clear(); - switch (Stream.readRecord(Entry.ID, Record)) { + switch (unsigned Code = Stream.readRecord(Entry.ID, Record)) { default: // Default behavior: ignore (e.g. VST_CODE_BBENTRY records). break; - case bitc::VST_CODE_ENTRY: { // VST_CODE_ENTRY: [valueid, namechar x N] - if (convertToString(Record, 1, ValueName)) + case bitc::VST_CODE_ENTRY: { // [valueid, namechar x N] + if (UseStrtab) + break; + StringRef ValueName; + Buf.clear(); + if (convertToString(Record, 1, Buf)) return error("Invalid record"); + ValueName = Buf; unsigned ValueID = Record[0]; assert(!SourceFileName.empty()); auto VLI = ValueIdToLinkageMap.find(ValueID); @@ -4603,13 +4723,17 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( << ValueName << "\n"; ValueIdToCallGraphGUIDMap[ValueID] std::make_pair(ValueGUID, OriginalNameID); - ValueName.clear(); + Buf.clear(); break; } - case bitc::VST_CODE_FNENTRY: { - // VST_CODE_FNENTRY: [valueid, offset, namechar x N] - if (convertToString(Record, 2, ValueName)) + case bitc::VST_CODE_FNENTRY: { // [valueid, offset, namechar x N] + if (UseStrtab) + break; + StringRef ValueName; + Buf.clear(); + if (convertToString(Record, 2, Buf)) return error("Invalid record"); + ValueName = Buf; unsigned ValueID = Record[0]; assert(!SourceFileName.empty()); auto VLI = ValueIdToLinkageMap.find(ValueID); @@ -4627,8 +4751,6 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( << ValueName << "\n"; ValueIdToCallGraphGUIDMap[ValueID] std::make_pair(FunctionGUID, OriginalNameID); - - ValueName.clear(); break; } case bitc::VST_CODE_COMBINED_ENTRY: { @@ -4715,6 +4837,11 @@ Error ModuleSummaryIndexBitcodeReader::parseModule(StringRef ModulePath) { default: break; // Default behavior, ignore unknown content. /// MODULE_CODE_SOURCE_FILENAME: [namechar x N] + case bitc::MODULE_CODE_VERSION: { + if (Error Err = readVersionRecord(Record)) + return Err; + break; + } case bitc::MODULE_CODE_SOURCE_FILENAME: { SmallString<128> ValueName; if (convertToString(Record, 0, ValueName)) @@ -4748,37 +4875,35 @@ Error ModuleSummaryIndexBitcodeReader::parseModule(StringRef ModulePath) { // was historically always the start of the regular bitcode header. VSTOffset = Record[0] - 1; break; - // GLOBALVAR: [pointer type, isconst, initid, - // linkage, alignment, section, visibility, threadlocal, - // unnamed_addr, externally_initialized, dllstorageclass, - // comdat] - case bitc::MODULE_CODE_GLOBALVAR: { - if (Record.size() < 6) - return error("Invalid record"); - uint64_t RawLinkage = Record[3]; - GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - ValueIdToLinkageMap[ValueId++] = Linkage; - break; - } - // FUNCTION: [type, callingconv, isproto, linkage, paramattr, - // alignment, section, visibility, gc, unnamed_addr, - // prologuedata, dllstorageclass, comdat, prefixdata] - case bitc::MODULE_CODE_FUNCTION: { - if (Record.size() < 8) - return error("Invalid record"); - uint64_t RawLinkage = Record[3]; - GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - ValueIdToLinkageMap[ValueId++] = Linkage; - break; - } - // ALIAS: [alias type, addrspace, aliasee val#, linkage, visibility, - // dllstorageclass] + // GLOBALVAR: [pointer type, isconst, initid, linkage, ...] + // FUNCTION: [type, callingconv, isproto, linkage, ...] + // ALIAS: [alias type, addrspace, aliasee val#, linkage, ...] + case bitc::MODULE_CODE_GLOBALVAR: + case bitc::MODULE_CODE_FUNCTION: case bitc::MODULE_CODE_ALIAS: { - if (Record.size() < 6) + StringRef Name; + ArrayRef<uint64_t> GVRecord; + std::tie(Name, GVRecord) = readNameFromStrtab(Record); + if (GVRecord.size() <= 3) return error("Invalid record"); - uint64_t RawLinkage = Record[3]; + uint64_t RawLinkage = GVRecord[3]; GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - ValueIdToLinkageMap[ValueId++] = Linkage; + if (!UseStrtab) { + ValueIdToLinkageMap[ValueId++] = Linkage; + break; + } + + std::string GlobalId + GlobalValue::getGlobalIdentifier(Name, Linkage, SourceFileName); + auto ValueGUID = GlobalValue::getGUID(GlobalId); + auto OriginalNameID = ValueGUID; + if (GlobalValue::isLocalLinkage(Linkage)) + OriginalNameID = GlobalValue::getGUID(Name); + if (PrintSummaryGUIDs) + dbgs() << "GUID " << ValueGUID << "(" << OriginalNameID << ") is " + << Name << "\n"; + ValueIdToCallGraphGUIDMap[ValueId++] + std::make_pair(ValueGUID, OriginalNameID); break; } } @@ -4889,6 +5014,13 @@ Error ModuleSummaryIndexBitcodeReader::parseEntireSummary( switch (BitCode) { default: // Default behavior: ignore. break; + + case bitc::FS_VALUE_GUID: { + uint64_t ValueID = Record[0]; + GlobalValue::GUID RefGUID = Record[1]; + ValueIdToCallGraphGUIDMap[ValueID] = std::make_pair(RefGUID, RefGUID); + break; + } // FS_PERMODULE: [valueid, flags, instcount, numrefs, numrefs x valueid, // n x (valueid)] // FS_PERMODULE_PROFILE: [valueid, flags, instcount, numrefs, @@ -5193,6 +5325,35 @@ const std::error_category &llvm::BitcodeErrorCategory() { return *ErrorCategory; } +static Expected<StringRef> readStrtab(BitstreamCursor &Stream) { + if (Stream.EnterSubBlock(bitc::STRTAB_BLOCK_ID)) + return error("Invalid record"); + + StringRef Strtab; + while (1) { + BitstreamEntry Entry = Stream.advance(); + switch (Entry.Kind) { + case BitstreamEntry::EndBlock: + return Strtab; + + case BitstreamEntry::Error: + return error("Malformed block"); + + case BitstreamEntry::SubBlock: + if (Stream.SkipBlock()) + return error("Malformed block"); + break; + + case BitstreamEntry::Record: + StringRef Blob; + SmallVector<uint64_t, 1> Record; + if (Stream.readRecord(Entry.ID, Record, &Blob) == bitc::STRTAB_BLOB) + Strtab = Blob; + break; + } + } +} + //===----------------------------------------------------------------------===// // External interface //===----------------------------------------------------------------------===// @@ -5245,6 +5406,22 @@ llvm::getBitcodeModuleList(MemoryBufferRef Buffer) { continue; } + if (Entry.ID == bitc::STRTAB_BLOCK_ID) { + Expected<StringRef> Strtab = readStrtab(Stream); + if (!Strtab) + return Strtab.takeError(); + // This string table is used by every preceding bitcode module that does + // not have its own string table. A bitcode file may have multiple + // string tables if it was created by binary concatenation, for example + // with "llvm-cat -b". + for (auto I = Modules.rbegin(), E = Modules.rend(); I != E; ++I) { + if (!I->Strtab.empty()) + break; + I->Strtab = *Strtab; + } + continue; + } + if (Stream.SkipBlock()) return error("Malformed block"); continue; @@ -5281,8 +5458,8 @@ BitcodeModule::getModuleImpl(LLVMContext &Context, bool MaterializeAll, } Stream.JumpToBit(ModuleBit); - auto *R - new BitcodeReader(std::move(Stream), ProducerIdentification, Context); + auto *R = new BitcodeReader(std::move(Stream), Strtab, ProducerIdentification, + Context); std::unique_ptr<Module> M llvm::make_unique<Module>(ModuleIdentifier, Context); @@ -5317,7 +5494,7 @@ Expected<std::unique_ptr<ModuleSummaryIndex>> BitcodeModule::getSummary() { Stream.JumpToBit(ModuleBit); auto Index = llvm::make_unique<ModuleSummaryIndex>(); - ModuleSummaryIndexBitcodeReader R(std::move(Stream), *Index); + ModuleSummaryIndexBitcodeReader R(std::move(Stream), Strtab, *Index); if (Error Err = R.parseModule(ModuleIdentifier)) return std::move(Err); diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp index 1bdb439086a..262e514d3cd 100644 --- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp +++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp @@ -28,6 +28,7 @@ #include "llvm/IR/Operator.h" #include "llvm/IR/UseListOrder.h" #include "llvm/IR/ValueSymbolTable.h" +#include "llvm/MC/StringTableBuilder.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/Program.h" @@ -96,6 +97,8 @@ class ModuleBitcodeWriter : public BitcodeWriterBase { /// Pointer to the buffer allocated by caller for bitcode writing. const SmallVectorImpl<char> &Buffer; + StringTableBuilder &StrtabBuilder; + /// The Module to write to bitcode. const Module &M; @@ -131,11 +134,12 @@ public: /// Constructs a ModuleBitcodeWriter object for the given Module, /// writing to the provided \p Buffer. ModuleBitcodeWriter(const Module *M, SmallVectorImpl<char> &Buffer, + StringTableBuilder &StrtabBuilder, BitstreamWriter &Stream, bool ShouldPreserveUseListOrder, const ModuleSummaryIndex *Index, bool GenerateHash, ModuleHash *ModHash = nullptr) - : BitcodeWriterBase(Stream), Buffer(Buffer), M(*M), - VE(*M, ShouldPreserveUseListOrder), Index(Index), + : BitcodeWriterBase(Stream), Buffer(Buffer), StrtabBuilder(StrtabBuilder), + M(*M), VE(*M, ShouldPreserveUseListOrder), Index(Index), GenerateHash(GenerateHash), ModHash(ModHash), BitcodeStartBit(Stream.GetCurrentBitNo()) { // Assign ValueIds to any callee values in the index that came from @@ -261,9 +265,7 @@ private: SmallVectorImpl<uint64_t> &Vals); void writeInstruction(const Instruction &I, unsigned InstID, SmallVectorImpl<unsigned> &Vals); - void writeValueSymbolTable( - const ValueSymbolTable &VST, bool IsModuleLevel = false, - DenseMap<const Function *, uint64_t> *FunctionToBitcodeIndex = nullptr); + void writeValueSymbolTable(const ValueSymbolTable &VST); void writeUseList(UseListOrder &&Order); void writeUseListBlock(const Function *F); void @@ -478,7 +480,6 @@ public: private: void writeIndex(); void writeModStrings(); - void writeCombinedValueSymbolTable(); void writeCombinedGlobalValueSummary(); /// Indicates whether the provided \p ModulePath should be written into @@ -1049,12 +1050,8 @@ void ModuleBitcodeWriter::writeComdats() { SmallVector<unsigned, 64> Vals; for (const Comdat *C : VE.getComdats()) { // COMDAT: [selection_kind, name] + Vals.push_back(StrtabBuilder.add(C->getName())); Vals.push_back(getEncodedComdatSelectionKind(*C)); - size_t Size = C->getName().size(); - assert(isUInt<32>(Size)); - Vals.push_back(Size); - for (char Chr : C->getName()) - Vals.push_back((unsigned char)Chr); Stream.EmitRecord(bitc::MODULE_CODE_COMDAT, Vals, /*AbbrevToUse=*/0); Vals.clear(); } @@ -1166,6 +1163,7 @@ void ModuleBitcodeWriter::writeModuleInfo() { // Add an abbrev for common globals with no visibility or thread localness. auto Abbv = std::make_shared<BitCodeAbbrev>(); Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_GLOBALVAR)); + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, Log2_32_Ceil(MaxGlobalType+1))); Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // AddrSpace << 2 @@ -1189,15 +1187,41 @@ void ModuleBitcodeWriter::writeModuleInfo() { SimpleGVarAbbrev = Stream.EmitAbbrev(std::move(Abbv)); } - // Emit the global variable information. SmallVector<unsigned, 64> Vals; + // Emit the module's source file name. + { + StringEncoding Bits = getStringEncoding(M.getSourceFileName().data(), + M.getSourceFileName().size()); + BitCodeAbbrevOp AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8); + if (Bits == SE_Char6) + AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Char6); + else if (Bits == SE_Fixed7) + AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7); + + // MODULE_CODE_SOURCE_FILENAME: [namechar x N] + auto Abbv = std::make_shared<BitCodeAbbrev>(); + Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_SOURCE_FILENAME)); + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); + Abbv->Add(AbbrevOpToUse); + unsigned FilenameAbbrev = Stream.EmitAbbrev(std::move(Abbv)); + + for (const auto P : M.getSourceFileName()) + Vals.push_back((unsigned char)P); + + // Emit the finished record. + Stream.EmitRecord(bitc::MODULE_CODE_SOURCE_FILENAME, Vals, FilenameAbbrev); + Vals.clear(); + } + + // Emit the global variable information. for (const GlobalVariable &GV : M.globals()) { unsigned AbbrevToUse = 0; - // GLOBALVAR: [type, isconst, initid, + // GLOBALVAR: [(name, )type, isconst, initid, // linkage, alignment, section, visibility, threadlocal, // unnamed_addr, externally_initialized, dllstorageclass, // comdat] + Vals.push_back(StrtabBuilder.add(GV.getName())); Vals.push_back(VE.getTypeID(GV.getValueType())); Vals.push_back(GV.getType()->getAddressSpace() << 2 | 2 | GV.isConstant()); Vals.push_back(GV.isDeclaration() ? 0 : @@ -1230,6 +1254,7 @@ void ModuleBitcodeWriter::writeModuleInfo() { // FUNCTION: [type, callingconv, isproto, linkage, paramattrs, alignment, // section, visibility, gc, unnamed_addr, prologuedata, // dllstorageclass, comdat, prefixdata, personalityfn] + Vals.push_back(StrtabBuilder.add(F.getName())); Vals.push_back(VE.getTypeID(F.getFunctionType())); Vals.push_back(F.getCallingConv()); Vals.push_back(F.isDeclaration()); @@ -1258,6 +1283,7 @@ void ModuleBitcodeWriter::writeModuleInfo() { for (const GlobalAlias &A : M.aliases()) { // ALIAS: [alias type, aliasee val#, linkage, visibility, dllstorageclass, // threadlocal, unnamed_addr] + Vals.push_back(StrtabBuilder.add(A.getName())); Vals.push_back(VE.getTypeID(A.getValueType())); Vals.push_back(A.getType()->getAddressSpace()); Vals.push_back(VE.getValueID(A.getAliasee())); @@ -1274,6 +1300,7 @@ void ModuleBitcodeWriter::writeModuleInfo() { // Emit the ifunc information. for (const GlobalIFunc &I : M.ifuncs()) { // IFUNC: [ifunc type, address space, resolver val#, linkage, visibility] + Vals.push_back(StrtabBuilder.add(I.getName())); Vals.push_back(VE.getTypeID(I.getValueType())); Vals.push_back(I.getType()->getAddressSpace()); Vals.push_back(VE.getValueID(I.getResolver())); @@ -1282,36 +1309,6 @@ void ModuleBitcodeWriter::writeModuleInfo() { Stream.EmitRecord(bitc::MODULE_CODE_IFUNC, Vals); Vals.clear(); } - - // Emit the module's source file name. - { - StringEncoding Bits = getStringEncoding(M.getSourceFileName().data(), - M.getSourceFileName().size()); - BitCodeAbbrevOp AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8); - if (Bits == SE_Char6) - AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Char6); - else if (Bits == SE_Fixed7) - AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7); - - // MODULE_CODE_SOURCE_FILENAME: [namechar x N] - auto Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_SOURCE_FILENAME)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(AbbrevOpToUse); - unsigned FilenameAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - - for (const auto P : M.getSourceFileName()) - Vals.push_back((unsigned char)P); - - // Emit the finished record. - Stream.EmitRecord(bitc::MODULE_CODE_SOURCE_FILENAME, Vals, FilenameAbbrev); - Vals.clear(); - } - - // If we have a VST, write the VSTOFFSET record placeholder. - if (M.getValueSymbolTable().empty()) - return; - writeValueSymbolTableForwardDecl(); } static uint64_t getOptimizationFlags(const Value *V) { @@ -2843,75 +2840,12 @@ void ModuleBitcodeWriter::writeInstruction(const Instruction &I, /// Emit names for globals/functions etc. \p IsModuleLevel is true when /// we are writing the module-level VST, where we are including a function /// bitcode index and need to backpatch the VST forward declaration record. -void ModuleBitcodeWriter::writeValueSymbolTable( - const ValueSymbolTable &VST, bool IsModuleLevel, - DenseMap<const Function *, uint64_t> *FunctionToBitcodeIndex) { - if (VST.empty()) { - // writeValueSymbolTableForwardDecl should have returned early as - // well. Ensure this handling remains in sync by asserting that - // the placeholder offset is not set. - assert(!IsModuleLevel || !hasVSTOffsetPlaceholder()); +void ModuleBitcodeWriter::writeValueSymbolTable(const ValueSymbolTable &VST) { + if (VST.empty()) return; - } - - if (IsModuleLevel && hasVSTOffsetPlaceholder()) { - // Get the offset of the VST we are writing, and backpatch it into - // the VST forward declaration record. - uint64_t VSTOffset = Stream.GetCurrentBitNo(); - // The BitcodeStartBit was the stream offset of the identification block. - VSTOffset -= bitcodeStartBit(); - assert((VSTOffset & 31) == 0 && "VST block not 32-bit aligned"); - // Note that we add 1 here because the offset is relative to one word - // before the start of the identification block, which was historically - // always the start of the regular bitcode header. - Stream.BackpatchWord(VSTOffsetPlaceholder, VSTOffset / 32 + 1); - } Stream.EnterSubblock(bitc::VALUE_SYMTAB_BLOCK_ID, 4); - // For the module-level VST, add abbrev Ids for the VST_CODE_FNENTRY - // records, which are not used in the per-function VSTs. - unsigned FnEntry8BitAbbrev; - unsigned FnEntry7BitAbbrev; - unsigned FnEntry6BitAbbrev; - unsigned GUIDEntryAbbrev; - if (IsModuleLevel && hasVSTOffsetPlaceholder()) { - // 8-bit fixed-width VST_CODE_FNENTRY function strings. - auto Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8)); - FnEntry8BitAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - - // 7-bit fixed width VST_CODE_FNENTRY function strings. - Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7)); - FnEntry7BitAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - - // 6-bit char6 VST_CODE_FNENTRY function strings. - Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Char6)); - FnEntry6BitAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - - // FIXME: Change the name of this record as it is now used by - // the per-module index as well. - Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_COMBINED_ENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // refguid - GUIDEntryAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - } - // FIXME: Set up the abbrev, we know how many values there are! // FIXME: We know if the type names can use 7-bit ascii. SmallVector<uint64_t, 64> NameVals; @@ -2924,15 +2858,6 @@ void ModuleBitcodeWriter::writeValueSymbolTable( unsigned AbbrevToUse = VST_ENTRY_8_ABBREV; NameVals.push_back(VE.getValueID(Name.getValue())); - Function *F = dyn_cast<Function>(Name.getValue()); - if (!F) { - // If value is an alias, need to get the aliased base object to - // see if it is a function. - auto *GA = dyn_cast<GlobalAlias>(Name.getValue()); - if (GA && GA->getBaseObject()) - F = dyn_cast<Function>(GA->getBaseObject()); - } - // VST_CODE_ENTRY: [valueid, namechar x N] // VST_CODE_FNENTRY: [valueid, funcoffset, namechar x N] // VST_CODE_BBENTRY: [bbid, namechar x N] @@ -2941,28 +2866,6 @@ void ModuleBitcodeWriter::writeValueSymbolTable( Code = bitc::VST_CODE_BBENTRY; if (Bits == SE_Char6) AbbrevToUse = VST_BBENTRY_6_ABBREV; - } else if (F && !F->isDeclaration()) { - // Must be the module-level VST, where we pass in the Index and - // have a VSTOffsetPlaceholder. The function-level VST should not - // contain any Function symbols. - assert(FunctionToBitcodeIndex); - assert(hasVSTOffsetPlaceholder()); - - // Save the word offset of the function (from the start of the - // actual bitcode written to the stream). - uint64_t BitcodeIndex = (*FunctionToBitcodeIndex)[F] - bitcodeStartBit(); - assert((BitcodeIndex & 31) == 0 && "function block not 32-bit aligned"); - // Note that we add 1 here because the offset is relative to one word - // before the start of the identification block, which was historically - // always the start of the regular bitcode header. - NameVals.push_back(BitcodeIndex / 32 + 1); - - Code = bitc::VST_CODE_FNENTRY; - AbbrevToUse = FnEntry8BitAbbrev; - if (Bits == SE_Char6) - AbbrevToUse = FnEntry6BitAbbrev; - else if (Bits == SE_Fixed7) - AbbrevToUse = FnEntry7BitAbbrev; } else { Code = bitc::VST_CODE_ENTRY; if (Bits == SE_Char6) @@ -2978,47 +2881,7 @@ void ModuleBitcodeWriter::writeValueSymbolTable( Stream.EmitRecord(Code, NameVals, AbbrevToUse); NameVals.clear(); } - // Emit any GUID valueIDs created for indirect call edges into the - // module-level VST. - if (IsModuleLevel && hasVSTOffsetPlaceholder()) - for (const auto &GI : valueIds()) { - NameVals.push_back(GI.second); - NameVals.push_back(GI.first); - Stream.EmitRecord(bitc::VST_CODE_COMBINED_ENTRY, NameVals, - GUIDEntryAbbrev); - NameVals.clear(); - } - Stream.ExitBlock(); -} - -/// Emit function names and summary offsets for the combined index -/// used by ThinLTO. -void IndexBitcodeWriter::writeCombinedValueSymbolTable() { - assert(hasVSTOffsetPlaceholder() && "Expected non-zero VSTOffsetPlaceholder"); - // Get the offset of the VST we are writing, and backpatch it into - // the VST forward declaration record. - uint64_t VSTOffset = Stream.GetCurrentBitNo(); - assert((VSTOffset & 31) == 0 && "VST block not 32-bit aligned"); - Stream.BackpatchWord(VSTOffsetPlaceholder, VSTOffset / 32); - - Stream.EnterSubblock(bitc::VALUE_SYMTAB_BLOCK_ID, 4); - - auto Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_COMBINED_ENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // refguid - unsigned EntryAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - - SmallVector<uint64_t, 64> NameVals; - for (const auto &GVI : valueIds()) { - // VST_CODE_COMBINED_ENTRY: [valueid, refguid] - NameVals.push_back(GVI.second); - NameVals.push_back(GVI.first); - // Emit the finished record. - Stream.EmitRecord(bitc::VST_CODE_COMBINED_ENTRY, NameVals, EntryAbbrev); - NameVals.clear(); - } Stream.ExitBlock(); } @@ -3510,6 +3373,11 @@ void ModuleBitcodeWriter::writePerModuleGlobalValueSummary() { return; } + for (const auto &GVI : valueIds()) { + Stream.EmitRecord(bitc::FS_VALUE_GUID, + ArrayRef<uint64_t>{GVI.second, GVI.first}); + } + // Abbrev for FS_PERMODULE. auto Abbv = std::make_shared<BitCodeAbbrev>(); Abbv->Add(BitCodeAbbrevOp(bitc::FS_PERMODULE)); @@ -3602,6 +3470,39 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() { Stream.EnterSubblock(bitc::GLOBALVAL_SUMMARY_BLOCK_ID, 3); Stream.EmitRecord(bitc::FS_VERSION, ArrayRef<uint64_t>{INDEX_VERSION}); + // Create value IDs for undefined references. + for (const auto &I : *this) { + if (auto *VS = dyn_cast<GlobalVarSummary>(I.second)) { + for (auto &RI : VS->refs()) + getValueId(RI.getGUID()); + continue; + } + + auto *FS = dyn_cast<FunctionSummary>(I.second); + if (!FS) + continue; + for (auto &RI : FS->refs()) + getValueId(RI.getGUID()); + + for (auto &EI : FS->calls()) { + GlobalValue::GUID GUID = EI.first.getGUID(); + if (!hasValueId(GUID)) { + // For SamplePGO, the indirect call targets for local functions will + // have its original name annotated in profile. We try to find the + // corresponding PGOFuncName as the GUID. + GUID = Index.getGUIDFromOriginalID(GUID); + if (GUID == 0 || !hasValueId(GUID)) + continue; + } + getValueId(GUID); + } + } + + for (const auto &GVI : valueIds()) { + Stream.EmitRecord(bitc::FS_VALUE_GUID, + ArrayRef<uint64_t>{GVI.second, GVI.first}); + } + // Abbrev for FS_COMBINED. auto Abbv = std::make_shared<BitCodeAbbrev>(); Abbv->Add(BitCodeAbbrevOp(bitc::FS_COMBINED)); @@ -3816,10 +3717,7 @@ void ModuleBitcodeWriter::write() { Stream.EnterSubblock(bitc::MODULE_BLOCK_ID, 3); size_t BlockStartPos = Buffer.size(); - SmallVector<unsigned, 1> Vals; - unsigned CurVersion = 1; - Vals.push_back(CurVersion); - Stream.EmitRecord(bitc::MODULE_CODE_VERSION, Vals); + Stream.EmitRecord(bitc::MODULE_CODE_VERSION, ArrayRef<uint64_t>{2}); // Emit blockinfo, which defines the standard abbreviations etc. writeBlockInfo(); @@ -3865,9 +3763,6 @@ void ModuleBitcodeWriter::write() { if (Index) writePerModuleGlobalValueSummary(); - writeValueSymbolTable(M.getValueSymbolTable(), - /* IsModuleLevel */ true, &FunctionToBitcodeIndex); - writeModuleHash(BlockStartPos); Stream.ExitBlock(); @@ -3954,13 +3849,46 @@ BitcodeWriter::BitcodeWriter(SmallVectorImpl<char> &Buffer) writeBitcodeHeader(*Stream); } -BitcodeWriter::~BitcodeWriter() = default; +BitcodeWriter::~BitcodeWriter() { assert(WroteStrtab); } + +void BitcodeWriter::writeBlob(unsigned Block, unsigned Record, StringRef Blob) { + Stream->EnterSubblock(Block, 3); + + auto Abbv = std::make_shared<BitCodeAbbrev>(); + Abbv->Add(BitCodeAbbrevOp(Record)); + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); + auto AbbrevNo = Stream->EmitAbbrev(std::move(Abbv)); + + Stream->EmitRecordWithBlob(AbbrevNo, ArrayRef<uint64_t>{Record}, Blob); + + Stream->ExitBlock(); +} + +void BitcodeWriter::writeStrtab() { + assert(!WroteStrtab); + + std::string Strtab; + raw_string_ostream OS(Strtab); + + StrtabBuilder.finalizeInOrder(); + StrtabBuilder.write(OS); + + OS.flush(); + writeBlob(bitc::STRTAB_BLOCK_ID, bitc::STRTAB_BLOB, Strtab); + + WroteStrtab = true; +} + +void BitcodeWriter::copyStrtab(StringRef Strtab) { + writeBlob(bitc::STRTAB_BLOCK_ID, bitc::STRTAB_BLOB, Strtab); + WroteStrtab = true; +} void BitcodeWriter::writeModule(const Module *M, bool ShouldPreserveUseListOrder, const ModuleSummaryIndex *Index, bool GenerateHash, ModuleHash *ModHash) { - ModuleBitcodeWriter ModuleWriter(M, Buffer, *Stream, + ModuleBitcodeWriter ModuleWriter(M, Buffer, StrtabBuilder, *Stream, ShouldPreserveUseListOrder, Index, GenerateHash, ModHash); ModuleWriter.write(); @@ -3984,6 +3912,7 @@ void llvm::WriteBitcodeToFile(const Module *M, raw_ostream &Out, BitcodeWriter Writer(Buffer); Writer.writeModule(M, ShouldPreserveUseListOrder, Index, GenerateHash, ModHash); + Writer.writeStrtab(); if (TT.isOSDarwin() || TT.isOSBinFormatMachO()) emitDarwinBCHeaderAndTrailer(Buffer, TT); @@ -3995,13 +3924,7 @@ void llvm::WriteBitcodeToFile(const Module *M, raw_ostream &Out, void IndexBitcodeWriter::write() { Stream.EnterSubblock(bitc::MODULE_BLOCK_ID, 3); - SmallVector<unsigned, 1> Vals; - unsigned CurVersion = 1; - Vals.push_back(CurVersion); - Stream.EmitRecord(bitc::MODULE_CODE_VERSION, Vals); - - // If we have a VST, write the VSTOFFSET record placeholder. - writeValueSymbolTableForwardDecl(); + Stream.EmitRecord(bitc::MODULE_CODE_VERSION, ArrayRef<uint64_t>{2}); // Write the module paths in the combined index. writeModStrings(); @@ -4009,10 +3932,6 @@ void IndexBitcodeWriter::write() { // Write the summary combined index records. writeCombinedGlobalValueSummary(); - // Need a special VST writer for the combined index (we don't have a - // real VST and real values when this is invoked). - writeCombinedValueSymbolTable(); - Stream.ExitBlock(); } diff --git a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp index 393213f0810..416f0565abd 100644 --- a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp +++ b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp @@ -336,6 +336,7 @@ void splitAndWriteThinLTOBitcode( W.writeModule(&M, /*ShouldPreserveUseListOrder=*/false, &Index, /*GenerateHash=*/true, &ModHash); W.writeModule(MergedM.get()); + W.writeStrtab(); OS << Buffer; // If a minimized bitcode module was requested for the thin link, @@ -348,6 +349,7 @@ void splitAndWriteThinLTOBitcode( W2.writeModule(&M, /*ShouldPreserveUseListOrder=*/false, &Index, /*GenerateHash=*/false, &ModHash); W2.writeModule(MergedM.get()); + W2.writeStrtab(); *ThinLinkOS << Buffer; } } diff --git a/llvm/tools/llvm-cat/llvm-cat.cpp b/llvm/tools/llvm-cat/llvm-cat.cpp index d884970309b..c9a65db3229 100644 --- a/llvm/tools/llvm-cat/llvm-cat.cpp +++ b/llvm/tools/llvm-cat/llvm-cat.cpp @@ -44,11 +44,14 @@ int main(int argc, char **argv) { std::unique_ptr<MemoryBuffer> MB = ExitOnErr( errorOrToExpected(MemoryBuffer::getFileOrSTDIN(InputFilename))); std::vector<BitcodeModule> Mods = ExitOnErr(getBitcodeModuleList(*MB)); - for (auto &BitcodeMod : Mods) + for (auto &BitcodeMod : Mods) { Buffer.insert(Buffer.end(), BitcodeMod.getBuffer().begin(), BitcodeMod.getBuffer().end()); + Writer.copyStrtab(BitcodeMod.getStrtab()); + } } } else { + std::vector<std::unique_ptr<Module>> OwnedMods; for (std::string InputFilename : InputFilenames) { SMDiagnostic Err; std::unique_ptr<Module> M = parseIRFile(InputFilename, Err, Context); @@ -57,7 +60,9 @@ int main(int argc, char **argv) { return 1; } Writer.writeModule(M.get()); + OwnedMods.push_back(std::move(M)); } + Writer.writeStrtab(); } std::error_code EC; diff --git a/llvm/tools/llvm-modextract/llvm-modextract.cpp b/llvm/tools/llvm-modextract/llvm-modextract.cpp index 6c2e364be44..375e7baf968 100644 --- a/llvm/tools/llvm-modextract/llvm-modextract.cpp +++ b/llvm/tools/llvm-modextract/llvm-modextract.cpp @@ -61,7 +61,10 @@ int main(int argc, char **argv) { if (BinaryExtract) { SmallVector<char, 0> Header; BitcodeWriter Writer(Header); - Out->os() << Header << Ms[ModuleIndex].getBuffer(); + Header.append(Ms[ModuleIndex].getBuffer().begin(), + Ms[ModuleIndex].getBuffer().end()); + Writer.copyStrtab(Ms[ModuleIndex].getStrtab()); + Out->os() << Header; Out->keep(); return 0; } -------------- next part -------------- diff --git a/llvm/include/llvm/Bitcode/BitcodeReader.h b/llvm/include/llvm/Bitcode/BitcodeReader.h index 9e042b17241..0701ddbb7f1 100644 --- a/llvm/include/llvm/Bitcode/BitcodeReader.h +++ b/llvm/include/llvm/Bitcode/BitcodeReader.h @@ -46,6 +46,9 @@ namespace llvm { ArrayRef<uint8_t> Buffer; StringRef ModuleIdentifier; + // The string table used to interpret this module. + StringRef Strtab; + // The bitstream location of the IDENTIFICATION_BLOCK. uint64_t IdentificationBit; @@ -70,6 +73,7 @@ namespace llvm { StringRef getBuffer() const { return StringRef((const char *)Buffer.begin(), Buffer.size()); } + StringRef getStrtab() const { return Strtab; } StringRef getModuleIdentifier() const { return ModuleIdentifier; } diff --git a/llvm/include/llvm/Bitcode/BitcodeWriter.h b/llvm/include/llvm/Bitcode/BitcodeWriter.h index 271cb2d81bb..23b5ae87b27 100644 --- a/llvm/include/llvm/Bitcode/BitcodeWriter.h +++ b/llvm/include/llvm/Bitcode/BitcodeWriter.h @@ -15,6 +15,7 @@ #define LLVM_BITCODE_BITCODEWRITER_H #include "llvm/IR/ModuleSummaryIndex.h" +#include "llvm/MC/StringTableBuilder.h" #include <string> namespace llvm { @@ -26,12 +27,25 @@ namespace llvm { SmallVectorImpl<char> &Buffer; std::unique_ptr<BitstreamWriter> Stream; + StringTableBuilder StrtabBuilder{StringTableBuilder::RAW}; + bool WroteStrtab = false; + + void writeBlob(unsigned Block, unsigned Record, StringRef Blob); + public: /// Create a BitcodeWriter that writes to Buffer. BitcodeWriter(SmallVectorImpl<char> &Buffer); ~BitcodeWriter(); + /// Write the bitcode file's string table. This must be called exactly once + /// after all modules have been written. + void writeStrtab(); + + /// Copy the string table for another module into this bitcode file. This + /// should be called after copying the module itself into the bitcode file. + void copyStrtab(StringRef Strtab); + /// Write the specified module to the buffer specified at construction time. /// /// If \c ShouldPreserveUseListOrder, encode the use-list order for each \a diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h index e2d2fbb0f44..9ab4b6ab143 100644 --- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h +++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h @@ -22,7 +22,7 @@ namespace llvm { namespace bitc { -// The only top-level block type defined is for a module. +// The only top-level block types are MODULE, IDENTIFICATION and STRTAB. enum BlockIDs { // Blocks MODULE_BLOCK_ID = FIRST_APPLICATION_BLOCKID, @@ -52,7 +52,9 @@ enum BlockIDs { OPERAND_BUNDLE_TAGS_BLOCK_ID, - METADATA_KIND_BLOCK_ID + METADATA_KIND_BLOCK_ID, + + STRTAB_BLOCK_ID, }; /// Identification block contains a string that describes the producer details, @@ -171,8 +173,9 @@ enum ValueSymtabCodes { VST_CODE_ENTRY = 1, // VST_ENTRY: [valueid, namechar x N] VST_CODE_BBENTRY = 2, // VST_BBENTRY: [bbid, namechar x N] VST_CODE_FNENTRY = 3, // VST_FNENTRY: [valueid, offset, namechar x N] + // 4 is available for reuse because it was never used in a release. // VST_COMBINED_ENTRY: [valueid, refguid] - VST_CODE_COMBINED_ENTRY = 5 + VST_CODE_COMBINED_ENTRY = 5, }; // The module path symbol table only has one code (MST_CODE_ENTRY). @@ -232,6 +235,7 @@ enum GlobalValueSummarySymtabCodes { // llvm.type.checked.load intrinsic with all constant integer arguments. // [typeid, offset, n x arg] FS_TYPE_CHECKED_LOAD_CONST_VCALL = 15, + FS_VALUE_GUID = 16, }; enum MetadataCodes { @@ -550,6 +554,10 @@ enum ComdatSelectionKindCodes { COMDAT_SELECTION_KIND_SAME_SIZE = 5, }; +enum StrtabCodes { + STRTAB_BLOB = 1, +}; + } // End bitc namespace } // End llvm namespace diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index 23649e811ed..bdc32156ae6 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -372,12 +372,28 @@ Expected<std::string> readTriple(BitstreamCursor &Stream) { class BitcodeReaderBase { protected: - BitcodeReaderBase(BitstreamCursor Stream) : Stream(std::move(Stream)) { + BitcodeReaderBase(BitstreamCursor Stream, StringRef Strtab) + : Stream(std::move(Stream)), Strtab(Strtab) { this->Stream.setBlockInfo(&BlockInfo); } BitstreamBlockInfo BlockInfo; BitstreamCursor Stream; + StringRef Strtab; + + /// Indicates that we are using a new encoding for instruction operands where + /// most operands in the current FUNCTION_BLOCK are encoded relative to the + /// instruction number, for a more compact encoding. Some instruction + /// operands are not relative to the instruction ID: basic block numbers, and + /// types. Once the old style function blocks have been phased out, we would + /// not need this flag. + bool UseRelativeIDs = false; + + bool UseStrtab = false; + + Error readVersionRecord(ArrayRef<uint64_t> Record); + std::pair<StringRef, ArrayRef<uint64_t>> + readNameFromStrtab(ArrayRef<uint64_t> Record); bool readBlockInfo(); @@ -405,6 +421,9 @@ class BitcodeReader : public BitcodeReaderBase, public GVMaterializer { bool SeenValueSymbolTable = false; uint64_t VSTOffset = 0; + std::vector<std::string> SectionTable; + std::vector<std::string> GCTable; + std::vector<Type*> TypeList; BitcodeReaderValueList ValueList; Optional<MetadataLoader> MDLoader; @@ -459,14 +478,6 @@ class BitcodeReader : public BitcodeReaderBase, public GVMaterializer { DenseMap<Function *, std::vector<BasicBlock *>> BasicBlockFwdRefs; std::deque<Function *> BasicBlockFwdRefQueue; - /// Indicates that we are using a new encoding for instruction operands where - /// most operands in the current FUNCTION_BLOCK are encoded relative to the - /// instruction number, for a more compact encoding. Some instruction - /// operands are not relative to the instruction ID: basic block numbers, and - /// types. Once the old style function blocks have been phased out, we would - /// not need this flag. - bool UseRelativeIDs = false; - /// True if all functions will be materialized, negating the need to process /// (e.g.) blockaddress forward references. bool WillMaterializeAllForwardRefs = false; @@ -477,8 +488,8 @@ class BitcodeReader : public BitcodeReaderBase, public GVMaterializer { std::vector<std::string> BundleTags; public: - BitcodeReader(BitstreamCursor Stream, StringRef ProducerIdentification, - LLVMContext &Context); + BitcodeReader(BitstreamCursor Stream, StringRef Strtab, + StringRef ProducerIdentification, LLVMContext &Context); Error materializeForwardReferencedFunctions(); @@ -598,6 +609,12 @@ private: Error parseAlignmentValue(uint64_t Exponent, unsigned &Alignment); Error parseAttrKind(uint64_t Code, Attribute::AttrKind *Kind); Error parseModule(uint64_t ResumeBit, bool ShouldLazyLoadMetadata = false); + + Error readComdatRecord(ArrayRef<uint64_t> Record); + Error readGlobalVarRecord(ArrayRef<uint64_t> Record); + Error readFunctionRecord(ArrayRef<uint64_t> Record); + Error readGlobalIndirectSymbolRecord(unsigned BitCode, ArrayRef<uint64_t> Record); + Error parseAttributeBlock(); Error parseAttributeGroupBlock(); Error parseTypeTable(); @@ -607,6 +624,7 @@ private: Expected<Value *> recordValue(SmallVectorImpl<uint64_t> &Record, unsigned NameIndex, Triple &TT); Error parseValueSymbolTable(uint64_t Offset = 0); + Error parseGlobalValueSymbolTable(); Error parseConstants(); Error rememberAndSkipFunctionBodies(); Error rememberAndSkipFunctionBody(); @@ -659,8 +677,8 @@ class ModuleSummaryIndexBitcodeReader : public BitcodeReaderBase { std::string SourceFileName; public: - ModuleSummaryIndexBitcodeReader( - BitstreamCursor Stream, ModuleSummaryIndex &TheIndex); + ModuleSummaryIndexBitcodeReader(BitstreamCursor Stream, StringRef Strtab, + ModuleSummaryIndex &TheIndex); Error parseModule(StringRef ModulePath); @@ -694,10 +712,10 @@ std::error_code llvm::errorToErrorCodeAndEmitErrors(LLVMContext &Ctx, return std::error_code(); } -BitcodeReader::BitcodeReader(BitstreamCursor Stream, +BitcodeReader::BitcodeReader(BitstreamCursor Stream, StringRef Strtab, StringRef ProducerIdentification, LLVMContext &Context) - : BitcodeReaderBase(std::move(Stream)), Context(Context), + : BitcodeReaderBase(std::move(Stream), Strtab), Context(Context), ValueList(Context) { this->ProducerIdentification = ProducerIdentification; } @@ -1682,14 +1700,15 @@ Error BitcodeReader::parseOperandBundleTags() { /// Associate a value with its name from the given index in the provided record. Expected<Value *> BitcodeReader::recordValue(SmallVectorImpl<uint64_t> &Record, unsigned NameIndex, Triple &TT) { - SmallString<128> ValueName; - if (convertToString(Record, NameIndex, ValueName)) - return error("Invalid record"); unsigned ValueID = Record[0]; if (ValueID >= ValueList.size() || !ValueList[ValueID]) return error("Invalid record"); Value *V = ValueList[ValueID]; + SmallString<128> ValueName; + if (convertToString(Record, NameIndex, ValueName)) + return error("Invalid record"); + StringRef NameStr(ValueName.data(), ValueName.size()); if (NameStr.find_first_of(0) != StringRef::npos) return error("Invalid value name"); @@ -1727,6 +1746,48 @@ static uint64_t jumpToValueSymbolTable(uint64_t Offset, return CurrentBit; } +Error BitcodeReader::parseGlobalValueSymbolTable() { + unsigned FuncBitcodeOffsetDelta + Stream.getAbbrevIDWidth() + bitc::BlockIDWidth; + + if (Stream.EnterSubBlock(bitc::VALUE_SYMTAB_BLOCK_ID)) + return error("Invalid record"); + + SmallVector<uint64_t, 64> Record; + while (true) { + BitstreamEntry Entry = Stream.advanceSkippingSubblocks(); + + switch (Entry.Kind) { + case BitstreamEntry::SubBlock: + case BitstreamEntry::Error: + return error("Malformed block"); + case BitstreamEntry::EndBlock: + return Error::success(); + case BitstreamEntry::Record: + break; + } + + Record.clear(); + switch (Stream.readRecord(Entry.ID, Record)) { + case bitc::VST_CODE_FNENTRY: { // [valueid, offset, namechar x N] + // Note that we subtract 1 here because the offset is relative to one word + // before the start of the identification or module block, which was + // historically always the start of the regular bitcode header. + uint64_t FuncWordOffset = Record[1] - 1; + uint64_t FuncBitOffset = FuncWordOffset * 32; + DeferredFunctionInfo[cast<Function>(ValueList[Record[0]])] + FuncBitOffset + FuncBitcodeOffsetDelta; + // Set the LastFunctionBlockBit to point to the last function block. + // Later when parsing is resumed after function materialization, + // we can simply skip that last function block. + if (FuncBitOffset > LastFunctionBlockBit) + LastFunctionBlockBit = FuncBitOffset; + break; + } + } + } +} + /// Parse the value symbol table at either the current parsing location or /// at the given bit offset if provided. Error BitcodeReader::parseValueSymbolTable(uint64_t Offset) { @@ -1734,8 +1795,15 @@ Error BitcodeReader::parseValueSymbolTable(uint64_t Offset) { // Pass in the Offset to distinguish between calling for the module-level // VST (where we want to jump to the VST offset) and the function-level // VST (where we don't). - if (Offset > 0) + if (Offset > 0) { CurrentBit = jumpToValueSymbolTable(Offset, Stream); + if (UseStrtab) { + if (Error Err = parseGlobalValueSymbolTable()) + return Err; + Stream.JumpToBit(CurrentBit); + return Error::success(); + } + } // Compute the delta between the bitcode indices in the VST (the word offset // to the word-aligned ENTER_SUBBLOCK for the function block, and that @@ -1779,18 +1847,17 @@ Error BitcodeReader::parseValueSymbolTable(uint64_t Offset) { // Read a record. Record.clear(); - switch (Stream.readRecord(Entry.ID, Record)) { + switch (unsigned Code = Stream.readRecord(Entry.ID, Record)) { default: // Default behavior: unknown type. break; - case bitc::VST_CODE_ENTRY: { // VST_CODE_ENTRY: [valueid, namechar x N] + case bitc::VST_CODE_ENTRY: { // [valueid, namechar x N] Expected<Value *> ValOrErr = recordValue(Record, 1, TT); if (Error Err = ValOrErr.takeError()) return Err; ValOrErr.get(); break; } - case bitc::VST_CODE_FNENTRY: { - // VST_CODE_FNENTRY: [valueid, offset, namechar x N] + case bitc::VST_CODE_FNENTRY: { // [valueid, offset, namechar x N] Expected<Value *> ValOrErr = recordValue(Record, 2, TT); if (Error Err = ValOrErr.takeError()) return Err; @@ -2609,6 +2676,271 @@ bool BitcodeReaderBase::readBlockInfo() { return false; } +std::pair<StringRef, ArrayRef<uint64_t>> +BitcodeReaderBase::readNameFromStrtab(ArrayRef<uint64_t> Record) { + if (!UseStrtab) + return {"", Record}; + if (Record[0] + Record[1] > Strtab.size()) + return {"", {}}; + return {StringRef(Strtab.data() + Record[0], Record[1]), Record.slice(2)}; +} + +Error BitcodeReader::readComdatRecord(ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + if (Record.size() < 1) + return error("Invalid record"); + Comdat::SelectionKind SK = getDecodedComdatSelectionKind(Record[0]); + std::string OldFormatName; + if (!UseStrtab) { + if (Record.size() < 2) + return error("Invalid record"); + unsigned ComdatNameSize = Record[1]; + OldFormatName.reserve(ComdatNameSize); + for (unsigned i = 0; i != ComdatNameSize; ++i) + OldFormatName += (char)Record[2 + i]; + Name = OldFormatName; + } + Comdat *C = TheModule->getOrInsertComdat(Name); + C->setSelectionKind(SK); + ComdatList.push_back(C); + return Error::success(); +} + +Error BitcodeReader::readGlobalVarRecord(ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + if (Record.size() < 6) + return error("Invalid record"); + Type *Ty = getTypeByID(Record[0]); + if (!Ty) + return error("Invalid record"); + bool isConstant = Record[1] & 1; + bool explicitType = Record[1] & 2; + unsigned AddressSpace; + if (explicitType) { + AddressSpace = Record[1] >> 2; + } else { + if (!Ty->isPointerTy()) + return error("Invalid type for value"); + AddressSpace = cast<PointerType>(Ty)->getAddressSpace(); + Ty = cast<PointerType>(Ty)->getElementType(); + } + + uint64_t RawLinkage = Record[3]; + GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); + unsigned Alignment; + if (Error Err = parseAlignmentValue(Record[4], Alignment)) + return Err; + std::string Section; + if (Record[5]) { + if (Record[5] - 1 >= SectionTable.size()) + return error("Invalid ID"); + Section = SectionTable[Record[5] - 1]; + } + GlobalValue::VisibilityTypes Visibility = GlobalValue::DefaultVisibility; + // Local linkage must have default visibility. + if (Record.size() > 6 && !GlobalValue::isLocalLinkage(Linkage)) + // FIXME: Change to an error if non-default in 4.0. + Visibility = getDecodedVisibility(Record[6]); + + GlobalVariable::ThreadLocalMode TLM = GlobalVariable::NotThreadLocal; + if (Record.size() > 7) + TLM = getDecodedThreadLocalMode(Record[7]); + + GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; + if (Record.size() > 8) + UnnamedAddr = getDecodedUnnamedAddrType(Record[8]); + + bool ExternallyInitialized = false; + if (Record.size() > 9) + ExternallyInitialized = Record[9]; + + GlobalVariable *NewGV + new GlobalVariable(*TheModule, Ty, isConstant, Linkage, nullptr, Name, + nullptr, TLM, AddressSpace, ExternallyInitialized); + NewGV->setAlignment(Alignment); + if (!Section.empty()) + NewGV->setSection(Section); + NewGV->setVisibility(Visibility); + NewGV->setUnnamedAddr(UnnamedAddr); + + if (Record.size() > 10) + NewGV->setDLLStorageClass(getDecodedDLLStorageClass(Record[10])); + else + upgradeDLLImportExportLinkage(NewGV, RawLinkage); + + ValueList.push_back(NewGV); + + // Remember which value to use for the global initializer. + if (unsigned InitID = Record[2]) + GlobalInits.push_back(std::make_pair(NewGV, InitID - 1)); + + if (Record.size() > 11) { + if (unsigned ComdatID = Record[11]) { + if (ComdatID > ComdatList.size()) + return error("Invalid global variable comdat ID"); + NewGV->setComdat(ComdatList[ComdatID - 1]); + } + } else if (hasImplicitComdat(RawLinkage)) { + NewGV->setComdat(reinterpret_cast<Comdat *>(1)); + } + return Error::success(); +} + +Error BitcodeReader::readFunctionRecord(ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + if (Record.size() < 8) + return error("Invalid record"); + Type *Ty = getTypeByID(Record[0]); + if (!Ty) + return error("Invalid record"); + if (auto *PTy = dyn_cast<PointerType>(Ty)) + Ty = PTy->getElementType(); + auto *FTy = dyn_cast<FunctionType>(Ty); + if (!FTy) + return error("Invalid type for value"); + auto CC = static_cast<CallingConv::ID>(Record[1]); + if (CC & ~CallingConv::MaxID) + return error("Invalid calling convention ID"); + + Function *Func + Function::Create(FTy, GlobalValue::ExternalLinkage, Name, TheModule); + + Func->setCallingConv(CC); + bool isProto = Record[2]; + uint64_t RawLinkage = Record[3]; + Func->setLinkage(getDecodedLinkage(RawLinkage)); + Func->setAttributes(getAttributes(Record[4])); + + unsigned Alignment; + if (Error Err = parseAlignmentValue(Record[5], Alignment)) + return Err; + Func->setAlignment(Alignment); + if (Record[6]) { + if (Record[6] - 1 >= SectionTable.size()) + return error("Invalid ID"); + Func->setSection(SectionTable[Record[6] - 1]); + } + // Local linkage must have default visibility. + if (!Func->hasLocalLinkage()) + // FIXME: Change to an error if non-default in 4.0. + Func->setVisibility(getDecodedVisibility(Record[7])); + if (Record.size() > 8 && Record[8]) { + if (Record[8] - 1 >= GCTable.size()) + return error("Invalid ID"); + Func->setGC(GCTable[Record[8] - 1]); + } + GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; + if (Record.size() > 9) + UnnamedAddr = getDecodedUnnamedAddrType(Record[9]); + Func->setUnnamedAddr(UnnamedAddr); + if (Record.size() > 10 && Record[10] != 0) + FunctionPrologues.push_back(std::make_pair(Func, Record[10] - 1)); + + if (Record.size() > 11) + Func->setDLLStorageClass(getDecodedDLLStorageClass(Record[11])); + else + upgradeDLLImportExportLinkage(Func, RawLinkage); + + if (Record.size() > 12) { + if (unsigned ComdatID = Record[12]) { + if (ComdatID > ComdatList.size()) + return error("Invalid function comdat ID"); + Func->setComdat(ComdatList[ComdatID - 1]); + } + } else if (hasImplicitComdat(RawLinkage)) { + Func->setComdat(reinterpret_cast<Comdat *>(1)); + } + + if (Record.size() > 13 && Record[13] != 0) + FunctionPrefixes.push_back(std::make_pair(Func, Record[13] - 1)); + + if (Record.size() > 14 && Record[14] != 0) + FunctionPersonalityFns.push_back(std::make_pair(Func, Record[14] - 1)); + + ValueList.push_back(Func); + + // If this is a function with a body, remember the prototype we are + // creating now, so that we can match up the body with them later. + if (!isProto) { + Func->setIsMaterializable(true); + FunctionsWithBodies.push_back(Func); + DeferredFunctionInfo[Func] = 0; + } + return Error::success(); +} + +Error BitcodeReader::readGlobalIndirectSymbolRecord(unsigned BitCode, + ArrayRef<uint64_t> Record) { + StringRef Name; + std::tie(Name, Record) = readNameFromStrtab(Record); + + bool NewRecord = BitCode != bitc::MODULE_CODE_ALIAS_OLD; + if (Record.size() < (3 + (unsigned)NewRecord)) + return error("Invalid record"); + unsigned OpNum = 0; + Type *Ty = getTypeByID(Record[OpNum++]); + if (!Ty) + return error("Invalid record"); + + unsigned AddrSpace; + if (!NewRecord) { + auto *PTy = dyn_cast<PointerType>(Ty); + if (!PTy) + return error("Invalid type for value"); + Ty = PTy->getElementType(); + AddrSpace = PTy->getAddressSpace(); + } else { + AddrSpace = Record[OpNum++]; + } + + auto Val = Record[OpNum++]; + auto Linkage = Record[OpNum++]; + GlobalIndirectSymbol *NewGA; + if (BitCode == bitc::MODULE_CODE_ALIAS || + BitCode == bitc::MODULE_CODE_ALIAS_OLD) + NewGA = GlobalAlias::create(Ty, AddrSpace, getDecodedLinkage(Linkage), Name, + TheModule); + else + NewGA = GlobalIFunc::create(Ty, AddrSpace, getDecodedLinkage(Linkage), Name, + nullptr, TheModule); + // Old bitcode files didn't have visibility field. + // Local linkage must have default visibility. + if (OpNum != Record.size()) { + auto VisInd = OpNum++; + if (!NewGA->hasLocalLinkage()) + // FIXME: Change to an error if non-default in 4.0. + NewGA->setVisibility(getDecodedVisibility(Record[VisInd])); + } + if (OpNum != Record.size()) + NewGA->setDLLStorageClass(getDecodedDLLStorageClass(Record[OpNum++])); + else + upgradeDLLImportExportLinkage(NewGA, Linkage); + if (OpNum != Record.size()) + NewGA->setThreadLocalMode(getDecodedThreadLocalMode(Record[OpNum++])); + if (OpNum != Record.size()) + NewGA->setUnnamedAddr(getDecodedUnnamedAddrType(Record[OpNum++])); + ValueList.push_back(NewGA); + IndirectSymbolInits.push_back(std::make_pair(NewGA, Val)); + return Error::success(); +} + +Error BitcodeReaderBase::readVersionRecord(ArrayRef<uint64_t> Record) { + if (Record.size() < 1) + return error("Invalid record"); + unsigned ModuleVersion = Record[0]; + if (ModuleVersion > 2) + return error("Invalid value"); + UseRelativeIDs = ModuleVersion >= 1; + UseStrtab = ModuleVersion >= 2; + return Error::success(); +} + Error BitcodeReader::parseModule(uint64_t ResumeBit, bool ShouldLazyLoadMetadata) { if (ResumeBit) @@ -2617,8 +2949,6 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, return error("Invalid record"); SmallVector<uint64_t, 64> Record; - std::vector<std::string> SectionTable; - std::vector<std::string> GCTable; // Read all the records for this module. while (true) { @@ -2765,20 +3095,8 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, switch (BitCode) { default: break; // Default behavior, ignore unknown content. case bitc::MODULE_CODE_VERSION: { // VERSION: [version#] - if (Record.size() < 1) - return error("Invalid record"); - // Only version #0 and #1 are supported so far. - unsigned module_version = Record[0]; - switch (module_version) { - default: - return error("Invalid value"); - case 0: - UseRelativeIDs = false; - break; - case 1: - UseRelativeIDs = true; - break; - } + if (Error Err = readVersionRecord(Record)) + return Err; break; } case bitc::MODULE_CODE_TRIPLE: { // TRIPLE: [strchr x N] @@ -2825,184 +3143,25 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, break; } case bitc::MODULE_CODE_COMDAT: { // COMDAT: [selection_kind, name] - if (Record.size() < 2) - return error("Invalid record"); - Comdat::SelectionKind SK = getDecodedComdatSelectionKind(Record[0]); - unsigned ComdatNameSize = Record[1]; - std::string ComdatName; - ComdatName.reserve(ComdatNameSize); - for (unsigned i = 0; i != ComdatNameSize; ++i) - ComdatName += (char)Record[2 + i]; - Comdat *C = TheModule->getOrInsertComdat(ComdatName); - C->setSelectionKind(SK); - ComdatList.push_back(C); - break; - } - // GLOBALVAR: [pointer type, isconst, initid, + if (Error Err = readComdatRecord(Record)) + return Err; + break; + } + // GLOBALVAR: [(name_offset, )pointer type, isconst, initid, // linkage, alignment, section, visibility, threadlocal, // unnamed_addr, externally_initialized, dllstorageclass, // comdat] case bitc::MODULE_CODE_GLOBALVAR: { - if (Record.size() < 6) - return error("Invalid record"); - Type *Ty = getTypeByID(Record[0]); - if (!Ty) - return error("Invalid record"); - bool isConstant = Record[1] & 1; - bool explicitType = Record[1] & 2; - unsigned AddressSpace; - if (explicitType) { - AddressSpace = Record[1] >> 2; - } else { - if (!Ty->isPointerTy()) - return error("Invalid type for value"); - AddressSpace = cast<PointerType>(Ty)->getAddressSpace(); - Ty = cast<PointerType>(Ty)->getElementType(); - } - - uint64_t RawLinkage = Record[3]; - GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - unsigned Alignment; - if (Error Err = parseAlignmentValue(Record[4], Alignment)) + if (Error Err = readGlobalVarRecord(Record)) return Err; - std::string Section; - if (Record[5]) { - if (Record[5]-1 >= SectionTable.size()) - return error("Invalid ID"); - Section = SectionTable[Record[5]-1]; - } - GlobalValue::VisibilityTypes Visibility = GlobalValue::DefaultVisibility; - // Local linkage must have default visibility. - if (Record.size() > 6 && !GlobalValue::isLocalLinkage(Linkage)) - // FIXME: Change to an error if non-default in 4.0. - Visibility = getDecodedVisibility(Record[6]); - - GlobalVariable::ThreadLocalMode TLM = GlobalVariable::NotThreadLocal; - if (Record.size() > 7) - TLM = getDecodedThreadLocalMode(Record[7]); - - GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; - if (Record.size() > 8) - UnnamedAddr = getDecodedUnnamedAddrType(Record[8]); - - bool ExternallyInitialized = false; - if (Record.size() > 9) - ExternallyInitialized = Record[9]; - - GlobalVariable *NewGV - new GlobalVariable(*TheModule, Ty, isConstant, Linkage, nullptr, "", nullptr, - TLM, AddressSpace, ExternallyInitialized); - NewGV->setAlignment(Alignment); - if (!Section.empty()) - NewGV->setSection(Section); - NewGV->setVisibility(Visibility); - NewGV->setUnnamedAddr(UnnamedAddr); - - if (Record.size() > 10) - NewGV->setDLLStorageClass(getDecodedDLLStorageClass(Record[10])); - else - upgradeDLLImportExportLinkage(NewGV, RawLinkage); - - ValueList.push_back(NewGV); - - // Remember which value to use for the global initializer. - if (unsigned InitID = Record[2]) - GlobalInits.push_back(std::make_pair(NewGV, InitID-1)); - - if (Record.size() > 11) { - if (unsigned ComdatID = Record[11]) { - if (ComdatID > ComdatList.size()) - return error("Invalid global variable comdat ID"); - NewGV->setComdat(ComdatList[ComdatID - 1]); - } - } else if (hasImplicitComdat(RawLinkage)) { - NewGV->setComdat(reinterpret_cast<Comdat *>(1)); - } - break; } - // FUNCTION: [type, callingconv, isproto, linkage, paramattr, + // FUNCTION: [(name_offset, )type, callingconv, isproto, linkage, paramattr, // alignment, section, visibility, gc, unnamed_addr, // prologuedata, dllstorageclass, comdat, prefixdata] case bitc::MODULE_CODE_FUNCTION: { - if (Record.size() < 8) - return error("Invalid record"); - Type *Ty = getTypeByID(Record[0]); - if (!Ty) - return error("Invalid record"); - if (auto *PTy = dyn_cast<PointerType>(Ty)) - Ty = PTy->getElementType(); - auto *FTy = dyn_cast<FunctionType>(Ty); - if (!FTy) - return error("Invalid type for value"); - auto CC = static_cast<CallingConv::ID>(Record[1]); - if (CC & ~CallingConv::MaxID) - return error("Invalid calling convention ID"); - - Function *Func = Function::Create(FTy, GlobalValue::ExternalLinkage, - "", TheModule); - - Func->setCallingConv(CC); - bool isProto = Record[2]; - uint64_t RawLinkage = Record[3]; - Func->setLinkage(getDecodedLinkage(RawLinkage)); - Func->setAttributes(getAttributes(Record[4])); - - unsigned Alignment; - if (Error Err = parseAlignmentValue(Record[5], Alignment)) + if (Error Err = readFunctionRecord(Record)) return Err; - Func->setAlignment(Alignment); - if (Record[6]) { - if (Record[6]-1 >= SectionTable.size()) - return error("Invalid ID"); - Func->setSection(SectionTable[Record[6]-1]); - } - // Local linkage must have default visibility. - if (!Func->hasLocalLinkage()) - // FIXME: Change to an error if non-default in 4.0. - Func->setVisibility(getDecodedVisibility(Record[7])); - if (Record.size() > 8 && Record[8]) { - if (Record[8]-1 >= GCTable.size()) - return error("Invalid ID"); - Func->setGC(GCTable[Record[8] - 1]); - } - GlobalValue::UnnamedAddr UnnamedAddr = GlobalValue::UnnamedAddr::None; - if (Record.size() > 9) - UnnamedAddr = getDecodedUnnamedAddrType(Record[9]); - Func->setUnnamedAddr(UnnamedAddr); - if (Record.size() > 10 && Record[10] != 0) - FunctionPrologues.push_back(std::make_pair(Func, Record[10]-1)); - - if (Record.size() > 11) - Func->setDLLStorageClass(getDecodedDLLStorageClass(Record[11])); - else - upgradeDLLImportExportLinkage(Func, RawLinkage); - - if (Record.size() > 12) { - if (unsigned ComdatID = Record[12]) { - if (ComdatID > ComdatList.size()) - return error("Invalid function comdat ID"); - Func->setComdat(ComdatList[ComdatID - 1]); - } - } else if (hasImplicitComdat(RawLinkage)) { - Func->setComdat(reinterpret_cast<Comdat *>(1)); - } - - if (Record.size() > 13 && Record[13] != 0) - FunctionPrefixes.push_back(std::make_pair(Func, Record[13]-1)); - - if (Record.size() > 14 && Record[14] != 0) - FunctionPersonalityFns.push_back(std::make_pair(Func, Record[14] - 1)); - - ValueList.push_back(Func); - - // If this is a function with a body, remember the prototype we are - // creating now, so that we can match up the body with them later. - if (!isProto) { - Func->setIsMaterializable(true); - FunctionsWithBodies.push_back(Func); - DeferredFunctionInfo[Func] = 0; - } break; } // ALIAS: [alias type, addrspace, aliasee val#, linkage] @@ -3011,54 +3170,10 @@ Error BitcodeReader::parseModule(uint64_t ResumeBit, case bitc::MODULE_CODE_IFUNC: case bitc::MODULE_CODE_ALIAS: case bitc::MODULE_CODE_ALIAS_OLD: { - bool NewRecord = BitCode != bitc::MODULE_CODE_ALIAS_OLD; - if (Record.size() < (3 + (unsigned)NewRecord)) - return error("Invalid record"); - unsigned OpNum = 0; - Type *Ty = getTypeByID(Record[OpNum++]); - if (!Ty) - return error("Invalid record"); - - unsigned AddrSpace; - if (!NewRecord) { - auto *PTy = dyn_cast<PointerType>(Ty); - if (!PTy) - return error("Invalid type for value"); - Ty = PTy->getElementType(); - AddrSpace = PTy->getAddressSpace(); - } else { - AddrSpace = Record[OpNum++]; - } - - auto Val = Record[OpNum++]; - auto Linkage = Record[OpNum++]; - GlobalIndirectSymbol *NewGA; - if (BitCode == bitc::MODULE_CODE_ALIAS || - BitCode == bitc::MODULE_CODE_ALIAS_OLD) - NewGA = GlobalAlias::create(Ty, AddrSpace, getDecodedLinkage(Linkage), - "", TheModule); - else - NewGA = GlobalIFunc::create(Ty, AddrSpace, getDecodedLinkage(Linkage), - "", nullptr, TheModule); - // Old bitcode files didn't have visibility field. - // Local linkage must have default visibility. - if (OpNum != Record.size()) { - auto VisInd = OpNum++; - if (!NewGA->hasLocalLinkage()) - // FIXME: Change to an error if non-default in 4.0. - NewGA->setVisibility(getDecodedVisibility(Record[VisInd])); - } - if (OpNum != Record.size()) - NewGA->setDLLStorageClass(getDecodedDLLStorageClass(Record[OpNum++])); - else - upgradeDLLImportExportLinkage(NewGA, Linkage); - if (OpNum != Record.size()) - NewGA->setThreadLocalMode(getDecodedThreadLocalMode(Record[OpNum++])); - if (OpNum != Record.size()) - NewGA->setUnnamedAddr(getDecodedUnnamedAddrType(Record[OpNum++])); - ValueList.push_back(NewGA); - IndirectSymbolInits.push_back(std::make_pair(NewGA, Val)); + if (Error Err = readGlobalIndirectSymbolRecord(BitCode, Record)) + return Err; break; + } /// MODULE_CODE_VSTOFFSET: [offset] case bitc::MODULE_CODE_VSTOFFSET: @@ -4535,8 +4650,8 @@ std::vector<StructType *> BitcodeReader::getIdentifiedStructTypes() const { } ModuleSummaryIndexBitcodeReader::ModuleSummaryIndexBitcodeReader( - BitstreamCursor Cursor, ModuleSummaryIndex &TheIndex) - : BitcodeReaderBase(std::move(Cursor)), TheIndex(TheIndex) {} + BitstreamCursor Cursor, StringRef Strtab, ModuleSummaryIndex &TheIndex) + : BitcodeReaderBase(std::move(Cursor), Strtab), TheIndex(TheIndex) {} std::pair<GlobalValue::GUID, GlobalValue::GUID> ModuleSummaryIndexBitcodeReader::getGUIDFromValueId(unsigned ValueId) { @@ -4560,7 +4675,7 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( SmallVector<uint64_t, 64> Record; // Read all the records for this value table. - SmallString<128> ValueName; + SmallString<128> Buf; while (true) { BitstreamEntry Entry = Stream.advanceSkippingSubblocks(); @@ -4580,12 +4695,17 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( // Read a record. Record.clear(); - switch (Stream.readRecord(Entry.ID, Record)) { + switch (unsigned Code = Stream.readRecord(Entry.ID, Record)) { default: // Default behavior: ignore (e.g. VST_CODE_BBENTRY records). break; - case bitc::VST_CODE_ENTRY: { // VST_CODE_ENTRY: [valueid, namechar x N] - if (convertToString(Record, 1, ValueName)) + case bitc::VST_CODE_ENTRY: { // [valueid, namechar x N] + if (UseStrtab) + break; + StringRef ValueName; + Buf.clear(); + if (convertToString(Record, 1, Buf)) return error("Invalid record"); + ValueName = Buf; unsigned ValueID = Record[0]; assert(!SourceFileName.empty()); auto VLI = ValueIdToLinkageMap.find(ValueID); @@ -4603,13 +4723,17 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( << ValueName << "\n"; ValueIdToCallGraphGUIDMap[ValueID] std::make_pair(ValueGUID, OriginalNameID); - ValueName.clear(); + Buf.clear(); break; } - case bitc::VST_CODE_FNENTRY: { - // VST_CODE_FNENTRY: [valueid, offset, namechar x N] - if (convertToString(Record, 2, ValueName)) + case bitc::VST_CODE_FNENTRY: { // [valueid, offset, namechar x N] + if (UseStrtab) + break; + StringRef ValueName; + Buf.clear(); + if (convertToString(Record, 2, Buf)) return error("Invalid record"); + ValueName = Buf; unsigned ValueID = Record[0]; assert(!SourceFileName.empty()); auto VLI = ValueIdToLinkageMap.find(ValueID); @@ -4627,8 +4751,6 @@ Error ModuleSummaryIndexBitcodeReader::parseValueSymbolTable( << ValueName << "\n"; ValueIdToCallGraphGUIDMap[ValueID] std::make_pair(FunctionGUID, OriginalNameID); - - ValueName.clear(); break; } case bitc::VST_CODE_COMBINED_ENTRY: { @@ -4715,6 +4837,11 @@ Error ModuleSummaryIndexBitcodeReader::parseModule(StringRef ModulePath) { default: break; // Default behavior, ignore unknown content. /// MODULE_CODE_SOURCE_FILENAME: [namechar x N] + case bitc::MODULE_CODE_VERSION: { + if (Error Err = readVersionRecord(Record)) + return Err; + break; + } case bitc::MODULE_CODE_SOURCE_FILENAME: { SmallString<128> ValueName; if (convertToString(Record, 0, ValueName)) @@ -4748,37 +4875,35 @@ Error ModuleSummaryIndexBitcodeReader::parseModule(StringRef ModulePath) { // was historically always the start of the regular bitcode header. VSTOffset = Record[0] - 1; break; - // GLOBALVAR: [pointer type, isconst, initid, - // linkage, alignment, section, visibility, threadlocal, - // unnamed_addr, externally_initialized, dllstorageclass, - // comdat] - case bitc::MODULE_CODE_GLOBALVAR: { - if (Record.size() < 6) - return error("Invalid record"); - uint64_t RawLinkage = Record[3]; - GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - ValueIdToLinkageMap[ValueId++] = Linkage; - break; - } - // FUNCTION: [type, callingconv, isproto, linkage, paramattr, - // alignment, section, visibility, gc, unnamed_addr, - // prologuedata, dllstorageclass, comdat, prefixdata] - case bitc::MODULE_CODE_FUNCTION: { - if (Record.size() < 8) - return error("Invalid record"); - uint64_t RawLinkage = Record[3]; - GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - ValueIdToLinkageMap[ValueId++] = Linkage; - break; - } - // ALIAS: [alias type, addrspace, aliasee val#, linkage, visibility, - // dllstorageclass] + // GLOBALVAR: [pointer type, isconst, initid, linkage, ...] + // FUNCTION: [type, callingconv, isproto, linkage, ...] + // ALIAS: [alias type, addrspace, aliasee val#, linkage, ...] + case bitc::MODULE_CODE_GLOBALVAR: + case bitc::MODULE_CODE_FUNCTION: case bitc::MODULE_CODE_ALIAS: { - if (Record.size() < 6) + StringRef Name; + ArrayRef<uint64_t> GVRecord; + std::tie(Name, GVRecord) = readNameFromStrtab(Record); + if (GVRecord.size() <= 3) return error("Invalid record"); - uint64_t RawLinkage = Record[3]; + uint64_t RawLinkage = GVRecord[3]; GlobalValue::LinkageTypes Linkage = getDecodedLinkage(RawLinkage); - ValueIdToLinkageMap[ValueId++] = Linkage; + if (!UseStrtab) { + ValueIdToLinkageMap[ValueId++] = Linkage; + break; + } + + std::string GlobalId + GlobalValue::getGlobalIdentifier(Name, Linkage, SourceFileName); + auto ValueGUID = GlobalValue::getGUID(GlobalId); + auto OriginalNameID = ValueGUID; + if (GlobalValue::isLocalLinkage(Linkage)) + OriginalNameID = GlobalValue::getGUID(Name); + if (PrintSummaryGUIDs) + dbgs() << "GUID " << ValueGUID << "(" << OriginalNameID << ") is " + << Name << "\n"; + ValueIdToCallGraphGUIDMap[ValueId++] + std::make_pair(ValueGUID, OriginalNameID); break; } } @@ -4889,6 +5014,13 @@ Error ModuleSummaryIndexBitcodeReader::parseEntireSummary( switch (BitCode) { default: // Default behavior: ignore. break; + + case bitc::FS_VALUE_GUID: { + uint64_t ValueID = Record[0]; + GlobalValue::GUID RefGUID = Record[1]; + ValueIdToCallGraphGUIDMap[ValueID] = std::make_pair(RefGUID, RefGUID); + break; + } // FS_PERMODULE: [valueid, flags, instcount, numrefs, numrefs x valueid, // n x (valueid)] // FS_PERMODULE_PROFILE: [valueid, flags, instcount, numrefs, @@ -5193,6 +5325,35 @@ const std::error_category &llvm::BitcodeErrorCategory() { return *ErrorCategory; } +static Expected<StringRef> readStrtab(BitstreamCursor &Stream) { + if (Stream.EnterSubBlock(bitc::STRTAB_BLOCK_ID)) + return error("Invalid record"); + + StringRef Strtab; + while (1) { + BitstreamEntry Entry = Stream.advance(); + switch (Entry.Kind) { + case BitstreamEntry::EndBlock: + return Strtab; + + case BitstreamEntry::Error: + return error("Malformed block"); + + case BitstreamEntry::SubBlock: + if (Stream.SkipBlock()) + return error("Malformed block"); + break; + + case BitstreamEntry::Record: + StringRef Blob; + SmallVector<uint64_t, 1> Record; + if (Stream.readRecord(Entry.ID, Record, &Blob) == bitc::STRTAB_BLOB) + Strtab = Blob; + break; + } + } +} + //===----------------------------------------------------------------------===// // External interface //===----------------------------------------------------------------------===// @@ -5245,6 +5406,22 @@ llvm::getBitcodeModuleList(MemoryBufferRef Buffer) { continue; } + if (Entry.ID == bitc::STRTAB_BLOCK_ID) { + Expected<StringRef> Strtab = readStrtab(Stream); + if (!Strtab) + return Strtab.takeError(); + // This string table is used by every preceding bitcode module that does + // not have its own string table. A bitcode file may have multiple + // string tables if it was created by binary concatenation, for example + // with "llvm-cat -b". + for (auto I = Modules.rbegin(), E = Modules.rend(); I != E; ++I) { + if (!I->Strtab.empty()) + break; + I->Strtab = *Strtab; + } + continue; + } + if (Stream.SkipBlock()) return error("Malformed block"); continue; @@ -5281,8 +5458,8 @@ BitcodeModule::getModuleImpl(LLVMContext &Context, bool MaterializeAll, } Stream.JumpToBit(ModuleBit); - auto *R - new BitcodeReader(std::move(Stream), ProducerIdentification, Context); + auto *R = new BitcodeReader(std::move(Stream), Strtab, ProducerIdentification, + Context); std::unique_ptr<Module> M llvm::make_unique<Module>(ModuleIdentifier, Context); @@ -5317,7 +5494,7 @@ Expected<std::unique_ptr<ModuleSummaryIndex>> BitcodeModule::getSummary() { Stream.JumpToBit(ModuleBit); auto Index = llvm::make_unique<ModuleSummaryIndex>(); - ModuleSummaryIndexBitcodeReader R(std::move(Stream), *Index); + ModuleSummaryIndexBitcodeReader R(std::move(Stream), Strtab, *Index); if (Error Err = R.parseModule(ModuleIdentifier)) return std::move(Err); diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp index 1bdb439086a..55105aa099d 100644 --- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp +++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp @@ -28,6 +28,7 @@ #include "llvm/IR/Operator.h" #include "llvm/IR/UseListOrder.h" #include "llvm/IR/ValueSymbolTable.h" +#include "llvm/MC/StringTableBuilder.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/Program.h" @@ -96,6 +97,8 @@ class ModuleBitcodeWriter : public BitcodeWriterBase { /// Pointer to the buffer allocated by caller for bitcode writing. const SmallVectorImpl<char> &Buffer; + StringTableBuilder &StrtabBuilder; + /// The Module to write to bitcode. const Module &M; @@ -131,11 +134,12 @@ public: /// Constructs a ModuleBitcodeWriter object for the given Module, /// writing to the provided \p Buffer. ModuleBitcodeWriter(const Module *M, SmallVectorImpl<char> &Buffer, + StringTableBuilder &StrtabBuilder, BitstreamWriter &Stream, bool ShouldPreserveUseListOrder, const ModuleSummaryIndex *Index, bool GenerateHash, ModuleHash *ModHash = nullptr) - : BitcodeWriterBase(Stream), Buffer(Buffer), M(*M), - VE(*M, ShouldPreserveUseListOrder), Index(Index), + : BitcodeWriterBase(Stream), Buffer(Buffer), StrtabBuilder(StrtabBuilder), + M(*M), VE(*M, ShouldPreserveUseListOrder), Index(Index), GenerateHash(GenerateHash), ModHash(ModHash), BitcodeStartBit(Stream.GetCurrentBitNo()) { // Assign ValueIds to any callee values in the index that came from @@ -261,9 +265,7 @@ private: SmallVectorImpl<uint64_t> &Vals); void writeInstruction(const Instruction &I, unsigned InstID, SmallVectorImpl<unsigned> &Vals); - void writeValueSymbolTable( - const ValueSymbolTable &VST, bool IsModuleLevel = false, - DenseMap<const Function *, uint64_t> *FunctionToBitcodeIndex = nullptr); + void writeValueSymbolTable(const ValueSymbolTable &VST); void writeUseList(UseListOrder &&Order); void writeUseListBlock(const Function *F); void @@ -478,7 +480,6 @@ public: private: void writeIndex(); void writeModStrings(); - void writeCombinedValueSymbolTable(); void writeCombinedGlobalValueSummary(); /// Indicates whether the provided \p ModulePath should be written into @@ -1049,12 +1050,9 @@ void ModuleBitcodeWriter::writeComdats() { SmallVector<unsigned, 64> Vals; for (const Comdat *C : VE.getComdats()) { // COMDAT: [selection_kind, name] + Vals.push_back(StrtabBuilder.add(C->getName())); + Vals.push_back(C->getName().size()); Vals.push_back(getEncodedComdatSelectionKind(*C)); - size_t Size = C->getName().size(); - assert(isUInt<32>(Size)); - Vals.push_back(Size); - for (char Chr : C->getName()) - Vals.push_back((unsigned char)Chr); Stream.EmitRecord(bitc::MODULE_CODE_COMDAT, Vals, /*AbbrevToUse=*/0); Vals.clear(); } @@ -1166,6 +1164,8 @@ void ModuleBitcodeWriter::writeModuleInfo() { // Add an abbrev for common globals with no visibility or thread localness. auto Abbv = std::make_shared<BitCodeAbbrev>(); Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_GLOBALVAR)); + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, Log2_32_Ceil(MaxGlobalType+1))); Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // AddrSpace << 2 @@ -1189,15 +1189,42 @@ void ModuleBitcodeWriter::writeModuleInfo() { SimpleGVarAbbrev = Stream.EmitAbbrev(std::move(Abbv)); } - // Emit the global variable information. SmallVector<unsigned, 64> Vals; + // Emit the module's source file name. + { + StringEncoding Bits = getStringEncoding(M.getSourceFileName().data(), + M.getSourceFileName().size()); + BitCodeAbbrevOp AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8); + if (Bits == SE_Char6) + AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Char6); + else if (Bits == SE_Fixed7) + AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7); + + // MODULE_CODE_SOURCE_FILENAME: [namechar x N] + auto Abbv = std::make_shared<BitCodeAbbrev>(); + Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_SOURCE_FILENAME)); + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); + Abbv->Add(AbbrevOpToUse); + unsigned FilenameAbbrev = Stream.EmitAbbrev(std::move(Abbv)); + + for (const auto P : M.getSourceFileName()) + Vals.push_back((unsigned char)P); + + // Emit the finished record. + Stream.EmitRecord(bitc::MODULE_CODE_SOURCE_FILENAME, Vals, FilenameAbbrev); + Vals.clear(); + } + + // Emit the global variable information. for (const GlobalVariable &GV : M.globals()) { unsigned AbbrevToUse = 0; - // GLOBALVAR: [type, isconst, initid, + // GLOBALVAR: [(name, )type, isconst, initid, // linkage, alignment, section, visibility, threadlocal, // unnamed_addr, externally_initialized, dllstorageclass, // comdat] + Vals.push_back(StrtabBuilder.add(GV.getName())); + Vals.push_back(GV.getName().size()); Vals.push_back(VE.getTypeID(GV.getValueType())); Vals.push_back(GV.getType()->getAddressSpace() << 2 | 2 | GV.isConstant()); Vals.push_back(GV.isDeclaration() ? 0 : @@ -1230,6 +1257,8 @@ void ModuleBitcodeWriter::writeModuleInfo() { // FUNCTION: [type, callingconv, isproto, linkage, paramattrs, alignment, // section, visibility, gc, unnamed_addr, prologuedata, // dllstorageclass, comdat, prefixdata, personalityfn] + Vals.push_back(StrtabBuilder.add(F.getName())); + Vals.push_back(F.getName().size()); Vals.push_back(VE.getTypeID(F.getFunctionType())); Vals.push_back(F.getCallingConv()); Vals.push_back(F.isDeclaration()); @@ -1258,6 +1287,8 @@ void ModuleBitcodeWriter::writeModuleInfo() { for (const GlobalAlias &A : M.aliases()) { // ALIAS: [alias type, aliasee val#, linkage, visibility, dllstorageclass, // threadlocal, unnamed_addr] + Vals.push_back(StrtabBuilder.add(A.getName())); + Vals.push_back(A.getName().size()); Vals.push_back(VE.getTypeID(A.getValueType())); Vals.push_back(A.getType()->getAddressSpace()); Vals.push_back(VE.getValueID(A.getAliasee())); @@ -1274,6 +1305,8 @@ void ModuleBitcodeWriter::writeModuleInfo() { // Emit the ifunc information. for (const GlobalIFunc &I : M.ifuncs()) { // IFUNC: [ifunc type, address space, resolver val#, linkage, visibility] + Vals.push_back(StrtabBuilder.add(I.getName())); + Vals.push_back(I.getName().size()); Vals.push_back(VE.getTypeID(I.getValueType())); Vals.push_back(I.getType()->getAddressSpace()); Vals.push_back(VE.getValueID(I.getResolver())); @@ -1282,36 +1315,6 @@ void ModuleBitcodeWriter::writeModuleInfo() { Stream.EmitRecord(bitc::MODULE_CODE_IFUNC, Vals); Vals.clear(); } - - // Emit the module's source file name. - { - StringEncoding Bits = getStringEncoding(M.getSourceFileName().data(), - M.getSourceFileName().size()); - BitCodeAbbrevOp AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8); - if (Bits == SE_Char6) - AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Char6); - else if (Bits == SE_Fixed7) - AbbrevOpToUse = BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7); - - // MODULE_CODE_SOURCE_FILENAME: [namechar x N] - auto Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_SOURCE_FILENAME)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(AbbrevOpToUse); - unsigned FilenameAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - - for (const auto P : M.getSourceFileName()) - Vals.push_back((unsigned char)P); - - // Emit the finished record. - Stream.EmitRecord(bitc::MODULE_CODE_SOURCE_FILENAME, Vals, FilenameAbbrev); - Vals.clear(); - } - - // If we have a VST, write the VSTOFFSET record placeholder. - if (M.getValueSymbolTable().empty()) - return; - writeValueSymbolTableForwardDecl(); } static uint64_t getOptimizationFlags(const Value *V) { @@ -2843,75 +2846,12 @@ void ModuleBitcodeWriter::writeInstruction(const Instruction &I, /// Emit names for globals/functions etc. \p IsModuleLevel is true when /// we are writing the module-level VST, where we are including a function /// bitcode index and need to backpatch the VST forward declaration record. -void ModuleBitcodeWriter::writeValueSymbolTable( - const ValueSymbolTable &VST, bool IsModuleLevel, - DenseMap<const Function *, uint64_t> *FunctionToBitcodeIndex) { - if (VST.empty()) { - // writeValueSymbolTableForwardDecl should have returned early as - // well. Ensure this handling remains in sync by asserting that - // the placeholder offset is not set. - assert(!IsModuleLevel || !hasVSTOffsetPlaceholder()); +void ModuleBitcodeWriter::writeValueSymbolTable(const ValueSymbolTable &VST) { + if (VST.empty()) return; - } - - if (IsModuleLevel && hasVSTOffsetPlaceholder()) { - // Get the offset of the VST we are writing, and backpatch it into - // the VST forward declaration record. - uint64_t VSTOffset = Stream.GetCurrentBitNo(); - // The BitcodeStartBit was the stream offset of the identification block. - VSTOffset -= bitcodeStartBit(); - assert((VSTOffset & 31) == 0 && "VST block not 32-bit aligned"); - // Note that we add 1 here because the offset is relative to one word - // before the start of the identification block, which was historically - // always the start of the regular bitcode header. - Stream.BackpatchWord(VSTOffsetPlaceholder, VSTOffset / 32 + 1); - } Stream.EnterSubblock(bitc::VALUE_SYMTAB_BLOCK_ID, 4); - // For the module-level VST, add abbrev Ids for the VST_CODE_FNENTRY - // records, which are not used in the per-function VSTs. - unsigned FnEntry8BitAbbrev; - unsigned FnEntry7BitAbbrev; - unsigned FnEntry6BitAbbrev; - unsigned GUIDEntryAbbrev; - if (IsModuleLevel && hasVSTOffsetPlaceholder()) { - // 8-bit fixed-width VST_CODE_FNENTRY function strings. - auto Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8)); - FnEntry8BitAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - - // 7-bit fixed width VST_CODE_FNENTRY function strings. - Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7)); - FnEntry7BitAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - - // 6-bit char6 VST_CODE_FNENTRY function strings. - Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Char6)); - FnEntry6BitAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - - // FIXME: Change the name of this record as it is now used by - // the per-module index as well. - Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_COMBINED_ENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // refguid - GUIDEntryAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - } - // FIXME: Set up the abbrev, we know how many values there are! // FIXME: We know if the type names can use 7-bit ascii. SmallVector<uint64_t, 64> NameVals; @@ -2924,15 +2864,6 @@ void ModuleBitcodeWriter::writeValueSymbolTable( unsigned AbbrevToUse = VST_ENTRY_8_ABBREV; NameVals.push_back(VE.getValueID(Name.getValue())); - Function *F = dyn_cast<Function>(Name.getValue()); - if (!F) { - // If value is an alias, need to get the aliased base object to - // see if it is a function. - auto *GA = dyn_cast<GlobalAlias>(Name.getValue()); - if (GA && GA->getBaseObject()) - F = dyn_cast<Function>(GA->getBaseObject()); - } - // VST_CODE_ENTRY: [valueid, namechar x N] // VST_CODE_FNENTRY: [valueid, funcoffset, namechar x N] // VST_CODE_BBENTRY: [bbid, namechar x N] @@ -2941,28 +2872,6 @@ void ModuleBitcodeWriter::writeValueSymbolTable( Code = bitc::VST_CODE_BBENTRY; if (Bits == SE_Char6) AbbrevToUse = VST_BBENTRY_6_ABBREV; - } else if (F && !F->isDeclaration()) { - // Must be the module-level VST, where we pass in the Index and - // have a VSTOffsetPlaceholder. The function-level VST should not - // contain any Function symbols. - assert(FunctionToBitcodeIndex); - assert(hasVSTOffsetPlaceholder()); - - // Save the word offset of the function (from the start of the - // actual bitcode written to the stream). - uint64_t BitcodeIndex = (*FunctionToBitcodeIndex)[F] - bitcodeStartBit(); - assert((BitcodeIndex & 31) == 0 && "function block not 32-bit aligned"); - // Note that we add 1 here because the offset is relative to one word - // before the start of the identification block, which was historically - // always the start of the regular bitcode header. - NameVals.push_back(BitcodeIndex / 32 + 1); - - Code = bitc::VST_CODE_FNENTRY; - AbbrevToUse = FnEntry8BitAbbrev; - if (Bits == SE_Char6) - AbbrevToUse = FnEntry6BitAbbrev; - else if (Bits == SE_Fixed7) - AbbrevToUse = FnEntry7BitAbbrev; } else { Code = bitc::VST_CODE_ENTRY; if (Bits == SE_Char6) @@ -2978,47 +2887,7 @@ void ModuleBitcodeWriter::writeValueSymbolTable( Stream.EmitRecord(Code, NameVals, AbbrevToUse); NameVals.clear(); } - // Emit any GUID valueIDs created for indirect call edges into the - // module-level VST. - if (IsModuleLevel && hasVSTOffsetPlaceholder()) - for (const auto &GI : valueIds()) { - NameVals.push_back(GI.second); - NameVals.push_back(GI.first); - Stream.EmitRecord(bitc::VST_CODE_COMBINED_ENTRY, NameVals, - GUIDEntryAbbrev); - NameVals.clear(); - } - Stream.ExitBlock(); -} - -/// Emit function names and summary offsets for the combined index -/// used by ThinLTO. -void IndexBitcodeWriter::writeCombinedValueSymbolTable() { - assert(hasVSTOffsetPlaceholder() && "Expected non-zero VSTOffsetPlaceholder"); - // Get the offset of the VST we are writing, and backpatch it into - // the VST forward declaration record. - uint64_t VSTOffset = Stream.GetCurrentBitNo(); - assert((VSTOffset & 31) == 0 && "VST block not 32-bit aligned"); - Stream.BackpatchWord(VSTOffsetPlaceholder, VSTOffset / 32); - - Stream.EnterSubblock(bitc::VALUE_SYMTAB_BLOCK_ID, 4); - auto Abbv = std::make_shared<BitCodeAbbrev>(); - Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_COMBINED_ENTRY)); - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // refguid - unsigned EntryAbbrev = Stream.EmitAbbrev(std::move(Abbv)); - - SmallVector<uint64_t, 64> NameVals; - for (const auto &GVI : valueIds()) { - // VST_CODE_COMBINED_ENTRY: [valueid, refguid] - NameVals.push_back(GVI.second); - NameVals.push_back(GVI.first); - - // Emit the finished record. - Stream.EmitRecord(bitc::VST_CODE_COMBINED_ENTRY, NameVals, EntryAbbrev); - NameVals.clear(); - } Stream.ExitBlock(); } @@ -3510,6 +3379,11 @@ void ModuleBitcodeWriter::writePerModuleGlobalValueSummary() { return; } + for (const auto &GVI : valueIds()) { + Stream.EmitRecord(bitc::FS_VALUE_GUID, + ArrayRef<uint64_t>{GVI.second, GVI.first}); + } + // Abbrev for FS_PERMODULE. auto Abbv = std::make_shared<BitCodeAbbrev>(); Abbv->Add(BitCodeAbbrevOp(bitc::FS_PERMODULE)); @@ -3602,6 +3476,39 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() { Stream.EnterSubblock(bitc::GLOBALVAL_SUMMARY_BLOCK_ID, 3); Stream.EmitRecord(bitc::FS_VERSION, ArrayRef<uint64_t>{INDEX_VERSION}); + // Create value IDs for undefined references. + for (const auto &I : *this) { + if (auto *VS = dyn_cast<GlobalVarSummary>(I.second)) { + for (auto &RI : VS->refs()) + getValueId(RI.getGUID()); + continue; + } + + auto *FS = dyn_cast<FunctionSummary>(I.second); + if (!FS) + continue; + for (auto &RI : FS->refs()) + getValueId(RI.getGUID()); + + for (auto &EI : FS->calls()) { + GlobalValue::GUID GUID = EI.first.getGUID(); + if (!hasValueId(GUID)) { + // For SamplePGO, the indirect call targets for local functions will + // have its original name annotated in profile. We try to find the + // corresponding PGOFuncName as the GUID. + GUID = Index.getGUIDFromOriginalID(GUID); + if (GUID == 0 || !hasValueId(GUID)) + continue; + } + getValueId(GUID); + } + } + + for (const auto &GVI : valueIds()) { + Stream.EmitRecord(bitc::FS_VALUE_GUID, + ArrayRef<uint64_t>{GVI.second, GVI.first}); + } + // Abbrev for FS_COMBINED. auto Abbv = std::make_shared<BitCodeAbbrev>(); Abbv->Add(BitCodeAbbrevOp(bitc::FS_COMBINED)); @@ -3816,10 +3723,7 @@ void ModuleBitcodeWriter::write() { Stream.EnterSubblock(bitc::MODULE_BLOCK_ID, 3); size_t BlockStartPos = Buffer.size(); - SmallVector<unsigned, 1> Vals; - unsigned CurVersion = 1; - Vals.push_back(CurVersion); - Stream.EmitRecord(bitc::MODULE_CODE_VERSION, Vals); + Stream.EmitRecord(bitc::MODULE_CODE_VERSION, ArrayRef<uint64_t>{2}); // Emit blockinfo, which defines the standard abbreviations etc. writeBlockInfo(); @@ -3865,9 +3769,6 @@ void ModuleBitcodeWriter::write() { if (Index) writePerModuleGlobalValueSummary(); - writeValueSymbolTable(M.getValueSymbolTable(), - /* IsModuleLevel */ true, &FunctionToBitcodeIndex); - writeModuleHash(BlockStartPos); Stream.ExitBlock(); @@ -3954,13 +3855,45 @@ BitcodeWriter::BitcodeWriter(SmallVectorImpl<char> &Buffer) writeBitcodeHeader(*Stream); } -BitcodeWriter::~BitcodeWriter() = default; +BitcodeWriter::~BitcodeWriter() { assert(WroteStrtab); } + +void BitcodeWriter::writeBlob(unsigned Block, unsigned Record, StringRef Blob) { + Stream->EnterSubblock(Block, 3); + + auto Abbv = std::make_shared<BitCodeAbbrev>(); + Abbv->Add(BitCodeAbbrevOp(Record)); + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); + auto AbbrevNo = Stream->EmitAbbrev(std::move(Abbv)); + + Stream->EmitRecordWithBlob(AbbrevNo, ArrayRef<uint64_t>{Record}, Blob); + + Stream->ExitBlock(); +} + +void BitcodeWriter::writeStrtab() { + assert(!WroteStrtab); + + std::vector<char> Strtab; + StrtabBuilder.finalizeInOrder(); + Strtab.resize(StrtabBuilder.getSize()); + StrtabBuilder.write((uint8_t *)Strtab.data()); + + writeBlob(bitc::STRTAB_BLOCK_ID, bitc::STRTAB_BLOB, + {Strtab.data(), Strtab.size()}); + + WroteStrtab = true; +} + +void BitcodeWriter::copyStrtab(StringRef Strtab) { + writeBlob(bitc::STRTAB_BLOCK_ID, bitc::STRTAB_BLOB, Strtab); + WroteStrtab = true; +} void BitcodeWriter::writeModule(const Module *M, bool ShouldPreserveUseListOrder, const ModuleSummaryIndex *Index, bool GenerateHash, ModuleHash *ModHash) { - ModuleBitcodeWriter ModuleWriter(M, Buffer, *Stream, + ModuleBitcodeWriter ModuleWriter(M, Buffer, StrtabBuilder, *Stream, ShouldPreserveUseListOrder, Index, GenerateHash, ModHash); ModuleWriter.write(); @@ -3984,6 +3917,7 @@ void llvm::WriteBitcodeToFile(const Module *M, raw_ostream &Out, BitcodeWriter Writer(Buffer); Writer.writeModule(M, ShouldPreserveUseListOrder, Index, GenerateHash, ModHash); + Writer.writeStrtab(); if (TT.isOSDarwin() || TT.isOSBinFormatMachO()) emitDarwinBCHeaderAndTrailer(Buffer, TT); @@ -3995,13 +3929,7 @@ void llvm::WriteBitcodeToFile(const Module *M, raw_ostream &Out, void IndexBitcodeWriter::write() { Stream.EnterSubblock(bitc::MODULE_BLOCK_ID, 3); - SmallVector<unsigned, 1> Vals; - unsigned CurVersion = 1; - Vals.push_back(CurVersion); - Stream.EmitRecord(bitc::MODULE_CODE_VERSION, Vals); - - // If we have a VST, write the VSTOFFSET record placeholder. - writeValueSymbolTableForwardDecl(); + Stream.EmitRecord(bitc::MODULE_CODE_VERSION, ArrayRef<uint64_t>{2}); // Write the module paths in the combined index. writeModStrings(); @@ -4009,10 +3937,6 @@ void IndexBitcodeWriter::write() { // Write the summary combined index records. writeCombinedGlobalValueSummary(); - // Need a special VST writer for the combined index (we don't have a - // real VST and real values when this is invoked). - writeCombinedValueSymbolTable(); - Stream.ExitBlock(); } diff --git a/llvm/lib/MC/StringTableBuilder.cpp b/llvm/lib/MC/StringTableBuilder.cpp index dffa26f481d..fbd7ba60bc9 100644 --- a/llvm/lib/MC/StringTableBuilder.cpp +++ b/llvm/lib/MC/StringTableBuilder.cpp @@ -54,7 +54,7 @@ void StringTableBuilder::write(raw_ostream &OS) const { assert(isFinalized()); SmallString<0> Data; Data.resize(getSize()); - write((uint8_t *)&Data[0]); + write((uint8_t *)Data.data()); OS << Data; } diff --git a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp index 393213f0810..416f0565abd 100644 --- a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp +++ b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp @@ -336,6 +336,7 @@ void splitAndWriteThinLTOBitcode( W.writeModule(&M, /*ShouldPreserveUseListOrder=*/false, &Index, /*GenerateHash=*/true, &ModHash); W.writeModule(MergedM.get()); + W.writeStrtab(); OS << Buffer; // If a minimized bitcode module was requested for the thin link, @@ -348,6 +349,7 @@ void splitAndWriteThinLTOBitcode( W2.writeModule(&M, /*ShouldPreserveUseListOrder=*/false, &Index, /*GenerateHash=*/false, &ModHash); W2.writeModule(MergedM.get()); + W2.writeStrtab(); *ThinLinkOS << Buffer; } } diff --git a/llvm/tools/llvm-cat/llvm-cat.cpp b/llvm/tools/llvm-cat/llvm-cat.cpp index d884970309b..c9a65db3229 100644 --- a/llvm/tools/llvm-cat/llvm-cat.cpp +++ b/llvm/tools/llvm-cat/llvm-cat.cpp @@ -44,11 +44,14 @@ int main(int argc, char **argv) { std::unique_ptr<MemoryBuffer> MB = ExitOnErr( errorOrToExpected(MemoryBuffer::getFileOrSTDIN(InputFilename))); std::vector<BitcodeModule> Mods = ExitOnErr(getBitcodeModuleList(*MB)); - for (auto &BitcodeMod : Mods) + for (auto &BitcodeMod : Mods) { Buffer.insert(Buffer.end(), BitcodeMod.getBuffer().begin(), BitcodeMod.getBuffer().end()); + Writer.copyStrtab(BitcodeMod.getStrtab()); + } } } else { + std::vector<std::unique_ptr<Module>> OwnedMods; for (std::string InputFilename : InputFilenames) { SMDiagnostic Err; std::unique_ptr<Module> M = parseIRFile(InputFilename, Err, Context); @@ -57,7 +60,9 @@ int main(int argc, char **argv) { return 1; } Writer.writeModule(M.get()); + OwnedMods.push_back(std::move(M)); } + Writer.writeStrtab(); } std::error_code EC; diff --git a/llvm/tools/llvm-modextract/llvm-modextract.cpp b/llvm/tools/llvm-modextract/llvm-modextract.cpp index 6c2e364be44..375e7baf968 100644 --- a/llvm/tools/llvm-modextract/llvm-modextract.cpp +++ b/llvm/tools/llvm-modextract/llvm-modextract.cpp @@ -61,7 +61,10 @@ int main(int argc, char **argv) { if (BinaryExtract) { SmallVector<char, 0> Header; BitcodeWriter Writer(Header); - Out->os() << Header << Ms[ModuleIndex].getBuffer(); + Header.append(Ms[ModuleIndex].getBuffer().begin(), + Ms[ModuleIndex].getBuffer().end()); + Writer.copyStrtab(Ms[ModuleIndex].getStrtab()); + Out->os() << Header; Out->keep(); return 0; }