Teresa Johnson via llvm-dev
2015-Oct-20 23:02 UTC
[llvm-dev] Question about METADATA_BLOCKs in bitcode
I have a question about module-level METADATA_BLOCKs in the bitcode. There are currently two blocks with the METADATA_BLOCK id at module scope. The first has the module-level metadata values (consisting of some combination of METADATA_* record codes except for METADATA_KIND). The second consists only of METADATA_KIND records. The latter is used only in the METADATA_ATTACHMENT block within function blocks (for metadata attached to instructions). For ThinLTO we want to delay the parsing of module level metadata until all functions have been imported from that module (there is some bookkeeping used to suture it up when we read it during a post-pass). However, I do need the METADATA_KIND records since that is used when parsing the function body being imported. In my ThinLTO prototype I was simply skipping the first METADATA_BLOCK and always parsing the second, since I knew from inspecting the BitcodeWriter that the second was the one with the METADATA_KIND records. When working on the implementation of this handling for the upstream patches, I am reassessing how to do this perhaps more robustly. A few options: 1) Should the two module-level blocks use different block ids since they hold different types of records? I.e. a METADATA_BLOCK_ID (holding the actual metadata) and a new METADATA_KIND_BLOCK_ID (for the metadata kind records). That way it is more obvious what the two separate blocks are for and it is also much more trivial to identify and skip just the former. 2) Should I simply continue to assume that we will have two METADATA_BLOCKs at module level and that the kinds will always be in the latter? 3) Another option is to peek inside the METADATA_BLOCK to check the first record's type, and if it isn't METADATA_KIND, skip the rest of the block. Looks like EnterSubBlock already optionally returns the size of the block in words, so I could use that to compute the bitcode offset to jump to before reading the first record, in case it is the wrong METADATA_BLOCK and I want to skip it. Thoughts? Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
Duncan P. N. Exon Smith via llvm-dev
2015-Oct-20 23:51 UTC
[llvm-dev] Question about METADATA_BLOCKs in bitcode
> On 2015-Oct-20, at 16:02, Teresa Johnson <tejohnson at google.com> wrote: > > I have a question about module-level METADATA_BLOCKs in the bitcode. > There are currently two blocks with the METADATA_BLOCK id at module > scope. The first has the module-level metadata values (consisting of > some combination of METADATA_* record codes except for METADATA_KIND). > The second consists only of METADATA_KIND records. The latter is used > only in the METADATA_ATTACHMENT block within function blocks (for > metadata attached to instructions). > > For ThinLTO we want to delay the parsing of module level metadata > until all functions have been imported from that module (there is some > bookkeeping used to suture it up when we read it during a post-pass). > However, I do need the METADATA_KIND records since that is used when > parsing the function body being imported. In my ThinLTO prototype I > was simply skipping the first METADATA_BLOCK and always parsing the > second, since I knew from inspecting the BitcodeWriter that the second > was the one with the METADATA_KIND records. > > When working on the implementation of this handling for the upstream > patches, I am reassessing how to do this perhaps more robustly. A few > options: > > 1) Should the two module-level blocks use different block ids since > they hold different types of records? I.e. a METADATA_BLOCK_ID > (holding the actual metadata) and a new METADATA_KIND_BLOCK_ID (for > the metadata kind records). That way it is more obvious what the two > separate blocks are for and it is also much more trivial to identify > and skip just the former.This SGTM. The other options sound fragile (not generally correct). Note that the reader will still have to understand the old schema.> 2) Should I simply continue to assume that we will have two > METADATA_BLOCKs at module level and that the kinds will always be in > the latter? > > 3) Another option is to peek inside the METADATA_BLOCK to check the > first record's type, and if it isn't METADATA_KIND, skip the rest of > the block. Looks like EnterSubBlock already optionally returns the > size of the block in words, so I could use that to compute the bitcode > offset to jump to before reading the first record, in case it is the > wrong METADATA_BLOCK and I want to skip it. > > Thoughts? > > Thanks, > Teresa > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
Teresa Johnson via llvm-dev
2015-Oct-21 01:18 UTC
[llvm-dev] Question about METADATA_BLOCKs in bitcode
On Tue, Oct 20, 2015 at 4:51 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:> >> On 2015-Oct-20, at 16:02, Teresa Johnson <tejohnson at google.com> wrote: >> >> I have a question about module-level METADATA_BLOCKs in the bitcode. >> There are currently two blocks with the METADATA_BLOCK id at module >> scope. The first has the module-level metadata values (consisting of >> some combination of METADATA_* record codes except for METADATA_KIND). >> The second consists only of METADATA_KIND records. The latter is used >> only in the METADATA_ATTACHMENT block within function blocks (for >> metadata attached to instructions). >> >> For ThinLTO we want to delay the parsing of module level metadata >> until all functions have been imported from that module (there is some >> bookkeeping used to suture it up when we read it during a post-pass). >> However, I do need the METADATA_KIND records since that is used when >> parsing the function body being imported. In my ThinLTO prototype I >> was simply skipping the first METADATA_BLOCK and always parsing the >> second, since I knew from inspecting the BitcodeWriter that the second >> was the one with the METADATA_KIND records. >> >> When working on the implementation of this handling for the upstream >> patches, I am reassessing how to do this perhaps more robustly. A few >> options: >> >> 1) Should the two module-level blocks use different block ids since >> they hold different types of records? I.e. a METADATA_BLOCK_ID >> (holding the actual metadata) and a new METADATA_KIND_BLOCK_ID (for >> the metadata kind records). That way it is more obvious what the two >> separate blocks are for and it is also much more trivial to identify >> and skip just the former. > > This SGTM. The other options sound fragile (not generally correct).Ok good, I think this is cleaner too.> > Note that the reader will still have to understand the old schema.I will keep support for reading the old format (ThinLTO won't support it, but it needs recent bitcode anyway). Thanks, Teresa> >> 2) Should I simply continue to assume that we will have two >> METADATA_BLOCKs at module level and that the kinds will always be in >> the latter? >> >> 3) Another option is to peek inside the METADATA_BLOCK to check the >> first record's type, and if it isn't METADATA_KIND, skip the rest of >> the block. Looks like EnterSubBlock already optionally returns the >> size of the block in words, so I could use that to compute the bitcode >> offset to jump to before reading the first record, in case it is the >> wrong METADATA_BLOCK and I want to skip it. >> >> Thoughts? >> >> Thanks, >> Teresa >> >> -- >> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 >-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
Reasonably Related Threads
- [LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...
- [LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...
- Storing FLAC in Matroska
- METADATA_BLOCK_PICTURE question
- RFC: Adding a string table to the bitcode format