thr3ads.net - llvm dev - [LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files... [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Jan Voung

2012-Sep-26 00:08 UTC

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

Hi all,

I've been looking into how to make llvm bitcode files smaller.  There is
one simple change that appears to shrink linked bitcode files by about 15%.
 See this spreadsheet for some rough data:

https://docs.google.com/spreadsheet/ccc?key=0AjRrJHQc4_bddEtJdjdIek5fMDdIdFFIZldZXzdWa0E


The change is in how operand ids are encoded in bitcode files.  Rather than
use an "absolute number" given by the ValueEnumerator, which counts up
within a function, we can have the id be relative to the current
instruction.

I.e., Instead of having:

... = icmp eq i32 n-1, n-2
br i1 ..., label %bb1, label %bb2

you have

... = icmp eq i32 1, 2
br i1 1, label %bb1, label %bb2


Thus the ids remain relatively small and can be encoded in fewer bits.
 This counters how ValueEnumerator starts assigning ids within functions at
some large N (where N is the number of module-level values?).  This also
makes it more likely to have a repeated sequences of bytes.  Many
instructions in a function may now have similar operands.

The format of the ".ll" files from llvm-dis is not changed, just the
binary
format (run llvm-bcanalyzer -dump to see the difference).

Caveats:

- Forward references will create negative-valued ids (which end up being
written out as large 32-bit integers, as far as I could tell).  The common
case for this is PHI nodes, but in larger tests fewer bits *overall* are
used for INST_PHI.

- Doesn't help with constant operands.  Their ids will now constantly
change...

- To retain backward compatibility with old bitcode files, I ended up using
up a new bitc value "bitc::FUNCTION_BLOCK_REL_ID" vs the existing
"bitc::FUNCTION_BLOCK_ID".

Are there known problems with this scheme?
Are there other ideas that have been floated around for reducing the size
of bitcode files?
In any case, the patch is attached if there is interest...  If you want to
try out the patch, you can toggle between the new and old encoding by using
the flag "-enable-old-style-functions" vs no flags, with llvm-as.


- Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120925/81159bd3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: relative_ids.patch
Type: application/octet-stream
Size: 38652 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120925/81159bd3/attachment.obj>

Duncan Sands

2012-Sep-26 08:31 UTC

head link

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

Hi Jan,
> I've been looking into how to make llvm bitcode files smaller.  There
is one
> simple change that appears to shrink linked bitcode files by about 15%. 
See
> this spreadsheet for some rough data:
>
>
https://docs.google.com/spreadsheet/ccc?key=0AjRrJHQc4_bddEtJdjdIek5fMDdIdFFIZldZXzdWa0E
the improvement is wonderful!

...
> In any case, the patch is attached if there is interest...  If you want to
try
> out the patch, you can toggle between the new and old encoding by using the
flag
> "-enable-old-style-functions" vs no flags, with llvm-as.
I don't know anything much about the bitcode representation, but anyway:
> diff --git a/include/llvm/Bitcode/BitstreamReader.h
b/include/llvm/Bitcode/BitstreamReader.h
> index 3daef78..840f57e 100644
> --- a/include/llvm/Bitcode/BitstreamReader.h
> +++ b/include/llvm/Bitcode/BitstreamReader.h
> @@ -409,7 +409,7 @@ public:
>    }
>
>    /// EnterSubBlock - Having read the ENTER_SUBBLOCK abbrevid, enter
> -  /// the block, and return true if the block is valid.
> +  /// the block, and return true if the block has an error.
>    bool EnterSubBlock(unsigned BlockID, unsigned *NumWordsP = 0) {
>      // Save the current block's state on BlockScope.
>      BlockScope.push_back(Block(CurCodeSize));
> diff --git a/include/llvm/Bitcode/LLVMBitCodes.h
b/include/llvm/Bitcode/LLVMBitCodes.h
> index c1dc190..6c81739 100644
> --- a/include/llvm/Bitcode/LLVMBitCodes.h
> +++ b/include/llvm/Bitcode/LLVMBitCodes.h
> @@ -33,9 +33,9 @@ namespace bitc {
>      UNUSED_ID1,
>
>      CONSTANTS_BLOCK_ID,
> -    FUNCTION_BLOCK_ID,
> +    FUNCTION_BLOCK_ID,  // Deprecated in favor of FUNCTION_BLOCK_ID_REL
FUNCTION_BLOCK_ID_REL -> FUNCTION_BLOCK_REL_ID

Is there any point in having this on a per-function basis, why not have it be
per-module?
>
> -    UNUSED_ID2,
> +    FUNCTION_BLOCK_REL_ID,
>
>      VALUE_SYMTAB_BLOCK_ID,
>      METADATA_BLOCK_ID,
> @@ -257,8 +257,9 @@ namespace bitc {
>      SYNCHSCOPE_CROSSTHREAD = 1
>    };
>
> -  // The function body block (FUNCTION_BLOCK_ID) describes function
bodies.  It
> -  // can contain a constant block (CONSTANTS_BLOCK_ID).
> +  // The function body block (FUNCTION_BLOCK_ID and FUNCTION_BLOCK_REL_ID)
> +  // describes function bodies.  It can contain a constant block
> +  // (CONSTANTS_BLOCK_ID).
>    enum FunctionCodes {
>      FUNC_CODE_DECLAREBLOCKS    =  1, // DECLAREBLOCKS: [n]
>
> diff --git a/lib/Bitcode/Reader/BitcodeReader.cpp
b/lib/Bitcode/Reader/BitcodeReader.cpp
> index b69a362..5fd3dfd 100644
> --- a/lib/Bitcode/Reader/BitcodeReader.cpp
> +++ b/lib/Bitcode/Reader/BitcodeReader.cpp
...> @@ -1495,6 +1497,12 @@ bool BitcodeReader::ParseModule(bool Resume) {
>            SeenFirstFunctionBody = true;
>          }
>
> +        // Remember the encoding style for the function.
> +        if (FunctionsWithBodies.empty())
> +          return Error("Insufficient function protos");
Why this new error?  Or would it have occurred anyway, later?  Maybe it
should be an assertion, if it should be impossible.  I think
"prototypes"
would be better than "protos".
> +        FuncUseRelativeIDs[FunctionsWithBodies.back()] > +           
(SubBlockID == bitc::FUNCTION_BLOCK_REL_ID);
If the use of relative ids was on a per module basis then this wouldn't be
needed.
> +
>          if (RememberAndSkipFunctionBody())
>            return true;
>          // For streaming bitcode, suspend parsing when we reach the
function
> @@ -1901,8 +1909,15 @@ bool BitcodeReader::ParseMetadataAttachment() {
>
>  /// ParseFunctionBody - Lazily parse the specified function body block.
>  bool BitcodeReader::ParseFunctionBody(Function *F) {
> -  if (Stream.EnterSubBlock(bitc::FUNCTION_BLOCK_ID))
> -    return Error("Malformed block record");
> +  if (FuncUseRelativeIDs[F]) {
> +    if (Stream.EnterSubBlock(bitc::FUNCTION_BLOCK_REL_ID))
> +      return Error("Malformed block record");
> +    UseRelativeIDs = true;
> +  } else {
> +    if (Stream.EnterSubBlock(bitc::FUNCTION_BLOCK_ID))
> +      return Error("Malformed block record");
> +    UseRelativeIDs = false;
> +  }
Likewise.
>
>    InstructionList.clear();
>    unsigned ModuleValueListSize = ValueList.size();
...> @@ -2260,8 +2275,10 @@ bool BitcodeReader::ParseFunctionBody(Function *F) {
>        }
>        else {
>          BasicBlock *FalseDest = getBasicBlock(Record[1]);
> -        Value *Cond = getFnValueByID(Record[2], Type::getInt1Ty(Context));
> -        if (FalseDest == 0 || Cond == 0)
> +        Value *Cond;
> +        if (getValueConst(Record, 2,
> +                          NextValueNo, Type::getInt1Ty(Context), Cond) ||
> +            FalseDest == 0 || Cond == 0)
This seems rather ugly and complicated...  Can't your new method just return
"Cond", and return null in case of an error?  Not sure the name is the
best
either, it doesn't suggest at all to me that it doesn't increment the
slot
number, it sounds like it's for handling constants.
>            return Error("Invalid BR record");
>          I = BranchInst::Create(TrueDest, FalseDest, Cond);
>          InstructionList.push_back(I);
> @@ -2276,9 +2293,10 @@ bool BitcodeReader::ParseFunctionBody(Function *F) {
>          Type *OpTy = getTypeByID(Record[1]);
>          unsigned ValueBitWidth =
cast<IntegerType>(OpTy)->getBitWidth();
>
> -        Value *Cond = getFnValueByID(Record[2], OpTy);
> +        Value *Cond;
>          BasicBlock *Default = getBasicBlock(Record[3]);
> -        if (OpTy == 0 || Cond == 0 || Default == 0)
> +        if (getValueConst(Record, 2, NextValueNo, OpTy, Cond) ||
> +            OpTy == 0 || Cond == 0 || Default == 0)
>            return Error("Invalid SWITCH record");
Likewise (more instances follow).

...> @@ -2444,7 +2467,8 @@ bool BitcodeReader::ParseFunctionBody(Function *F) {
>        InstructionList.push_back(PN);
>
>        for (unsigned i = 0, e = Record.size()-1; i != e; i += 2) {
> -        Value *V = getFnValueByID(Record[1+i], Ty);
> +        Value *V = 0;
> +        getValueConst(Record, 1+i, NextValueNo, Ty, V);
You didn't check the return value.
>          BasicBlock *BB = getBasicBlock(Record[2+i]);
>          if (!V || !BB) return Error("Invalid PHI record");
>          PN->addIncoming(V, BB);
...> diff --git a/lib/Bitcode/Reader/BitcodeReader.h
b/lib/Bitcode/Reader/BitcodeReader.h
> index e7c4e94..249d723 100644
> --- a/lib/Bitcode/Reader/BitcodeReader.h
> +++ b/lib/Bitcode/Reader/BitcodeReader.h
...> @@ -260,14 +274,30 @@ private:
>      ResVal = getFnValueByID(ValNo, getTypeByID(TypeNo));
>      return ResVal == 0;
>    }
> +  /// getValue - Read a value out of the specified record from slot
'Slot'.
> +  /// Increment Slot past the number of slots used in the record.
> +  /// Return true on failure.
>    bool getValue(SmallVector<uint64_t, 64> &Record, unsigned
&Slot,
> -                Type *Ty, Value *&ResVal) {
> +                unsigned InstNum, Type *Ty, Value *&ResVal) {
>      if (Slot == Record.size()) return true;
>      unsigned ValNo = (unsigned)Record[Slot++];
> +    // Adjust the ValNo, if it was encoded relative to the InstNum.
> +    if (UseRelativeIDs)
> +      ValNo = InstNum - ValNo;
> +    ResVal = getFnValueByID(ValNo, Ty);
> +    return ResVal == 0;
> +  }
> +  /// getValueConst -- Like getValue, but does not increment the Slot
number.
> +  bool getValueConst(SmallVector<uint64_t, 64> &Record, unsigned
Slot,
> +                     unsigned InstNum, Type *Ty, Value *&ResVal) {
> +    if (Slot == Record.size()) return true;
> +    unsigned ValNo = (unsigned)Record[Slot];
> +    // Adjust the ValNo, if it was encoded relative to the InstNum.
> +    if (UseRelativeIDs)
> +      ValNo = InstNum - ValNo;
>      ResVal = getFnValueByID(ValNo, Ty);
>      return ResVal == 0;
>    }
How about having just one method, eg getValue, and simply decrement the slot
number by hand in those places that need it.  If you really want two, implement
getValue in terms of getValueConst, by calling it then incrementing the slot
number for example.
> -
>
>    bool ParseModule(bool Resume);
>    bool ParseAttributeBlock();
> diff --git a/lib/Bitcode/Writer/BitcodeWriter.cpp
b/lib/Bitcode/Writer/BitcodeWriter.cpp
> index b3f1bb1..e3ebe38 100644
> --- a/lib/Bitcode/Writer/BitcodeWriter.cpp
> +++ b/lib/Bitcode/Writer/BitcodeWriter.cpp
...> @@ -55,7 +62,7 @@ enum {
>    CONSTANTS_CE_CAST_Abbrev,
>    CONSTANTS_NULL_Abbrev,
>
> -  // FUNCTION_BLOCK abbrev id's.
> +  // FUNCTION_BLOCK_REL abbrev id's.
Should this comment have been changed?
>    FUNCTION_INST_LOAD_ABBREV = bitc::FIRST_APPLICATION_ABBREV,
>    FUNCTION_INST_BINOP_ABBREV,
>    FUNCTION_INST_BINOP_FLAGS_ABBREV,
...> @@ -1025,12 +1037,15 @@ static void WriteModuleConstants(const
ValueEnumerator &VE,
>  ///
>  /// This function adds V's value ID to Vals.  If the value ID is
higher than the
>  /// instruction ID, then it is a forward reference, and it also includes
the
> -/// type ID.
> +/// type ID.  The value ID that is writtne is encoded as relative to the
InstID.
writtne -> written
is encoded as relative to -> is encoded relative to
>  static bool PushValueAndType(const Value *V, unsigned InstID,
>                               SmallVector<unsigned, 64> &Vals,
>                               ValueEnumerator &VE) {
>    unsigned ValID = VE.getValueID(V);
> -  Vals.push_back(ValID);
> +  if (EnableOldStyleFunctions)
> +    Vals.push_back(ValID);
> +  else
> +    Vals.push_back(InstID - ValID);
>    if (ValID >= InstID) {
>      Vals.push_back(VE.getTypeID(V->getType()));
>      return true;
...> @@ -1164,7 +1191,12 @@ static void WriteInstruction(const Instruction
&I, unsigned InstID,
>        Vals64.push_back(SwitchRecordHeader);
>
>        Vals64.push_back(VE.getTypeID(SI.getCondition()->getType()));
> -      Vals64.push_back(VE.getValueID(SI.getCondition()));
> +      // ValueID for non-BB / literal operands are relative to the InstID.
> +      // (e.g., the condition value).
> +      if (EnableOldStyleFunctions)
> +        Vals64.push_back(VE.getValueID(SI.getCondition()));
> +      else
> +        Vals64.push_back(InstID - VE.getValueID(SI.getCondition()));
Shouldn't you use a "push" method like PushValue for this too?
>        Vals64.push_back(VE.getValueID(SI.getDefaultDest()));
>        Vals64.push_back(SI.getNumCases());
>        for (SwitchInst::CaseIt i = SI.case_begin(), e = SI.case_end();
...> @@ -1358,8 +1392,13 @@ static void WriteInstruction(const Instruction
&I, unsigned InstID,
>      PushValueAndType(CI.getCalledValue(), InstID, Vals, VE);  // Callee
>
>      // Emit value #'s for the fixed parameters.
> -    for (unsigned i = 0, e = FTy->getNumParams(); i != e; ++i)
> -      Vals.push_back(VE.getValueID(CI.getArgOperand(i)));  // fixed param.
> +    for (unsigned i = 0, e = FTy->getNumParams(); i != e; ++i) {
> +      if (FTy->getParamType(i)->isLabelTy()) {
Is this actually possible?
> +        Vals.push_back(VE.getValueID(CI.getArgOperand(i)));
> +      } else {
> +        PushValue(CI.getArgOperand(i), InstID, Vals, VE);  // fixed param.
> +      }
> +    }
>
>      // Emit type/value pairs for varargs params.
>      if (FTy->isVarArg()) {
...> @@ -1514,8 +1553,8 @@ static void WriteFunction(const Function &F,
ValueEnumerator &VE,
>  // Emit blockinfo, which defines the standard abbreviations etc.
>  static void WriteBlockInfo(const ValueEnumerator &VE, BitstreamWriter
&Stream) {
>    // We only want to emit block info records for blocks that have multiple
> -  // instances: CONSTANTS_BLOCK, FUNCTION_BLOCK and VALUE_SYMTAB_BLOCK. 
Other
> -  // blocks can defined their abbrevs inline.
> +  // instances: CONSTANTS_BLOCK, FUNCTION_BLOCK_REL and
VALUE_SYMTAB_BLOCK.
> +  // Other blocks can defined their abbrevs inline.
Should this comment really have changed?
>    Stream.EnterBlockInfoBlock(2);
>
>    { // 8-bit fixed-width VST_ENTRY/VST_BBENTRY strings.
> @@ -1603,68 +1642,68 @@ static void WriteBlockInfo(const ValueEnumerator
&VE, BitstreamWriter &Stream) {
>
>    // FIXME: This should only use space for first class types!
>
> -  { // INST_LOAD abbrev for FUNCTION_BLOCK.
> +  { // INST_LOAD abbrev for FUNCTION_BLOCK_REL.
Likewise (occurs many more times).

Ciao, Duncan.

David Chisnall

2012-Sep-26 08:47 UTC

head link

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

On 26 Sep 2012, at 01:08, Jan Voung wrote:
> I've been looking into how to make llvm bitcode files smaller.  There
is one simple change that appears to shrink linked bitcode files by about 15%
Whenever anyone proposes a custom compression scheme for a data format, the
first question that should always be asked is how does it compare to using a
generic off-the-shelf compression algorithm.

For comparison, if we just use the DEFLATE algorithm (via gzip) on bitcode
files, we seem to see a saving of about 35% in a quick-and-dirty test.  If you
apply your compression scheme (effectively a form of LZW, but only applied to a
subset of the values), does this reduce the efficiency of a general purpose
solution?  Does it make more sense than just applying DEFLATE to the bitcode
when it's written to the disk?  The other advantage of separating the
compression from the encoding, of course, is that it's easier to
parallelise, as a fairly coarse-grained dataflow model can be used when
streaming to and from compressed bitcode.

David

Renato Golin

2012-Sep-26 09:07 UTC

head link

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

On 26 September 2012 01:08, Jan Voung <jvoung at chromium.org>
wrote:> - Forward references will create negative-valued ids (which end up being
> written out as large 32-bit integers, as far as I could tell).
Can you use an SLEB-like representation?

It's probably not going to be backwards compatible, but if there isn't
an SLEB/ULEB representation in bitcode, I'd say it's a good
improvement. ;)


-- 
cheers,
--renato

http://systemcall.org/

Sean Silva

2012-Sep-26 19:44 UTC

head link

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

If you look at his spreadsheet [1] in the OP, he did test this case.
He even has a column showing the improvement difference uncompressed
vs. compressed. Even after compression with gzip, there is still a
noticeable difference.

--Sean Silva

[1]
https://docs.google.com/spreadsheet/ccc?key=0AjRrJHQc4_bddEtJdjdIek5fMDdIdFFIZldZXzdWa0E

On Wed, Sep 26, 2012 at 4:47 AM, David Chisnall
<David.Chisnall at cl.cam.ac.uk> wrote:> On 26 Sep 2012, at 01:08, Jan Voung wrote:
>
>> I've been looking into how to make llvm bitcode files smaller. 
There is one simple change that appears to shrink linked bitcode files by about
15%
>
> Whenever anyone proposes a custom compression scheme for a data format, the
first question that should always be asked is how does it compare to using a
generic off-the-shelf compression algorithm.
>
> For comparison, if we just use the DEFLATE algorithm (via gzip) on bitcode
files, we seem to see a saving of about 35% in a quick-and-dirty test.  If you
apply your compression scheme (effectively a form of LZW, but only applied to a
subset of the values), does this reduce the efficiency of a general purpose
solution?  Does it make more sense than just applying DEFLATE to the bitcode
when it's written to the disk?  The other advantage of separating the
compression from the encoding, of course, is that it's easier to
parallelise, as a fairly coarse-grained dataflow model can be used when
streaming to and from compressed bitcode.
>
> David
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Jan Voung

2012-Sep-26 20:44 UTC

head link

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

Thanks for the review Duncan!

Yes, on second thought, it would be simpler (less bookkeeping)
if this was toggled at the module level (but I haven't looked into
how that would be done).  I had done it at the function level
to begin with because it affected only the function bodies,
so it would be more obvious what changed.  Anyone have a
stronger feeling (or a reason!) why it would be better to do this at
a per-function vs per-module basis?

I'll look into how to do it at the module level and send out a new
patch later + with your comments addressed.


On Wed, Sep 26, 2012 at 1:31 AM, Duncan Sands <baldrick at free.fr> wrote:
> Hi Jan,
>
>
>  I've been looking into how to make llvm bitcode files smaller.  There
is
>> one
>> simple change that appears to shrink linked bitcode files by about 15%.
>>  See
>> this spreadsheet for some rough data:
>>
>> https://docs.google.com/**spreadsheet/ccc?key=**0AjRrJHQc4_**
>>
bddEtJdjdIek5fMDdIdFFIZldZXzdW**a0E<https://docs.google.com/spreadsheet/ccc?key=0AjRrJHQc4_bddEtJdjdIek5fMDdIdFFIZldZXzdWa0E>
>>
>
> the improvement is wonderful!
>
> ...
>
>
>  In any case, the patch is attached if there is interest...  If you want
>> to try
>> out the patch, you can toggle between the new and old encoding by using
>> the flag
>> "-enable-old-style-functions" vs no flags, with llvm-as.
>>
>
> I don't know anything much about the bitcode representation, but
anyway:
>
>  diff --git a/include/llvm/Bitcode/**BitstreamReader.h
>> b/include/llvm/Bitcode/**BitstreamReader.h
>> index 3daef78..840f57e 100644
>> --- a/include/llvm/Bitcode/**BitstreamReader.h
>> +++ b/include/llvm/Bitcode/**BitstreamReader.h
>> @@ -409,7 +409,7 @@ public:
>>    }
>>
>>    /// EnterSubBlock - Having read the ENTER_SUBBLOCK abbrevid, enter
>> -  /// the block, and return true if the block is valid.
>> +  /// the block, and return true if the block has an error.
>>    bool EnterSubBlock(unsigned BlockID, unsigned *NumWordsP = 0) {
>>      // Save the current block's state on BlockScope.
>>      BlockScope.push_back(Block(**CurCodeSize));
>> diff --git a/include/llvm/Bitcode/**LLVMBitCodes.h
>> b/include/llvm/Bitcode/**LLVMBitCodes.h
>> index c1dc190..6c81739 100644
>> --- a/include/llvm/Bitcode/**LLVMBitCodes.h
>> +++ b/include/llvm/Bitcode/**LLVMBitCodes.h
>> @@ -33,9 +33,9 @@ namespace bitc {
>>      UNUSED_ID1,
>>
>>      CONSTANTS_BLOCK_ID,
>> -    FUNCTION_BLOCK_ID,
>> +    FUNCTION_BLOCK_ID,  // Deprecated in favor of
FUNCTION_BLOCK_ID_REL
>>
>
> FUNCTION_BLOCK_ID_REL -> FUNCTION_BLOCK_REL_ID
>
> Is there any point in having this on a per-function basis, why not have it
> be
> per-module?
>
>
>> -    UNUSED_ID2,
>> +    FUNCTION_BLOCK_REL_ID,
>>
>>      VALUE_SYMTAB_BLOCK_ID,
>>      METADATA_BLOCK_ID,
>> @@ -257,8 +257,9 @@ namespace bitc {
>>      SYNCHSCOPE_CROSSTHREAD = 1
>>    };
>>
>> -  // The function body block (FUNCTION_BLOCK_ID) describes function
>> bodies.  It
>> -  // can contain a constant block (CONSTANTS_BLOCK_ID).
>> +  // The function body block (FUNCTION_BLOCK_ID and
>> FUNCTION_BLOCK_REL_ID)
>> +  // describes function bodies.  It can contain a constant block
>> +  // (CONSTANTS_BLOCK_ID).
>>    enum FunctionCodes {
>>      FUNC_CODE_DECLAREBLOCKS    =  1, // DECLAREBLOCKS: [n]
>>
>> diff --git a/lib/Bitcode/Reader/**BitcodeReader.cpp
b/lib/Bitcode/Reader/
>> **BitcodeReader.cpp
>> index b69a362..5fd3dfd 100644
>> --- a/lib/Bitcode/Reader/**BitcodeReader.cpp
>> +++ b/lib/Bitcode/Reader/**BitcodeReader.cpp
>>
> ...
>
>> @@ -1495,6 +1497,12 @@ bool BitcodeReader::ParseModule(**bool Resume) {
>>            SeenFirstFunctionBody = true;
>>          }
>>
>> +        // Remember the encoding style for the function.
>> +        if (FunctionsWithBodies.empty())
>> +          return Error("Insufficient function protos");
>>
>
> Why this new error?  Or would it have occurred anyway, later?  Maybe it
> should be an assertion, if it should be impossible.  I think
"prototypes"
> would be better than "protos".
>
Yes, it would have occurred later (originally).  I had copied this check
from
the "RememberAndSkipFunctionBody()" function, as a quick way
to test this out.

>
>  +        FuncUseRelativeIDs[**FunctionsWithBodies.back()] >> +      
(SubBlockID == bitc::FUNCTION_BLOCK_REL_ID);
>>
>
> If the use of relative ids was on a per module basis then this wouldn't
be
> needed.
>
>  +
>>          if (RememberAndSkipFunctionBody()**)
>>            return true;
>>          // For streaming bitcode, suspend parsing when we reach the
>> function
>> @@ -1901,8 +1909,15 @@ bool BitcodeReader::**ParseMetadataAttachment()
{
>>
>>  /// ParseFunctionBody - Lazily parse the specified function body
block.
>>  bool BitcodeReader::**ParseFunctionBody(Function *F) {
>> -  if (Stream.EnterSubBlock(bitc::**FUNCTION_BLOCK_ID))
>> -    return Error("Malformed block record");
>> +  if (FuncUseRelativeIDs[F]) {
>> +    if (Stream.EnterSubBlock(bitc::**FUNCTION_BLOCK_REL_ID))
>> +      return Error("Malformed block record");
>> +    UseRelativeIDs = true;
>> +  } else {
>> +    if (Stream.EnterSubBlock(bitc::**FUNCTION_BLOCK_ID))
>> +      return Error("Malformed block record");
>> +    UseRelativeIDs = false;
>> +  }
>>
>
> Likewise.
>
>
>>    InstructionList.clear();
>>    unsigned ModuleValueListSize = ValueList.size();
>>
> ...
>
>> @@ -2260,8 +2275,10 @@ bool BitcodeReader::**ParseFunctionBody(Function
>> *F) {
>>        }
>>        else {
>>          BasicBlock *FalseDest = getBasicBlock(Record[1]);
>> -        Value *Cond = getFnValueByID(Record[2],
>> Type::getInt1Ty(Context));
>> -        if (FalseDest == 0 || Cond == 0)
>> +        Value *Cond;
>> +        if (getValueConst(Record, 2,
>> +                          NextValueNo, Type::getInt1Ty(Context), Cond)
||
>> +            FalseDest == 0 || Cond == 0)
>>
>
> This seems rather ugly and complicated...  Can't your new method just
> return
> "Cond", and return null in case of an error?  Not sure the name
is the best
> either, it doesn't suggest at all to me that it doesn't increment
the slot
> number, it sounds like it's for handling constants.
>
>Yes, I agree it's pretty ugly as it is now.  It was modeled after the
"getValueTypePair" function, which would have signaled an error
when the result was null, so your suggestion should work.


>             return Error("Invalid BR record");
>>          I = BranchInst::Create(TrueDest, FalseDest, Cond);
>>          InstructionList.push_back(I);
>> @@ -2276,9 +2293,10 @@ bool BitcodeReader::**ParseFunctionBody(Function
>> *F) {
>>          Type *OpTy = getTypeByID(Record[1]);
>>          unsigned ValueBitWidth = cast<IntegerType>(OpTy)->**
>> getBitWidth();
>>
>> -        Value *Cond = getFnValueByID(Record[2], OpTy);
>> +        Value *Cond;
>>          BasicBlock *Default = getBasicBlock(Record[3]);
>> -        if (OpTy == 0 || Cond == 0 || Default == 0)
>> +        if (getValueConst(Record, 2, NextValueNo, OpTy, Cond) ||
>> +            OpTy == 0 || Cond == 0 || Default == 0)
>>            return Error("Invalid SWITCH record");
>>
>
> Likewise (more instances follow).
>
> ...
>
>> @@ -2444,7 +2467,8 @@ bool BitcodeReader::**ParseFunctionBody(Function
>> *F) {
>>        InstructionList.push_back(PN);
>>
>>        for (unsigned i = 0, e = Record.size()-1; i != e; i += 2) {
>> -        Value *V = getFnValueByID(Record[1+i], Ty);
>> +        Value *V = 0;
>> +        getValueConst(Record, 1+i, NextValueNo, Ty, V);
>>
>
> You didn't check the return value.
>
>           BasicBlock *BB = getBasicBlock(Record[2+i]);
>>          if (!V || !BB) return Error("Invalid PHI record");
>>          PN->addIncoming(V, BB);
>>
> ...
>
>> diff --git a/lib/Bitcode/Reader/**BitcodeReader.h
b/lib/Bitcode/Reader/**
>> BitcodeReader.h
>> index e7c4e94..249d723 100644
>> --- a/lib/Bitcode/Reader/**BitcodeReader.h
>> +++ b/lib/Bitcode/Reader/**BitcodeReader.h
>>
> ...
>
>> @@ -260,14 +274,30 @@ private:
>>      ResVal = getFnValueByID(ValNo, getTypeByID(TypeNo));
>>      return ResVal == 0;
>>    }
>> +  /// getValue - Read a value out of the specified record from slot
>> 'Slot'.
>> +  /// Increment Slot past the number of slots used in the record.
>> +  /// Return true on failure.
>>    bool getValue(SmallVector<uint64_t, 64> &Record, unsigned
&Slot,
>> -                Type *Ty, Value *&ResVal) {
>> +                unsigned InstNum, Type *Ty, Value *&ResVal) {
>>      if (Slot == Record.size()) return true;
>>      unsigned ValNo = (unsigned)Record[Slot++];
>> +    // Adjust the ValNo, if it was encoded relative to the InstNum.
>> +    if (UseRelativeIDs)
>> +      ValNo = InstNum - ValNo;
>> +    ResVal = getFnValueByID(ValNo, Ty);
>> +    return ResVal == 0;
>> +  }
>> +  /// getValueConst -- Like getValue, but does not increment the Slot
>> number.
>> +  bool getValueConst(SmallVector<**uint64_t, 64> &Record,
unsigned Slot,
>> +                     unsigned InstNum, Type *Ty, Value *&ResVal) {
>> +    if (Slot == Record.size()) return true;
>> +    unsigned ValNo = (unsigned)Record[Slot];
>> +    // Adjust the ValNo, if it was encoded relative to the InstNum.
>> +    if (UseRelativeIDs)
>> +      ValNo = InstNum - ValNo;
>>      ResVal = getFnValueByID(ValNo, Ty);
>>      return ResVal == 0;
>>    }
>>
>
> How about having just one method, eg getValue, and simply decrement the
> slot
> number by hand in those places that need it.  If you really want two,
> implement
> getValue in terms of getValueConst, by calling it then incrementing the
> slot
> number for example.
>
>Good suggestions, will try it out.

>  -
>>
>>    bool ParseModule(bool Resume);
>>    bool ParseAttributeBlock();
>> diff --git a/lib/Bitcode/Writer/**BitcodeWriter.cpp
b/lib/Bitcode/Writer/
>> **BitcodeWriter.cpp
>> index b3f1bb1..e3ebe38 100644
>> --- a/lib/Bitcode/Writer/**BitcodeWriter.cpp
>> +++ b/lib/Bitcode/Writer/**BitcodeWriter.cpp
>>
> ...
>
>> @@ -55,7 +62,7 @@ enum {
>>    CONSTANTS_CE_CAST_Abbrev,
>>    CONSTANTS_NULL_Abbrev,
>>
>> -  // FUNCTION_BLOCK abbrev id's.
>> +  // FUNCTION_BLOCK_REL abbrev id's.
>>
>
> Should this comment have been changed?
>
>     FUNCTION_INST_LOAD_ABBREV = bitc::FIRST_APPLICATION_**ABBREV,
>>    FUNCTION_INST_BINOP_ABBREV,
>>    FUNCTION_INST_BINOP_FLAGS_**ABBREV,
>>
> ...
>
>> @@ -1025,12 +1037,15 @@ static void WriteModuleConstants(const
>> ValueEnumerator &VE,
>>  ///
>>  /// This function adds V's value ID to Vals.  If the value ID is
higher
>> than the
>>  /// instruction ID, then it is a forward reference, and it also
includes
>> the
>> -/// type ID.
>> +/// type ID.  The value ID that is writtne is encoded as relative to
the
>> InstID.
>>
>
> writtne -> written
> is encoded as relative to -> is encoded relative to
>
>   static bool PushValueAndType(const Value *V, unsigned InstID,
>>                               SmallVector<unsigned, 64>
&Vals,
>>                               ValueEnumerator &VE) {
>>    unsigned ValID = VE.getValueID(V);
>> -  Vals.push_back(ValID);
>> +  if (EnableOldStyleFunctions)
>> +    Vals.push_back(ValID);
>> +  else
>> +    Vals.push_back(InstID - ValID);
>>    if (ValID >= InstID) {
>>      Vals.push_back(VE.getTypeID(V-**>getType()));
>>      return true;
>>
> ...
>
>> @@ -1164,7 +1191,12 @@ static void WriteInstruction(const Instruction
&I,
>> unsigned InstID,
>>        Vals64.push_back(**SwitchRecordHeader);
>>
>>       
Vals64.push_back(VE.getTypeID(**SI.getCondition()->getType()))**;
>> -      Vals64.push_back(VE.**getValueID(SI.getCondition()))**;
>> +      // ValueID for non-BB / literal operands are relative to the
>> InstID.
>> +      // (e.g., the condition value).
>> +      if (EnableOldStyleFunctions)
>> +        Vals64.push_back(VE.**getValueID(SI.getCondition()))**;
>> +      else
>> +        Vals64.push_back(InstID - VE.getValueID(SI.getCondition(**)));
>>
>
> Shouldn't you use a "push" method like PushValue for this
too?
>
These were different because they used uint64_t with a "Vals64"
vector.
I could make a PushValue64.


>
>         Vals64.push_back(VE.**getValueID(SI.getDefaultDest()**));
>>        Vals64.push_back(SI.**getNumCases());
>>        for (SwitchInst::CaseIt i = SI.case_begin(), e = SI.case_end();
>>
> ...
>
>> @@ -1358,8 +1392,13 @@ static void WriteInstruction(const Instruction
&I,
>> unsigned InstID,
>>      PushValueAndType(CI.**getCalledValue(), InstID, Vals, VE);  //
>> Callee
>>
>>      // Emit value #'s for the fixed parameters.
>> -    for (unsigned i = 0, e = FTy->getNumParams(); i != e; ++i)
>> -      Vals.push_back(VE.getValueID(**CI.getArgOperand(i)));  // fixed
>> param.
>> +    for (unsigned i = 0, e = FTy->getNumParams(); i != e; ++i) {
>> +      if (FTy->getParamType(i)->**isLabelTy()) {
>>
>
> Is this actually possible?
>
>I'm not sure =)

I had originally added it because I noticed that the code in BitcodeReader
for FUNC_CODE_INST_CALL had the check:

 if (FTy->getParamType(i)->isLabelTy())
          Args.push_back(getBasicBlock(Record[OpNum]));

I'll take this modification out, since it's orthogonal, and I'm not
sure
what the original motivation was.


 +       
Vals.push_back(VE.getValueID(**CI.getArgOperand(i)));>> +      } else {
>> +        PushValue(CI.getArgOperand(i), InstID, Vals, VE);  // fixed
>> param.
>> +      }
>> +    }
>>
>>      // Emit type/value pairs for varargs params.
>>      if (FTy->isVarArg()) {
>>
> ...
>
>> @@ -1514,8 +1553,8 @@ static void WriteFunction(const Function &F,
>> ValueEnumerator &VE,
>>  // Emit blockinfo, which defines the standard abbreviations etc.
>>  static void WriteBlockInfo(const ValueEnumerator &VE,
BitstreamWriter
>> &Stream) {
>>    // We only want to emit block info records for blocks that have
>> multiple
>> -  // instances: CONSTANTS_BLOCK, FUNCTION_BLOCK and
VALUE_SYMTAB_BLOCK.
>>  Other
>> -  // blocks can defined their abbrevs inline.
>> +  // instances: CONSTANTS_BLOCK, FUNCTION_BLOCK_REL and
>> VALUE_SYMTAB_BLOCK.
>> +  // Other blocks can defined their abbrevs inline.
>>
>
> Should this comment really have changed?
>
>     Stream.EnterBlockInfoBlock(2);
>>
>>    { // 8-bit fixed-width VST_ENTRY/VST_BBENTRY strings.
>> @@ -1603,68 +1642,68 @@ static void WriteBlockInfo(const
ValueEnumerator
>> &VE, BitstreamWriter &Stream) {
>>
>>    // FIXME: This should only use space for first class types!
>>
>> -  { // INST_LOAD abbrev for FUNCTION_BLOCK.
>> +  { // INST_LOAD abbrev for FUNCTION_BLOCK_REL.
>>
>
> Likewise (occurs many more times).
>
> Ciao, Duncan.
> ______________________________**_________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>
http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>

Thanks!
- Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120926/09faec2e/attachment.html>

Jan Voung

2012-Sep-26 21:08 UTC

head link

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

On Wed, Sep 26, 2012 at 2:07 AM, Renato Golin <rengolin at
systemcall.org>wrote:
> On 26 September 2012 01:08, Jan Voung <jvoung at chromium.org> wrote:
> > - Forward references will create negative-valued ids (which end up
being
> > written out as large 32-bit integers, as far as I could tell).
>
> Can you use an SLEB-like representation?
>
> It's probably not going to be backwards compatible, but if there
isn't
> an SLEB/ULEB representation in bitcode, I'd say it's a good
> improvement. ;)
>
>I think most of the instructions operands are encoded as VBR(N)
for some N number of bits, and it seems to me that VBR(8) is like ULEB?
I think that the default N is 6 bits, but I haven't tried tweaking these
parameters.


> --
> cheers,
> --renato
>
> http://systemcall.org/
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120926/07294d3c/attachment.html>

Chris Lattner

2012-Sep-30 16:09 UTC

head link

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

On Sep 25, 2012, at 5:08 PM, Jan Voung <jvoung at chromium.org> wrote:
> Hi all,
> 
> I've been looking into how to make llvm bitcode files smaller.  There
is one simple change that appears to shrink linked bitcode files by about 15%. 
See this spreadsheet for some rough data:
> 
>
https://docs.google.com/spreadsheet/ccc?key=0AjRrJHQc4_bddEtJdjdIek5fMDdIdFFIZldZXzdWa0E
> 
> 
> The change is in how operand ids are encoded in bitcode files.  Rather than
use an "absolute number" given by the ValueEnumerator, which counts up
within a function, we can have the id be relative to the current instruction.
Very cool.  This is the "obviously right" way to do it, I'm glad
you've implemented it.  It looks like Duncan gave you a lot of feedback
already, I'm looking forward to seeing the updated patch.  Thanks for
tackling this Jan!

-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120930/67bce9ee/attachment.html>

Jan Voung

2012-Oct-02 00:09 UTC

head link

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

On Sun, Sep 30, 2012 at 9:09 AM, Chris Lattner <clattner at apple.com>
wrote:
>
> On Sep 25, 2012, at 5:08 PM, Jan Voung <jvoung at chromium.org>
wrote:
>
> Hi all,
>
> I've been looking into how to make llvm bitcode files smaller.  There
is
> one simple change that appears to shrink linked bitcode files by about 15%.
>  See this spreadsheet for some rough data:
>
>
>
https://docs.google.com/spreadsheet/ccc?key=0AjRrJHQc4_bddEtJdjdIek5fMDdIdFFIZldZXzdWa0E
>
>
> The change is in how operand ids are encoded in bitcode files.  Rather
> than use an "absolute number" given by the ValueEnumerator, which
counts up
> within a function, we can have the id be relative to the current
> instruction.
>
>
> Very cool.  This is the "obviously right" way to do it, I'm
glad you've
> implemented it.  It looks like Duncan gave you a lot of feedback already,
> I'm looking forward to seeing the updated patch.  Thanks for tackling
this
> Jan!
>
> -Chris
>

Thanks for taking a look!  I wasn't sure of better way to toggle this
behavior at the module level vs the function level.  The options I toyed
with were:

  (a) bump MODULE_CODE_VERSION from 0 to 1
  (b) add a new optional module code to identify the change
  (c) stick with the approach in the first patch, which used a new function
block ID value, then check that all the function blocks uniformly have the
newer ID instead of mix of the old and new.

for option (a) there hasn't been a version bump before(?) and I wasn't
sure
when that is appropriate.
for option (b) it seems like it's really indicating a version change and
not doing anything else, so could have been done more directly with (a)
for option (c) it ends up creating a new block ID that overlaps with the
existing function one and that would be first one to do that?

Thanks again Duncan for the review, and Renato for the suggestion of
looking into SLEB. Attached is an updated patch which toggles the change at
the module level (done with option a), and uses signed VRBs for phis,

- Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121001/72a138aa/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: relative_ids.patch4
Type: application/octet-stream
Size: 36425 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121001/72a138aa/attachment.obj>

Robert Muth

2012-Oct-09 14:01 UTC

head link

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

Another option to deal with negatives is to use zig-zag encoding:
https://developers.google.com/protocol-buffers/docs/encoding
This "penalizes" positive ints with an extra bit and requires more
surgery - ymmv.


On Wed, Sep 26, 2012 at 5:07 AM, Renato Golin <rengolin at systemcall.org>
wrote:> On 26 September 2012 01:08, Jan Voung <jvoung at chromium.org> wrote:
>> - Forward references will create negative-valued ids (which end up
being
>> written out as large 32-bit integers, as far as I could tell).
>
> Can you use an SLEB-like representation?
>
> It's probably not going to be backwards compatible, but if there
isn't
> an SLEB/ULEB representation in bitcode, I'd say it's a good
> improvement. ;)
>
>
> --
> cheers,
> --renato
>
> http://systemcall.org/
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Sep 2012 - [LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

Apparently Analagous Threads