thr3ads.net - llvm dev - [LLVMdev] Bitcode format [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Joshua Haberman

2007-Sep-03 21:34 UTC

[LLVMdev] Bitcode format

Greetings,

I am working on a project (unrelated to LLVM) that needed a 
bytecode-like format.  I found Bitcode and it seems to fit the bill
really nicely.

I am writing an independent implementation of Bitcode in C (I really 
want to keep my runtime pure C).  In the process of doing this, I've
discovered a few things I want to ask about.

I notice the Bitcode format documentation [0] is somewhat incomplete --
there have been a few questions I had to resolve by looking at LLVM
source code.  This is totally understandable (I'm actually impressed
that it's documented so well for something so new), but I wonder if you
would accept a patch that clarified some things.  For example, the
document doesn't mention endianness at all, but from the source code I
discover that bytes are read in order and bits are read 
least-significant first.

I also have a few questions about the format:

- it appears that the only magic number in the file is 
  application-specific.  This seems unfortunate, because it means that
  application-neutral tools cannot be built that process bitcode files,
  since they could not reliably detect that the file is a bitcode file.
  It might seem like there is little room for application- neutral tools
  since almost all the data in the file is application-specific, but off
  the top of my head I can think of a few, like a tool to suggest
  abbreviations that would give a file better compression.

- the LLVM code assumes that several VBR fields can be at most 32 bits 
  (block ids, number of elements in an array, etc).  These assumptions
  seem quite reasonable: can they be considered part of the format and
  added to the document?

Cheers,
Josh

[0] http://llvm.org/releases/2.0/docs/BitCodeFormat.html

Reid Spencer

2007-Sep-04 00:40 UTC

head link

[LLVMdev] Bitcode format

Hi Joshua,

On Mon, 2007-09-03 at 21:34 +0000, Joshua Haberman
wrote:> Greetings,
> 
> I am working on a project (unrelated to LLVM) that needed a 
> bytecode-like format.  I found Bitcode and it seems to fit the bill
> really nicely.
> 
> I am writing an independent implementation of Bitcode in C (I really 
> want to keep my runtime pure C).  In the process of doing this, I've
> discovered a few things I want to ask about.
Okay, cool :)
> 
> I notice the Bitcode format documentation [0] is somewhat incomplete --
> there have been a few questions I had to resolve by looking at LLVM
> source code.  This is totally understandable (I'm actually impressed
> that it's documented so well for something so new), but I wonder if you
> would accept a patch that clarified some things.  For example, the
> document doesn't mention endianness at all, but from the source code I
> discover that bytes are read in order and bits are read 
> least-significant first.
Absolutely, patches are always welcome.
> 
> I also have a few questions about the format:
> 
> - it appears that the only magic number in the file is 
>   application-specific.  This seems unfortunate, because it means that
>   application-neutral tools cannot be built that process bitcode files,
>   since they could not reliably detect that the file is a bitcode file.
>   It might seem like there is little room for application- neutral tools
>   since almost all the data in the file is application-specific, but off
>   the top of my head I can think of a few, like a tool to suggest
>   abbreviations that would give a file better compression.
Wouldn't those abbreviations imply some application-specific knowledge
of the format? I can't think of any situation where a generic tool could
do anything other than outline the contents of the file by looking at
the blocks, e.g. llvm-bcanalyzer.> 
> - the LLVM code assumes that several VBR fields can be at most 32 bits 
>   (block ids, number of elements in an array, etc).  These assumptions
>   seem quite reasonable: can they be considered part of the format and
>   added to the document?
I don't see why not. The llvm bitcode documentation is specific to llvm
anyway so the limits should be defined. 

Thanks for your interest, Joshua.

Reid.
> 
> Cheers,
> Josh
> 
> [0] http://llvm.org/releases/2.0/docs/BitCodeFormat.html
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Joshua Haberman

2007-Sep-04 05:26 UTC

head link

[LLVMdev] Bitcode format

Reid Spencer <rspencer <at> reidspencer.com> writes:
> Hi Joshua,
Hi, thanks for the reply.
> On Mon, 2007-09-03 at 21:34 +0000, Joshua Haberman wrote:
> > I also have a few questions about the format:
> > 
> > - it appears that the only magic number in the file is 
> >   application-specific.  This seems unfortunate, because it means that
> >   application-neutral tools cannot be built that process bitcode
files,
> >   since they could not reliably detect that the file is a bitcode
file.
> >   It might seem like there is little room for application- neutral
tools
> >   since almost all the data in the file is application-specific, but
off
> >   the top of my head I can think of a few, like a tool to suggest
> >   abbreviations that would give a file better compression.
> 
> Wouldn't those abbreviations imply some application-specific knowledge
> of the format?
No -- unless I am mistaken about bitcode, abbreviations have no semantic
meaning.  They are nothing but a space optimization, and needn't even be
exposed to the application (at least for reading).
> I can't think of any situation where a generic tool could
> do anything other than outline the contents of the file by looking at
> the blocks, e.g. llvm-bcanalyzer.
I can think of several tools that could be application-neutral and very
useful:

- a tool that prints the hierarchy of blocks and their sizes in the file

- a tool that does more sophisticated space usage analysis.  For
  example, how much space are the abbreviations saving?  Could the file
  be more efficiently packed with different VBR choices?  Or with
  different abbreviations?  It could even spit out a file that is 
  semantically identical but smaller.

- a tool that dumps selected blocks from the file (possibly giving you 
  fine-grained control over what blocks, how many records from each 
  block, etc).

- a tool for creating arbitrary bitcode files by reading textual input 
  on stdin written in some language that is isomorphic to bitcode.  This
  could be useful for things like constructing bitcode files that don't
  follow the expectations of your application, so you can test that your
  application handles such corruption gracefully.

I know there must already be bitcode files flying around, and so I can 
see the difficulty of making any changes to the format.  What if we just
said that BC (2 bytes) is the magic number for the Bitcode format, and
that 0xC0DE (2 bytes) is the application-specific magic number for LLVM?
> > - the LLVM code assumes that several VBR fields can be at most 32 bits
> >   (block ids, number of elements in an array, etc).  These assumptions
> >   seem quite reasonable: can they be considered part of the format and
> >   added to the document?
> 
> I don't see why not. The llvm bitcode documentation is specific to llvm
> anyway so the limits should be defined. 
Hmm, the Bitcode documentation [0] seems to contain both application-neutral
and LLVM-specific parts.  Specifically, only section 4 of that document
seems specific to LLVM.

In any case, when I submit my patch to that document, I'll put any
of these assumptions that seem supremely reasonable in there and see
what the feedback is.  I'm especially interested to hear from Chris,
since he wrote that document (and I assume designed the format).
> Thanks for your interest, Joshua.
Thanks for encouraging my participation.  And you can call me Josh.  :)

Josh

[0] http://llvm.org/releases/2.0/docs/BitCodeFormat.html

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Sep 2007 - [LLVMdev] Bitcode format

[LLVMdev] Bitcode format

[LLVMdev] Bitcode format

[LLVMdev] Bitcode format

Possibly Parallel Threads