thr3ads.net - llvm dev - [LLVMdev] More Encoding Ideas [Aug 2004]

If this information is useful, please help other people find it:
Share via:

Chris Lattner

2004-Aug-21 01:43 UTC

[LLVMdev] More Encoding Ideas

On Fri, 20 Aug 2004, Robert Mykland wrote:> >In any case, both signed and unsigned 8-bit constants can be written
out
> >in a single byte.  Again, do you think it's worth special casing
this
> >though?  Considering that we handle 8-bit strings specially already,
there
> >are not a ton of 8-bit constants with value >= 128.
>
> I'd rather that they not be treated specially.  If char defaulted to
> unsigned char, there would be little reason to create this special case.
I don't understand what you're getting at here.  You can change char to
default to unsigned right now with llvm-gcc -funsigned-char.  I don't
understand how that would change anything to be more useful though.
> >This is a very interesting idea, particularly for languages like C++
that
> >have a ton of types.  Before making this change, I would want to see
some
> >numbers though.  In particular, I don't think that types typically
take up
> >a large amount of the .bc file size: most of it are instructions.
> >
> >Are you seeing other cases?
>
> No.  This would only save a bit less than two bytes per primitive and
> defined type.  Maybe a few hundred bytes in a large LLVM file.  Not a
> big savings, but a savings.  The thing I like is that along with the
> size savings it appears to make the encode/decode simpler and quicker if
> anything.  So good news all around.
Okay, that's fine.  When implementing that, we should take care to create
the pointer types lazily instead of eagerly to avoid creating pointer
types that are not used.
> > > I think the original plan was to have multiple modules in them
but this
> > seems
> > > to have gone by the wayside. The result of linking two (or more)
> > modules is a
> > > single module so except in some really bizare corner cases the
need for
> > > multiple modules would go away. I suppose we could get rid of the
block id
> > > field for the file. I'll give this some thought and see if
Chris has any
> > > objections.
> >
> >I don't have any problem with removing it.
>
> Cool. Before you chop remember debug libraries.
I think that debug libraries should be handled in other ways.  The
original idea was to have .bc files hold lots of other random cruft with
them.  With more experience, this seems like a bad idea.

-Chris

-- 
http://llvm.org/
http://nondot.org/sabre/

Reid Spencer

2004-Aug-21 01:48 UTC

head link

[LLVMdev] More Encoding Ideas

On Fri, 2004-08-20 at 18:43, Chris Lattner wrote:
> I don't understand what you're getting at here.  You can change
char to
> default to unsigned right now with llvm-gcc -funsigned-char.  I don't
> understand how that would change anything to be more useful though.
The only thing it would change is that character constants with values >
63 would get encoded in 1 byte instead of 2 (with current
implementation). I'm making a change that will ALWAYS encode UByteTyID
and SByteTyID constants in 1 byte which would then render
-funsigned-char useless (as far as bytecode goes).
> Okay, that's fine.  When implementing that, we should take care to
create
> the pointer types lazily instead of eagerly to avoid creating pointer
> types that are not used.
Eww .. you just raised a really good point. I was planning on doubling
the referent type's slot number to get the pointer type. But, if all the
pointer types are not used in the program then this just serves to
increase the numerical values of the slot numbers and it will actually
bloat the size of the file because the vbr_uint written slot numbers
could take more bytes to write. 

Contrary to previous assertions, I'm not going to implement this unless
we can prove that its beneficial.

> I think that debug libraries should be handled in other ways.  The
> original idea was to have .bc files hold lots of other random cruft with
> them.  With more experience, this seems like a bad idea.
By "random cruft" you're referring to the current lib/Debugger
things
intermixed with the instructions?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20040820/9ebf3f71/attachment.sig>

Chris Lattner

2004-Aug-21 02:10 UTC

head link

[LLVMdev] More Encoding Ideas

On Fri, 20 Aug 2004, Reid Spencer wrote:> On Fri, 2004-08-20 at 18:43, Chris Lattner wrote:
>
> > I don't understand what you're getting at here.  You can
change char to
> > default to unsigned right now with llvm-gcc -funsigned-char.  I
don't
> > understand how that would change anything to be more useful though.
>
> The only thing it would change is that character constants with values >
> 63 would get encoded in 1 byte instead of 2 (with current
> implementation). I'm making a change that will ALWAYS encode UByteTyID
> and SByteTyID constants in 1 byte which would then render
> -funsigned-char useless (as far as bytecode goes).
It's 127 right, not 63?  Also, what does this have to do with sbyte vs
ubyte?
> > Okay, that's fine.  When implementing that, we should take care to
create
> > the pointer types lazily instead of eagerly to avoid creating pointer
> > types that are not used.
>
> Eww .. you just raised a really good point. I was planning on doubling
> the referent type's slot number to get the pointer type. But, if all
the
> pointer types are not used in the program then this just serves to
> increase the numerical values of the slot numbers and it will actually
> bloat the size of the file because the vbr_uint written slot numbers
> could take more bytes to write.
Yes.
> Contrary to previous assertions, I'm not going to implement this unless
> we can prove that its beneficial.
ok.
> > I think that debug libraries should be handled in other ways.  The
> > original idea was to have .bc files hold lots of other random cruft
with
> > them.  With more experience, this seems like a bad idea.
>
> By "random cruft" you're referring to the current
lib/Debugger things
> intermixed with the instructions?
Hrm, actually, random cruft still might be useful in the future.  In
particular, for large scale IPA (millions of LOC programs), you want to be
able to do analysis at compile time, then read just the analysis results
in at link time instead of holding the whole program in memory.  Being
able to define additional section ID's later could be useful.

In any case, shrinking it to one vbr can't hurt.

-Chris

-- 
http://llvm.org/
http://nondot.org/sabre/

Robert Mykland

2004-Aug-24 02:46 UTC

head link

[LLVMdev] More Encoding Ideas

At 06:43 PM 8/20/2004, Chris Lattner wrote:>On Fri, 20 Aug 2004, Robert Mykland wrote:
> > >In any case, both signed and unsigned 8-bit constants can be
written out
> > >in a single byte.  Again, do you think it's worth special
casing this
> > >though?  Considering that we handle 8-bit strings specially
already, there
> > >are not a ton of 8-bit constants with value >= 128.
> >
> > I'd rather that they not be treated specially.  If char defaulted
to
> > unsigned char, there would be little reason to create this special
case.
>
>I don't understand what you're getting at here.  You can change char
to
>default to unsigned right now with llvm-gcc -funsigned-char.  I don't
>understand how that would change anything to be more useful though.
Well, in the old days, char strings were handled just like any other kind 
of array of primitive types.  In that world, when char defaulted to signed 
char, most of the heavily used ASCII symbols took two bytes to 
encode.  Thus, (and I'm guessing here), you guys decided to treat char 
strings as a special case to save space in the bytecode file.
>Okay, that's fine.  When implementing that, we should take care to
create
>the pointer types lazily instead of eagerly to avoid creating pointer
>types that are not used.
If all pointer types are implied, not a problem to create them.  However, 
in larger files it may cost a little due to slightly larger type 
numbers.  I'm not sure about the tradeoff here, but I expect that implied 
pointers would still save more just because of pointers to function types.

Regards,

-- Robert.


Robert Mykland               Voice: (831) 462-6725
Founder/CTO                   Ascenium Corporation

Reid Spencer

2004-Aug-24 04:37 UTC

head link

[LLVMdev] More Encoding Ideas

On Mon, 2004-08-23 at 19:46, Robert Mykland wrote:> At 06:43 PM 8/20/2004, Chris Lattner wrote:
> >I don't understand what you're getting at here.  You can change
char to
> >default to unsigned right now with llvm-gcc -funsigned-char.  I
don't
> >understand how that would change anything to be more useful though.
> 
> Well, in the old days, char strings were handled just like any other kind 
> of array of primitive types.  
And, they still are :)
> In that world, when char defaulted to signed 
> char, most of the heavily used ASCII symbols took two bytes to 
> encode.  
Um. What? ASCII is a 7-bit encoding. It defines values 0-127 which, even
with a sign bit is encoded into one byte. Recall that in the "old
days"
computers had a parity bit as the 8th-bit because the memory failure
rates were so high (think vacuum tubes). 
> Thus, (and I'm guessing here), you guys decided to treat char 
> strings as a special case to save space in the bytecode file.
Actually, LLVM doesn't really treat character strings specially EXCEPT
in the bcwriter and bcreader. There is no notion in LLVM of a
"string",
just primitive types and arrays of them. It is up to the front end
compiler to define what it means by a "string". In the bytecode
libraries of LLVM, we chose to interpret "[n x ubyte]" and "[n x
sbyte]"
as "strings" for reading and writing efficiency. They are, however,
still just arrays of one of the two primitive single-byte types.
> If all pointer types are implied, not a problem to create them.  However, 
> in larger files it may cost a little due to slightly larger type 
> numbers.  I'm not sure about the tradeoff here, but I expect that
implied
> pointers would still save more just because of pointers to function types.
Pointers are used heavily in almost all languages. I can almost
guarantee that the "tradeoff" would be larger bytecode files. The use
of
pointers to function types is not all that frequent so I wouldn't expect
it to save much.  In any event, we're not going to do anything with this
until there are solid numbers. I'm working on improving llvm-bcanalyzer
to provide them.

Reid
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20040823/f72a1033/attachment.sig>

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Aug 2004 - [LLVMdev] More Encoding Ideas

[LLVMdev] More Encoding Ideas

[LLVMdev] More Encoding Ideas

[LLVMdev] More Encoding Ideas

[LLVMdev] More Encoding Ideas

[LLVMdev] More Encoding Ideas

Maybe Matching Threads