On Fri, 20 Aug 2004, Robert Mykland wrote:> >In any case, both signed and unsigned 8-bit constants can be written out > >in a single byte. Again, do you think it's worth special casing this > >though? Considering that we handle 8-bit strings specially already, there > >are not a ton of 8-bit constants with value >= 128. > > I'd rather that they not be treated specially. If char defaulted to > unsigned char, there would be little reason to create this special case.I don't understand what you're getting at here. You can change char to default to unsigned right now with llvm-gcc -funsigned-char. I don't understand how that would change anything to be more useful though.> >This is a very interesting idea, particularly for languages like C++ that > >have a ton of types. Before making this change, I would want to see some > >numbers though. In particular, I don't think that types typically take up > >a large amount of the .bc file size: most of it are instructions. > > > >Are you seeing other cases? > > No. This would only save a bit less than two bytes per primitive and > defined type. Maybe a few hundred bytes in a large LLVM file. Not a > big savings, but a savings. The thing I like is that along with the > size savings it appears to make the encode/decode simpler and quicker if > anything. So good news all around.Okay, that's fine. When implementing that, we should take care to create the pointer types lazily instead of eagerly to avoid creating pointer types that are not used.> > > I think the original plan was to have multiple modules in them but this > > seems > > > to have gone by the wayside. The result of linking two (or more) > > modules is a > > > single module so except in some really bizare corner cases the need for > > > multiple modules would go away. I suppose we could get rid of the block id > > > field for the file. I'll give this some thought and see if Chris has any > > > objections. > > > >I don't have any problem with removing it. > > Cool. Before you chop remember debug libraries.I think that debug libraries should be handled in other ways. The original idea was to have .bc files hold lots of other random cruft with them. With more experience, this seems like a bad idea. -Chris -- http://llvm.org/ http://nondot.org/sabre/
On Fri, 2004-08-20 at 18:43, Chris Lattner wrote:> I don't understand what you're getting at here. You can change char to > default to unsigned right now with llvm-gcc -funsigned-char. I don't > understand how that would change anything to be more useful though.The only thing it would change is that character constants with values > 63 would get encoded in 1 byte instead of 2 (with current implementation). I'm making a change that will ALWAYS encode UByteTyID and SByteTyID constants in 1 byte which would then render -funsigned-char useless (as far as bytecode goes).> Okay, that's fine. When implementing that, we should take care to create > the pointer types lazily instead of eagerly to avoid creating pointer > types that are not used.Eww .. you just raised a really good point. I was planning on doubling the referent type's slot number to get the pointer type. But, if all the pointer types are not used in the program then this just serves to increase the numerical values of the slot numbers and it will actually bloat the size of the file because the vbr_uint written slot numbers could take more bytes to write. Contrary to previous assertions, I'm not going to implement this unless we can prove that its beneficial.> I think that debug libraries should be handled in other ways. The > original idea was to have .bc files hold lots of other random cruft with > them. With more experience, this seems like a bad idea.By "random cruft" you're referring to the current lib/Debugger things intermixed with the instructions? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040820/9ebf3f71/attachment.sig>
On Fri, 20 Aug 2004, Reid Spencer wrote:> On Fri, 2004-08-20 at 18:43, Chris Lattner wrote: > > > I don't understand what you're getting at here. You can change char to > > default to unsigned right now with llvm-gcc -funsigned-char. I don't > > understand how that would change anything to be more useful though. > > The only thing it would change is that character constants with values > > 63 would get encoded in 1 byte instead of 2 (with current > implementation). I'm making a change that will ALWAYS encode UByteTyID > and SByteTyID constants in 1 byte which would then render > -funsigned-char useless (as far as bytecode goes).It's 127 right, not 63? Also, what does this have to do with sbyte vs ubyte?> > Okay, that's fine. When implementing that, we should take care to create > > the pointer types lazily instead of eagerly to avoid creating pointer > > types that are not used. > > Eww .. you just raised a really good point. I was planning on doubling > the referent type's slot number to get the pointer type. But, if all the > pointer types are not used in the program then this just serves to > increase the numerical values of the slot numbers and it will actually > bloat the size of the file because the vbr_uint written slot numbers > could take more bytes to write.Yes.> Contrary to previous assertions, I'm not going to implement this unless > we can prove that its beneficial.ok.> > I think that debug libraries should be handled in other ways. The > > original idea was to have .bc files hold lots of other random cruft with > > them. With more experience, this seems like a bad idea. > > By "random cruft" you're referring to the current lib/Debugger things > intermixed with the instructions?Hrm, actually, random cruft still might be useful in the future. In particular, for large scale IPA (millions of LOC programs), you want to be able to do analysis at compile time, then read just the analysis results in at link time instead of holding the whole program in memory. Being able to define additional section ID's later could be useful. In any case, shrinking it to one vbr can't hurt. -Chris -- http://llvm.org/ http://nondot.org/sabre/
At 06:43 PM 8/20/2004, Chris Lattner wrote:>On Fri, 20 Aug 2004, Robert Mykland wrote: > > >In any case, both signed and unsigned 8-bit constants can be written out > > >in a single byte. Again, do you think it's worth special casing this > > >though? Considering that we handle 8-bit strings specially already, there > > >are not a ton of 8-bit constants with value >= 128. > > > > I'd rather that they not be treated specially. If char defaulted to > > unsigned char, there would be little reason to create this special case. > >I don't understand what you're getting at here. You can change char to >default to unsigned right now with llvm-gcc -funsigned-char. I don't >understand how that would change anything to be more useful though.Well, in the old days, char strings were handled just like any other kind of array of primitive types. In that world, when char defaulted to signed char, most of the heavily used ASCII symbols took two bytes to encode. Thus, (and I'm guessing here), you guys decided to treat char strings as a special case to save space in the bytecode file.>Okay, that's fine. When implementing that, we should take care to create >the pointer types lazily instead of eagerly to avoid creating pointer >types that are not used.If all pointer types are implied, not a problem to create them. However, in larger files it may cost a little due to slightly larger type numbers. I'm not sure about the tradeoff here, but I expect that implied pointers would still save more just because of pointers to function types. Regards, -- Robert. Robert Mykland Voice: (831) 462-6725 Founder/CTO Ascenium Corporation
On Mon, 2004-08-23 at 19:46, Robert Mykland wrote:> At 06:43 PM 8/20/2004, Chris Lattner wrote: > >I don't understand what you're getting at here. You can change char to > >default to unsigned right now with llvm-gcc -funsigned-char. I don't > >understand how that would change anything to be more useful though. > > Well, in the old days, char strings were handled just like any other kind > of array of primitive types.And, they still are :)> In that world, when char defaulted to signed > char, most of the heavily used ASCII symbols took two bytes to > encode.Um. What? ASCII is a 7-bit encoding. It defines values 0-127 which, even with a sign bit is encoded into one byte. Recall that in the "old days" computers had a parity bit as the 8th-bit because the memory failure rates were so high (think vacuum tubes).> Thus, (and I'm guessing here), you guys decided to treat char > strings as a special case to save space in the bytecode file.Actually, LLVM doesn't really treat character strings specially EXCEPT in the bcwriter and bcreader. There is no notion in LLVM of a "string", just primitive types and arrays of them. It is up to the front end compiler to define what it means by a "string". In the bytecode libraries of LLVM, we chose to interpret "[n x ubyte]" and "[n x sbyte]" as "strings" for reading and writing efficiency. They are, however, still just arrays of one of the two primitive single-byte types.> If all pointer types are implied, not a problem to create them. However, > in larger files it may cost a little due to slightly larger type > numbers. I'm not sure about the tradeoff here, but I expect that implied > pointers would still save more just because of pointers to function types.Pointers are used heavily in almost all languages. I can almost guarantee that the "tradeoff" would be larger bytecode files. The use of pointers to function types is not all that frequent so I wouldn't expect it to save much. In any event, we're not going to do anything with this until there are solid numbers. I'm working on improving llvm-bcanalyzer to provide them. Reid -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040823/f72a1033/attachment.sig>