On Fri, 20 Aug 2004, Reid Spencer wrote:> > defined would be almost always stored in one byte instead of the present > > usual two. > > So, if I get you correctly, you're advocating the creation of a Type::CharTyID > in the TypeID enumeration that is always written as a single byte? Note that > right now all ASCII values ( <128 ) will be written as a single byte for > UByteTyID but for SByteTyID (often the default from FE compilers like GCC), > you're right, they'll take two bytes if the value > 63. Or are you saying that > we should always write UByteTyID and SByteTyID as a single byte? > > Long term, LLVM's distinction between signed and unsigned will go away. Talk to > Chris about that. :)If you're interested in the plans, they are described in some detail here: http://nondot.org/sabre/LLVMNotes/TypeSystemChanges.txt Note that there is no concrete timeline for this to happen, it basically depends on when someone is ambitious enough to start working on it. In any case, both signed and unsigned 8-bit constants can be written out in a single byte. Again, do you think it's worth special casing this though? Considering that we handle 8-bit strings specially already, there are not a ton of 8-bit constants with value >= 128.> > 2) I think it would be a big file size and processing speed win to have > > implied pointer types for every literal type. This would save a > > tremendous amount of space in the global type table and other places > > where pointer types are constantly being defined. So the primitive > > types list would change to: > > > > 0 void > > 1 void* (implied)This is a very interesting idea, particularly for languages like C++ that have a ton of types. Before making this change, I would want to see some numbers though. In particular, I don't think that types typically take up a large amount of the .bc file size: most of it are instructions. Are you seeing other cases?> > This approach would have the added advantage of being able to check to > > see whether anything is a pointer type by checking bit 0 (1 = yes) and > > deriving its dereferenced type (just subtract 1).I don't think this is a big win, the .bc reader doesn't have to do much of this.> > 3) Have the value index for labels start at 1, just like nonzero values > > of everything else does. This just makes the encode/decode algorithm > > simpler and I doubt it would cost anything in file size. I made this > > suggestion a few emails back, hopefully in a clearer form here. > > Like I replied, we don't store labels as values in LLVM. Labels are just the > names of basic blocks. Those names are stored in the function level symbolI think that Robert's point is that this would remove a special case from the code (which is good). I'm indifferent about the change: if some other changes are made to the .bc file format, this could go in as well.> > 4) Can files have multiple 0x01 headers? I've never seen more than > > one. If not, ditch this four bytes of unnecessary space per file. > > I think the original plan was to have multiple modules in them but this seems > to have gone by the wayside. The result of linking two (or more) modules is a > single module so except in some really bizare corner cases the need for > multiple modules would go away. I suppose we could get rid of the block id > field for the file. I'll give this some thought and see if Chris has any > objections.I don't have any problem with removing it.> Long term, I intend to write some kind of bytecode archive utility similar to > JAR files that contains multiple bytecode files, an index, and the whole thingSounds like a cool thing. If you did this, make sure that llvm-nm could read the files (of course), and, if/when you do this, you could make the interface be llvm-ar (which was never finished).> > I'm committed to making LLVM > > bytecode as compact and as quick to encode/decode as possible. > > Thanks, we appreciate that a lot. Its high on our agenda too.I totally agree as well. :) -Chris -- http://llvm.org/ http://nondot.org/sabre/
At 05:09 PM 8/20/2004, you wrote:>On Fri, 20 Aug 2004, Reid Spencer wrote: > > > defined would be almost always stored in one byte instead of the present > > > usual two. > > > > So, if I get you correctly, you're advocating the creation of a > Type::CharTyID > > in the TypeID enumeration that is always written as a single byte? Note > that > > right now all ASCII values ( <128 ) will be written as a single byte for > > UByteTyID but for SByteTyID (often the default from FE compilers like GCC), > > you're right, they'll take two bytes if the value > 63. Or are you > saying that > > we should always write UByteTyID and SByteTyID as a single byte? > > > > Long term, LLVM's distinction between signed and unsigned will go away. > Talk to > > Chris about that. :) > >If you're interested in the plans, they are described in some detail here: >http://nondot.org/sabre/LLVMNotes/TypeSystemChanges.txt > >Note that there is no concrete timeline for this to happen, it basically >depends on when someone is ambitious enough to start working on it. > >In any case, both signed and unsigned 8-bit constants can be written out >in a single byte. Again, do you think it's worth special casing this >though? Considering that we handle 8-bit strings specially already, there >are not a ton of 8-bit constants with value >= 128.I'd rather that they not be treated specially. If char defaulted to unsigned char, there would be little reason to create this special case.> > > 2) I think it would be a big file size and processing speed win to have > > > implied pointer types for every literal type. This would save a > > > tremendous amount of space in the global type table and other places > > > where pointer types are constantly being defined. So the primitive > > > types list would change to: > > > > > > 0 void > > > 1 void* (implied) > >This is a very interesting idea, particularly for languages like C++ that >have a ton of types. Before making this change, I would want to see some >numbers though. In particular, I don't think that types typically take up >a large amount of the .bc file size: most of it are instructions. > >Are you seeing other cases?No. This would only save a bit less than two bytes per primitive and defined type. Maybe a few hundred bytes in a large LLVM file. Not a big savings, but a savings. The thing I like is that along with the size savings it appears to make the encode/decode simpler and quicker if anything. So good news all around.> > > This approach would have the added advantage of being able to check to > > > see whether anything is a pointer type by checking bit 0 (1 = yes) and > > > deriving its dereferenced type (just subtract 1). > >I don't think this is a big win, the .bc reader doesn't have to do much of >this.I know my reader does this. I'm not really sure how much time it spends doing it. My little code generator spends a lot of time going back and forth between pointers and literal values when turning certain kinds of memory operations into data movement in the Ascenium array.> > > 3) Have the value index for labels start at 1, just like nonzero values > > > of everything else does. This just makes the encode/decode algorithm > > > simpler and I doubt it would cost anything in file size. I made this > > > suggestion a few emails back, hopefully in a clearer form here. > > > > Like I replied, we don't store labels as values in LLVM. Labels are > just the > > names of basic blocks. Those names are stored in the function level symbol > >I think that Robert's point is that this would remove a special case from >the code (which is good). I'm indifferent about the change: if some other >changes are made to the .bc file format, this could go in as well.Cool.> > > 4) Can files have multiple 0x01 headers? I've never seen more than > > > one. If not, ditch this four bytes of unnecessary space per file. > > > > I think the original plan was to have multiple modules in them but this > seems > > to have gone by the wayside. The result of linking two (or more) > modules is a > > single module so except in some really bizare corner cases the need for > > multiple modules would go away. I suppose we could get rid of the block id > > field for the file. I'll give this some thought and see if Chris has any > > objections. > >I don't have any problem with removing it.Cool. Before you chop remember debug libraries.> > Long term, I intend to write some kind of bytecode archive utility > similar to > > JAR files that contains multiple bytecode files, an index, and the > whole thing > >Sounds like a cool thing. If you did this, make sure that llvm-nm could >read the files (of course), and, if/when you do this, you could make the >interface be llvm-ar (which was never finished).Seconded! Regards, -- Robert. Robert Mykland Voice: (831) 462-6725 Founder/CTO Ascenium Corporation
On Fri, 2004-08-20 at 17:55, Robert Mykland wrote:> At 05:09 PM 8/20/2004, Chris Lattner wrote: > > > >If you're interested in the plans, they are described in some detail here: > >http://nondot.org/sabre/LLVMNotes/TypeSystemChanges.txt > > > >Note that there is no concrete timeline for this to happen, it basically > >depends on when someone is ambitious enough to start working on it. > > > >In any case, both signed and unsigned 8-bit constants can be written out > >in a single byte. Again, do you think it's worth special casing this > >though? Considering that we handle 8-bit strings specially already, there > >are not a ton of 8-bit constants with value >= 128. > > I'd rather that they not be treated specially. If char defaulted to > unsigned char, there would be little reason to create this special case.Actually, this isn't a very big deal. Its just handled in a switch() statement now so I just make a couple more cases that handle the UByteTyID and SByteTyID separately. I'll probably include this in 1.4> > > > This approach would have the added advantage of being able to check to > > > > see whether anything is a pointer type by checking bit 0 (1 = yes) and > > > > deriving its dereferenced type (just subtract 1). > > > >I don't think this is a big win, the .bc reader doesn't have to do much of > >this. > > I know my reader does this. I'm not really sure how much time it spends > doing it. My little code generator spends a lot of time going back and > forth between pointers and literal values when turning certain kinds of > memory operations into data movement in the Ascenium array.I will probably make this change in 1.4 to eek out a few more bytes of savings from the file and since it will help Robert.> > > > 4) Can files have multiple 0x01 headers? I've never seen more than > > > > one. If not, ditch this four bytes of unnecessary space per file. > > > > > > I think the original plan was to have multiple modules in them but this > > seems > > > to have gone by the wayside. The result of linking two (or more) > > modules is a > > > single module so except in some really bizare corner cases the need for > > > multiple modules would go away. I suppose we could get rid of the block id > > > field for the file. I'll give this some thought and see if Chris has any > > > objections. > > > >I don't have any problem with removing it. > > Cool. Before you chop remember debug libraries.Sorry, I'm missing the context here. Why would this affect debug libraries? Reid -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040820/2fae433a/attachment.sig>
On Fri, 20 Aug 2004, Robert Mykland wrote:> >In any case, both signed and unsigned 8-bit constants can be written out > >in a single byte. Again, do you think it's worth special casing this > >though? Considering that we handle 8-bit strings specially already, there > >are not a ton of 8-bit constants with value >= 128. > > I'd rather that they not be treated specially. If char defaulted to > unsigned char, there would be little reason to create this special case.I don't understand what you're getting at here. You can change char to default to unsigned right now with llvm-gcc -funsigned-char. I don't understand how that would change anything to be more useful though.> >This is a very interesting idea, particularly for languages like C++ that > >have a ton of types. Before making this change, I would want to see some > >numbers though. In particular, I don't think that types typically take up > >a large amount of the .bc file size: most of it are instructions. > > > >Are you seeing other cases? > > No. This would only save a bit less than two bytes per primitive and > defined type. Maybe a few hundred bytes in a large LLVM file. Not a > big savings, but a savings. The thing I like is that along with the > size savings it appears to make the encode/decode simpler and quicker if > anything. So good news all around.Okay, that's fine. When implementing that, we should take care to create the pointer types lazily instead of eagerly to avoid creating pointer types that are not used.> > > I think the original plan was to have multiple modules in them but this > > seems > > > to have gone by the wayside. The result of linking two (or more) > > modules is a > > > single module so except in some really bizare corner cases the need for > > > multiple modules would go away. I suppose we could get rid of the block id > > > field for the file. I'll give this some thought and see if Chris has any > > > objections. > > > >I don't have any problem with removing it. > > Cool. Before you chop remember debug libraries.I think that debug libraries should be handled in other ways. The original idea was to have .bc files hold lots of other random cruft with them. With more experience, this seems like a bad idea. -Chris -- http://llvm.org/ http://nondot.org/sabre/