On Mon, 2004-08-23 at 19:46, Robert Mykland wrote:> At 06:43 PM 8/20/2004, Chris Lattner wrote: > >I don't understand what you're getting at here. You can change char to > >default to unsigned right now with llvm-gcc -funsigned-char. I don't > >understand how that would change anything to be more useful though. > > Well, in the old days, char strings were handled just like any other kind > of array of primitive types.And, they still are :)> In that world, when char defaulted to signed > char, most of the heavily used ASCII symbols took two bytes to > encode.Um. What? ASCII is a 7-bit encoding. It defines values 0-127 which, even with a sign bit is encoded into one byte. Recall that in the "old days" computers had a parity bit as the 8th-bit because the memory failure rates were so high (think vacuum tubes).> Thus, (and I'm guessing here), you guys decided to treat char > strings as a special case to save space in the bytecode file.Actually, LLVM doesn't really treat character strings specially EXCEPT in the bcwriter and bcreader. There is no notion in LLVM of a "string", just primitive types and arrays of them. It is up to the front end compiler to define what it means by a "string". In the bytecode libraries of LLVM, we chose to interpret "[n x ubyte]" and "[n x sbyte]" as "strings" for reading and writing efficiency. They are, however, still just arrays of one of the two primitive single-byte types.> If all pointer types are implied, not a problem to create them. However, > in larger files it may cost a little due to slightly larger type > numbers. I'm not sure about the tradeoff here, but I expect that implied > pointers would still save more just because of pointers to function types.Pointers are used heavily in almost all languages. I can almost guarantee that the "tradeoff" would be larger bytecode files. The use of pointers to function types is not all that frequent so I wouldn't expect it to save much. In any event, we're not going to do anything with this until there are solid numbers. I'm working on improving llvm-bcanalyzer to provide them. Reid -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040823/f72a1033/attachment.sig>
On Mon, 23 Aug 2004, Reid Spencer wrote:> > If all pointer types are implied, not a problem to create them. However, > > in larger files it may cost a little due to slightly larger type > > numbers. I'm not sure about the tradeoff here, but I expect that implied > > pointers would still save more just because of pointers to function types. > > Pointers are used heavily in almost all languages. I can almost > guarantee that the "tradeoff" would be larger bytecode files. The use of > pointers to function types is not all that frequent so I wouldn't expectNote that every LLVM function involves creating a pointer to function, so it might be a good idea to implicitly encode pointers for every function type (there is no other way to use a function type in any case). If you guys are microoptimizing the bc format, this is an idea. *shrug* -Chris -- http://llvm.org/ http://nondot.org/sabre/
Chris & Reid, In fact, the primitive that means "function" could stand for "function pointer" and new number could be added to the end of the list, if needed, that just means "function". As far as I can see, the "function" type slots are never used. -- Robert. At 11:36 PM 8/23/2004, you wrote:>On Mon, 23 Aug 2004, Reid Spencer wrote: > > > If all pointer types are implied, not a problem to create them. However, > > > in larger files it may cost a little due to slightly larger type > > > numbers. I'm not sure about the tradeoff here, but I expect that implied > > > pointers would still save more just because of pointers to function > types. > > > > Pointers are used heavily in almost all languages. I can almost > > guarantee that the "tradeoff" would be larger bytecode files. The use of > > pointers to function types is not all that frequent so I wouldn't expect > >Note that every LLVM function involves creating a pointer to function, so >it might be a good idea to implicitly encode pointers for every function >type (there is no other way to use a function type in any case). If you >guys are microoptimizing the bc format, this is an idea. *shrug* > >-Chris > >-- >http://llvm.org/ >http://nondot.org/sabre/ > >_______________________________________________ >LLVM Developers mailing list >LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >http://mail.cs.uiuc.edu/mailman/listinfo/llvmdevRobert Mykland Voice: (831) 462-6725 Founder/CTO Ascenium Corporation
At 09:37 PM 8/23/2004, you wrote:>On Mon, 2004-08-23 at 19:46, Robert Mykland wrote: > > At 06:43 PM 8/20/2004, Chris Lattner wrote: > > >I don't understand what you're getting at here. You can change char to > > >default to unsigned right now with llvm-gcc -funsigned-char. I don't > > >understand how that would change anything to be more useful though. > > > > Well, in the old days, char strings were handled just like any other kind > > of array of primitive types. > >And, they still are :)No. If you define an array of int, the various int values that initialize the table are defined seperately and then their value indexes are used in the definition of the int array, which appears under its created type in the constants list. Character strings are handled differently from this.> > In that world, when char defaulted to signed > > char, most of the heavily used ASCII symbols took two bytes to > > encode. > >Um. What? ASCII is a 7-bit encoding. It defines values 0-127 which, even >with a sign bit is encoded into one byte. Recall that in the "old days" >computers had a parity bit as the 8th-bit because the memory failure >rates were so high (think vacuum tubes).Actually, by "old days" I meant LLVM version 0.9. In LLVM 0.9 they were most often encoded as two bytes because most of the most commonly used ASCII symbols are above 0x30.> > Thus, (and I'm guessing here), you guys decided to treat char > > strings as a special case to save space in the bytecode file. > >Actually, LLVM doesn't really treat character strings specially EXCEPT >in the bcwriter and bcreader. There is no notion in LLVM of a "string", >just primitive types and arrays of them. It is up to the front end >compiler to define what it means by a "string". In the bytecode >libraries of LLVM, we chose to interpret "[n x ubyte]" and "[n x sbyte]" >as "strings" for reading and writing efficiency. They are, however, >still just arrays of one of the two primitive single-byte types.Okay, but this discussion is about the physical protocol of the bytecode. That's what I'm referring to.> > If all pointer types are implied, not a problem to create them. However, > > in larger files it may cost a little due to slightly larger type > > numbers. I'm not sure about the tradeoff here, but I expect that implied > > pointers would still save more just because of pointers to function types. > >Pointers are used heavily in almost all languages. I can almost >guarantee that the "tradeoff" would be larger bytecode files. The use of >pointers to function types is not all that frequent so I wouldn't expect >it to save much. In any event, we're not going to do anything with this >until there are solid numbers. I'm working on improving llvm-bcanalyzer >to provide them.Right now I see pointer types being created for practically every literal type defined anyway. I doubt you'd see much file bloat due to pointer types being implied for everything. These pointer types are already being defined. However, I could see how it could conceivably save more file space to only define pointer types where absolutely necessary, thus keeping the overall number of types to an absolute minimum. Chris mentioned this philosophy and I think it's a good one. Perhaps we could also find a way to declare pointer types without having to declare the literal type if the literal type is never used. Functions are but one example of this. Regards, -- Robert. Robert Mykland Voice: (831) 462-6725 Founder/CTO Ascenium Corporation
On Thu, 2004-08-26 at 12:45, Robert Mykland wrote:> At 09:37 PM 8/23/2004, you wrote: > >On Mon, 2004-08-23 at 19:46, Robert Mykland wrote: > > > At 06:43 PM 8/20/2004, Chris Lattner wrote: > > > >I don't understand what you're getting at here. You can change char to > > > >default to unsigned right now with llvm-gcc -funsigned-char. I don't > > > >understand how that would change anything to be more useful though. > > > > > > Well, in the old days, char strings were handled just like any other kind > > > of array of primitive types. > > > >And, they still are :) > > No. If you define an array of int, the various int values that initialize > the table are defined seperately and then their value indexes are used in > the definition of the int array, which appears under its created type in > the constants list. Character strings are handled differently from this.Okay, I see what you mean now. Yes, bytecode treats character arrays (well, to be precise arrays of UByte or SByte) separately from arrays of other primitive types.> > > > > In that world, when char defaulted to signed > > > char, most of the heavily used ASCII symbols took two bytes to > > > encode. > > > >Um. What? ASCII is a 7-bit encoding. It defines values 0-127 which, even > >with a sign bit is encoded into one byte. Recall that in the "old days" > >computers had a parity bit as the 8th-bit because the memory failure > >rates were so high (think vacuum tubes). > > Actually, by "old days" I meant LLVM version 0.9. In LLVM 0.9 they were > most often encoded as two bytes because most of the most commonly used > ASCII symbols are above 0x30.Okay. LLVM 0.9 predates my involvement so I wouldn't know :)> > > > Thus, (and I'm guessing here), you guys decided to treat char > > > strings as a special case to save space in the bytecode file. > > > >Actually, LLVM doesn't really treat character strings specially EXCEPT > >in the bcwriter and bcreader. There is no notion in LLVM of a "string", > >just primitive types and arrays of them. It is up to the front end > >compiler to define what it means by a "string". In the bytecode > >libraries of LLVM, we chose to interpret "[n x ubyte]" and "[n x sbyte]" > >as "strings" for reading and writing efficiency. They are, however, > >still just arrays of one of the two primitive single-byte types. > > Okay, but this discussion is about the physical protocol of the > bytecode. That's what I'm referring to.Okay.> > > > If all pointer types are implied, not a problem to create them. However, > > > in larger files it may cost a little due to slightly larger type > > > numbers. I'm not sure about the tradeoff here, but I expect that implied > > > pointers would still save more just because of pointers to function types. > > > >Pointers are used heavily in almost all languages. I can almost > >guarantee that the "tradeoff" would be larger bytecode files. The use of > >pointers to function types is not all that frequent so I wouldn't expect > >it to save much. In any event, we're not going to do anything with this > >until there are solid numbers. I'm working on improving llvm-bcanalyzer > >to provide them. > > Right now I see pointer types being created for practically every literal > type defined anyway. I doubt you'd see much file bloat due to pointer > types being implied for everything. These pointer types are already being > defined.The bloat doesn't occur because of the extra type definitions. A pointer type definition is incredibly small (basically a VBR number or 1-2 bytes to reference the element type). The bloat occurs because of the automatic doubling of the slot numbers. Say you had 100 non-pointer types of which 15 needed pointers for a total of 115. If we automatically made even indices in the type table be non-pointer types and odd indices be pointer-types, we'd have 200 entries in the table. Now the entries above 127 all require 2 VBR bytes to reference instead of just 1 as per the current mechanism. If those higher slot number types are used frequently, this could add serious bloat to the file. What we *really* want to do here is (a) only define types that are used and (b) sort the types by frequency of use so that the most frequently used types have the smallest slot numbers. This will ensure the most compact encoding. You should try out llvm-bcanalyzer sometime. It will show you that on average the size of the type table is about 1% of the file size wheras the size of the instructions is around 90% (unless there's a lot of symbols in which case the symbol table rivals the instruction lists).> > However, I could see how it could conceivably save more file space to only > define pointer types where absolutely necessary, thus keeping the overall > number of types to an absolute minimum. Chris mentioned this philosophy > and I think it's a good one. Perhaps we could also find a way to declare > pointer types without having to declare the literal type if the literal > type is never used. Functions are but one example of this.True 'nuff. Reid -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040826/6b11dfaf/attachment.sig>