Robert Mykland
2003-Aug-26 15:44 UTC
[LLVMdev] Question: Bytecode Representation of Type Definitions Table
Distinguished LLVM Creators, I've been looking through the bytecode representation of the type definition table and had a few questions about it. There's an enum in Types.h that defines all bytecodes that represent the primitive types and a few other necessary things: 0 = 0x00 = Void 1 = 0x01 = Bool 2 = 0x02 = UByte 3 = 0x03 = SByte 4 = 0x04 = UShort (16 bits) 5 = 0x05 = Short (16 bits) 6 = 0x06 = UInt (32 bits) 7 = 0x07 = Int (32 bits) 8 = 0x08 = ULong (64 bits) 9 = 0x09 = Long (64 bits) 10 = 0x0a = Float (32 bits) 11 = 0x0b = Double (64 bits) 12 = 0x0c = Type definition 13 = 0x0d = Label 14 = 0x0e = Function 15 = 0x0f = Struct 16 = 0x10 = Array 17 = 0x11 = Pointer 18 = 0x12 = Opaque As far as I can figure, the type definition table itself starts back at 0x0e and I'm thinking that's because the label is the last thing that wouldn't have to be only part of a derived type. But it still seems to make some of the low entries in the table ambiguous (at least to me!). I compiled a nice little hello world program into LLVM and then into bytecodes (see complete results attached). Here is the start of the type definition table: Entry 0x0e: Pointer to type 0x0f 0000001a 11 0f Entry 0x0f: Array of SByte [14] (presumably for "Hello World!\n" constant) 0000001c 10 03 0e Entry 0x10: Pointer to type 0x12 0000001f 11 |....n...n.......| 00000020 12 Entry 0x11: Pointer to SByte 00000021 11 03 Entry 0x12: Function returning Pointer ( UInt ) 00000023 0e 11 01 06 Okay, so looking at entry 0x10: is it a pointer to Opaque or a pointer to a function returning Pointer ( UInt )? I'm guessing the latter. Similarly, entry 0x0e could be a pointer to Struct or a pointer to Array of SByte [14]. Again I'm guessing the latter. I'm worried this low table stuff isn't unambiguous in all cases, but then again I'm a nervous guy. If you could set my mind at ease with regard to the lack of ambiguity that would be great. And what's with this Opaque type anyway? It's in the enum but I haven't found an instance of its use, unless of course it's used in entry 0x10. The whole missing Opaque thing makes me nervous too. It seems like it was just put there to be unclear. :-) But seriously, is it used for anything now? Will it start to get used sometime? Regards, -- Robert. Robert Mykland Voice: (831) 462-6725 -------------- next part -------------- A non-text attachment was scrubbed... Name: hello.hexdump Type: application/octet-stream Size: 26620 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20030826/977bdee1/attachment.obj> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: hello.c URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20030826/977bdee1/attachment.c> -------------- next part -------------- A non-text attachment was scrubbed... Name: hello.s Type: application/octet-stream Size: 10437 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20030826/977bdee1/attachment-0001.obj>
Chris Lattner
2003-Aug-26 16:04 UTC
[LLVMdev] Question: Bytecode Representation of Type Definitions Table
> As far as I can figure, the type definition table itself starts back at > 0x0e and I'm thinking that's because the label is the last thing that > wouldn't have to be only part of a derived type.Exactly right. The types starting with the function type never appear explictly in the table/they don't occupy a "slot". Derived types are only used to build concrete types from other things. :)> But it still seems to make some of the low entries in the table > ambiguous (at least to me!). I compiled a nice little hello world > program into LLVM and then into bytecodes (see complete results > attached). Here is the start of the type definition table:Ok.> Entry 0x0e: Pointer to type 0x0f > 0000001a 11 0fYes, since type 0x0F is '[14 x sbyte]', this is '[14 x sbyte]*'. Forward references are required for things like recursive types.> Entry 0x0f: Array of SByte [14] (presumably for "Hello World!\n" constant) > 0000001c 10 03 0eYup.> Entry 0x10: Pointer to type 0x12 > 0000001f 11 |....n...n.......| > 00000020 12Yup: 'sbyte* (uint)*'> Entry 0x11: Pointer to SByte > 00000021 11 03'sbyte*'> Entry 0x12: Function returning Pointer ( UInt ) > 00000023 0e 11 01 06'sbyte* (uint)> Okay, so looking at entry 0x10: is it a pointer to Opaque or a pointer to a > function returning Pointer ( UInt )? I'm guessing the latter. Similarly, > entry 0x0e could be a pointer to Struct or a pointer to Array of SByte > [14]. Again I'm guessing the latter.You're right. The parsing algorithm goes like this: Read a byte. This defines the 'typeid' to use for the type. This is ne of the values from the Type.h file, including things like structure, pointer, opaque, function, ... as well as the primitive types. If it's a derived type, extra information is read indicating what type of parameters there are for functions, which the pointee of a pointer is, etc. These type id's are type #'s, not primitive ID numbers. You cannot refer to a "generic" structure or function or anything like that. Forward references are allowed.> I'm worried this low table stuff isn't unambiguous in all cases, but > then again I'm a nervous guy. If you could set my mind at ease with > regard to the lack of ambiguity that would be great.It seems to work so far. :) It should be ambiguous, we haven't had any problems.> And what's with this Opaque type anyway? It's in the enum but I haven't > found an instance of its use, unless of course it's used in entry > 0x10. The whole missing Opaque thing makes me nervous too. It seems like > it was just put there to be unclear. :-)Opaque type is for a type that does not have a definition yet. In C, for example, if you say 'struct foo;' and never provide the body, you get an llvm type like: %struct.foo = opaque; Allowing you to build definitions like '%struct.foo*', etc. Later, when the type is resolved in the linking phase, all of these types are updated to have their "true" values.> But seriously, is it used for anything now? Will it start to get used > sometime?It is used extensively for a lot of things, including the "forward referencing" of types in the bytecode and asm files... -Chris -- http://llvm.cs.uiuc.edu/ http://www.nondot.org/~sabre/Projects/