First off, most of my information about the integer representation in LLVM is from http://llvm.org/docs/LangRef.html#t_integer and I could use some things cleared up. First, I guess that smaller integer sizes, say, i1 (boolean) are stuffed into a full word size for the cpu it is compiled on (so 8bits, or 32 bits or whatever). What if someone made an i4 and compiled it on 32/64 bit windows/nix/bsd on a standard x86 or x64 system, and they set the value to 15 (the max size of an unsigned i4), if it is still rounded up to the next nearest size when compiled (i8 or i32 or what-not), what if when that value has 15, but a 1 was added to it, it will be represented in memory at 16, or if you ignore all but the first 4 bits it would be zero. Is there any enforcement in the range of a given integer (in other words, regardless of architecture, would an i4 be constrained to only be 0 to 15, or is this the realm of the language to enforce, I would think it would be as having it at LLVM level would add noticeable overhead on non-machine size integers, and given that it would be in the realm of the language to deal with, how can the language be certain what values are directly appropriate for the architecture it is being compiled on)? In just a quick guess, I would say that something like an i4 would be compiled as if it was an i8, treated identically to an i8 in all circumstances, is this correct? Second, what if the specified integer size is rather large, say that an i512 was specified, would this cause a compile error (something along the lines of the specified integer size being too large to fit in the machine architecture), or would it rather compile in the necessary code to do bignum math on it (or would something like that be in the realm of the language designer, although having it at LLVM level would also make sense, after all, what best knows how to compile something for speed on the target system other then the compiler itself)? In just a quick guess, I would say that specifying an integer bit size too large for the machine would cause a compile error, but the docs do not hint at that (especially with the given example of: i1942652 a really big integer of over 1 million bits), is this correct? Third, assuming either or both of the above things had to be enforced/implemented by the language designer, what would be the best way for the language to ask LLVM what the appropriate machine integer sizes are, so that if an i64 is specified, then bignum math would be done by the language on a 32-bit compile, but would just be a native int on a 64-bit compile. The reason this is asked instead of just directly testing the cpu bit (32-bit, 64-bit, whatever) is that some processors allow double sized integers to be specified, so a 64-bit integer on some 32-bit cpu's is just fine, as is a 128-bit int on a 64-bit cpu, thus how can I query what are the best appropriate integer sizes? Some background on the questions: Making a JIT'd, speed-critical 'scripting-language' for a certain app of mine, integer types have a bitsize part, like how LLVM does it, i4/s4 is a signed integer of 4-bits, u4 is an unsigned integer of 4-bits, etc...
On Fri, Sep 5, 2008 at 12:42 PM, OvermindDL1 <overminddl1 at gmail.com> wrote:> First off, most of my information about the integer representation in > LLVM is from http://llvm.org/docs/LangRef.html#t_integer and I could > use some things cleared up.Okay... that's a good start :)> First, I guess that smaller integer sizes, say, i1 (boolean) are > stuffed into a full word size for the cpu it is compiled on (so 8bits, > or 32 bits or whatever).The code is compiled so that it works. :) At least hopefully; I think there are still some bugs lurking with unusual types. CodeGen will use a native register to do arithmetic.> What if someone made an i4 and compiled it on 32/64 bit > windows/nix/bsd on a standard x86 or x64 system, and they set the > value to 15 (the max size of an unsigned i4), if it is still rounded > up to the next nearest size when compiled (i8 or i32 or what-not), > what if when that value has 15, but a 1 was added to it, it will be > represented in memory at 16, or if you ignore all but the first 4 bits > it would be zero. Is there any enforcement in the range of a given > integer (in other words, regardless of architecture, would an i4 be > constrained to only be 0 to 15, or is this the realm of the language > to enforce, I would think it would be as having it at LLVM level would > add noticeable overhead on non-machine size integers, and given that > it would be in the realm of the language to deal with, how can the > language be certain what values are directly appropriate for the > architecture it is being compiled on)? > In just a quick guess, I would say that something like an i4 would be > compiled as if it was an i8, treated identically to an i8 in all > circumstances, is this correct?An i4 is a four-bit integer; it is guaranteed to act like a true i4 for all arithmetic operations. CodeGen will mask the integers appropriately to achieve the desired behavior.> Second, what if the specified integer size is rather large, say that > an i512 was specified, would this cause a compile error (something > along the lines of the specified integer size being too large to fit > in the machine architecture), or would it rather compile in the > necessary code to do bignum math on it (or would something like that > be in the realm of the language designer, although having it at LLVM > level would also make sense, after all, what best knows how to compile > something for speed on the target system other then the compiler > itself)? > In just a quick guess, I would say that specifying an integer bit size > too large for the machine would cause a compile error, but the docs do > not hint at that (especially with the given example of: i1942652 a > really big integer of over 1 million bits), is this correct?The language and the optimizers don't have any issues with such types, at least in theory. Ignoring bugs, CodeGen can currently handle anything up to i128 for all operations on most architectures; if the operations aren't natively supported, it falls back to calling the implementation in libgcc. -Eli
Hi,> First, I guess that smaller integer sizes, say, i1 (boolean) are > stuffed into a full word size for the cpu it is compiled on (so 8bits, > or 32 bits or whatever).on x86-32, an i1 gets placed in an 8 bit register.> What if someone made an i4 and compiled it on 32/64 bit > windows/nix/bsd on a standard x86 or x64 system, and they set the > value to 15 (the max size of an unsigned i4), if it is still rounded > up to the next nearest size when compiled (i8 or i32 or what-not),The extra bits typically contain rubbish, but you can't tell. For example, suppose in the bitcode you decide to print out the value of the i4 by calling printf. So in the bitcode you first (say) zero-extend the i4 to an i32 which you pass to printf. Well, the code-generators will generate the following (or equivalent) for the zero-extension: mask off the rubbish bits in the i8 register (i.e. set them to zero) then zero-extend the result to a full 32 bit register. This all happens transparently.> what if when that value has 15, but a 1 was added to it, it will be > represented in memory at 16, or if you ignore all but the first 4 bits > it would be zero.It acts like an i4: the bits corresponding to the i4 will have the right value (0) while the rest will have some rubbish.> Is there any enforcement in the range of a given > integer (in other words, regardless of architecture, would an i4 be > constrained to only be 0 to 15, or is this the realm of the language > to enforce, I would think it would be as having it at LLVM level would > add noticeable overhead on non-machine size integers, and given that > it would be in the realm of the language to deal with, how can the > language be certain what values are directly appropriate for the > architecture it is being compiled on)? > In just a quick guess, I would say that something like an i4 would be > compiled as if it was an i8, treated identically to an i8 in all > circumstances, is this correct?No it is not. It acts exactly like an i4, even though on x86-32 the code-generators implement this in an i8. There is a whole pile of logic in lib/CodeGen/SelectionDAG/Legalize*Types.cpp in order to get this effect (currently you have to pass -enable-legalize-types to llc to turn on codegen support for funky integer sizes).> Second, what if the specified integer size is rather large, say that > an i512 was specified, would this cause a compile error (something > along the lines of the specified integer size being too large to fit > in the machine architecture), or would it rather compile in the > necessary code to do bignum math on it (or would something like that > be in the realm of the language designer, although having it at LLVM > level would also make sense, after all, what best knows how to compile > something for speed on the target system other then the compiler > itself)?The current maximum the code generators support is i256. If you try to use bigger integers it will work fine in the bitcode, but if you try to do code generation the compiler will crash.> In just a quick guess, I would say that specifying an integer bit size > too large for the machine would cause a compile error, but the docs do > not hint at that (especially with the given example of: i1942652 a > really big integer of over 1 million bits), is this correct?No, you can use i256 on a 32 bit machine for example.> Third, assuming either or both of the above things had to be > enforced/implemented by the language designer, what would be the best > way for the language to ask LLVM what the appropriate machine integer > sizes are, so that if an i64 is specified, then bignum math would be > done by the language on a 32-bit compile, but would just be a native > int on a 64-bit compile. The reason this is asked instead of just > directly testing the cpu bit (32-bit, 64-bit, whatever) is that some > processors allow double sized integers to be specified, so a 64-bit > integer on some 32-bit cpu's is just fine, as is a 128-bit int on a > 64-bit cpu, thus how can I query what are the best appropriate integer > sizes?I don't know, sorry. Ciao, Duncan.
On Sep 5, 2008, at 3:07 PM, Duncan Sands wrote:> The current maximum the code generators support is i256. If you try > to > use bigger integers it will work fine in the bitcode, but if you try > to do code generation the compiler will crash.FYI, there is one other issue here, PR2660. While codegen in general can handle types like i256, individual targets don't always have calling convention rules to cover them. For example, returning an i128 on x86-32 or an i256 on x86-64 doesn't doesn't fit in the registers designated for returning values on those targets. Dan