On 2008-05-07, at 03:05, Chris Lattner wrote:> On May 6, 2008, at 11:49 PM, Jonathan S. Shapiro wrote: > >> There are other languages that specify a "word" type along these >> lines. Would it be worth considering adding such a type to the IR, >> or is there a reason not to do so that I am failing to see? > > What would this be used for? How is it defined? How does > arithmetic work on it?Looking up the intptr type via TargetData is not a significant issue for me, but I can see the appeal, and how its absence could constitute a significant barrier to generating portable IR (provided, of course, a portable language). Regardless, it would allow me to hardcode a good deal more codegen if the LLVM IR had an intptr type. The semantics I would imagine for an intptr type are: • Lowered to i32 or i64 for code generation. • Treated an ordinary integer for all operations except casts. • Can be the operand to ptrtoint, but not the result. • Can be the result of inttoptr, but not the operand. • Can be bitcast to an actual pointer type. • Whether sext, zext, and trunc are applicable, I could be convinced either way. It muddies the semantics of these operations. On 2008-05-07, at 12:51, Mike Stump wrote:> The word opaque comes to mind. The basic question is, do you want > people to be able to understand this stuff, or is it more like the > bitcode file, you don't care that it is unreadable?I think this is unworkable. You can't assign an opaque value to a vreg (what if it became an struct type?)—only a pointer to an opaque can be used as a vreg. Never mind the horror of calling a runtime function to perform integer arithmetic. — Gordon
On May 7, 2008, at 12:20 PM, Gordon Henriksen wrote:>> The word opaque comes to mind.> I think this is unworkable.My comment wasn't a solution it was a value judgement on the proposed solution.
On Wed, 2008-05-07 at 15:20 -0400, Gordon Henriksen wrote:> Looking up the intptr type via TargetData is not a significant issue > for me, but I can see the appeal, and how its absence could constitute > a significant barrier to generating portable IR (provided, of course, > a portable language). Regardless, it would allow me to hardcode a good > deal more codegen if the LLVM IR had an intptr type. The semantics I > would imagine for an intptr type are: > > • Lowered to i32 or i64 for code generation. > • Treated an ordinary integer for all operations except casts. > • Can be the operand to ptrtoint, but not the result. > • Can be the result of inttoptr, but not the operand. > • Can be bitcast to an actual pointer type. > • Whether sext, zext, and trunc are applicable, I could be convinced > either way. It muddies the semantics of these operations.This doesn't seem like what we are after for BitC. In the BitC case, WORD is not an integer type that contains a pointer value. It is an integer type that is guaranteed to describe an arbitrary vector index. The size dependency is a consequence of the fact that not all address spaces are the same. We allow Word <-> integer conversion through explicit conversion operators, but the language spec does not allow Word to be intermixed with other integral types for arithmetic operations. shap
On Wed, 2008-05-07 at 13:24 -0700, Chris Lattner wrote:> Querying TargetData only works if you know the size of the pointer. :)Yes. For BitC purposes, querying TargetData would be sufficient as long as we don't care whether the emitted IR is neutral w.r.t. pointer size. Given this, I think that introducing an iWord type is not yet sufficiently well motivated from the BitC perspective. But it would sure be convenient if we could query TargetData at compile time to determine the target pointer size. Not essential, by any means, but it seems unnecessary to encode the knowledge redundantly (in both the IR layer and the front end). In the end, the use case that concerns me is things like character vectors, because of the fact that the index spans depend on the address space size. I'm not clear whether it is a goal to have an IR that is capable of being a neutral representation w.r.t. address space size. If it *is* a goal, then I don't see how to do it without some form of iIntPtr or iWord type, but I'm still very new to all this.> > • Lowered to i32 or i64 for code generation. > > Ok > > > • Treated an ordinary integer for all operations except casts. > > Ok. What does this mean for add? This basically means that an intptr add > cannot have usefully defined semantics. Can you give an example of when > it is useful?I had the same reaction. If it is lowered, then it should work normally for add. It is not quite as useless as you suggest, because things of the form add iIntPtr x iIntPtr -> iIntPtr will still work correctly after lowering is performed. I also see no reason why casts should be excluded at the IR level. That seems to me like a front end issue. At the IR level iIntPtr is just an late-bound integral type like any other. Perhaps Mike and I are thinking about unrelated things.> > • Can be the operand to ptrtoint, but not the result. > > • Can be the result of inttoptr, but not the operand. > > I assume these are backwards. intptr_t is an integer, not a pointer.I agree, but Mike was consistent enough here that I wondered if I had failed to understand what he was after properly.> > • Whether sext, zext, and trunc are applicable, I could be convinced > > either way. It muddies the semantics of these operations.These seem important in order to allow explicit conversions to the normal integer types. shap
On Wed, 7 May 2008, Gordon Henriksen wrote:>> What would this be used for? How is it defined? How does >> arithmetic work on it? > > Looking up the intptr type via TargetData is not a significant issue > for me, but I can see the appeal, and how its absence could constitute > a significant barrier to generating portable IR (provided, of course, > a portable language). Regardless, it would allow me to hardcode a good > deal more codegen if the LLVM IR had an intptr type. The semantics I > would imagine for an intptr type are:Querying TargetData only works if you know the size of the pointer. :)> • Lowered to i32 or i64 for code generation.Ok> • Treated an ordinary integer for all operations except casts.Ok. What does this mean for add? This basically means that an intptr add cannot have usefully defined semantics. Can you give an example of when it is useful?> • Can be the operand to ptrtoint, but not the result. > • Can be the result of inttoptr, but not the operand.I assume these are backwards. intptr_t is an integer, not a pointer.> • Can be bitcast to an actual pointer type.No. int <-> ptr is done with inttoptr and ptrtoint.> • Whether sext, zext, and trunc are applicable, I could be convinced > either way. It muddies the semantics of these operations.Right. -Chris -- http://nondot.org/sabre/ http://llvm.org/
On 2008-05-07, at 16:24, Chris Lattner wrote:> On Wed, 7 May 2008, Gordon Henriksen wrote: > >>> What would this be used for? How is it defined? How does >>> arithmetic work on it? >> >> Looking up the intptr type via TargetData is not a significant >> issue for me, but I can see the appeal, and how its absence could >> constitute a significant barrier to generating portable IR >> (provided, of course, a portable language). Regardless, it would >> allow me to hardcode a good deal more codegen if the LLVM IR had an >> intptr type. The semantics I would imagine for an intptr type are: > > Querying TargetData only works if you know the size of the pointer. :)Exactly. :) I'm going to play devil's advocate here for a moment. intptr would tidy up my own output a smidgen, but I do have other target dependencies, so it's of no great concern to me. But I could see how someone wanting LLVM bitcode to play the role of Java bytecode or MSIL might find it important or even essential. And the question has come up many times. I can also see how this is entirely useless in C and thus less than interesting. :)>> • Treated an ordinary integer for all operations except casts. > > Ok. What does this mean for add?Sure. %x = add intptr %a, %b is semantically identical to: %tmp1 = bitcast intptr %a to i8* %tmp2 = getelementptr i8* %tmp1, intptr %b %x = bitcast i8* %tmp2 to intptr Or, put another way, it's an i32 add on a 32-bit host and an i64 add on a 64-bit host.> This basically means that an intptr add cannot have usefully defined > semantics.How do you figure? I consider getelementptr to have usefully defined semantics, even though they are target-dependent. :)> Can you give an example of when it is useful?Sure, grep for getIntPtrType. But seriously, any situation where a front-end language would use size_t, ptrdiff_t, System.IntPtr, a value in a tagged object model, etc… it could use this type instead of conditionally selecting i32 or i64. This is not applicable to Java or C, which either have no such pointer-sized integer type, or have no portable representation. But it would be applicable to many other languages that do. The advantage provided is improved portability of bitcode and (very slightly) reduced complexity in front-end compilers. I don't consider these overwhelming advantages, given that bitcode is pretty non- portable as-is.>> • Can be the operand to ptrtoint, but not the result. >> • Can be the result of inttoptr, but not the operand. > > I assume these are backwards. intptr_t is an integer, not a pointer.They are not.>> • Can be bitcast to an actual pointer type. > > No. int <-> ptr is done with inttoptr and ptrtoint.No. These cast behaviors are unique semantics. Let me be more explicit. To be useful, an intptr type would need conversions to and from both fixed-width integer types and pointers. It's not necessary to overload existing casts. If we chose to, the casts applicable to pointers are closer matches than the casts applicable to integers, semantically. This is because they correctly reflect the potential data loss between the fixed-width integer type and the target-dependent type. == Pointer conversions =For pointer conversions, bitcast has the correct semantics. void *p;(void *) (ptrdiff_t) p; // This is a no-op on every platform. (void *) (int32_t) p; // This is target-dependent and could truncate. Pointer-to-intptr-to-pointer conversions can be condensed or eliminated in the same way that bitcasts between pointer types can. By contrast, inttoptr(ptrtoint) cannot be converted to a bitcast or noop because if the integer type is smaller than the pointer type, the conversion is lossy. == Conversions to fixed-width integer types = size_t ip; (uint16_t) ip; (uint32_t) ip; (uint32_t) ip; This has the same semantics as ptrtoint: Depending on the target, it could be an extend or a truncate or a noop. size_t ip; ssize_t sip; (uint64_t) ip; (int64_t) sip; However, signed intptr types do exist, so it's quite arguable that sign extension behavior should not be fixed as it is in ptrtoint and gep sign extension. For Ocaml, this might be beneficial, actually; a great many ptrtoint and inttoptr operations occur due to the tagged object model. Since these are lossy casts, it might be beneficial if they could be recognized target-independently as no-ops. == Conversions from fixed-width integer types = int16_t s, int32_t i, int64_t l; (size_t) s; (size_t) i; (size_t) l; Same issues as with conversions to fixed-width integer types: • inttoptr is a better match for the semantics. • But sign extension behavior should be controllable. On 2008-05-07, at 19:09, Chris Lattner wrote:> On Wed, 7 May 2008, Jonathan S. Shapiro wrote: >> > >> On a 32-bit platform, doesn't one want to use i32? > > Why? What is wrong with i64?Lots of things, actually. It doesn't have the proper semantics for arithmetic. As a concrete example, System.IntPtr.operator/ in .NET is quite distinct from either System.Int32.operator/ or System.Int64.operator/. Nor does it have the correct size in memory or as an argument, although converting to a pointer is a usable workaround in both cases. Likewise, alignment. Finally, computing 64-bit intermediate results on 32-bit platforms in order to preserve unwanted i64 semantics is quite undesirable. Consider this, a reasonable sort of thing to compute with an intptr: int f(void *p, void *q, int i, int j) { size_t ip = (size_t) p, iq = (size_t) q; return (iq - (ip + i)) / j; } If size_t is defined as int64_t and sizeof(void*) = 4, the divide must be computed in 64-bits (even though the high portion will be discarded) in order to preserve semantics in the uninteresting case that iq - (ip + i) > 0xFFFFFFFFU. Now imagine a target without a 64- bit divider. :) I guess each intermediate result could be cast back to a pointer and then back to an integer, but that seems unlovely. On 2008-05-07, at 08:25, Jonathan S. Shapiro wrote:> Meaning: on a machine having 32-bit registers, iWord is a type > treated by the IR as indistinguishable from i32. On a machine having > 64 bit registers, iWord is the a type treated by the IR as > indistinguishable from i64. Arithmetic works in the usual way. If > "iWord" is "i32" on your target, then it is acceptable in any > position and condition where "i32" would be acceptable in the IR > specification. In short, iWord can be substituted for the > appropriate integral type the instant you commit to a particular > target.This doesn't work, because for instance trunc i32 to i32 is an illegal instruction. I say this under the assumption that entirely preventing interoperation with other forms of integers at the IR level is too strict. — Gordon