Russell Wallace via llvm-dev
2016-Mar-26 11:58 UTC
[llvm-dev] Is pointer tagging defined behavior?
Dynamic languages commonly use an implementation technique where you take a pointer to an object (aligned on eight bytes so the lower three bits are zero), cast to intptr_t, change the lower three bits to a tag value indicating the type of the object, then later test the tag value, remove the tag, cast back to a pointer and dereference the pointer. As I understand it, the standard says this is implementation defined. Does LLVM consider it to be defined behavior? If so, is this still true if you write your own memory manager that allocates chunks of memory (rounded up to 8 bytes) from a big char array? (Assuming a mainstream platform such as x64 - I'm not talking about a scenario where there is an unusual CPU architecture.) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160326/38c28059/attachment.html>
Bruce Hoult via llvm-dev
2016-Mar-26 12:32 UTC
[llvm-dev] Is pointer tagging defined behavior?
On Sat, Mar 26, 2016 at 2:58 PM, Russell Wallace via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Dynamic languages commonly use an implementation technique where you take > a pointer to an object (aligned on eight bytes so the lower three bits are > zero), cast to intptr_t, change the lower three bits to a tag value > indicating the type of the object, then later test the tag value, remove > the tag, cast back to a pointer and dereference the pointer. >That doesn't sound exactly right. In the implementations I've seen, pointers always have tags with all 0 bits. So if the thing is actually a pointer you AND with 0x7 and find the result is zero then you just go ahead and use the original value as a pointer. If the tag bits are nonzero then you don't have a pointer at all, you have an integer or character or single float. However. It's not out of the question that you might use some tag values to indicate pointers to special kinds of objects that the runtime knows about, such as strings or arrays. Even so, the tagged pointer is still guaranteed to look like a pointer into somewhere in the first 8 bytes of the same object (objects are never smaller than 8 bytes), so that's perfectly well defined. The only possible objection is that the pointer will be misaligned. I believe you can find a discussion here in the last several months in which it was stated that misaligned pointers are always ok on any machine, provided that they are not dereferenced. In the case of a tagged pointer, the pointer is always aligned before being dereferenced, either by masking, or subtracting, or as an immediate offset (possibly combined with a field offset). -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160326/b389f02e/attachment.html>
Sanjoy Das via llvm-dev
2016-Mar-26 18:57 UTC
[llvm-dev] Is pointer tagging defined behavior?
On Sat, Mar 26, 2016 at 5:32 AM, Bruce Hoult via llvm-dev <llvm-dev at lists.llvm.org> wrote:> On Sat, Mar 26, 2016 at 2:58 PM, Russell Wallace via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> Dynamic languages commonly use an implementation technique where you take >> a pointer to an object (aligned on eight bytes so the lower three bits are >> zero), cast to intptr_t, change the lower three bits to a tag value >> indicating the type of the object, then later test the tag value, remove the >> tag, cast back to a pointer and dereference the pointer. > > > That doesn't sound exactly right. > > In the implementations I've seen, pointers always have tags with all 0 bits. > So if the thing is actually a pointer you AND with 0x7 and find the result > is zero then you just go ahead and use the original value as a pointer.V8 does (or used to do?) the opposite -- pointers are tagged in their low bits, and integers are not. Since most of the time you're accessing an offset within the pointer anyway, you didn't have to unmask the low bits out, but could instead change the offset instead (i.e. load from Ptr+15 instead of Ptr+16, since pointers have their lowest bit set (say)). Not tagging integers then makes most integer math cheaper e.g. addition of two unpacked integers can just be an add instruction, multiplication needs only one shift instead of two etc.. -- Sanjoy