On 15 February 2011 18:30, David A. Greene <greened at obbligato.org> wrote:> { int32, int8, { int8 } } > > Do I understand you correctly?Hi David, I'm actually looking for answers, not requesting features... ;) That structure would actually solve the problem for this specific case, but not for the general case. There are far too many exceptions to be worth make a special case for every one of them.> There are ways to do that without losing too much information. For > example, we render the above without using arrays at all: > > %I = type { i32, i8, i16 } > %J = type { %I, i8, i16 }Not if you follow the Itanium C++ ABI. In your example it works because { i8, i16 } pads nicely to 4 bytes, so there is no tail padding. If there is tail padding, the size of the Base class is different from the size of the Base class inside the Derived one. So, in my example "B : public A { char }": %A = type { i32, i8 } %B = type { %A, i8 } A has 8 bytes, as it should, but inside B it has only 5, so B's first field offset is 5, not 8. This is why we have to do: %B = { [5 x i8], i8, [3 x i8] } Adding the 3 bytes at the end is NOT the problem, but revoking the type (and it's natural alignment) from %A is.> There are some places where LLVM can do a better job, I think. > StructLayout should "just work" in more cases. But the kind of > generality you're talking about just isn't going to work very well in a > low-level IR. Nor should it. It's not what the IR is designed to do.My idea was that StructLayout could have more (optional) sources of information, to do a better job at figuring out sizes and offsets. We even thought about creating a Pass that will transform from natural structures, unions and bitfields to the horrible mess it results to when lowered, but that's avoiding the problem, not solving them. The IR was designed for type safety and we have far too many hacks in the type system that all C++ front-ends have to do. Maybe the original design wasn't followed so closely, or we need a new design document that clearly states what the goals are, because the way it is, it's not clear, and it's definitely not good for C++. -- cheers, --renato
Renato Golin <renato.golin at arm.com> writes:>> There are ways to do that without losing too much information. For >> example, we render the above without using arrays at all: >> >> %I = type { i32, i8, i16 } >> %J = type { %I, i8, i16 } > > Not if you follow the Itanium C++ ABI. > > In your example it works because { i8, i16 } pads nicely to 4 bytes,That's why we use { i8, i16 }. It's not by accident. We do adhere to the Itanium ABI.> so there is no tail padding. If there is tail padding, the size of the > Base class is different from the size of the Base class inside the > Derived one.Yes, that's true for non-POD types. In that case the base class really has two different representations which need to be two different types in LLVM. This is really ugly stuff. I fixed a whole slew of bugs around just this issue last year. :) In these cases you may have to resort to arrays or at least a bunch of consecutive i8s. I'm sure we do so though I would have to verify that.> So, in my example "B : public A { char }": > > %A = type { i32, i8 } > %B = type { %A, i8 } > > A has 8 bytes, as it should, but inside B it has only 5, so B's first > field offset is 5, not 8. This is why we have to do: > > %B = { [5 x i8], i8, [3 x i8] }Wait, that's not what you showed before: // CHECK: %struct.J = type { [8 x i8], i8, [3 x i8] } struct J : I { char c; }; %B = { [5 x i8], i8, [3 x i8] } is not correct for the Itanium ABI because tail padding cannot be overlaid in "POD for the purposes of layout" types (secs. 1.1 and 2.2). You had it right the first time. :)> Adding the 3 bytes at the end is NOT the problem, but revoking the > type (and it's natural alignment) from %A is.What do you mean by "revoking?" Do you mean inferring the type of %A within %B given %B's layout? Why do you need to get the alignment information anyway? The byte offsets are fixed by the ABI so in the end, bits is bits and addresses is addresses. Ugly casts may be necessary but nothing too drastic that will seriously prevent optimization.> My idea was that StructLayout could have more (optional) sources of > information, to do a better job at figuring out sizes and offsets. We > even thought about creating a Pass that will transform from natural > structures, unions and bitfields to the horrible mess it results to > when lowered, but that's avoiding the problem, not solving them.I'm still not exactly sure what problem you're trying to solve. Is it a correctness issue in your code generator? That said, I have thought along similar lines to make frontends easier to construct. I imagined metadata on struct types to indicate layout requirements but the current metadata system is not appropriate since it does not consider metadata to be semantically important for correctness. But even with that solution, the frontend would still need to add the metadata to struct types. There's really no way around the frontend needing to understand the ABI at some level. It has to convey the language semantics to LLVM, which is by design language-agnostic.> The IR was designed for type safety and we have far too many hacks in > the type system that all C++ front-ends have to do. Maybe the original > design wasn't followed so closely, or we need a new design document > that clearly states what the goals are, because the way it is, it's > not clear, and it's definitely not good for C++.I'm not sure the IR was designed for type safety. The original designers can speak to that. But any language that has things like inttoptr and ptrtoint is inherently not type-safe. The typing helps certain classes of analysis and transformation but in the case of C++ inheritence there's not a whole lot that applies. You need a higher-level IR to take care of that stuff. -Dave
On 15 February 2011 22:14, David A. Greene <greened at obbligato.org> wrote:> That's why we use { i8, i16 }. It's not by accident. We do adhere to > the Itanium ABI.Oh, I see. But the padding is not a problem, it could be [3 x i8] or { i8, i16 }, it doesn't matter, since it's never going to be used. But the [8 x i8] that was the original Base type is, and that's cast'd away to plain array (with alignment 1).> Yes, that's true for non-POD types. In that case the base class really > has two different representations which need to be two different types > in LLVM. This is really ugly stuff. I fixed a whole slew of bugs > around just this issue last year. :)As you can see, it's my turn now... ;)>> %B = { [5 x i8], i8, [3 x i8] } > > Wait, that's not what you showed before:My bad, different case that one...> What do you mean by "revoking?" Do you mean inferring the type of %A > within %B given %B's layout? Why do you need to get the alignment > information anyway? The byte offsets are fixed by the ABI so in the > end, bits is bits and addresses is addresses. Ugly casts may be > necessary but nothing too drastic that will seriously prevent > optimization.Ok, that was my first real question. I'm not too focused on optimizations, so I don't know how much that would actually stop them from happening. What I see is that variables get cast'd away to arrays, passed to routines and cast's back (maybe to some different type) to do some arithmetic operation. If LLVM can understand that, that's fine.> I'm still not exactly sure what problem you're trying to solve. Is it a > correctness issue in your code generator?No. Our front-end follows the standard and the ABI to the letter, the problem starts when I have to match all ABI decisions to LLVM types. I could convert every single structure into arrays and rely only on our front-end, but that would make the IR very hard to debug.> That said, I have thought along similar lines to make frontends easier > to construct. I imagined metadata on struct types to indicate layout > requirements but the current metadata system is not appropriate since it > does not consider metadata to be semantically important for correctness.That's a valid point. The OpenCL guys were also discussing metadata and I think it's fair to consider it a first class citizen. But I also understand perfectly well why it hasn't, so far. Because metadata is SO generic, if you consider it first-class, people will start abusing of it for little personal modifications and the IR will stop being standard and diverge. So maybe, sticking things to the IR without metadata is still the best course for extra information (like build attributes, target sub-features, ABIs), but that's a completely different discussion.> But even with that solution, the frontend would still need to add the > metadata to struct types. There's really no way around the frontend > needing to understand the ABI at some level. It has to convey the > language semantics to LLVM, which is by design language-agnostic.Oh, I totally agree. Some knowledge is best left for the front-end, but there are some things that could be passed onto StructLayout (such as POD/nonPOD) with a little bit of effort and make the IR much easier to understand and debug.> I'm not sure the IR was designed for type safety. The original > designers can speak to that. But any language that has things like > inttoptr and ptrtoint is inherently not type-safe. The typing helps > certain classes of analysis and transformation but in the case of C++ > inheritence there's not a whole lot that applies. You need a > higher-level IR to take care of that stuff.Fair point. And that's where the idea of having the passes came from. Maybe having a high-level IR with language/ABI/sub-target information that gets converted to the current low-level IR only at the last time is the best course of action... But that unleashes a completely different beast that I'm not willing to handle right now... ;) cheers, --renato