thr3ads.net - llvm dev - [LLVMdev] Structure Types and ABI sizes [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2011-Feb-15 22:53 UTC

[LLVMdev] Structure Types and ABI sizes

On 15 February 2011 22:14, David A. Greene <greened at obbligato.org>
wrote:> That's why we use { i8, i16 }.  It's not by accident.  We do adhere
to
> the Itanium ABI.
Oh, I see. But the padding is not a problem, it could be [3 x i8] or {
i8, i16 }, it doesn't matter, since it's never going to be used.

But the [8 x i8] that was the original Base type is, and that's cast'd
away to plain array (with alignment 1).

> Yes, that's true for non-POD types.  In that case the base class really
> has two different representations which need to be two different types
> in LLVM.  This is really ugly stuff.  I fixed a whole slew of bugs
> around just this issue last year.  :)
As you can see, it's my turn now... ;)

>> %B = { [5 x i8], i8, [3 x i8] }
>
> Wait, that's not what you showed before:
My bad, different case that one...

> What do you mean by "revoking?"  Do you mean inferring the type
of %A
> within %B given %B's layout?  Why do you need to get the alignment
> information anyway?  The byte offsets are fixed by the ABI so in the
> end, bits is bits and addresses is addresses.  Ugly casts may be
> necessary but nothing too drastic that will seriously prevent
> optimization.
Ok, that was my first real question. I'm not too focused on
optimizations, so I don't know how much that would actually stop them
from happening.

What I see is that variables get cast'd away to arrays, passed to
routines and cast's back (maybe to some different type) to do some
arithmetic operation. If LLVM can understand that, that's fine.

> I'm still not exactly sure what problem you're trying to solve.  Is
it a
> correctness issue in your code generator?
No. Our front-end follows the standard and the ABI to the letter, the
problem starts when I have to match all ABI decisions to LLVM types. I
could convert every single structure into arrays and rely only on our
front-end, but that would make the IR very hard to debug.

> That said, I have thought along similar lines to make frontends easier
> to construct.  I imagined metadata on struct types to indicate layout
> requirements but the current metadata system is not appropriate since it
> does not consider metadata to be semantically important for correctness.
That's a valid point. The OpenCL guys were also discussing metadata
and I think it's fair to consider it a first class citizen.

But I also understand perfectly well why it hasn't, so far. Because
metadata is SO generic, if you consider it first-class, people will
start abusing of it for little personal modifications and the IR will
stop being standard and diverge.

So maybe, sticking things to the IR without metadata is still the best
course for extra information (like build attributes, target
sub-features, ABIs), but that's a completely different discussion.

> But even with that solution, the frontend would still need to add the
> metadata to struct types.  There's really no way around the frontend
> needing to understand the ABI at some level.  It has to convey the
> language semantics to LLVM, which is by design language-agnostic.
Oh, I totally agree. Some knowledge is best left for the front-end,
but there are some things that could be passed onto StructLayout (such
as POD/nonPOD) with a little bit of effort and make the IR much easier
to understand and debug.

> I'm not sure the IR was designed for type safety.  The original
> designers can speak to that.  But any language that has things like
> inttoptr and ptrtoint is inherently not type-safe.  The typing helps
> certain classes of analysis and transformation but in the case of C++
> inheritence there's not a whole lot that applies.  You need a
> higher-level IR to take care of that stuff.
Fair point. And that's where the idea of having the passes came from.
Maybe having a high-level IR with language/ABI/sub-target information
that gets converted to the current low-level IR only at the last time
is the best course of action...

But that unleashes a completely different beast that I'm not willing
to handle right now... ;)

cheers,
--renato

David A. Greene

2011-Feb-15 23:21 UTC

head link

[LLVMdev] Structure Types and ABI sizes

Renato Golin <renato.golin at arm.com> writes:
> On 15 February 2011 22:14, David A. Greene <greened at obbligato.org>
wrote:
>> That's why we use { i8, i16 }.  It's not by accident.  We do
adhere to
>> the Itanium ABI.
>
> Oh, I see. But the padding is not a problem, it could be [3 x i8] or {
> i8, i16 }, it doesn't matter, since it's never going to be used.
Right.  We do it for aesthetics.  :)
> But the [8 x i8] that was the original Base type is, and that's
cast'd
> away to plain array (with alignment 1).
Yep.  But again, I don't think you're losing much.
>> Yes, that's true for non-POD types.  In that case the base class
really
>> has two different representations which need to be two different types
>> in LLVM.  This is really ugly stuff.  I fixed a whole slew of bugs
>> around just this issue last year.  :)
>
> As you can see, it's my turn now... ;)
Good luck!  :-/
>>> %B = { [5 x i8], i8, [3 x i8] }
>>
>> Wait, that's not what you showed before:
>
> My bad, different case that one...
Whew!  Thought I might have another nasty bug.  :)
> What I see is that variables get cast'd away to arrays, passed to
> routines and cast's back (maybe to some different type) to do some
> arithmetic operation. If LLVM can understand that, that's fine.
It'll understand it but make sure it's casting to the correct types and
using the correct offsets given the base types.  We actually generate a
bunch of ugly ptrtoint + arithmetic + inttptr + GEP stuff.  I wrote an
instcombine pass to fold that down to a single GEP where possible.  I
don't know if that's valid in general (due to inbounds and other GEP
semantics) but it is for the cases we use it because we "know" where
it
came from.

LLVM understands GEP better than ptrtoint/inttoptr, which is why we make
the transform.  Whether you can do this is probably dependent on how you
lower things.
>> I'm still not exactly sure what problem you're trying to solve.
 Is it a
>> correctness issue in your code generator?
>
> No. Our front-end follows the standard and the ABI to the letter, the
> problem starts when I have to match all ABI decisions to LLVM types. I
> could convert every single structure into arrays and rely only on our
> front-end, but that would make the IR very hard to debug.
That's true and is the primary reason we try to maintain as much of the
original structure as possible (i.e. not using arrays).  We used to emit
more array stuff but I actually put a fair amount of effort to improve
this in our frontend.  But we always have to include explicit padding at
some point and that means odd-looking members from time to time.  I
haven't found it too terribly burdensome.
>> That said, I have thought along similar lines to make frontends easier
>> to construct.  I imagined metadata on struct types to indicate layout
>> requirements but the current metadata system is not appropriate since
it
>> does not consider metadata to be semantically important for
correctness.
>
> That's a valid point. The OpenCL guys were also discussing metadata
> and I think it's fair to consider it a first class citizen.
I would like to see that too but it might be an uphill battle.  :(
> But I also understand perfectly well why it hasn't, so far. Because
> metadata is SO generic, if you consider it first-class, people will
> start abusing of it for little personal modifications and the IR will
> stop being standard and diverge.
Absolutely.  Even so, I would consider debug info to be a semantic
correctness issue.  If the compiler can't produce debuggable code, it is
useless.  So we at least have some precendent for metadata being
"important."  The key with the debug info is that it's documented.
It's
its own little language, really.
> Oh, I totally agree. Some knowledge is best left for the front-end,
> but there are some things that could be passed onto StructLayout (such
> as POD/nonPOD) with a little bit of effort and make the IR much easier
> to understand and debug.
I completely agree.
> Fair point. And that's where the idea of having the passes came from.
> Maybe having a high-level IR with language/ABI/sub-target information
> that gets converted to the current low-level IR only at the last time
> is the best course of action...
>
> But that unleashes a completely different beast that I'm not willing
> to handle right now... ;)
Oh, come on!  Where's your sense of adventure?  :)

                          -Dave

Renato Golin

2011-Feb-16 00:47 UTC

head link

[LLVMdev] Structure Types and ABI sizes

On 15 February 2011 23:21, David A. Greene <greened at obbligato.org>
wrote:> LLVM understands GEP better than ptrtoint/inttoptr, which is why we make
> the transform.  Whether you can do this is probably dependent on how you
> lower things.
We're using only GEPs from start, so we might not hit all the problems you
did.

I was actually surprised that the change to support those ABI
constraints wasn't too big...

>> That's a valid point. The OpenCL guys were also discussing metadata
>> and I think it's fair to consider it a first class citizen.
>
> I would like to see that too but it might be an uphill battle.  :(
Maybe even steeper than introducing new fields to the IR...

>> But that unleashes a completely different beast that I'm not
willing
>> to handle right now... ;)
>
> Oh, come on!  Where's your sense of adventure?  :)
:D

One thing at a time... It's in my TODO list to prepare a document with
all the hacks one has to do to make C++ be translated to IR.

As far as I know, llvm-gcc and clang had the same problems and solved
the same hacky way, so there might be other people that would agree on
some more radical changes. But we need more evidence to start with...
One thing at a time... ;)

-- 
cheers,
--renato

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Feb 2011 - [LLVMdev] Structure Types and ABI sizes

[LLVMdev] Structure Types and ABI sizes

[LLVMdev] Structure Types and ABI sizes

[LLVMdev] Structure Types and ABI sizes

Maybe Matching Threads