thr3ads.net - llvm dev - [LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version) [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Dan Gohman

2012-Sep-10 21:11 UTC

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

On Sep 10, 2012, at 11:29 AM, Chandler Carruth <chandlerc at google.com>
wrote:> 
> Hey Dan, I've talked with you about this in person and on IRC, but
I've not yet laid out my thoughts on a single place, so I'll put them
here.
> 
> TL;DR: I really like the idea of using metadata to tag each member of a
struct with TBAA, and re-using the TBAA metadata nodes we already have. I'm
not as fond of the description of padding in the metadata node.
> 
> Currently padding is really hard to represent because there is sometimes a
member of an LLVM struct which represents padding (packed structs and cases
where the frontend type requires more alignment than the datalayout string
specifies) and other times there isn't. The current proposal doesn't
entirely fix this because we still will need some way to annotate the members of
structs inserted purely for the purpose of padding.
This is not a problem in the current proposal, because it represents padding
completely independently from the LLVM struct type.
> Further, we have the problem that sometimes what is needed is a
representation of a "hole", that is a region which is neither padding
nor part of the struct itself. The canonical example is the tail padding of a
base class where the derived class's first member has low alignent
constraints.
I don't see how a hole in a base class which isn't being used by a
subclass is
different from padding, from the optimizer's perspective. The optimizer
doesn't know about class hierarchies (unless you're proposing something
much more significant).
> 
> I would propose that we solve these problems by a somewhat more invasive
change, but one which will significantly simplify both LLVM and frontends (at
least Clang, I suspect other frontends):
> 
> Remove non-packed struct types completely. Make LLVM structs represent a
contiguous sequence of bytes, explicitly partitioned into fields with particular
primitive types.
> 
> The idea would be to make all struct types be packed[1], and to represent
padding as explicit members of the struct. These could in turn have a
"padding" TBAA metadata node which would specify that member as being
padding. This would simplify the metadata representation because there would
*always* be a member to hang the padding tag off of. It would simplify struct
layout analysis in LLVM because the difference between alloc-size and type-size
would be irrelevant. It would dramatically simplify Clang's record layout
building, which already has to fall back to packed LLVM structs in many cases
because  normal structs produce offsets that conflict with the ABI's layout
requirements.
> 
> Essentially, LLVM is trying to simplify ABI layout by providing a
datalayout summary description of target alignments, and building structs with
that algorithm. But unless this *exactly* matches the ABI in question, it
actually makes the job harder because now we have to try, potentially fail, and
end up with all the code to use the packed mode anyways. My theory is that there
are too many ABIs in the world (and too weird rules within them) for us to ever
really get this right at the LLVM layer. Instead, we should force the frontend
to explicitly layout the bytes as it sees fit.
The current situation is not bleak. ABIs don't often vary that much in the
way they lay out structs and arrays, especially within a given architecture.

I actually think it's kind of nice that LLVM has this native concept of
"normal"
struct layout built in. It encourages people to avoid doing their own custom
struct
layout unless they have a good reason to.

I think your proposal would solve the original problem here, but it's not
obviously
better than the metadata approach. Bytes in memory in LLVM don't have
inherent types,
so optimizers can't rely on the type, or on any individual copy, to
understand the
lifetimes of data in storage allocated for padding. Consequently, a type just
becomes a way to attach information to a copy. And it's not clear that using
a type
is better than using metadata.

The metadata approach is nice because it separates the use cases into two
families.
On one side, copies with no metadata are simple to create, simple to understand,
and
simple to implement. On the other side, people who need more features can add
metadata to get there, and things are more complex all around, but that's
the price
of using advanced features. This is the shape of problem that metadata was
intended
to solve.

Dan

Chandler Carruth

2012-Sep-10 22:19 UTC

head link

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

On Mon, Sep 10, 2012 at 2:11 PM, Dan Gohman <gohman at apple.com> wrote:
> On Sep 10, 2012, at 11:29 AM, Chandler Carruth <chandlerc at
google.com>
> wrote:
> >
> > Hey Dan, I've talked with you about this in person and on IRC, but
I've
> not yet laid out my thoughts on a single place, so I'll put them here.
> >
> > TL;DR: I really like the idea of using metadata to tag each member of
a
> struct with TBAA, and re-using the TBAA metadata nodes we already have.
I'm
> not as fond of the description of padding in the metadata node.
> >
> > Currently padding is really hard to represent because there is
sometimes
> a member of an LLVM struct which represents padding (packed structs and
> cases where the frontend type requires more alignment than the datalayout
> string specifies) and other times there isn't. The current proposal
doesn't
> entirely fix this because we still will need some way to annotate the
> members of structs inserted purely for the purpose of padding.
>
> This is not a problem in the current proposal, because it represents
> padding
> completely independently from the LLVM struct type.
>
It is a complexity of your proposal that I find unfortunate. =/ I suspect
it will also add a small amount of complexity to the consumers of the
analysis based on my work on SROA, although its possible that pass is just
unique in the questions it needs to ask.

> > Further, we have the problem that sometimes what is needed is a
> representation of a "hole", that is a region which is neither
padding nor
> part of the struct itself. The canonical example is the tail padding of a
> base class where the derived class's first member has low alignent
> constraints.
>
> I don't see how a hole in a base class which isn't being used by a
> subclass is
> different from padding, from the optimizer's perspective. The optimizer
> doesn't know about class hierarchies (unless you're proposing
something
> much more significant).
>
Because there is real data packed into that space. Because the LLVM type
system has no way to represent this, Clang turns such base classes into an
opaque array of i8 with the appropriate size, and *all* structural
information about this base class is lost. I'm trying to propose something
that is sufficiently powerful to no longer require such hacks in the
frontend.

>
> >
> > I would propose that we solve these problems by a somewhat more
invasive
> change, but one which will significantly simplify both LLVM and frontends
> (at least Clang, I suspect other frontends):
> >
> > Remove non-packed struct types completely. Make LLVM structs represent
a
> contiguous sequence of bytes, explicitly partitioned into fields with
> particular primitive types.
> >
> > The idea would be to make all struct types be packed[1], and to
> represent padding as explicit members of the struct. These could in turn
> have a "padding" TBAA metadata node which would specify that
member as
> being padding. This would simplify the metadata representation because
> there would *always* be a member to hang the padding tag off of. It would
> simplify struct layout analysis in LLVM because the difference between
> alloc-size and type-size would be irrelevant. It would dramatically
> simplify Clang's record layout building, which already has to fall back
to
> packed LLVM structs in many cases because  normal structs produce offsets
> that conflict with the ABI's layout requirements.
> >
> > Essentially, LLVM is trying to simplify ABI layout by providing a
> datalayout summary description of target alignments, and building structs
> with that algorithm. But unless this *exactly* matches the ABI in question,
> it actually makes the job harder because now we have to try, potentially
> fail, and end up with all the code to use the packed mode anyways. My
> theory is that there are too many ABIs in the world (and too weird rules
> within them) for us to ever really get this right at the LLVM layer.
> Instead, we should force the frontend to explicitly layout the bytes as it
> sees fit.
>
> The current situation is not bleak. ABIs don't often vary that much in
the
> way they lay out structs and arrays, especially within a given
> architecture.
>
Well, clearly we disagree here. ;] Have you read Clang's record layout
building code recently? Or tried to change it? It is a nightmare.

LLVM simply doesn't provide the tools to reasonably express things like
packing of low-alignment members into padding with nested sub-aggregates,
or bitfield packing.

> I actually think it's kind of nice that LLVM has this native concept of
> "normal"
> struct layout built in. It encourages people to avoid doing their own
> custom struct
> layout unless they have a good reason to.
>
But every frontend ends up with a reason to. And they all do it
differently, and their lives are made harder and more complex by trying to
utilize LLVM's struct layout when it works, but roll their own when it
doesn't. Having looked extensively at Clang's recently, I am confident
that
it would be much simpler, and produce dramatically more clear LLVM types
after my proposal. That argues for LLVM's system not being good enough, and
my contention is that there simply is not any language-neutral struct
layout system other than: the bytes in memory. So let's go back to
representing a sequence of bytes in memory as an aggregation of primitive
types.

>
> I think your proposal would solve the original problem here, but it's
not
> obviously
> better than the metadata approach.

I think it will be conceptually cleaner, and I think it will allow the
frontend to expose more structural information when currently it is unable
to due to that structural information not fitting into LLVM's struct layout
model.

> Bytes in memory in LLVM don't have inherent types,
> so optimizers can't rely on the type, or on any individual copy, to
> understand the
> lifetimes of data in storage allocated for padding. Consequently, a type
> just
> becomes a way to attach information to a copy. And it's not clear that
> using a type
> is better than using metadata.
>
Aggregate types are most useful when applied to bytes of memory as a means
of partitioning accesses to those bytes of memory into operations on
distinct primitives, and simplifying the formation of byte offset pointers
to load and the properly typed loaded values in the IR. (IE, avoiding
complex GEPs and/or bitcasts before every load).

I don't think metadata helps with that at all.

> The metadata approach is nice because it separates the use cases into two
> families.
> On one side, copies with no metadata are simple to create, simple to
> understand, and
> simple to implement. On the other side, people who need more features can
> add
> metadata to get there, and things are more complex all around, but
that's
> the price
> of using advanced features. This is the shape of problem that metadata was
> intended
> to solve.

Nothing I propose would change copies though? I'm just trying to make the
struct type able to have members for each actual primitive member that is
accessed within a range of memory...

And I'm still suggesting to use metadata to differentiate between padding
and non-padding -- i *don't* think that belongs in the type system. I think
only the *layout* belongs in the type system because that is what powers
GEPs, the primary means of computing offsets.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120910/5e472be4/attachment.html>

Dan Gohman

2012-Sep-11 00:21 UTC

head link

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

On Sep 10, 2012, at 3:19 PM, Chandler Carruth <chandlerc at google.com>
wrote:
> On Mon, Sep 10, 2012 at 2:11 PM, Dan Gohman <gohman at apple.com>
wrote:
> On Sep 10, 2012, at 11:29 AM, Chandler Carruth <chandlerc at
google.com> wrote:
> >
> > Hey Dan, I've talked with you about this in person and on IRC, but
I've not yet laid out my thoughts on a single place, so I'll put them
here.
> >
> > TL;DR: I really like the idea of using metadata to tag each member of
a struct with TBAA, and re-using the TBAA metadata nodes we already have.
I'm not as fond of the description of padding in the metadata node.
> >
> > Currently padding is really hard to represent because there is
sometimes a member of an LLVM struct which represents padding (packed structs
and cases where the frontend type requires more alignment than the datalayout
string specifies) and other times there isn't. The current proposal
doesn't entirely fix this because we still will need some way to annotate
the members of structs inserted purely for the purpose of padding.
> 
> This is not a problem in the current proposal, because it represents
padding
> completely independently from the LLVM struct type.
> 
> It is a complexity of your proposal that I find unfortunate. =/ I suspect
it will also add a small amount of complexity to the consumers of the analysis
based on my work on SROA, although its possible that pass is just unique in the
questions it needs to ask.
As we discussed on IRC, it seems that the best thing to do here would be to
start by rewriting clang's struct layout code to use LLVM's packed
struct
types instead of falling back to i8 arrays or other fall-back types. If that
works well, it sounds like it would be an improvement regardless, and it
would be a much better vantage point from which to consider actual changes
to the LLVM IR type system.

For now, I will proceed with my most recent metadata proposal. It's just
metadata, so if something better comes along, we can drop it.

Dan

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Sep 2012 - [LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

Seemingly Similar Threads