thr3ads.net - llvm dev - [LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version) [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Dan Gohman

2012-Sep-06 23:24 UTC

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

Hello,

Persuant to feedback,

http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052927.html

here is a new proposal for detailed struct assignment information.

Here's the example showing the basic problem:

struct bar {
 char x;
 float y;
 double z;
};
void copy_bar(struct bar *a, struct bar *b) {
 *a = *b;
}

The solution I now propose here is to have front-ends describe the copy
using metadata. For example:

 call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1 false),
!tbaa.struct !4
 […]
 !0 = metadata !{metadata !"Simple C/C++ TBAA"}
 !1 = metadata !{metadata !"omnipotent char", metadata !0}
 !2 = metadata !{metadata !"float", metadata !1}
 !3 = metadata !{metadata !"double", metadata !1}
 !4 = metadata !{metadata !5, i64 3, metadata !6, metadata !7}
 !5 = metadata !{i64 1, metadata !1}
 !6 = metadata !{i64 4, metadata !2}
 !7 = metadata !{i64 8, metadata !3}

Metadata nodes !0 through !3 are regular TBAA nodes as are already in use.

Metadata node !4 here is a top-level description of the memcpy. It holds a
list of virtual members. An integer represents a padding field of that
size. A metadata tuple represents an actual data field. The tuple's members
are an integer size and a TBAA tag for the field.

Comments and questions are welcome.

Dan

Duncan Sands

2012-Sep-07 11:01 UTC

head link

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

Hi Dan, another approach is to exploit the fact that intrinsics can have
metadata parameters.  We could then have a new "structmemcpy"
intrinsic
which would take a variable number of parameters (yes, a variable number
of parameters is problematic, see below), basically a list describing
each field, something like this:

   void llvm.structmemcpy (
     i8* dest,
     i8* src,
     i64 offset0, i64 size0, i32 align0, i1 volatile0, metadata tbaa0, ; field0
     i64 offset1, i64 size1, i32 align1, i1 volatile1, metadata tbaa1, ; field1
...
     i64 offsetN, i64 sizeN, i32 alignN, i1 volatileN, metadata tbaaN  ; fieldN
   )

The first "field" to be copied would be at bytes [offset0,
offset0+size0).
The second field at [offset1, offset1+size1), though it might be better
to have offsets be from the end of the previous field, in which case it
would be: [offset0+size0+ offset1, offset0+size0+ offset1+size1).

The memory in [0, offset0) would thus be a gap ("padding"), and
likewise
between the end of each field and the start of the next.  There is a small
hassle expressing a gap at the end of the struct, but this can be overcome
by the trick of placing a fake zero size field after the last byte in
the struct.

What I like about this is that it puts the vital information directly
into the intrinsic in a structured way, rather than having it be "on
the side" in metadata.

The big problem of course is that we aren't really set up to have
intrinsics for which different instances can have a different number
of parameters.  This can be handled to some extent by using multiple
declarations, sticking the number of parameters onto the name like
is done for intrinsics which can have different types, like:
  @llvm.struct.memcpy.p0i8.p0i8.i64.8 <- this one takes 8 parameters,
or maybe 8 field descriptions, or something like that.
But there may be many issues here.

Ciao, Duncan.
> Persuant to feedback,
>
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052927.html
>
> here is a new proposal for detailed struct assignment information.
>
> Here's the example showing the basic problem:
>
> struct bar {
>   char x;
>   float y;
>   double z;
> };
> void copy_bar(struct bar *a, struct bar *b) {
>   *a = *b;
> }
>
> The solution I now propose here is to have front-ends describe the copy
> using metadata. For example:
>
>   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1
false), !tbaa.struct !4
>   […]
>   !0 = metadata !{metadata !"Simple C/C++ TBAA"}
>   !1 = metadata !{metadata !"omnipotent char", metadata !0}
>   !2 = metadata !{metadata !"float", metadata !1}
>   !3 = metadata !{metadata !"double", metadata !1}
>   !4 = metadata !{metadata !5, i64 3, metadata !6, metadata !7}
>   !5 = metadata !{i64 1, metadata !1}
>   !6 = metadata !{i64 4, metadata !2}
>   !7 = metadata !{i64 8, metadata !3}
>
> Metadata nodes !0 through !3 are regular TBAA nodes as are already in use.
>
> Metadata node !4 here is a top-level description of the memcpy. It holds a
> list of virtual members. An integer represents a padding field of that
> size. A metadata tuple represents an actual data field. The tuple's
members
> are an integer size and a TBAA tag for the field.
>
> Comments and questions are welcome.
>
> Dan
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Chris Lattner

2012-Sep-07 17:28 UTC

head link

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

On Sep 6, 2012, at 4:24 PM, Dan Gohman <gohman at apple.com> wrote:
> Hello,
> 
> Persuant to feedback,
> 
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052927.html
> 
> here is a new proposal for detailed struct assignment information.
Thanks Dan,
> Here's the example showing the basic problem:
> 
> struct bar {
> char x;
> float y;
> double z;
> };
> void copy_bar(struct bar *a, struct bar *b) {
> *a = *b;
> }
> 
> The solution I now propose here is to have front-ends describe the copy
> using metadata. For example:
> 
> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1
false), !tbaa.struct !4
> […]
> !0 = metadata !{metadata !"Simple C/C++ TBAA"}
> !1 = metadata !{metadata !"omnipotent char", metadata !0}
> !2 = metadata !{metadata !"float", metadata !1}
> !3 = metadata !{metadata !"double", metadata !1}
> !4 = metadata !{metadata !5, i64 3, metadata !6, metadata !7}
> !5 = metadata !{i64 1, metadata !1}
> !6 = metadata !{i64 4, metadata !2}
> !7 = metadata !{i64 8, metadata !3}
> 
> Metadata nodes !0 through !3 are regular TBAA nodes as are already in use.
> 
> Metadata node !4 here is a top-level description of the memcpy. It holds a
> list of virtual members. An integer represents a padding field of that
> size. A metadata tuple represents an actual data field. The tuple's
members
> are an integer size and a TBAA tag for the field.
How about just making "!4" be a list of triples, where the first two
elements are integer offset/size, and the third entry is a TBAA pointer, or null
for padding?  This would give us easier to read (and pretty print) llvm IR and
may be more memory efficient as well.

-Chris

Dan Gohman

2012-Sep-07 20:14 UTC

head link

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

On Sep 7, 2012, at 10:28 AM, Chris Lattner <clattner at apple.com>
wrote:> 
> On Sep 6, 2012, at 4:24 PM, Dan Gohman <gohman at apple.com> wrote:
> 
>> Here's the example showing the basic problem:
>> 
>> struct bar {
>> char x;
>> float y;
>> double z;
>> };
>> void copy_bar(struct bar *a, struct bar *b) {
>> *a = *b;
>> }
>> 
>> The solution I now propose here is to have front-ends describe the copy
>> using metadata. For example:
>> 
>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1
false), !tbaa.struct !4
>> […]
>> !0 = metadata !{metadata !"Simple C/C++ TBAA"}
>> !1 = metadata !{metadata !"omnipotent char", metadata !0}
>> !2 = metadata !{metadata !"float", metadata !1}
>> !3 = metadata !{metadata !"double", metadata !1}
>> !4 = metadata !{metadata !5, i64 3, metadata !6, metadata !7}
>> !5 = metadata !{i64 1, metadata !1}
>> !6 = metadata !{i64 4, metadata !2}
>> !7 = metadata !{i64 8, metadata !3}
>> 
>> Metadata nodes !0 through !3 are regular TBAA nodes as are already in
use.
>> 
>> Metadata node !4 here is a top-level description of the memcpy. It
holds a
>> list of virtual members. An integer represents a padding field of that
>> size. A metadata tuple represents an actual data field. The tuple's
members
>> are an integer size and a TBAA tag for the field.
> 
> How about just making "!4" be a list of triples, where the first
two elements are integer offset/size, and the third entry is a TBAA pointer, or
null for padding?  This would give us easier to read (and pretty print) llvm IR
and may be more memory efficient as well.

So, like this?

!4 = metadata !{i64 0, i64 1, metadata !1, i64 1, 64 3, i8* null, i64 4, i64 4,
metadata !2, i64 8, i64 8, metadata !3}

Dan

Chandler Carruth

2012-Sep-10 18:29 UTC

head link

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

On Thu, Sep 6, 2012 at 4:24 PM, Dan Gohman <gohman at apple.com> wrote:
> Hello,
>
> Persuant to feedback,
>
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052927.html
>
> here is a new proposal for detailed struct assignment information.
>
> Here's the example showing the basic problem:
>
> struct bar {
>  char x;
>  float y;
>  double z;
> };
> void copy_bar(struct bar *a, struct bar *b) {
>  *a = *b;
> }
>
> The solution I now propose here is to have front-ends describe the copy
> using metadata. For example:
>
>  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1
> false), !tbaa.struct !4
>  […]
>  !0 = metadata !{metadata !"Simple C/C++ TBAA"}
>  !1 = metadata !{metadata !"omnipotent char", metadata !0}
>  !2 = metadata !{metadata !"float", metadata !1}
>  !3 = metadata !{metadata !"double", metadata !1}
>  !4 = metadata !{metadata !5, i64 3, metadata !6, metadata !7}
>  !5 = metadata !{i64 1, metadata !1}
>  !6 = metadata !{i64 4, metadata !2}
>  !7 = metadata !{i64 8, metadata !3}
>
> Metadata nodes !0 through !3 are regular TBAA nodes as are already in use.
>
> Metadata node !4 here is a top-level description of the memcpy. It holds a
> list of virtual members. An integer represents a padding field of that
> size. A metadata tuple represents an actual data field. The tuple's
members
> are an integer size and a TBAA tag for the field.
>
Hey Dan, I've talked with you about this in person and on IRC, but I've
not
yet laid out my thoughts on a single place, so I'll put them here.

TL;DR: I really like the idea of using metadata to tag each member of a
struct with TBAA, and re-using the TBAA metadata nodes we already have. I'm
not as fond of the description of padding in the metadata node.

Currently padding is really hard to represent because there is sometimes a
member of an LLVM struct which represents padding (packed structs and cases
where the frontend type requires more alignment than the datalayout string
specifies) and other times there isn't. The current proposal doesn't
entirely fix this because we still will need some way to annotate the
members of structs inserted purely for the purpose of padding.

Further, we have the problem that sometimes what is needed is a
representation of a "hole", that is a region which is neither padding
nor
part of the struct itself. The canonical example is the tail padding of a
base class where the derived class's first member has low alignent
constraints.

I would propose that we solve these problems by a somewhat more invasive
change, but one which will significantly simplify both LLVM and frontends
(at least Clang, I suspect other frontends):

Remove non-packed struct types completely. Make LLVM structs represent a
contiguous sequence of bytes, explicitly partitioned into fields with
particular primitive types.

The idea would be to make all struct types be packed[1], and to represent
padding as explicit members of the struct. These could in turn have a
"padding" TBAA metadata node which would specify that member as being
padding. This would simplify the metadata representation because there
would *always* be a member to hang the padding tag off of. It would
simplify struct layout analysis in LLVM because the difference between
alloc-size and type-size would be irrelevant. It would dramatically
simplify Clang's record layout building, which already has to fall back to
packed LLVM structs in many cases because  normal structs produce offsets
that conflict with the ABI's layout requirements.

Essentially, LLVM is trying to simplify ABI layout by providing a
datalayout summary description of target alignments, and building structs
with that algorithm. But unless this *exactly* matches the ABI in question,
it actually makes the job harder because now we have to try, potentially
fail, and end up with all the code to use the packed mode anyways. My
theory is that there are too many ABIs in the world (and too weird rules
within them) for us to ever really get this right at the LLVM layer.
Instead, we should force the frontend to explicitly layout the bytes as it
sees fit.

Ok, now to the "how does this all work" part:

- No more alignment needed in the datalayout string[2].
- Other places where today we have optional alignment, if omitted the
alignment will be '1' instead of '0'. This will essentially
require
alignment to be specified in more places.
- Array elements are packed[3]. If the elements of an array must be padded
out to a particular alignment, the array should be of a struct containing
the element and a padding member of the appropriate size. This will allow
us to tag that member with metadata as padding as well.
- Auto-upgrade uses old datalayout with alignments to synthesize necessary
align specifiers on instructions etc.
- TBAA metadata will identify members of a struct type which are padding
and hold no interesting data.

This would at least remove one dimension of complexity from Clang's record
layout building by removing the need to try non-packed structs and fallback
to packed. It should even allow us to retain the struct type for a base
class with derived class members packed into previously "padding"
bytes at
the end. Currently, even the current proposal doesn't seem to support
retaining the llvm struct type for the base class in this case, or easily
annotating the fields of that base class with TBAA information.

Thoughts?
-Chandler

Some points of clarification:
[1]: I say "packed" repeatedly but never "bit packed" or
"byte packed". My
inclination is to make the rule within LLVM "byte packed" and fix the
idea
of a byte as an i8. I think its hopeless to support non-8-bit-bytes in
LLVM, and we should just move past that illusion. However, it would
certainly be possible to make this be "bit packed" and add bit padding
with
appropriate metadata. I might even like that if it gives us a cleaner
semantic model, or helps tag certain bits as undef.

[2]: We could potentially keep some of this information here if there are
other parts of LLVM that use it... I'm not deeply familiar with all the
consumers of the datalayout string.

[3]: I'm torn on this one. It might be nice to have arrays get an optional
alignment that establishes the stride of the elements, particularly if we
want the semantics to be that between array elements we have a "hole"
rather than padding. However, I'm not aware of any place where this is a
practical or important constraint, and it seems to add complexity that we
don't need. If needed, it could always be added later.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120910/7ac0a1f7/attachment.html>

Krzysztof Parzyszek

2012-Sep-10 18:54 UTC

head link

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

On 9/10/2012 1:29 PM, Chandler Carruth wrote:>
> The idea would be to make all struct types be packed[1], and to
> represent padding as explicit members of the struct.
> > [...]
 >> Thoughts?
Frankly, I like this idea a lot.  I have one comment though: the data 
type used for the padding fields would need to always be the same, or 
else we run into the issue of having two types that are equal with 
respect to the non-padded data, but differ in the types (but not 
lengths) of the padding.  Those should be considered identical.

It brings my attention back to this:

On 8/31/2012 3:15 AM, Renato Golin wrote:> On 30 August 2012 21:30, 
Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:
 >> I guess I'm late to the party, but another possibility would be to
model
 >> structure types as lists of members with their offsets from the 
beginning of
 >> the parent aggregate.  This would require extensive changes to LLVM, 
so I'm
 >> not sure if it's an option.
 >
 > This has been proposed already, and could also be used by bitfields,
 > but the changes were too many and was not accepted.
 >
 > I think the biggest reason against was that it was strongly based on
 > C++ semantics and not generic enough to be considered IR material.


This would simply omit any non-member information from a type, and 
provide explicit placement (offset) of the members.  What were the 
specific concerns regarding this idea in the past?


-K

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Peter Cooper

2012-Sep-10 18:56 UTC

head link

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

Hi Chandler

I also brainstormed a little with Dan on this and one idea we had was to add a
new LLVM type for the hole/padding.  This would be a type for which it is legal
to load/store/move around as part of a larger move operation, but is otherwise
unusable in LLVM.  Dan named it x32 for a 32-bit type for example.

I think this would fit well within what you are proposing as then it is easy to
see the holes/padding without even needing metadata.  The TBAA metadata would
still be needed, but now you could simply have a list of tbaa nodes, where the
index in the list corresponds to the field, whether a real field or one of the
'x' ones.

Pete


On Sep 10, 2012, at 11:29 AM, Chandler Carruth <chandlerc at google.com>
wrote:
> On Thu, Sep 6, 2012 at 4:24 PM, Dan Gohman <gohman at apple.com>
wrote:
> Hello,
> 
> Persuant to feedback,
> 
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052927.html
> 
> here is a new proposal for detailed struct assignment information.
> 
> Here's the example showing the basic problem:
> 
> struct bar {
>  char x;
>  float y;
>  double z;
> };
> void copy_bar(struct bar *a, struct bar *b) {
>  *a = *b;
> }
> 
> The solution I now propose here is to have front-ends describe the copy
> using metadata. For example:
> 
>  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1
false), !tbaa.struct !4
>  […]
>  !0 = metadata !{metadata !"Simple C/C++ TBAA"}
>  !1 = metadata !{metadata !"omnipotent char", metadata !0}
>  !2 = metadata !{metadata !"float", metadata !1}
>  !3 = metadata !{metadata !"double", metadata !1}
>  !4 = metadata !{metadata !5, i64 3, metadata !6, metadata !7}
>  !5 = metadata !{i64 1, metadata !1}
>  !6 = metadata !{i64 4, metadata !2}
>  !7 = metadata !{i64 8, metadata !3}
> 
> Metadata nodes !0 through !3 are regular TBAA nodes as are already in use.
> 
> Metadata node !4 here is a top-level description of the memcpy. It holds a
> list of virtual members. An integer represents a padding field of that
> size. A metadata tuple represents an actual data field. The tuple's
members
> are an integer size and a TBAA tag for the field.
> 
> Hey Dan, I've talked with you about this in person and on IRC, but
I've not yet laid out my thoughts on a single place, so I'll put them
here.
> 
> TL;DR: I really like the idea of using metadata to tag each member of a
struct with TBAA, and re-using the TBAA metadata nodes we already have. I'm
not as fond of the description of padding in the metadata node.
> 
> Currently padding is really hard to represent because there is sometimes a
member of an LLVM struct which represents padding (packed structs and cases
where the frontend type requires more alignment than the datalayout string
specifies) and other times there isn't. The current proposal doesn't
entirely fix this because we still will need some way to annotate the members of
structs inserted purely for the purpose of padding.
> 
> Further, we have the problem that sometimes what is needed is a
representation of a "hole", that is a region which is neither padding
nor part of the struct itself. The canonical example is the tail padding of a
base class where the derived class's first member has low alignent
constraints.
> 
> I would propose that we solve these problems by a somewhat more invasive
change, but one which will significantly simplify both LLVM and frontends (at
least Clang, I suspect other frontends):
> 
> Remove non-packed struct types completely. Make LLVM structs represent a
contiguous sequence of bytes, explicitly partitioned into fields with particular
primitive types.
> 
> The idea would be to make all struct types be packed[1], and to represent
padding as explicit members of the struct. These could in turn have a
"padding" TBAA metadata node which would specify that member as being
padding. This would simplify the metadata representation because there would
*always* be a member to hang the padding tag off of. It would simplify struct
layout analysis in LLVM because the difference between alloc-size and type-size
would be irrelevant. It would dramatically simplify Clang's record layout
building, which already has to fall back to packed LLVM structs in many cases
because  normal structs produce offsets that conflict with the ABI's layout
requirements.
> 
> Essentially, LLVM is trying to simplify ABI layout by providing a
datalayout summary description of target alignments, and building structs with
that algorithm. But unless this *exactly* matches the ABI in question, it
actually makes the job harder because now we have to try, potentially fail, and
end up with all the code to use the packed mode anyways. My theory is that there
are too many ABIs in the world (and too weird rules within them) for us to ever
really get this right at the LLVM layer. Instead, we should force the frontend
to explicitly layout the bytes as it sees fit.
> 
> 
> Ok, now to the "how does this all work" part:
> 
> - No more alignment needed in the datalayout string[2].
> - Other places where today we have optional alignment, if omitted the
alignment will be '1' instead of '0'. This will essentially
require alignment to be specified in more places.
> - Array elements are packed[3]. If the elements of an array must be padded
out to a particular alignment, the array should be of a struct containing the
element and a padding member of the appropriate size. This will allow us to tag
that member with metadata as padding as well.
> - Auto-upgrade uses old datalayout with alignments to synthesize necessary
align specifiers on instructions etc.
> - TBAA metadata will identify members of a struct type which are padding
and hold no interesting data.
> 
> This would at least remove one dimension of complexity from Clang's
record layout building by removing the need to try non-packed structs and
fallback to packed. It should even allow us to retain the struct type for a base
class with derived class members packed into previously "padding"
bytes at the end. Currently, even the current proposal doesn't seem to
support retaining the llvm struct type for the base class in this case, or
easily annotating the fields of that base class with TBAA information.
> 
> Thoughts?
> -Chandler
> 
> Some points of clarification:
> [1]: I say "packed" repeatedly but never "bit packed"
or "byte packed". My inclination is to make the rule within LLVM
"byte packed" and fix the idea of a byte as an i8. I think its
hopeless to support non-8-bit-bytes in LLVM, and we should just move past that
illusion. However, it would certainly be possible to make this be "bit
packed" and add bit padding with appropriate metadata. I might even like
that if it gives us a cleaner semantic model, or helps tag certain bits as
undef.
> 
> [2]: We could potentially keep some of this information here if there are
other parts of LLVM that use it... I'm not deeply familiar with all the
consumers of the datalayout string.
> 
> [3]: I'm torn on this one. It might be nice to have arrays get an
optional alignment that establishes the stride of the elements, particularly if
we want the semantics to be that between array elements we have a
"hole" rather than padding. However, I'm not aware of any place
where this is a practical or important constraint, and it seems to add
complexity that we don't need. If needed, it could always be added later.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120910/4109afe6/attachment.html>

Dan Gohman

2012-Sep-10 21:11 UTC

head link

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

On Sep 10, 2012, at 11:29 AM, Chandler Carruth <chandlerc at google.com>
wrote:> 
> Hey Dan, I've talked with you about this in person and on IRC, but
I've not yet laid out my thoughts on a single place, so I'll put them
here.
> 
> TL;DR: I really like the idea of using metadata to tag each member of a
struct with TBAA, and re-using the TBAA metadata nodes we already have. I'm
not as fond of the description of padding in the metadata node.
> 
> Currently padding is really hard to represent because there is sometimes a
member of an LLVM struct which represents padding (packed structs and cases
where the frontend type requires more alignment than the datalayout string
specifies) and other times there isn't. The current proposal doesn't
entirely fix this because we still will need some way to annotate the members of
structs inserted purely for the purpose of padding.
This is not a problem in the current proposal, because it represents padding
completely independently from the LLVM struct type.
> Further, we have the problem that sometimes what is needed is a
representation of a "hole", that is a region which is neither padding
nor part of the struct itself. The canonical example is the tail padding of a
base class where the derived class's first member has low alignent
constraints.
I don't see how a hole in a base class which isn't being used by a
subclass is
different from padding, from the optimizer's perspective. The optimizer
doesn't know about class hierarchies (unless you're proposing something
much more significant).
> 
> I would propose that we solve these problems by a somewhat more invasive
change, but one which will significantly simplify both LLVM and frontends (at
least Clang, I suspect other frontends):
> 
> Remove non-packed struct types completely. Make LLVM structs represent a
contiguous sequence of bytes, explicitly partitioned into fields with particular
primitive types.
> 
> The idea would be to make all struct types be packed[1], and to represent
padding as explicit members of the struct. These could in turn have a
"padding" TBAA metadata node which would specify that member as being
padding. This would simplify the metadata representation because there would
*always* be a member to hang the padding tag off of. It would simplify struct
layout analysis in LLVM because the difference between alloc-size and type-size
would be irrelevant. It would dramatically simplify Clang's record layout
building, which already has to fall back to packed LLVM structs in many cases
because  normal structs produce offsets that conflict with the ABI's layout
requirements.
> 
> Essentially, LLVM is trying to simplify ABI layout by providing a
datalayout summary description of target alignments, and building structs with
that algorithm. But unless this *exactly* matches the ABI in question, it
actually makes the job harder because now we have to try, potentially fail, and
end up with all the code to use the packed mode anyways. My theory is that there
are too many ABIs in the world (and too weird rules within them) for us to ever
really get this right at the LLVM layer. Instead, we should force the frontend
to explicitly layout the bytes as it sees fit.
The current situation is not bleak. ABIs don't often vary that much in the
way they lay out structs and arrays, especially within a given architecture.

I actually think it's kind of nice that LLVM has this native concept of
"normal"
struct layout built in. It encourages people to avoid doing their own custom
struct
layout unless they have a good reason to.

I think your proposal would solve the original problem here, but it's not
obviously
better than the metadata approach. Bytes in memory in LLVM don't have
inherent types,
so optimizers can't rely on the type, or on any individual copy, to
understand the
lifetimes of data in storage allocated for padding. Consequently, a type just
becomes a way to attach information to a copy. And it's not clear that using
a type
is better than using metadata.

The metadata approach is nice because it separates the use cases into two
families.
On one side, copies with no metadata are simple to create, simple to understand,
and
simple to implement. On the other side, people who need more features can add
metadata to get there, and things are more complex all around, but that's
the price
of using advanced features. This is the shape of problem that metadata was
intended
to solve.

Dan

Hal Finkel

2012-Sep-10 21:54 UTC

head link

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

On Mon, 10 Sep 2012 11:29:37 -0700
Chandler Carruth <chandlerc at google.com> wrote:
> On Thu, Sep 6, 2012 at 4:24 PM, Dan Gohman <gohman at apple.com>
wrote:
> 
> > Hello,
> >
> > Persuant to feedback,
> >
> > http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052927.html
> >
> > here is a new proposal for detailed struct assignment information.
> >
> > Here's the example showing the basic problem:
> >
> > struct bar {
> >  char x;
> >  float y;
> >  double z;
> > };
> > void copy_bar(struct bar *a, struct bar *b) {
> >  *a = *b;
> > }
> >
> > The solution I now propose here is to have front-ends describe the
> > copy using metadata. For example:
> >
> >  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32
> > 8, i1 false), !tbaa.struct !4
> >  […]
> >  !0 = metadata !{metadata !"Simple C/C++ TBAA"}
> >  !1 = metadata !{metadata !"omnipotent char", metadata !0}
> >  !2 = metadata !{metadata !"float", metadata !1}
> >  !3 = metadata !{metadata !"double", metadata !1}
> >  !4 = metadata !{metadata !5, i64 3, metadata !6, metadata !7}
> >  !5 = metadata !{i64 1, metadata !1}
> >  !6 = metadata !{i64 4, metadata !2}
> >  !7 = metadata !{i64 8, metadata !3}
> >
> > Metadata nodes !0 through !3 are regular TBAA nodes as are already
> > in use.
> >
> > Metadata node !4 here is a top-level description of the memcpy. It
> > holds a list of virtual members. An integer represents a padding
> > field of that size. A metadata tuple represents an actual data
> > field. The tuple's members are an integer size and a TBAA tag for
> > the field.
> >
> 
> Hey Dan, I've talked with you about this in person and on IRC, but
> I've not yet laid out my thoughts on a single place, so I'll put
them
> here.
> 
> TL;DR: I really like the idea of using metadata to tag each member of
> a struct with TBAA, and re-using the TBAA metadata nodes we already
> have. I'm not as fond of the description of padding in the metadata
> node.
> 
> Currently padding is really hard to represent because there is
> sometimes a member of an LLVM struct which represents padding (packed
> structs and cases where the frontend type requires more alignment
> than the datalayout string specifies) and other times there isn't.
> The current proposal doesn't entirely fix this because we still will
> need some way to annotate the members of structs inserted purely for
> the purpose of padding.
> 
> Further, we have the problem that sometimes what is needed is a
> representation of a "hole", that is a region which is neither
padding
> nor part of the struct itself. The canonical example is the tail
> padding of a base class where the derived class's first member has
> low alignent constraints.
> 
> I would propose that we solve these problems by a somewhat more
> invasive change, but one which will significantly simplify both LLVM
> and frontends (at least Clang, I suspect other frontends):
> 
> Remove non-packed struct types completely. Make LLVM structs
> represent a contiguous sequence of bytes, explicitly partitioned into
> fields with particular primitive types.
> 
> The idea would be to make all struct types be packed[1], and to
> represent padding as explicit members of the struct. These could in
> turn have a "padding" TBAA metadata node which would specify that
> member as being padding. This would simplify the metadata
> representation because there would *always* be a member to hang the
> padding tag off of. It would simplify struct layout analysis in LLVM
> because the difference between alloc-size and type-size would be
> irrelevant. It would dramatically simplify Clang's record layout
> building, which already has to fall back to packed LLVM structs in
> many cases because  normal structs produce offsets that conflict with
> the ABI's layout requirements.
> 
> Essentially, LLVM is trying to simplify ABI layout by providing a
> datalayout summary description of target alignments, and building
> structs with that algorithm. But unless this *exactly* matches the
> ABI in question, it actually makes the job harder because now we have
> to try, potentially fail, and end up with all the code to use the
> packed mode anyways. My theory is that there are too many ABIs in the
> world (and too weird rules within them) for us to ever really get
> this right at the LLVM layer.
This layout logic needs to live somewhere, why can't it live in LLVM?
Does LLVM not have all of the necessary information for some ABIs? If
we push all of the necessary information and the associated logic into
the LLVM layer, then it can be used by multiple frontends.

 -Hal
> Instead, we should force the frontend
> to explicitly layout the bytes as it sees fit.
> 
> 
> Ok, now to the "how does this all work" part:
> 
> - No more alignment needed in the datalayout string[2].
> - Other places where today we have optional alignment, if omitted the
> alignment will be '1' instead of '0'. This will essentially
require
> alignment to be specified in more places.
> - Array elements are packed[3]. If the elements of an array must be
> padded out to a particular alignment, the array should be of a struct
> containing the element and a padding member of the appropriate size.
> This will allow us to tag that member with metadata as padding as
> well.
> - Auto-upgrade uses old datalayout with alignments to synthesize
> necessary align specifiers on instructions etc.
> - TBAA metadata will identify members of a struct type which are
> padding and hold no interesting data.
> 
> This would at least remove one dimension of complexity from Clang's
> record layout building by removing the need to try non-packed structs
> and fallback to packed. It should even allow us to retain the struct
> type for a base class with derived class members packed into
> previously "padding" bytes at the end. Currently, even the
current
> proposal doesn't seem to support retaining the llvm struct type for
> the base class in this case, or easily annotating the fields of that
> base class with TBAA information.
> 
> Thoughts?
> -Chandler
> 
> Some points of clarification:
> [1]: I say "packed" repeatedly but never "bit packed"
or "byte
> packed". My inclination is to make the rule within LLVM "byte
packed"
> and fix the idea of a byte as an i8. I think its hopeless to support
> non-8-bit-bytes in LLVM, and we should just move past that illusion.
> However, it would certainly be possible to make this be "bit
packed"
> and add bit padding with appropriate metadata. I might even like that
> if it gives us a cleaner semantic model, or helps tag certain bits as
> undef.
> 
> [2]: We could potentially keep some of this information here if there
> are other parts of LLVM that use it... I'm not deeply familiar with
> all the consumers of the datalayout string.
> 
> [3]: I'm torn on this one. It might be nice to have arrays get an
> optional alignment that establishes the stride of the elements,
> particularly if we want the semantics to be that between array
> elements we have a "hole" rather than padding. However, I'm
not aware
> of any place where this is a practical or important constraint, and
> it seems to add complexity that we don't need. If needed, it could
> always be added later.


-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Sep 2012 - [LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

[LLVMdev] PROPOSAL: IR representation of detailed struct assignment information (new version)

Maybe Matching Threads