thr3ads.net - llvm dev - [LLVMdev] Language lawyer question [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Dale Johannesen

2008-Mar-12 03:42 UTC

[LLVMdev] Language lawyer question

Looking through the gcc testsuite turned up an interesting edge case.   
Let's assume our target leaves a hole for alignment in struct x, as do  
x86 and powerpc.  Do you think the following code can validly abort?

   struct x { char c; short s; };
   int i;    char *p;
   memset(&X, 0, sizeof(struct x));
   memset(&Y, 22, sizeof(struct x));
   X = Y;
   for (i=0, p=(char *)&X; i<sizeof(struct x); i++, p++)
     if (*p != 22)
       abort();

The memset's and char-by-char comparison are clearly valid references;  
the questionable bit is the struct copy, which llvm-gcc currently does  
field-by-field, skipping the hole.  C99 says

In simple assignment (=), the value of the right operand is converted  
to the type of the
assignment expression and replaces the value stored in the object  
designated by the left
operand.

I would take "replaces the value" to mean "replaces the entire
value",
but it could be read otherwise, I suppose.

The current code seems to me to assume holes in structs can't ever be  
validly accessed, which isn't right, as we see here.  They are often  
nondeterministic (this is explicit for initialization in C99) but not  
always.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080311/85880817/attachment.html>

Patrick Meredith

2008-Mar-12 04:47 UTC

head link

[LLVMdev] Language lawyer question

I thought pointer referencing like this was only valid for arrays.  I  
could be wrong,  but it might be that looping over the struct like that
is invalid, making it undefined behavior (and then the hole doesn't  
matter because there is no valid way to access it).  That said, I've  
definitely
seen a lot of code that uses pointers to reference struct contents.

On Mar 11, 2008, at 10:42 PM, Dale Johannesen wrote:
> Looking through the gcc testsuite turned up an interesting edge  
> case.  Let's assume our target leaves a hole for alignment in  
> struct x, as do x86 and powerpc.  Do you think the following code  
> can validly abort?
>
>   struct x { char c; short s; };
>   int i;    char *p;
>   memset(&X, 0, sizeof(struct x));
>   memset(&Y, 22, sizeof(struct x));
>   X = Y;
>   for (i=0, p=(char *)&X; i<sizeof(struct x); i++, p++)
>     if (*p != 22)
>       abort();
>
> The memset's and char-by-char comparison are clearly valid  
> references; the questionable bit is the struct copy, which llvm-gcc  
> currently does field-by-field, skipping the hole.  C99 says
>
> In simple assignment (=), the value of the right operand is  
> converted to the type of the
> assignment expression and replaces the value stored in the object  
> designated by the left
> operand.
>
> I would take "replaces the value" to mean "replaces the
entire
> value", but it could be read otherwise, I suppose.
>
> The current code seems to me to assume holes in structs can't ever  
> be validly accessed, which isn't right, as we see here.  They are  
> often nondeterministic (this is explicit for initialization in C99)  
> but not always.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080311/ef90cc14/attachment.html>

Shantonu Sen

2008-Mar-12 05:22 UTC

head link

[LLVMdev] Language lawyer question

Does the test case indicate the why it was added?

More of an implementation observation than a language interpretation,  
but if you add a "long l[6];" field, llvm-gcc continues to do
field-by-
field copies, but at l[7] it turns into machine-word copies, then at  
some point it turns into a rep/movsl (on Intel), and then at another  
threshold it turns into a memcpy(3) callout.

What part of LLVM's codegen for copying "struct x { char c; short s;  
long l[6] };" considers a movb + movw + 6 movl's to efficient in  
either time or space (I was using -Os)? What changes when the overall  
structure gets to 64 bytes such that it decides its more efficient to  
copy a word at a time?

I think the test case is bogus in terms of language correctness, but  
it might be indicative of a missed optimization for doing structure  
copies. Is that what GCC's test case is actually trying to validate?  
If so, it probably falls under a "gcc test case" and not a "C
test
case", if one can differentiate them.

Maybe it would be reasonable for llvm-gcc to NOT copy the padding at - 
O0 and do explicit field copies, but to copy the padding as a side  
effect of an inlined memcpy() implementation for copying sizeof(struct  
x) when optimization is used. Copying using the largest appropriate  
registers/instructions given the structure size and alignment seems  
like it would always be faster than field copies, even for small  
structures.

Shantonu

Sent from my MacBook

On Mar 11, 2008, at 9:47 PM, Patrick Meredith wrote:
> I thought pointer referencing like this was only valid for arrays.   
> I could be wrong,  but it might be that looping over the struct like  
> that
> is invalid, making it undefined behavior (and then the hole doesn't  
> matter because there is no valid way to access it).  That said, I've  
> definitely
> seen a lot of code that uses pointers to reference struct contents.
>
> On Mar 11, 2008, at 10:42 PM, Dale Johannesen wrote:
>
>> Looking through the gcc testsuite turned up an interesting edge  
>> case.  Let's assume our target leaves a hole for alignment in  
>> struct x, as do x86 and powerpc.  Do you think the following code  
>> can validly abort?
>>
>>   struct x { char c; short s; };
>>   int i;    char *p;
>>   memset(&X, 0, sizeof(struct x));
>>   memset(&Y, 22, sizeof(struct x));
>>   X = Y;
>>   for (i=0, p=(char *)&X; i<sizeof(struct x); i++, p++)
>>     if (*p != 22)
>>       abort();
>>
>> The memset's and char-by-char comparison are clearly valid  
>> references; the questionable bit is the struct copy, which llvm-gcc  
>> currently does field-by-field, skipping the hole.  C99 says
>>
>> In simple assignment (=), the value of the right operand is  
>> converted to the type of the
>> assignment expression and replaces the value stored in the object  
>> designated by the left
>> operand.
>>
>> I would take "replaces the value" to mean "replaces the
entire
>> value", but it could be read otherwise, I suppose.
>>
>> The current code seems to me to assume holes in structs can't ever
>> be validly accessed, which isn't right, as we see here.  They are  
>> often nondeterministic (this is explicit for initialization in C99)  
>> but not always.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080311/56040969/attachment.html>

Dale Johannesen

2008-Mar-12 05:52 UTC

head link

[LLVMdev] Language lawyer question

On Mar 11, 2008, at 9:47 PM, Patrick Meredith wrote:
> I thought pointer referencing like this was only valid for arrays.   
> I could be wrong,  but it might be that looping over the struct like  
> that
> is invalid, making it undefined behavior (and then the hole doesn't  
> matter because there is no valid way to access it).  That said, I've  
> definitely
> seen a lot of code that uses pointers to reference struct contents.
Anything can be addressed as characters.  C99 6.5, see last line:

An object shall have its stored value accessed only by an lvalue  
expression that has one of
the following types:73)
—atype compatible with the effective type of the object,
—aqualiﬁed version of a type compatible with the effective type of  
the object,
—atype that is the signed or unsigned type corresponding to the  
effective type of the
object,
—atype that is the signed or unsigned type corresponding to a  
qualiﬁed version of the
effective type of the object,
—anaggregate or union type that includes one of the aforementioned  
types among its
members (including, recursively,amember of a subaggregate or contained  
union), or
—a character type.

C89 and C++ have similar language.

On Mar 11, 2008, at 10:22 PM, Shantonu Sen wrote:> Does the test case indicate the why it was added?
Actually it's testing something else entirely and tripped over this  
before it got to what it's supposed to be testing:(
Just assumed it would work, as it does with gcc's codegen, of course.
> More of an implementation observation than a language  
> interpretation, but if you add a "long l[6];" field, llvm-gcc  
> continues to do field-by-field copies, but at l[7] it turns into  
> machine-word copies, then at some point it turns into a rep/movsl  
> (on Intel), and then at another threshold it turns into a memcpy(3)  
> callout.
>
> What part of LLVM's codegen for copying "struct x { char c; short
s;
> long l[6] };" considers a movb + movw + 6 movl's to efficient in  
> either time or space (I was using -Os)? What changes when the  
> overall structure gets to 64 bytes such that it decides its more  
> efficient to copy a word at a time?
Yeah, it's not efficient either.  I didn't want to get into that since  
fixing the correctness issue, if there is one, will automatically fix  
this too.
I think the justification is that breaking the struct into fields  
early makes it easier to do other optimizations.
> I think the test case is bogus in terms of language correctness,
Why?
> but it might be indicative of a missed optimization for doing  
> structure copies. Is that what GCC's test case is actually trying to  
> validate? If so, it probably falls under a "gcc test case" and
not a
> "C test case", if one can differentiate them.
There certainly are "gcc test cases", but so far I don't think
this is
one.
> Maybe it would be reasonable for llvm-gcc to NOT copy the padding at  
> -O0 and do explicit field copies, but to copy the padding as a side  
> effect of an inlined memcpy() implementation for copying  
> sizeof(struct x) when optimization is used. Copying using the  
> largest appropriate registers/instructions given the structure size  
> and alignment seems like it would always be faster than field  
> copies, even for small structures.
> On Mar 11, 2008, at 10:42 PM, Dale Johannesen wrote:
>
>> Looking through the gcc testsuite turned up an interesting edge  
>> case.  Let's assume our target leaves a hole for alignment in  
>> struct x, as do x86 and powerpc.  Do you think the following code  
>> can validly abort?
>>
>>   struct x { char c; short s; };
>>   int i;    char *p;
>>   memset(&X, 0, sizeof(struct x));
>>   memset(&Y, 22, sizeof(struct x));
>>   X = Y;
>>   for (i=0, p=(char *)&X; i<sizeof(struct x); i++, p++)
>>     if (*p != 22)
>>       abort();
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080311/d8d7aa61/attachment.html>

Duncan Sands

2008-Mar-12 08:11 UTC

head link

[LLVMdev] Language lawyer question

Hi,
> Looking through the gcc testsuite turned up an interesting edge case.   
> Let's assume our target leaves a hole for alignment in struct x, as do
> x86 and powerpc.  Do you think the following code can validly abort?
I think you should ask this on the gcc mailing list.  For what it's
worth I'm sure that Ada does not require the values in padding to
be preserved.
> The current code seems to me to assume holes in structs can't ever be  
> validly accessed, which isn't right, as we see here.
Of course they can be accessed, but that doesn't in itself mean that
what they contain needs to be preserved by structure copies.

Ciao,

Duncan.

Daniel Berlin

2008-Mar-12 15:43 UTC

head link

[LLVMdev] Language lawyer question

On Tue, Mar 11, 2008 at 11:42 PM, Dale Johannesen <dalej at apple.com>
wrote:>
> Looking through the gcc testsuite turned up an interesting edge case. 
Let's
> assume our target leaves a hole for alignment in struct x, as do x86 and
> powerpc.  Do you think the following code can validly abort?
>
>
>   struct x { char c; short s; };
>   int i;    char *p;
>   memset(&X, 0, sizeof(struct x));
>   memset(&Y, 22, sizeof(struct x));
>   X = Y;
>   for (i=0, p=(char *)&X; i<sizeof(struct x); i++, p++)
>     if (*p != 22)
>       abort();
>
> The memset's and char-by-char comparison are clearly valid references;
the
> questionable bit is the struct copy, which llvm-gcc currently does
> field-by-field, skipping the hole.  C99 says
>
>
> In simple assignment (=), the value of the right operand is converted to
the
> type of the
> assignment expression and replaces the value stored in the object
designated
> by the left
> operand.
>
Padding is allowed to be skipped in structures in this case.
See 6.2.6.1.
Even further, all padding is allowed to take any value no matter how
you try to set it (IE it always allowed to have an undefined value,
even if you memset it).


We happen to allow memset to clear padding bits, but we don't have to,
AFAIK.

--Dan

Dale Johannesen

2008-Mar-12 17:01 UTC

head link

[LLVMdev] Language lawyer question

On Mar 12, 2008, at 8:43 AM, Daniel Berlin wrote:
> On Tue, Mar 11, 2008 at 11:42 PM, Dale Johannesen <dalej at
apple.com>
> wrote:
> Padding is allowed to be skipped in structures in this case.
> See 6.2.6.1.
> Even further, all padding is allowed to take any value no matter how
> you try to set it (IE it always allowed to have an undefined value,
> even if you memset it).
>
> We happen to allow memset to clear padding bits, but we don't have  
> to, AFAIK.
Yes, 6.2.6.1 is clear in C99, thanks.  (C90 does not seem to have a  
corresponding section, though.)

This is derived from gcc.dg/vmx/varargs-4.c from the gcc testsuite, if  
anybody feels like fixing it.

Mike Stump

2008-Mar-12 19:49 UTC

head link

[LLVMdev] Language lawyer question

On Mar 11, 2008, at 8:42 PM, Dale Johannesen wrote:> Looking through the gcc testsuite turned up an interesting edge  
> case.  Let's assume our target leaves a hole for alignment in struct  
> x, as do x86 and powerpc.  Do you think the following code can  
> validly abort?
No.  The value of the object referred to be the left hand side must be  
replaced by the object on the right.  The objects are defined to be a  
sequence of sizeof(x) bytes (I'm assuming the testcase does struct x  
X, Y;).  If a byte isn't so updated, trivially, the object has not  
been replaced.

Mike Stump

2008-Mar-12 20:02 UTC

head link

[LLVMdev] Language lawyer question

On Mar 12, 2008, at 12:49 PM, Mike Stump wrote:> No.
Doh, you know, I was trying to find the quoted wording...  What the  
big print giveth, the small print takes away.  Ignore me...  I was  
wrong.

Daniel Berlin

2008-Mar-12 20:05 UTC

head link

[LLVMdev] Language lawyer question

On Wed, Mar 12, 2008 at 3:49 PM, Mike Stump <mrs at apple.com>
wrote:> On Mar 11, 2008, at 8:42 PM, Dale Johannesen wrote:
>  > Looking through the gcc testsuite turned up an interesting edge
>  > case.  Let's assume our target leaves a hole for alignment in
struct
>  > x, as do x86 and powerpc.  Do you think the following code can
>  > validly abort?
>
>  No.  The value of the object referred to be the left hand side must be
>  replaced by the object on the right.  The objects are defined to be a
>  sequence of sizeof(x) bytes (I'm assuming the testcase does struct x
>  X, Y;).  If a byte isn't so updated, trivially, the object has not
>  been replaced.
Except that the standard specifically says the value of the padding
bytes are undefined, and that you can do structure copies by member.

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Mar 2008 - [LLVMdev] Language lawyer question

[LLVMdev] Language lawyer question

[LLVMdev] Language lawyer question

[LLVMdev] Language lawyer question

[LLVMdev] Language lawyer question

[LLVMdev] Language lawyer question

[LLVMdev] Language lawyer question

[LLVMdev] Language lawyer question

[LLVMdev] Language lawyer question

[LLVMdev] Language lawyer question

[LLVMdev] Language lawyer question

Apparently Analagous Threads