Hi all, This patch is to fix a problem on PPC64 where an unaligned memcpy is generated. The testcase is this: $ cat testcase.c void Qux() { char Bar[11] = {0}; } What happens is that we produce LLVM code like this: call void @llvm.memcpy.i64( i8* %event_list2, i8* getelementptr ([11 x i8]* @C.103.30698, i32 0, i32 0), i64 11, i32 8 ) Notice that it has an 8-byte alignment. However, the Bar variable has 1-byte alignment, because it's being placed into the "redzone". The generic code for the function is: $ more testcase.c.t03.generic Qux () { static char C.0[11] = {0}; char Bar[11]; Bar = C.0; } Anyway, it turns out that the gimplifier was generating the correct alignment, but it was being overridden in assemble_variable(): /* On some machines, it is good to increase alignment sometimes. */ if (! DECL_USER_ALIGN (decl)) { #ifdef DATA_ALIGNMENT align = DATA_ALIGNMENT (TREE_TYPE (decl), align); #endif #ifdef CONSTANT_ALIGNMENT => if (DECL_INITIAL (decl) != 0 && DECL_INITIAL (decl) != error_mark_node) align = CONSTANT_ALIGNMENT (DECL_INITIAL (decl), align); #endif } By setting the DECL_USER_ALIGN field, we can get the correct alignment. Comments? -bw Index: gimplify.c ==================================================================--- gimplify.c (revision 43658) +++ gimplify.c (working copy) @@ -2756,7 +2756,8 @@ TREE_STATIC (new) = 1; TREE_READONLY (new) = 1; DECL_INITIAL (new) = ctor; - if (align > DECL_ALIGN (new)) + /* LLVM LOCAL - Set DECL_USER_ALIGN when equal */ + if (align >= DECL_ALIGN (new)) { DECL_ALIGN (new) = align; DECL_USER_ALIGN (new) = 1;
On Nov 6, 2007, at 5:45 PM, Bill Wendling wrote:> $ more testcase.c.t03.generic > Qux () > { > static char C.0[11] = {0}; > char Bar[11]; > > Bar = C.0; > } > > Anyway, it turns out that the gimplifier was generating the correct > alignment, but it was being overridden in assemble_variable(): > > /* On some machines, it is good to increase alignment sometimes. */ > if (! DECL_USER_ALIGN (decl)) > { > #ifdef DATA_ALIGNMENT > align = DATA_ALIGNMENT (TREE_TYPE (decl), align); > #endif > #ifdef CONSTANT_ALIGNMENT > => if (DECL_INITIAL (decl) != 0 && DECL_INITIAL (decl) != > error_mark_node) > align = CONSTANT_ALIGNMENT (DECL_INITIAL (decl), align); > #endif > }Not sure I'm getting it, is this Bar itself or the constructed C.0 that's getting realigned?> By setting the DECL_USER_ALIGN field, we can get the correct > alignment. > > Comments? > > -bw > > Index: gimplify.c > ==================================================================> --- gimplify.c (revision 43658) > +++ gimplify.c (working copy) > @@ -2756,7 +2756,8 @@ > TREE_STATIC (new) = 1; > TREE_READONLY (new) = 1; > DECL_INITIAL (new) = ctor; > - if (align > DECL_ALIGN (new)) > + /* LLVM LOCAL - Set DECL_USER_ALIGN when equal */ > + if (align >= DECL_ALIGN (new)) > { > DECL_ALIGN (new) = align; > DECL_USER_ALIGN (new) = 1; > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Nov 6, 2007, at 5:45 PM, Bill Wendling wrote:> Hi all, > > This patch is to fix a problem on PPC64 where an unaligned memcpy is > generated. The testcase is this: > > $ cat testcase.c > void Qux() { > char Bar[11] = {0}; > } > > What happens is that we produce LLVM code like this: > > call void @llvm.memcpy.i64( i8* %event_list2, i8* getelementptr ([11 > x i8]* @C.103.30698, i32 0, i32 0), i64 11, i32 8 ) > > Notice that it has an 8-byte alignment. However, the Bar variable has > 1-byte alignment, because it's being placed into the "redzone". The > generic code for the function is: > > $ more testcase.c.t03.generic > Qux () > { > static char C.0[11] = {0}; > char Bar[11]; > > Bar = C.0; > } > > Anyway, it turns out that the gimplifier was generating the correct > alignment, but it was being overridden in assemble_variable(): > > /* On some machines, it is good to increase alignment sometimes. */ > if (! DECL_USER_ALIGN (decl)) > { > #ifdef DATA_ALIGNMENT > align = DATA_ALIGNMENT (TREE_TYPE (decl), align); > #endif > #ifdef CONSTANT_ALIGNMENT > => if (DECL_INITIAL (decl) != 0 && DECL_INITIAL (decl) != > error_mark_node) > align = CONSTANT_ALIGNMENT (DECL_INITIAL (decl), align); > #endif > } > > By setting the DECL_USER_ALIGN field, we can get the correct > alignment.I don't get it Bill. llvm should be making memcpy with the right alignment regardless of how Bar is formed. Changing the alignment of bar papers over this problem, but we will still get improper alignment for other cases. Also, DECL_USER_ALIGN isn't appropriate to tweak here... the user isn't playing around with __attribute__((aligned)) -Chris
On Nov 6, 2007, at 6:07 PM, Dale Johannesen wrote:> > On Nov 6, 2007, at 5:45 PM, Bill Wendling wrote: >> $ more testcase.c.t03.generic >> Qux () >> { >> static char C.0[11] = {0}; >> char Bar[11]; >> >> Bar = C.0; >> } >> >> Anyway, it turns out that the gimplifier was generating the correct >> alignment, but it was being overridden in assemble_variable(): >> >> /* On some machines, it is good to increase alignment >> sometimes. */ >> if (! DECL_USER_ALIGN (decl)) >> { >> #ifdef DATA_ALIGNMENT >> align = DATA_ALIGNMENT (TREE_TYPE (decl), align); >> #endif >> #ifdef CONSTANT_ALIGNMENT >> => if (DECL_INITIAL (decl) != 0 && DECL_INITIAL (decl) !>> error_mark_node) >> align = CONSTANT_ALIGNMENT (DECL_INITIAL (decl), align); >> #endif >> } > > Not sure I'm getting it, is this Bar itself or the constructed C.0 > that's getting realigned? >It's C.0 that's getting realigned. -bw
On Nov 6, 2007, at 8:25 PM, Chris Lattner wrote:> On Nov 6, 2007, at 5:45 PM, Bill Wendling wrote: > >> Hi all, >> >> This patch is to fix a problem on PPC64 where an unaligned memcpy is >> generated. The testcase is this: >> >> $ cat testcase.c >> void Qux() { >> char Bar[11] = {0}; >> } >> >> What happens is that we produce LLVM code like this: >> >> call void @llvm.memcpy.i64( i8* %event_list2, i8* getelementptr ([11 >> x i8]* @C.103.30698, i32 0, i32 0), i64 11, i32 8 ) >> >> Notice that it has an 8-byte alignment. However, the Bar variable has >> 1-byte alignment, because it's being placed into the "redzone". The >> generic code for the function is: >> >> $ more testcase.c.t03.generic >> Qux () >> { >> static char C.0[11] = {0}; >> char Bar[11]; >> >> Bar = C.0; >> } >> >> Anyway, it turns out that the gimplifier was generating the correct >> alignment, but it was being overridden in assemble_variable(): >> >> /* On some machines, it is good to increase alignment sometimes. */ >> if (! DECL_USER_ALIGN (decl)) >> { >> #ifdef DATA_ALIGNMENT >> align = DATA_ALIGNMENT (TREE_TYPE (decl), align); >> #endif >> #ifdef CONSTANT_ALIGNMENT >> => if (DECL_INITIAL (decl) != 0 && DECL_INITIAL (decl) !>> error_mark_node) >> align = CONSTANT_ALIGNMENT (DECL_INITIAL (decl), align); >> #endif >> } >> >> By setting the DECL_USER_ALIGN field, we can get the correct >> alignment. > > I don't get it Bill. llvm should be making memcpy with the right > alignment regardless of how Bar is formed. Changing the alignment of > bar papers over this problem, but we will still get improper alignment > for other cases. Also, DECL_USER_ALIGN isn't appropriate to tweak > here... the user isn't playing around with __attribute__((aligned)) >Bar isn't the one that's getting realigned with the above code, it's the constructor. The patch only sets the DECL_USER_ALIGN field for the constructor (not Bar) when the constructor's created. My thoughts were that if the constructor was initially given the alignment of the variable (Bar, in this case), then it shouldn't be changed in the assemble_variable() function, unless it's also changing the alignment of Bar (which it's not). I'm actually not sure why it doesn't change Bar's alignment when it's changing Bar's constructor's alignment. That seems like a GCC bug, IMHO, though they probably don't get hit with it or "fix" it later on or something. My guess is that you want something akin to TreeToLLVM::EmitBuiltinMemCopy where it gets the min of the alignments? -bw
How about this patch then? -bw Index: gcc/llvm-convert.cpp ==================================================================--- gcc/llvm-convert.cpp (revision 43658) +++ gcc/llvm-convert.cpp (working copy) @@ -758,7 +758,7 @@ } -Value *TreeToLLVM::Emit(tree exp, Value *DestLoc) { +Value *TreeToLLVM::Emit(tree exp, Value *DestLoc, unsigned Alignment) { assert((isAggregateTreeType(TREE_TYPE(exp)) == (DestLoc != 0) || TREE_CODE(exp) == MODIFY_EXPR) && "Didn't pass DestLoc to an aggregate expr, or passed it to scalar!"); @@ -811,7 +811,7 @@ case STRING_CST: case REALPART_EXPR: case IMAGPART_EXPR: - Result = EmitLoadOfLValue(exp, DestLoc); + Result = EmitLoadOfLValue(exp, DestLoc, Alignment); break; case OBJ_TYPE_REF: Result = EmitOBJ_TYPE_REF(exp); break; case ADDR_EXPR: Result = EmitADDR_EXPR(exp); break; @@ -2524,7 +2524,8 @@ /// EmitLoadOfLValue - When an l-value expression is used in a context that /// requires an r-value, this method emits the lvalue computation, then loads /// the result. -Value *TreeToLLVM::EmitLoadOfLValue(tree exp, Value *DestLoc) { +Value *TreeToLLVM::EmitLoadOfLValue(tree exp, Value *DestLoc, + unsigned SrcAlign) { // If this is an SSA value, don't emit a load, just use the result. if (isGCC_SSA_Temporary(exp)) { assert(DECL_LLVM_SET_P(exp) && "Definition not found before use!"); @@ -2540,7 +2541,7 @@ LValue LV = EmitLV(exp); bool isVolatile = TREE_THIS_VOLATILE(exp); const Type *Ty = ConvertType(TREE_TYPE(exp)); - unsigned Alignment = expr_align(exp) / 8; + unsigned DstAlign = expr_align(exp) / 8; if (!LV.isBitfield()) { if (!DestLoc) { @@ -2548,17 +2549,17 @@ Value *Ptr = CastToType(Instruction::BitCast, LV.Ptr, PointerType::get(Ty)); LoadInst *LI = Builder.CreateLoad(Ptr, isVolatile, "tmp"); - LI->setAlignment(Alignment); + LI->setAlignment(DstAlign); return LI; } else { EmitAggregateCopy(DestLoc, LV.Ptr, TREE_TYPE(exp), false, isVolatile, - Alignment); + std::min(DstAlign, SrcAlign)); return 0; } } else { // This is a bitfield reference. LoadInst *LI = Builder.CreateLoad(LV.Ptr, isVolatile, "tmp"); - LI->setAlignment(Alignment); + LI->setAlignment(DstAlign); Value *Val = LI; unsigned ValSizeInBits = Val->getType()->getPrimitiveSizeInBits(); @@ -3021,7 +3022,7 @@ EmitAggregateCopy(DestLoc, LV.Ptr, TREE_TYPE(exp), isVolatile, false, Alignment); } else if (!isVolatile && TREE_CODE(TREE_OPERAND(exp, 0)) != RESULT_DECL) { - Emit(TREE_OPERAND(exp, 1), LV.Ptr); + Emit(TREE_OPERAND(exp, 1), LV.Ptr, Alignment); } else { // Need to do a volatile store into TREE_OPERAND(exp, 1). To do this, we // emit it into a temporary memory location, then do a volatile copy into Index: gcc/llvm-internal.h ==================================================================--- gcc/llvm-internal.h (revision 43658) +++ gcc/llvm-internal.h (working copy) @@ -375,7 +375,7 @@ /// expression that fits into an LLVM scalar value, the result is returned. If /// the result is an aggregate, it is stored into the location specified by /// DestLoc. - Value *Emit(tree_node *exp, Value *DestLoc); + Value *Emit(tree_node *exp, Value *DestLoc, unsigned Alignment = 0); /// EmitLV - Convert the specified l-value tree node to LLVM code, returning /// the address of the result. @@ -512,7 +512,7 @@ // Expressions. void EmitINTEGER_CST_Aggregate(tree_node *exp, Value *DestLoc); - Value *EmitLoadOfLValue(tree_node *exp, Value *DestLoc); + Value *EmitLoadOfLValue(tree_node *exp, Value *DestLoc, unsigned SrcAlign); Value *EmitOBJ_TYPE_REF(tree_node *exp, Value *DestLoc); Value *EmitADDR_EXPR(tree_node *exp); Value *EmitOBJ_TYPE_REF(tree_node *exp); spang:llvm-9999-01 admin$ cd .. spang:Clean admin$ ls *.c t.c testcase.c spang:Clean admin$ cat > a.c struct A { int a[11]; }; void Qux() { static struct A C = {0}; struct A __attribute__ ((aligned (1))) Bar; Bar = C; }
Hi,> I don't get it Bill. llvm should be making memcpy with the right > alignment regardless of how Bar is formed. Changing the alignment of > bar papers over this problem, but we will still get improper alignment > for other cases. Also, DECL_USER_ALIGN isn't appropriate to tweak > here... the user isn't playing around with __attribute__((aligned))the fundamental problem is that the DestLoc argument to Emit doesn't come with alignment (or volatility for that matter). When emitting an aggregate into DestLoc only the alignment of the source is used, not that of DestLoc. But that's wrong if DestLoc is less aligned than the source (either because the source is highly aligned, eg because the user marked it as being highly aligned, or because the destination is little aligned as in the testcase). So why not pass the alignment and volatility in too? Probably Chris doesn't want to add extra arguments to Emit, so instead we could define a "DestLoc descriptor" struct: struct DLD { Value *Ptr; unsigned Alignment; bool Volatile; } declare one on the stack and pass its address, something like this: DLD D = { Ptr, Align, isVolatile }; Emit(exp, &D); Ciao, Duncan.
Hi,> > I don't get it Bill. llvm should be making memcpy with the right > > alignment regardless of how Bar is formed. Changing the alignment of > > bar papers over this problem, but we will still get improper alignment > > for other cases. Also, DECL_USER_ALIGN isn't appropriate to tweak > > here... the user isn't playing around with __attribute__((aligned)) > > the fundamental problem is that the DestLoc argument to Emit doesn't > come with alignment (or volatility for that matter). When emitting > an aggregate into DestLoc only the alignment of the source is used, > not that of DestLoc. But that's wrong if DestLoc is less aligned > than the source (either because the source is highly aligned, eg > because the user marked it as being highly aligned, or because the > destination is little aligned as in the testcase). So why not pass > the alignment and volatility in too? Probably Chris doesn't want > to add extra arguments to Emit, so instead we could define a > "DestLoc descriptor" struct: > struct DLD { > Value *Ptr; > unsigned Alignment; > bool Volatile; > } > declare one on the stack and pass its address, something like this: > DLD D = { Ptr, Align, isVolatile }; > Emit(exp, &D);I just did this in a sudden fit of activity. It was quite easy and looks like it catches a ton of volatility and alignment bugs. Maybe it will even work and fix the original problem :) I will let you know later. Ciao, Duncan.