Uday Kumar Reddy B via llvm-dev
2019-Sep-03 18:53 UTC
[llvm-dev] load/stores adjusted to align - prevent aggregate replacement/elimination
Hello, This is a question reg. replacement of malloc'ed single element arrays by scalars, which LLVM's opt appears to normally perform well. Now, when there are arrays of vector types with elements of size 32 bytes or more (eg <8 x float> *), it's common to adjust load/store's so that they align on element type boundaries (since GNU malloc would typically align only to 16 byte boundaries, say on x86-64). On such IR, I notice that the scalar replacement / register promotion of a malloc'ed vector element doesn't work any more. https://godbolt.org/z/9KByAf (Commenting out the alignment adjustment makes it work perfectly.) Are there any attributes/hints that might be used in generating those alignment arithmetic instructions to help the optimizer here? Thanks, ~ Uday
Uday Kumar Reddy B via llvm-dev
2019-Sep-03 19:03 UTC
[llvm-dev] load/stores adjusted to align - prevent aggregate replacement/elimination
On Wed, 4 Sep 2019 at 00:23, Uday Kumar Reddy B <uday at polymagelabs.com> wrote:> > Hello, > > This is a question reg. replacement of malloc'ed single element arrays > by scalars, which LLVM's opt appears to normally perform well. Now, > when there are arrays of vector types with elements of size 32 bytes > or more (eg <8 x float> *), it's common to adjust load/store's so that > they align on element type boundaries (since GNU malloc would > typically align only to 16 byte boundaries, say on x86-64). On such > IR, I notice that the scalar replacement / register promotion of a > malloc'ed vector element doesn't work any more. > > https://godbolt.org/z/9KByAfLooks like the short URLs aren't working: I'm anyway appending the snippet below.> (Commenting out the alignment adjustment makes it work perfectly.) > > Are there any attributes/hints that might be used in generating those > alignment arithmetic instructions to help the optimizer here? > > Thanks, > ~ Uday-------------------------------------------------------------------------------------------- declare i8* @malloc(i64) define <8 x float> @xyz(<8 x float>* %0) { %2 = call i8* @malloc(i64 63) %3 = bitcast i8* %2 to <8 x float>* %4 = ptrtoint <8 x float>* %3 to i64 %5 = add i64 %4, 31 %6 = udiv i64 %5, 32 %7 = mul i64 %6, 32 %8 = inttoptr i64 %7 to <8 x float>* %9 = getelementptr <8 x float>, <8 x float>* %8, i64 0 ; uncomment this and comment the one above to allow full scalar rep ; %9 = getelementptr <8 x float>, <8 x float>* %3, i64 0 store <8 x float> zeroinitializer, <8 x float>* %9 br label %10 10: ; preds = %13, %1 %11 = phi i64 [ %23, %13 ], [ 0, %1 ] %12 = icmp slt i64 %11, 100 br i1 %12, label %13, label %24 13: ; preds = %10 %14 = ptrtoint <8 x float>* %0 to i64 %15 = add i64 %14, 31 %16 = udiv i64 %15, 32 %17 = mul i64 %16, 32 %18 = inttoptr i64 %17 to <8 x float>* %19 = getelementptr <8 x float>, <8 x float>* %18, i64 %11 %20 = load <8 x float>, <8 x float>* %19 %21 = load <8 x float>, <8 x float>* %9 %22 = fadd <8 x float> %20, %21 store <8 x float> %22, <8 x float>* %9 %23 = add i64 %11, 1 br label %10 24: ; preds = %10 %25 = load <8 x float>, <8 x float>* %9 ret <8 x float> %25 } -------------------------------