thr3ads.net - llvm dev - [llvm-dev] load/stores adjusted to align - prevent aggregate replacement/elimination [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Uday Kumar Reddy B via llvm-dev

2019-Sep-03 18:53 UTC

[llvm-dev] load/stores adjusted to align - prevent aggregate replacement/elimination

Hello,

This is a question reg. replacement of malloc'ed single element arrays
by scalars, which LLVM's opt appears to normally perform well. Now,
when there are arrays of vector types with elements of size 32 bytes
or more (eg <8 x float> *), it's common to adjust load/store's so
that
they align on element type boundaries (since GNU malloc would
typically align only to 16 byte boundaries, say on x86-64). On such
IR, I notice that the scalar replacement / register promotion of a
malloc'ed vector element doesn't work any more.

https://godbolt.org/z/9KByAf
(Commenting out the alignment adjustment makes it work perfectly.)

Are there any attributes/hints that might be used in generating those
alignment arithmetic instructions to help the optimizer here?

Thanks,
~ Uday

Uday Kumar Reddy B via llvm-dev

2019-Sep-03 19:03 UTC

head link

[llvm-dev] load/stores adjusted to align - prevent aggregate replacement/elimination

On Wed, 4 Sep 2019 at 00:23, Uday Kumar Reddy B <uday at polymagelabs.com>
wrote:>
> Hello,
>
> This is a question reg. replacement of malloc'ed single element arrays
> by scalars, which LLVM's opt appears to normally perform well. Now,
> when there are arrays of vector types with elements of size 32 bytes
> or more (eg <8 x float> *), it's common to adjust
load/store's so that
> they align on element type boundaries (since GNU malloc would
> typically align only to 16 byte boundaries, say on x86-64). On such
> IR, I notice that the scalar replacement / register promotion of a
> malloc'ed vector element doesn't work any more.
>
> https://godbolt.org/z/9KByAf
Looks like the short URLs aren't working: I'm anyway appending the
snippet below.
> (Commenting out the alignment adjustment makes it work perfectly.)
>
> Are there any attributes/hints that might be used in generating those
> alignment arithmetic instructions to help the optimizer here?
>
> Thanks,
> ~ Uday
--------------------------------------------------------------------------------------------
declare i8* @malloc(i64)

define <8 x float> @xyz(<8 x float>* %0) {
%2 = call i8* @malloc(i64 63)
%3 = bitcast i8* %2 to <8 x float>*
%4 = ptrtoint <8 x float>* %3 to i64
%5 = add i64 %4, 31
%6 = udiv i64 %5, 32
%7 = mul i64 %6, 32
%8 = inttoptr i64 %7 to <8 x float>*
%9 = getelementptr <8 x float>, <8 x float>* %8, i64 0
; uncomment this and comment the one above to allow full scalar rep
; %9 = getelementptr <8 x float>, <8 x float>* %3, i64 0
store <8 x float> zeroinitializer, <8 x float>* %9
br label %10

10: ; preds = %13, %1
%11 = phi i64 [ %23, %13 ], [ 0, %1 ]
%12 = icmp slt i64 %11, 100
br i1 %12, label %13, label %24

13: ; preds = %10
%14 = ptrtoint <8 x float>* %0 to i64
%15 = add i64 %14, 31
%16 = udiv i64 %15, 32
%17 = mul i64 %16, 32
%18 = inttoptr i64 %17 to <8 x float>*
%19 = getelementptr <8 x float>, <8 x float>* %18, i64 %11
%20 = load <8 x float>, <8 x float>* %19
%21 = load <8 x float>, <8 x float>* %9
%22 = fadd <8 x float> %20, %21
store <8 x float> %22, <8 x float>* %9
%23 = add i64 %11, 1
br label %10

24: ; preds = %10
%25 = load <8 x float>, <8 x float>* %9
ret <8 x float> %25
}
-------------------------------

llvm dev - Sep 2019 - load/stores adjusted to align - prevent aggregate replacement/elimination

[llvm-dev] load/stores adjusted to align - prevent aggregate replacement/elimination

[llvm-dev] load/stores adjusted to align - prevent aggregate replacement/elimination