thr3ads.net - llvm dev - [LLVMdev] Disable vectorization for unaligned data [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Francois Pichet

2013-Jul-19 20:14 UTC

[LLVMdev] Disable vectorization for unaligned data

What is the proper solution to disable auto-vectorization for unaligned
data?

I have an out of tree target and I added this:

bool OpusTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast)
const {
  if (VT.isVector())
    return false;
....
}

After that, I could see that vectorization is still done on unaligned data
except that llvm will copy the data back and forth from the source to the
top of the stack and work from there. This is very costly, I rather get
scalar operations.

Then I tried to add:
  unsigned getMemoryOpCost(unsigned Opcode, Type *Src,
                           unsigned Alignment,
                           unsigned AddressSpace) const {
    if (Src->isVectorTy() && Alignment != 16)
      return 10000; // <== high number to try to avoid unaligned load/store.
    return TargetTransformInfo::getMemoryOpCost(Opcode, Src, Alignment,
AddressSpace);
  }

Except that this doesn't work because Alignment will always be 4 even for
data like:
       int   data[16][16] __attribute__ ((aligned (16))),

Because individual element are still 4-byte aligned.

I am not sure what is the right way to do it?
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/ec295a09/attachment.html>

Eli Friedman

2013-Jul-19 20:32 UTC

head link

[LLVMdev] Disable vectorization for unaligned data

On Fri, Jul 19, 2013 at 1:14 PM, Francois Pichet <pichet2000 at gmail.com>
wrote:>
> What is the proper solution to disable auto-vectorization for unaligned
> data?
Why are you trying to do this?  If auto-vectorization is making a
given loop slower on your target, that means the cost metrics are off,
and we should fix them.  If code size is an issue, you should tell the
optimizer that you want to optimize for size.

-Eli
> I have an out of tree target and I added this:
>
> bool OpusTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast)
> const {
>   if (VT.isVector())
>     return false;
> ....
> }
>
> After that, I could see that vectorization is still done on unaligned data
> except that llvm will copy the data back and forth from the source to the
> top of the stack and work from there. This is very costly, I rather get
> scalar operations.
>
> Then I tried to add:
>   unsigned getMemoryOpCost(unsigned Opcode, Type *Src,
>                            unsigned Alignment,
>                            unsigned AddressSpace) const {
>     if (Src->isVectorTy() && Alignment != 16)
>       return 10000; // <== high number to try to avoid unaligned
load/store.
>     return TargetTransformInfo::getMemoryOpCost(Opcode, Src, Alignment,
> AddressSpace);
>   }
>
> Except that this doesn't work because Alignment will always be 4 even
for
> data like:
>        int   data[16][16] __attribute__ ((aligned (16))),
>
> Because individual element are still 4-byte aligned.
>
> I am not sure what is the right way to do it?
> Thanks.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Francois Pichet

2013-Jul-19 20:39 UTC

head link

[LLVMdev] Disable vectorization for unaligned data

Because unaligned load/store are illegal on my target.
But ExpandUnalignedStore expand to too many load/store.

It seem that ExpandUnalignedStore is called after the vectorization cost
analysis is done and not taken into account.



On Fri, Jul 19, 2013 at 4:32 PM, Eli Friedman <eli.friedman at
gmail.com>wrote:
> On Fri, Jul 19, 2013 at 1:14 PM, Francois Pichet <pichet2000 at
gmail.com>
> wrote:
> >
> > What is the proper solution to disable auto-vectorization for
unaligned
> > data?
>
> Why are you trying to do this?  If auto-vectorization is making a
> given loop slower on your target, that means the cost metrics are off,
> and we should fix them.  If code size is an issue, you should tell the
> optimizer that you want to optimize for size.
>
> -Eli
>
> > I have an out of tree target and I added this:
> >
> > bool OpusTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool
> *Fast)
> > const {
> >   if (VT.isVector())
> >     return false;
> > ....
> > }
> >
> > After that, I could see that vectorization is still done on unaligned
> data
> > except that llvm will copy the data back and forth from the source to
the
> > top of the stack and work from there. This is very costly, I rather
get
> > scalar operations.
> >
> > Then I tried to add:
> >   unsigned getMemoryOpCost(unsigned Opcode, Type *Src,
> >                            unsigned Alignment,
> >                            unsigned AddressSpace) const {
> >     if (Src->isVectorTy() && Alignment != 16)
> >       return 10000; // <== high number to try to avoid unaligned
> load/store.
> >     return TargetTransformInfo::getMemoryOpCost(Opcode, Src,
Alignment,
> > AddressSpace);
> >   }
> >
> > Except that this doesn't work because Alignment will always be 4
even for
> > data like:
> >        int   data[16][16] __attribute__ ((aligned (16))),
> >
> > Because individual element are still 4-byte aligned.
> >
> > I am not sure what is the right way to do it?
> > Thanks.
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/9c586821/attachment.html>

Arnold Schwaighofer

2013-Jul-20 16:52 UTC

head link

[LLVMdev] Disable vectorization for unaligned data

On Jul 19, 2013, at 3:14 PM, Francois Pichet <pichet2000 at gmail.com>
wrote:
> 
> What is the proper solution to disable auto-vectorization for unaligned
data?
> 
> I have an out of tree target and I added this:
> 
> bool OpusTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast)
const {
>   if (VT.isVector())
>     return false;
> ....
> }
> 
> After that, I could see that vectorization is still done on unaligned data
except that llvm will copy the data back and forth from the source to the top of
the stack and work from there. This is very costly, I rather get scalar
operations.
> 
> Then I tried to add:
>   unsigned getMemoryOpCost(unsigned Opcode, Type *Src,
>                            unsigned Alignment,
>                            unsigned AddressSpace) const {
>     if (Src->isVectorTy() && Alignment != 16)
>       return 10000; // <== high number to try to avoid unaligned
load/store.
>     return TargetTransformInfo::getMemoryOpCost(Opcode, Src, Alignment,
AddressSpace);
>   }
> 
> Except that this doesn't work because Alignment will always be 4 even
for data like:
>        int   data[16][16] __attribute__ ((aligned (16))),
> 
> Because individual element are still 4-byte aligned.
We will have to hook up some logic in the loop vectorizer that computes the
alignment of the vectorized version of the memory access so that we can pass it
to “getMemoryOpCost". Currently, as you have observed, we will just pass
the scalar loop’s memory access alignment which will be pessimistic.

Instcombine will later replace the alignment to a stronger variant for
vectorized code but that is obviously to late for the cost model in the
vectorizer.

Francois Pichet

2013-Jul-21 14:29 UTC

head link

[LLVMdev] Disable vectorization for unaligned data

Ok any quick workaround to limit vectorization to 16-byte aligned 128-bit
data then?

All the memory copying done by ExpandUnalignedStore/ExpandUnalignedLoad is
just too expensive.


On Sat, Jul 20, 2013 at 12:52 PM, Arnold Schwaighofer <
aschwaighofer at apple.com> wrote:
>
> On Jul 19, 2013, at 3:14 PM, Francois Pichet <pichet2000 at
gmail.com> wrote:
>
> >
> > What is the proper solution to disable auto-vectorization for
unaligned
> data?
> >
> > I have an out of tree target and I added this:
> >
> > bool OpusTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool
> *Fast) const {
> >   if (VT.isVector())
> >     return false;
> > ....
> > }
> >
> > After that, I could see that vectorization is still done on unaligned
> data except that llvm will copy the data back and forth from the source to
> the top of the stack and work from there. This is very costly, I rather get
> scalar operations.
> >
> > Then I tried to add:
> >   unsigned getMemoryOpCost(unsigned Opcode, Type *Src,
> >                            unsigned Alignment,
> >                            unsigned AddressSpace) const {
> >     if (Src->isVectorTy() && Alignment != 16)
> >       return 10000; // <== high number to try to avoid unaligned
> load/store.
> >     return TargetTransformInfo::getMemoryOpCost(Opcode, Src,
Alignment,
> AddressSpace);
> >   }
> >
> > Except that this doesn't work because Alignment will always be 4
even
> for data like:
> >        int   data[16][16] __attribute__ ((aligned (16))),
> >
> > Because individual element are still 4-byte aligned.
>
> We will have to hook up some logic in the loop vectorizer that computes
> the alignment of the vectorized version of the memory access so that we can
> pass it to “getMemoryOpCost". Currently, as you have observed, we will
just
> pass the scalar loop’s memory access alignment which will be pessimistic.
>
> Instcombine will later replace the alignment to a stronger variant for
> vectorized code but that is obviously to late for the cost model in the
> vectorizer.-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130721/15b85cb6/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Jul 2013 - [LLVMdev] Disable vectorization for unaligned data

[LLVMdev] Disable vectorization for unaligned data

[LLVMdev] Disable vectorization for unaligned data

[LLVMdev] Disable vectorization for unaligned data

[LLVMdev] Disable vectorization for unaligned data

[LLVMdev] Disable vectorization for unaligned data

Possibly Parallel Threads