thr3ads.net - llvm dev - [LLVMdev] Unaligned vector memory access for ARM/NEON. [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Peter Couperus

2012-Sep-05 21:42 UTC

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello all,

I am a first time writer here, but am a happy LLVM tinkerer.  It is a 
pleasure to use :).
We have come across some sub-optimal behavior when LLVM lowers loads for 
vectors with small integers, i.e. load <4 x i16>* %a, align 2,
using a sequence of scalar loads rather than a single vld1 on armv7 
linux with NEON.
Looking at the code in svn, it appears the ARM backend is capable of 
lowering these loads as desired, and will if we use an appropriate 
darwin triple.
It appears this was actually enabled relatively recently.
Seemingly, the case where the Subtarget has NEON available should be 
handled the same on Darwin and Linux.
Is this true, or am I missing something?
Do the regulars have an opinion on the best way to handle this?
Thanks!

Pete

Jim Grosbach

2012-Sep-05 22:15 UTC

head link

[LLVMdev] Unaligned vector memory access for ARM/NEON.

VLD1 expects a 64-bit aligned address unless the target explicitly days that
unaligned loads are OK.

For your situation, either the subtarget should set AllowsUnalignedMem to true
(if that's accurate), or the load address should be made 64-bit aligned.

-Jim

On Sep 5, 2012, at 2:42 PM, Peter Couperus <peter.couperus at st.com>
wrote:
> Hello all,
> 
> I am a first time writer here, but am a happy LLVM tinkerer.  It is a
pleasure to use :).
> We have come across some sub-optimal behavior when LLVM lowers loads for
vectors with small integers, i.e. load <4 x i16>* %a, align 2,
> using a sequence of scalar loads rather than a single vld1 on armv7 linux
with NEON.
> Looking at the code in svn, it appears the ARM backend is capable of
lowering these loads as desired, and will if we use an appropriate darwin
triple.
> It appears this was actually enabled relatively recently.
> Seemingly, the case where the Subtarget has NEON available should be
handled the same on Darwin and Linux.
> Is this true, or am I missing something?
> Do the regulars have an opinion on the best way to handle this?
> Thanks!
> 
> Pete
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Peter Couperus

2012-Sep-05 23:25 UTC

head link

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello Jim,

Thank you for the response.  I may be confused about the alignment rules 
here.
I had been looking at the ARM RVCT Assembler Guide, which seems to 
indicate vld1.16 operates on 16-bit aligned data, unless I am 
misinterpreting their table
(Table 5-11 in ARM DUI 0204H, pg 5-70,5-71).
Prior to the table, It does mention the accesses need to be "element" 
aligned, where I took element in this case to mean i16.

Anyhow, to make this a little more concrete:

void extend(short* a, int* b) {
   for(int i = 0; i < 8; i++)
     b[i] = (int)a[i];
}

When I compile this program with clang -O3 -ccc-host-triple 
armv7-none-linux-gnueabi -mfpu=neon -mllvm -vectorize, the intermediate 
LLVM assembly
looks OK (and it has an align 2 vector load), but the generated ARM 
assembly has the scalar loads.
When I compile with (4.6) gcc -std=c99 -ftree-vectorize -marm -mfpu=neon 
-O3, it uses vld1.16 and vst1.32 regardless of the parameter alignment.  
This is on armv7a.

The gcc version (and the clang version with our modified backend) runs 
fine, even on 2-byte aligned data.  Is this not a guarantee across 
armv7/armv7a generally?

Pete




On 09/05/2012 03:15 PM, Jim Grosbach wrote:> VLD1 expects a 64-bit aligned address unless the target explicitly days
that unaligned loads are OK.
>
> For your situation, either the subtarget should set AllowsUnalignedMem to
true (if that's accurate), or the load address should be made 64-bit
aligned.
>
> -Jim
>
> On Sep 5, 2012, at 2:42 PM, Peter Couperus<peter.couperus at st.com> 
wrote:
>
>> Hello all,
>>
>> I am a first time writer here, but am a happy LLVM tinkerer.  It is a
pleasure to use :).
>> We have come across some sub-optimal behavior when LLVM lowers loads
for vectors with small integers, i.e. load<4 x i16>* %a, align 2,
>> using a sequence of scalar loads rather than a single vld1 on armv7
linux with NEON.
>> Looking at the code in svn, it appears the ARM backend is capable of
lowering these loads as desired, and will if we use an appropriate darwin
triple.
>> It appears this was actually enabled relatively recently.
>> Seemingly, the case where the Subtarget has NEON available should be
handled the same on Darwin and Linux.
>> Is this true, or am I missing something?
>> Do the regulars have an opinion on the best way to handle this?
>> Thanks!
>>
>> Pete
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: extend.c
Type: text/x-csrc
Size: 92 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120905/3e81319f/attachment.c>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Sep 2012 - [LLVMdev] Unaligned vector memory access for ARM/NEON.

[LLVMdev] Unaligned vector memory access for ARM/NEON.

[LLVMdev] Unaligned vector memory access for ARM/NEON.

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Possibly Parallel Threads