thr3ads.net - llvm dev - [LLVMdev] Limit loop vectorizer to SSE [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Arnold Schwaighofer

2013-Nov-15 23:15 UTC

[LLVMdev] Limit loop vectorizer to SSE

A fix for this is in r194876.

Thanks for reporting this!


On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at gmail.com>
wrote:
> Nadav,
> 
> I believe aligned accesses to unaligned pointers is precisely the issue.
Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass.
There is no alignment assumption associated with %kernel_data prior to
vectorization. I can't tell if it's the loop vectorizer or the codegen
at fault, but the alignment assumption seems to sneak in somewhere.
> 
> v/r,
> Josh
> 
> [1] http://pastebin.com/kc95WtGG
> [2] http://pastebin.com/VY3ZLVJK
> 
> 
> On Fri, Nov 15, 2013 at 3:58 PM, Nadav Rotem <nrotem at apple.com>
wrote:
> 
> On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at
linaro.org> wrote:
> 
>> On 15 November 2013 20:24, Joshua Klontz <josh.klontz at
gmail.com> wrote:
>> Agreed, is there a pass that will insert a runtime alignment check?
Also, what's the easiest way to get at
TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32?
Thanks!
>> 
>> I think that's a fair question, and it's about safety. If
you're getting this on the JIT, means we may be generating unsafe
transformations on the vectorizer.
>> 
>> Arnold, Nadav, I don't remember seeing code to generate any
run-time alignment checks on the incoming pointer, is there such a thing? If
not, shouldn't we add one?
> 
> 
> If the the vectorizer generates aligned memory accesses to unaligned
addresses then this is a serious bug.  But I don’t think that Josh said that the
vectorizer generated aligned accesses to unaligned pointers.
> 
> There is no point in LLVM checking for alignment because if the memory is
unaligned then the program will crash.  Users who want to crash with a readable
error message can simply write code that checks the pointer (by masking the high
bits and comparing to zero).
> 
>

Frank Winter

2013-Nov-16 01:23 UTC

head link

[LLVMdev] Limit loop vectorizer to SSE

I confirm that r194876 fixes the issue, i.e. segfault not caused.

My program still passed 16 byte aligned pointers to the function
which the loop vectorizer processes successfully:

LV: Vector loop of width 8 costs: 1.
LV: Selecting VF = : 8.
LV: Found a vectorizable loop (8) in func_orig.ll
LV: Unroll Factor is 1

Since the program runs fine, it seems to be allowed for the CPU
to issue a vector load (8 floats) to a 16 byte aligned address (as
opposed to 32 byte aligned). Or does in fact the loop vectorizer
handle this case in the preamble and the vector.body issues only
32 byte aligned accesses. In which case I would align the payload
to 32 byte in order to save a little in the preamble.

Frank



On 15/11/13 18:15, Arnold Schwaighofer wrote:> A fix for this is in r194876.
>
> Thanks for reporting this!
>
>
> On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at gmail.com>
wrote:
>
>> Nadav,
>>
>> I believe aligned accesses to unaligned pointers is precisely the
issue. Consider the function `add_u8S` before[1] and after[2] the loop
vectorizer pass. There is no alignment assumption associated with %kernel_data
prior to vectorization. I can't tell if it's the loop vectorizer or the
codegen at fault, but the alignment assumption seems to sneak in somewhere.
>>
>> v/r,
>> Josh
>>
>> [1] http://pastebin.com/kc95WtGG
>> [2] http://pastebin.com/VY3ZLVJK
>>
>>
>> On Fri, Nov 15, 2013 at 3:58 PM, Nadav Rotem <nrotem at
apple.com> wrote:
>>
>> On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at
linaro.org> wrote:
>>
>>> On 15 November 2013 20:24, Joshua Klontz <josh.klontz at
gmail.com> wrote:
>>> Agreed, is there a pass that will insert a runtime alignment check?
Also, what's the easiest way to get at
TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32?
Thanks!
>>>
>>> I think that's a fair question, and it's about safety. If
you're getting this on the JIT, means we may be generating unsafe
transformations on the vectorizer.
>>>
>>> Arnold, Nadav, I don't remember seeing code to generate any
run-time alignment checks on the incoming pointer, is there such a thing? If
not, shouldn't we add one?
>>
>> If the the vectorizer generates aligned memory accesses to unaligned
addresses then this is a serious bug.  But I don’t think that Josh said that the
vectorizer generated aligned accesses to unaligned pointers.
>>
>> There is no point in LLVM checking for alignment because if the memory
is unaligned then the program will crash.  Users who want to crash with a
readable error message can simply write code that checks the pointer (by masking
the high bits and comparing to zero).
>>
>>

Arnold Schwaighofer

2013-Nov-16 01:34 UTC

head link

[LLVMdev] Limit loop vectorizer to SSE

The vectorizer will now emit

= load <8 x i32>, align #TargetAlignmentOfScalari32

where before it would emit

= load <8 x i32>

(which has the semantics of “= load <8 xi32>, align 0” which means the
address is aligned with target abi alignment, see
http://llvm.org/docs/LangRef.html#load-instruction).

When the backend generates code for the former it will emit an unaligned move:
 = vmovups ...
wheres for the later it will use an aligned move:
 = vmovaps …

vmovups can load from unaligned addresses while vmovaps can not.

No, we currently don’t peel loops for alignment.

Best,
Arnold

On Nov 15, 2013, at 7:23 PM, Frank Winter <fwinter at jlab.org> wrote:
> I confirm that r194876 fixes the issue, i.e. segfault not caused.
> 
> My program still passed 16 byte aligned pointers to the function
> which the loop vectorizer processes successfully:
> 
> LV: Vector loop of width 8 costs: 1.
> LV: Selecting VF = : 8.
> LV: Found a vectorizable loop (8) in func_orig.ll
> LV: Unroll Factor is 1
> 
> Since the program runs fine, it seems to be allowed for the CPU
> to issue a vector load (8 floats) to a 16 byte aligned address (as
> opposed to 32 byte aligned). Or does in fact the loop vectorizer
> handle this case in the preamble and the vector.body issues only
> 32 byte aligned accesses. In which case I would align the payload
> to 32 byte in order to save a little in the preamble.
> 
> Frank
> 
> 
> 
> On 15/11/13 18:15, Arnold Schwaighofer wrote:
>> A fix for this is in r194876.
>> 
>> Thanks for reporting this!
>> 
>> 
>> On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at
gmail.com> wrote:
>> 
>>> Nadav,
>>> 
>>> I believe aligned accesses to unaligned pointers is precisely the
issue. Consider the function `add_u8S` before[1] and after[2] the loop
vectorizer pass. There is no alignment assumption associated with %kernel_data
prior to vectorization. I can't tell if it's the loop vectorizer or the
codegen at fault, but the alignment assumption seems to sneak in somewhere.
>>> 
>>> v/r,
>>> Josh
>>> 
>>> [1] http://pastebin.com/kc95WtGG
>>> [2] http://pastebin.com/VY3ZLVJK
>>> 
>>> 
>>> On Fri, Nov 15, 2013 at 3:58 PM, Nadav Rotem <nrotem at
apple.com> wrote:
>>> 
>>> On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at
linaro.org> wrote:
>>> 
>>>> On 15 November 2013 20:24, Joshua Klontz <josh.klontz at
gmail.com> wrote:
>>>> Agreed, is there a pass that will insert a runtime alignment
check? Also, what's the easiest way to get at
TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32?
Thanks!
>>>> 
>>>> I think that's a fair question, and it's about safety.
If you're getting this on the JIT, means we may be generating unsafe
transformations on the vectorizer.
>>>> 
>>>> Arnold, Nadav, I don't remember seeing code to generate any
run-time alignment checks on the incoming pointer, is there such a thing? If
not, shouldn't we add one?
>>> 
>>> If the the vectorizer generates aligned memory accesses to
unaligned addresses then this is a serious bug.  But I don’t think that Josh
said that the vectorizer generated aligned accesses to unaligned pointers.
>>> 
>>> There is no point in LLVM checking for alignment because if the
memory is unaligned then the program will crash.  Users who want to crash with a
readable error message can simply write code that checks the pointer (by masking
the high bits and comparing to zero).
>>> 
>>> 
> 
>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Nov 2013 - [LLVMdev] Limit loop vectorizer to SSE

[LLVMdev] Limit loop vectorizer to SSE

[LLVMdev] Limit loop vectorizer to SSE

[LLVMdev] Limit loop vectorizer to SSE

Possibly Parallel Threads