thr3ads.net - llvm dev - [LLVMdev] MCJIT generates MOVAPS on unaligned address [Aug 2014]

If this information is useful, please help other people find it:
Share via:

Frank Winter

2014-Aug-07 21:18 UTC

[LLVMdev] MCJIT generates MOVAPS on unaligned address

It's not reproducible with 'opt'. I call the SLP pass from my 
application and only then the wrong IR gets generated.

On the attached module I call via the function pass manager:

1) TargetLibraryInfo with the target triple
2) Set the data layout
3) Basic Alias Analysis
4) SLP vectorizer

This produces the wrong IR. On the other hand running the attached 
module through 'opt -slp-vectorizer' results in no code changes.

What could I be missing here?

Frank


On 08/07/2014 04:29 PM, Arnold Schwaighofer wrote:>> On Aug 7, 2014, at 12:42 PM, Frank Winter <fwinter at jlab.org>
wrote:
>>
>> MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed
Single-Precision Floating-Point Values) on a non-aligned memory address:
>>
>>     movaps    88(%rdx), %xmm0
>>
>> where %rdx comes in as a function argument with only natural alignment
(float*). This x86 instruction requires the memory address to be 16 byte aligned
which 88 plus something aligned to 4 byte isn't.
>>
>> Here the according IR code which was produced from the SLP vectorizer:
>>
>> define void @func(float* noalias %arg0, float* noalias %arg1, float*
noalias %arg2) {
>> entrypoint:
>> ...
>>   %104 = getelementptr float* %arg0, i32 22
>> ...
>>   %204 = bitcast float* %104 to <4 x float>*
>>   store <4 x float> %198, <4 x float>* %204
>>
>> This in itself not wrong. However, shouldn't the lowering pass
recognize the wrong alignment?
> The LLVM IR is wrong. Omitting the align directive on the store means abi
alignment of the target. The backend is “right” wrt to LLVM IR semantics to
produce the movaps.
>
> The error is in the  producer (looks like the SLP vectorizer) of said
vector store. Could you provide a full test case where running the SLP
vectorizer (opt -slp-vectorize < t.ll) produces such an output?
>
> The following code in the SLP vectorizer should have made sure that we
created an alignment of “4 bytes” given a data layout
(http://llvm.org/docs/LangRef.html#data-layout) that specifies f32:32:32.
>
>      case Instruction::Store: {
>        StoreInst *SI = cast<StoreInst>(VL0);
>        unsigned Alignment = SI->getAlignment();
>        ...
>        StoreInst *S = Builder.CreateStore(VecValue, VecPtr);
>        if (!Alignment)
>          Alignment =
DL->getABITypeAlignment(SI->getPointerOperand()->getType()); //
<< Get the 4byte alignment for the scalar float store from the data layout
string.
>        S->setAlignment(Alignment);

-------------- next part --------------
A non-text attachment was scrubbed...
Name: module_H7ktW0.ll.gz
Type: application/x-gzip
Size: 65781 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140807/a42fe088/attachment.bin>

Jim Grosbach

2014-Aug-07 21:54 UTC

head link

[LLVMdev] MCJIT generates MOVAPS on unaligned address

r214322 fixed one place where the SLP vectorizer uncovered a latent (very
latent) bug in the combiner. Perhaps something similar is going on here?

-Jim
> On Aug 7, 2014, at 2:18 PM, Frank Winter <fwinter at jlab.org> wrote:
> 
> It's not reproducible with 'opt'. I call the SLP pass from my
application and only then the wrong IR gets generated.
> 
> On the attached module I call via the function pass manager:
> 
> 1) TargetLibraryInfo with the target triple
> 2) Set the data layout
> 3) Basic Alias Analysis
> 4) SLP vectorizer
> 
> This produces the wrong IR. On the other hand running the attached module
through 'opt -slp-vectorizer' results in no code changes.
> 
> What could I be missing here?
> 
> Frank
> 
> 
> On 08/07/2014 04:29 PM, Arnold Schwaighofer wrote:
>>> On Aug 7, 2014, at 12:42 PM, Frank Winter <fwinter at
jlab.org> wrote:
>>> 
>>> MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned
Packed Single-Precision Floating-Point Values) on a non-aligned memory address:
>>> 
>>>    movaps    88(%rdx), %xmm0
>>> 
>>> where %rdx comes in as a function argument with only natural
alignment (float*). This x86 instruction requires the memory address to be 16
byte aligned which 88 plus something aligned to 4 byte isn't.
>>> 
>>> Here the according IR code which was produced from the SLP
vectorizer:
>>> 
>>> define void @func(float* noalias %arg0, float* noalias %arg1,
float* noalias %arg2) {
>>> entrypoint:
>>> ...
>>>  %104 = getelementptr float* %arg0, i32 22
>>> ...
>>>  %204 = bitcast float* %104 to <4 x float>*
>>>  store <4 x float> %198, <4 x float>* %204
>>> 
>>> This in itself not wrong. However, shouldn't the lowering pass
recognize the wrong alignment?
>> The LLVM IR is wrong. Omitting the align directive on the store means
abi alignment of the target. The backend is “right” wrt to LLVM IR semantics to
produce the movaps.
>> 
>> The error is in the  producer (looks like the SLP vectorizer) of said
vector store. Could you provide a full test case where running the SLP
vectorizer (opt -slp-vectorize < t.ll) produces such an output?
>> 
>> The following code in the SLP vectorizer should have made sure that we
created an alignment of “4 bytes” given a data layout
(http://llvm.org/docs/LangRef.html#data-layout) that specifies f32:32:32.
>> 
>>     case Instruction::Store: {
>>       StoreInst *SI = cast<StoreInst>(VL0);
>>       unsigned Alignment = SI->getAlignment();
>>       ...
>>       StoreInst *S = Builder.CreateStore(VecValue, VecPtr);
>>       if (!Alignment)
>>         Alignment =
DL->getABITypeAlignment(SI->getPointerOperand()->getType()); //
<< Get the 4byte alignment for the scalar float store from the data layout
string.
>>       S->setAlignment(Alignment);
> 
> 
> <module_H7ktW0.ll.gz>_______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140807/362340e1/attachment.html>

Arnold Schwaighofer

2014-Aug-07 21:57 UTC

head link

[LLVMdev] MCJIT generates MOVAPS on unaligned address

Your .ll file does not have a data layout. Opt will not initialize the
DataLayoutPass. The SLP vectorizer will not vectorize because there is no
DataLayoutPass.

  debug-cmake/bin/opt
-default-data-layout="e-m:e-i64:64-f80:128-n8:16:32:64-S128" -basicaa
-slp-vectorizer -S </Users/arnold/Downloads/module_H7ktW0.ll | grep
"<4 x" |  grep store
  store <4 x float> %198, <4 x float>* %204, align 8

There is a bug in the SLPVectorizer however - it should be “align 4” - we get
the alignment of the pointer type which is not what we want we want the
alignment of the stored/loaded value. It should be

if (!Alignment)
  Alignment =
DL->getABITypeAlignment(SI->getValueOperand()->getType());

I am not sure that would fix your issue though, because that would mean we
return the wrong alignment not none.

If the call below returns 0 then something has gone wrong in setting up the data
layout in your compilation pipeline.

> On Aug 7, 2014, at 2:18 PM, Frank Winter <fwinter at jlab.org> wrote:
> 
> It's not reproducible with 'opt'. I call the SLP pass from my
application and only then the wrong IR gets generated.
> 
> On the attached module I call via the function pass manager:
> 
> 1) TargetLibraryInfo with the target triple
> 2) Set the data layout
> 3) Basic Alias Analysis
> 4) SLP vectorizer
> 
> This produces the wrong IR. On the other hand running the attached module
through 'opt -slp-vectorizer' results in no code changes.
> 
> What could I be missing here?
> 
> Frank
> 
> 
> On 08/07/2014 04:29 PM, Arnold Schwaighofer wrote:
>>> On Aug 7, 2014, at 12:42 PM, Frank Winter <fwinter at
jlab.org> wrote:
>>> 
>>> MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned
Packed Single-Precision Floating-Point Values) on a non-aligned memory address:
>>> 
>>>    movaps    88(%rdx), %xmm0
>>> 
>>> where %rdx comes in as a function argument with only natural
alignment (float*). This x86 instruction requires the memory address to be 16
byte aligned which 88 plus something aligned to 4 byte isn't.
>>> 
>>> Here the according IR code which was produced from the SLP
vectorizer:
>>> 
>>> define void @func(float* noalias %arg0, float* noalias %arg1,
float* noalias %arg2) {
>>> entrypoint:
>>> ...
>>>  %104 = getelementptr float* %arg0, i32 22
>>> ...
>>>  %204 = bitcast float* %104 to <4 x float>*
>>>  store <4 x float> %198, <4 x float>* %204
>>> 
>>> This in itself not wrong. However, shouldn't the lowering pass
recognize the wrong alignment?
>> The LLVM IR is wrong. Omitting the align directive on the store means
abi alignment of the target. The backend is “right” wrt to LLVM IR semantics to
produce the movaps.
>> 
>> The error is in the  producer (looks like the SLP vectorizer) of said
vector store. Could you provide a full test case where running the SLP
vectorizer (opt -slp-vectorize < t.ll) produces such an output?
>> 
>> The following code in the SLP vectorizer should have made sure that we
created an alignment of “4 bytes” given a data layout
(http://llvm.org/docs/LangRef.html#data-layout) that specifies f32:32:32.
>> 
>>     case Instruction::Store: {
>>       StoreInst *SI = cast<StoreInst>(VL0);
>>       unsigned Alignment = SI->getAlignment();
>>       ...
>>       StoreInst *S = Builder.CreateStore(VecValue, VecPtr);
>>       if (!Alignment)
>>         Alignment =
DL->getABITypeAlignment(SI->getPointerOperand()->getType()); //
<< Get the 4byte alignment for the scalar float store from the data layout
string.
>>       S->setAlignment(Alignment);
> 
> 
> <module_H7ktW0.ll.gz>

Arnold Schwaighofer

2014-Aug-07 22:59 UTC

head link

[LLVMdev] MCJIT generates MOVAPS on unaligned address

> On Aug 7, 2014, at 2:57 PM, Arnold Schwaighofer <aschwaighofer at
apple.com> wrote:
> 
> Your .ll file does not have a data layout. Opt will not initialize the
DataLayoutPass. The SLP vectorizer will not vectorize because there is no
DataLayoutPass.
> 
>  debug-cmake/bin/opt
-default-data-layout="e-m:e-i64:64-f80:128-n8:16:32:64-S128" -basicaa
-slp-vectorizer -S </Users/arnold/Downloads/module_H7ktW0.ll | grep
"<4 x" |  grep store
>  store <4 x float> %198, <4 x float>* %204, align 8
> 
> There is a bug in the SLPVectorizer however - it should be “align 4” - we
get the alignment of the pointer type which is not what we want we want the
alignment of the stored/loaded value. It should be
> 
> if (!Alignment)
>  Alignment =
DL->getABITypeAlignment(SI->getValueOperand()->getType());

r215162 fixes this bug.

> 
> I am not sure that would fix your issue though, because that would mean we
return the wrong alignment not none.
> 
> If the call below returns 0 then something has gone wrong in setting up the
data layout in your compilation pipeline.
> 
> 
>> On Aug 7, 2014, at 2:18 PM, Frank Winter <fwinter at jlab.org>
wrote:
>> 
>> It's not reproducible with 'opt'. I call the SLP pass from
my application and only then the wrong IR gets generated.
>> 
>> On the attached module I call via the function pass manager:
>> 
>> 1) TargetLibraryInfo with the target triple
>> 2) Set the data layout
>> 3) Basic Alias Analysis
>> 4) SLP vectorizer
>> 
>> This produces the wrong IR. On the other hand running the attached
module through 'opt -slp-vectorizer' results in no code changes.
>> 
>> What could I be missing here?
>> 
>> Frank
>> 
>> 
>> On 08/07/2014 04:29 PM, Arnold Schwaighofer wrote:
>>>> On Aug 7, 2014, at 12:42 PM, Frank Winter <fwinter at
jlab.org> wrote:
>>>> 
>>>> MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned
Packed Single-Precision Floating-Point Values) on a non-aligned memory address:
>>>> 
>>>>   movaps    88(%rdx), %xmm0
>>>> 
>>>> where %rdx comes in as a function argument with only natural
alignment (float*). This x86 instruction requires the memory address to be 16
byte aligned which 88 plus something aligned to 4 byte isn't.
>>>> 
>>>> Here the according IR code which was produced from the SLP
vectorizer:
>>>> 
>>>> define void @func(float* noalias %arg0, float* noalias %arg1,
float* noalias %arg2) {
>>>> entrypoint:
>>>> ...
>>>> %104 = getelementptr float* %arg0, i32 22
>>>> ...
>>>> %204 = bitcast float* %104 to <4 x float>*
>>>> store <4 x float> %198, <4 x float>* %204
>>>> 
>>>> This in itself not wrong. However, shouldn't the lowering
pass recognize the wrong alignment?
>>> The LLVM IR is wrong. Omitting the align directive on the store
means abi alignment of the target. The backend is “right” wrt to LLVM IR
semantics to produce the movaps.
>>> 
>>> The error is in the  producer (looks like the SLP vectorizer) of
said vector store. Could you provide a full test case where running the SLP
vectorizer (opt -slp-vectorize < t.ll) produces such an output?
>>> 
>>> The following code in the SLP vectorizer should have made sure that
we created an alignment of “4 bytes” given a data layout
(http://llvm.org/docs/LangRef.html#data-layout) that specifies f32:32:32.
>>> 
>>>    case Instruction::Store: {
>>>      StoreInst *SI = cast<StoreInst>(VL0);
>>>      unsigned Alignment = SI->getAlignment();
>>>      ...
>>>      StoreInst *S = Builder.CreateStore(VecValue, VecPtr);
>>>      if (!Alignment)
>>>        Alignment =
DL->getABITypeAlignment(SI->getPointerOperand()->getType()); //
<< Get the 4byte alignment for the scalar float store from the data layout
string.
>>>      S->setAlignment(Alignment);
>> 
>> 
>> <module_H7ktW0.ll.gz>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Aug 2014 - [LLVMdev] MCJIT generates MOVAPS on unaligned address

[LLVMdev] MCJIT generates MOVAPS on unaligned address

[LLVMdev] MCJIT generates MOVAPS on unaligned address

[LLVMdev] MCJIT generates MOVAPS on unaligned address

[LLVMdev] MCJIT generates MOVAPS on unaligned address

Possibly Parallel Threads