thr3ads.net - llvm dev - [LLVMdev] AVX Status? [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Ralf Karrenberg

2011-Jun-01 12:52 UTC

[LLVMdev] AVX Status?

Hi,

The last time the AVX backend was mentioned on this list seems to be 
from November 2010, so I would like to ask about the current status. Is 
anybody (e.g. at Cray?) still actively working on it?

I have tried both LLVM 2.9 final and the latest trunk, and it seems like 
some trivial stuff is already working and produces nice code for code 
using <8 x float>.
Unfortunately, the backend gets confused about mask code as e.g. 
produced by VCMPPS together with mask operations (which LLVM requires to 
work on <8 x i32> atm) and corresponding bitcasts.

Consider these two examples:

define <8 x float> @test1(<8 x float> %a, <8 x float> %b,
<8 x i32> %m)
nounwind readnone {
entry:
   %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float>
%a,
<8 x float> %b, i8 1) nounwind readnone
   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x
float>
%a, <8 x float> %b, <8 x float> %cmp) nounwind readnone
   ret <8 x float> %res
}

This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS).

On the other hand, this does not work:

define <8 x float> @test2(<8 x float> %a, <8 x float> %b,
<8 x i32> %m)
nounwind readnone {
entry:
   %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float>
%a,
<8 x float> %b, i8 1) nounwind readnone
   %cast = bitcast <8 x float> %cmp to <8 x i32>
   %mask = and <8 x i32> %cast, %m
   %blend_cond = bitcast <8 x i32> %mask to <8 x float>
   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x
float>
%a, <8 x float> %b, <8 x float> %blend_cond) nounwind readnone
   ret <8 x float> %res
}

llc (latest trunk) bails out with:

LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16]
   0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15]
     0x2532070: v4i64 = bitcast 0x2510740 [ID=14]
       0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340, 
0x2510f40, 0x2511140 [ORD=3] [ID=12]
...

The same counts for or and xor where VXORPS etc. should be selected. 
There seems to be some code for this because
xor <8 x i32> %m, %m
works, probably because it can get rid of all bitcasts.

Ideally, I guess we would want code like this instead of the intrinsics 
at some point:

define <8 x float> @test3(<8 x float> %a, <8 x float> %b,
<8 x i1> %m)
nounwind readnone {
entry:
   %cmp = fcmp ugt <8 x float> %a, %b
   %mask = and <8 x i1> %cmp, %m
   %res = select <8 x i1> %mask, <8 x float> %a, <8 x float>
%b
   ret <8 x float> %res
}

-> VCMPPS, VANDPS, BLENDVPS

Nadav Rotem sent around a patch a few weeks ago in which he implemented 
codegen for the select for SSE, unfortunately I did not have time to 
look at it in more depth so far.

Can anybody comment on the current status of AVX?

Best,
Ralf

Syoyo Fujita

2011-Jun-02 16:27 UTC

head link

[LLVMdev] AVX Status?

Hello Ralf,

Chris said AVX backend is not yet mature.

http://www.mail-archive.com/llvmbugs at cs.uiuc.edu/msg12442.html

I am also interested in AVX codegen backend and trying to write a
patch to fix current unusable AVX codegen.
I have just tried to submit a patch to fix fpextend(VCVTSS2SD) and
sitofp(VCVTSI2SD) codegen, and its in reviewing phase.

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110530/121689.html

It'd be definitely welcome for AVX committers, but at this time no one
is actively working AVX backend.

I am trying to write a AVX patch as much as possible(at least to run
my AVX code correctly), but my time is very limited, so I hope someone
else would actively work on AVX backend...


On Wed, Jun 1, 2011 at 9:52 PM, Ralf Karrenberg <Chareos at gmx.de>
wrote:> Hi,
>
> The last time the AVX backend was mentioned on this list seems to be
> from November 2010, so I would like to ask about the current status. Is
> anybody (e.g. at Cray?) still actively working on it?
>
> I have tried both LLVM 2.9 final and the latest trunk, and it seems like
> some trivial stuff is already working and produces nice code for code
> using <8 x float>.
> Unfortunately, the backend gets confused about mask code as e.g.
> produced by VCMPPS together with mask operations (which LLVM requires to
> work on <8 x i32> atm) and corresponding bitcasts.
>
> Consider these two examples:
>
> define <8 x float> @test1(<8 x float> %a, <8 x float> %b,
<8 x i32> %m)
> nounwind readnone {
> entry:
>   %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x
float> %a,
> <8 x float> %b, i8 1) nounwind readnone
>   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x
float>
> %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone
>   ret <8 x float> %res
> }
>
> This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS).
>
> On the other hand, this does not work:
>
> define <8 x float> @test2(<8 x float> %a, <8 x float> %b,
<8 x i32> %m)
> nounwind readnone {
> entry:
>   %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x
float> %a,
> <8 x float> %b, i8 1) nounwind readnone
>   %cast = bitcast <8 x float> %cmp to <8 x i32>
>   %mask = and <8 x i32> %cast, %m
>   %blend_cond = bitcast <8 x i32> %mask to <8 x float>
>   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x
float>
> %a, <8 x float> %b, <8 x float> %blend_cond) nounwind readnone
>   ret <8 x float> %res
> }
>
> llc (latest trunk) bails out with:
>
> LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16]
>   0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15]
>     0x2532070: v4i64 = bitcast 0x2510740 [ID=14]
>       0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340,
> 0x2510f40, 0x2511140 [ORD=3] [ID=12]
> ...
>
> The same counts for or and xor where VXORPS etc. should be selected.
> There seems to be some code for this because
> xor <8 x i32> %m, %m
> works, probably because it can get rid of all bitcasts.
>
> Ideally, I guess we would want code like this instead of the intrinsics
> at some point:
>
> define <8 x float> @test3(<8 x float> %a, <8 x float> %b,
<8 x i1> %m)
> nounwind readnone {
> entry:
>   %cmp = fcmp ugt <8 x float> %a, %b
>   %mask = and <8 x i1> %cmp, %m
>   %res = select <8 x i1> %mask, <8 x float> %a, <8 x
float> %b
>   ret <8 x float> %res
> }
>
> -> VCMPPS, VANDPS, BLENDVPS
>
> Nadav Rotem sent around a patch a few weeks ago in which he implemented
> codegen for the select for SSE, unfortunately I did not have time to
> look at it in more depth so far.
>
> Can anybody comment on the current status of AVX?
>
> Best,
> Ralf
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Bruno Cardoso Lopes

2011-Jun-02 21:55 UTC

head link

[LLVMdev] AVX Status?

Hi Ralf

On Wednesday, June 1, 2011, Ralf Karrenberg <Chareos at gmx.de>
wrote:> Hi,
>
> The last time the AVX backend was mentioned on this list seems to be
> from November 2010, so I would like to ask about the current status. Is
> anybody (e.g. at Cray?) still actively working on it?
I don't think so!
> I have tried both LLVM 2.9 final and the latest trunk, and it seems like
> some trivial stuff is already working and produces nice code for code
> using <8 x float>.
Almost everything that could be matched in tablegen files only by
extending the 128-bit PatFrags and PatLeafs to their 256-bit
counterparts should work, but besides that (which is where the
interesting stuff happens) there's no support yet!
> Unfortunately, the backend gets confused about mask code as e.g.
> produced by VCMPPS together with mask operations (which LLVM requires to
> work on <8 x i32> atm) and corresponding bitcasts.
>
> Consider these two examples:
>
> define <8 x float> @test1(<8 x float> %a, <8 x float> %b,
<8 x i32> %m)
> nounwind readnone {
> entry:
>    %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x
float> %a,
> <8 x float> %b, i8 1) nounwind readnone
>    %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x
float>
> %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone
>    ret <8 x float> %res
> }
>
> This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS).
>
> On the other hand, this does not work:
>
> define <8 x float> @test2(<8 x float> %a, <8 x float> %b,
<8 x i32> %m)
> nounwind readnone {
> entry:
>    %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x
float> %a,
> <8 x float> %b, i8 1) nounwind readnone
>    %cast = bitcast <8 x float> %cmp to <8 x i32>
>    %mask = and <8 x i32> %cast, %m
>    %blend_cond = bitcast <8 x i32> %mask to <8 x float>
>    %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x
float>
> %a, <8 x float> %b, <8 x float> %blend_cond) nounwind readnone
>    ret <8 x float> %res
> }
>
> llc (latest trunk) bails out with:
>
> LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16]
>    0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15]
>      0x2532070: v4i64 = bitcast 0x2510740 [ID=14]
>        0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340,
> 0x2510f40, 0x2511140 [ORD=3] [ID=12]
> ...
>
> The same counts for or and xor where VXORPS etc. should be selected.
Please file bug reports!
> There seems to be some code for this because
> xor <8 x i32> %m, %m
> works, probably because it can get rid of all bitcasts.
>
> Ideally, I guess we would want code like this instead of the intrinsics
> at some point:
>
> define <8 x float> @test3(<8 x float> %a, <8 x float> %b,
<8 x i1> %m)
> nounwind readnone {
> entry:
>    %cmp = fcmp ugt <8 x float> %a, %b
>    %mask = and <8 x i1> %cmp, %m
>    %res = select <8 x i1> %mask, <8 x float> %a, <8 x
float> %b
>    ret <8 x float> %res
> }
That would be nice indeed
> -> VCMPPS, VANDPS, BLENDVPS
>
> Nadav Rotem sent around a patch a few weeks ago in which he implemented
> codegen for the select for SSE, unfortunately I did not have time to
> look at it in more depth so far.
>
> Can anybody comment on the current status of AVX?
No codegen support yet (although some stuff works), but the assembler
support is complete!
>
> Best,
> Ralf
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-- 
Bruno Cardoso Lopes
http://www.brunocardoso.cc

Ralf Karrenberg

2011-Jun-03 09:35 UTC

head link

[LLVMdev] AVX Status?

Thanks Syoyo and Bruno for your replies.

As suggested, I filed a bug under 
http://llvm.org/bugs/show_bug.cgi?id=10073 .

I am not familiar with .td files and the LLVM backend infrastructure 
yet, but I might give it a try and solve it myself if I find the time.

Best,
Ralf

Am 02.06.2011 23:55, schrieb Bruno Cardoso Lopes:> Hi Ralf
>
> On Wednesday, June 1, 2011, Ralf Karrenberg<Chareos at gmx.de> 
wrote:
>> Hi,
>>
>> The last time the AVX backend was mentioned on this list seems to be
>> from November 2010, so I would like to ask about the current status. Is
>> anybody (e.g. at Cray?) still actively working on it?
>
> I don't think so!
>
>> I have tried both LLVM 2.9 final and the latest trunk, and it seems
like
>> some trivial stuff is already working and produces nice code for code
>> using<8 x float>.
>
> Almost everything that could be matched in tablegen files only by
> extending the 128-bit PatFrags and PatLeafs to their 256-bit
> counterparts should work, but besides that (which is where the
> interesting stuff happens) there's no support yet!
>
>> Unfortunately, the backend gets confused about mask code as e.g.
>> produced by VCMPPS together with mask operations (which LLVM requires
to
>> work on<8 x i32>  atm) and corresponding bitcasts.
>>
>> Consider these two examples:
>>
>> define<8 x float>  @test1(<8 x float>  %a,<8 x float>
%b,<8 x i32>  %m)
>> nounwind readnone {
>> entry:
>>     %cmp = tail call<8 x float>  @llvm.x86.avx.cmp.ps.256(<8 x
float>  %a,
>> <8 x float>  %b, i8 1) nounwind readnone
>>     %res = tail call<8 x float> 
@llvm.x86.avx.blendv.ps.256(<8 x float>
>> %a,<8 x float>  %b,<8 x float>  %cmp) nounwind readnone
>>     ret<8 x float>  %res
>> }
>>
>> This works fine and produces the expected assembly (VCMPLTPS +
VBLENDVPS).
>>
>> On the other hand, this does not work:
>>
>> define<8 x float>  @test2(<8 x float>  %a,<8 x float>
%b,<8 x i32>  %m)
>> nounwind readnone {
>> entry:
>>     %cmp = tail call<8 x float>  @llvm.x86.avx.cmp.ps.256(<8 x
float>  %a,
>> <8 x float>  %b, i8 1) nounwind readnone
>>     %cast = bitcast<8 x float>  %cmp to<8 x i32>
>>      %mask = and<8 x i32>  %cast, %m
>>     %blend_cond = bitcast<8 x i32>  %mask to<8 x float>
>>      %res = tail call<8 x float> 
@llvm.x86.avx.blendv.ps.256(<8 x float>
>> %a,<8 x float>  %b,<8 x float>  %blend_cond) nounwind
readnone
>>     ret<8 x float>  %res
>> }
>>
>> llc (latest trunk) bails out with:
>>
>> LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16]
>>     0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15]
>>       0x2532070: v4i64 = bitcast 0x2510740 [ID=14]
>>         0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640,
0x2511340,
>> 0x2510f40, 0x2511140 [ORD=3] [ID=12]
>> ...
>>
>> The same counts for or and xor where VXORPS etc. should be selected.
>
> Please file bug reports!
>
>> There seems to be some code for this because
>> xor<8 x i32>  %m, %m
>> works, probably because it can get rid of all bitcasts.
>>
>> Ideally, I guess we would want code like this instead of the intrinsics
>> at some point:
>>
>> define<8 x float>  @test3(<8 x float>  %a,<8 x float>
%b,<8 x i1>  %m)
>> nounwind readnone {
>> entry:
>>     %cmp = fcmp ugt<8 x float>  %a, %b
>>     %mask = and<8 x i1>  %cmp, %m
>>     %res = select<8 x i1>  %mask,<8 x float>  %a,<8 x
float>  %b
>>     ret<8 x float>  %res
>> }
>
> That would be nice indeed
>
>> ->  VCMPPS, VANDPS, BLENDVPS
>>
>> Nadav Rotem sent around a patch a few weeks ago in which he implemented
>> codegen for the select for SSE, unfortunately I did not have time to
>> look at it in more depth so far.
>>
>> Can anybody comment on the current status of AVX?
>
> No codegen support yet (although some stuff works), but the assembler
> support is complete!
>
>>
>> Best,
>> Ralf
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>

David A. Greene

2011-Jun-03 21:46 UTC

head link

[LLVMdev] AVX Status?

Bruno Cardoso Lopes <bruno.cardoso at gmail.com> writes:
> Hi Ralf
>
> On Wednesday, June 1, 2011, Ralf Karrenberg <Chareos at gmx.de>
wrote:
>> Hi,
>>
>> The last time the AVX backend was mentioned on this list seems to be
>> from November 2010, so I would like to ask about the current status. Is
>> anybody (e.g. at Cray?) still actively working on it?
>
> I don't think so!
Yes, we are!  I am doing a lot of tuning work at the moment.  We have
been rather swamped with work for new products and I am now just getting
out from under that.  Expect to see more patches flowing in over the
next several weeks.  There's a LOT left to send up.
>> I have tried both LLVM 2.9 final and the latest trunk, and it seems
like
>> some trivial stuff is already working and produces nice code for code
>> using <8 x float>.
>
> Almost everything that could be matched in tablegen files only by
> extending the 128-bit PatFrags and PatLeafs to their 256-bit
> counterparts should work, but besides that (which is where the
> interesting stuff happens) there's no support yet!
Indeed.  The bulk of the work is in shuffle generation.

We have a full implementation.  I just have to get enough time to get it
merged.  :-/
>> define <8 x float> @test2(<8 x float> %a, <8 x float>
%b, <8 x i32> %m)
>> nounwind readnone {
>> entry:
>>    %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x
float> %a,
>> <8 x float> %b, i8 1) nounwind readnone
>>    %cast = bitcast <8 x float> %cmp to <8 x i32>
>>    %mask = and <8 x i32> %cast, %m
>>    %blend_cond = bitcast <8 x i32> %mask to <8 x float>
>>    %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8
x float>
>> %a, <8 x float> %b, <8 x float> %blend_cond) nounwind
readnone
>>    ret <8 x float> %res
>> }
>>
>> llc (latest trunk) bails out with:
>>
>> LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16]
>>    0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15]
>>      0x2532070: v4i64 = bitcast 0x2510740 [ID=14]
>>        0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340,
>> 0x2510f40, 0x2511140 [ORD=3] [ID=12]
>> ...
>>
>> The same counts for or and xor where VXORPS etc. should be selected.
>
> Please file bug reports!
It's a problem with integer code.  There are no 256-bit integer bitwise
instructions in AVX.  There are no 256-bit integer instructions period.
What's missing is the legalize code to handle this.  I have it in our
tree.
>> There seems to be some code for this because
>> xor <8 x i32> %m, %m
>> works, probably because it can get rid of all bitcasts.
And it can use xorps to implement the operation.
>> Ideally, I guess we would want code like this instead of the intrinsics
>> at some point:
>>
>> define <8 x float> @test3(<8 x float> %a, <8 x float>
%b, <8 x i1> %m)
>> nounwind readnone {
>> entry:
>>    %cmp = fcmp ugt <8 x float> %a, %b
>>    %mask = and <8 x i1> %cmp, %m
>>    %res = select <8 x i1> %mask, <8 x float> %a, <8 x
float> %b
>>    ret <8 x float> %res
>> }
>
> That would be nice indeed
Some lowering code would be needed to convert from i1 masks to i8 masks
(the so-called packed vs. sparse mask issue).  I don't think I've added
anything to do this as our vectorizer doesn't generate code this way.
>> -> VCMPPS, VANDPS, BLENDVPS
>>
>> Nadav Rotem sent around a patch a few weeks ago in which he implemented
>> codegen for the select for SSE, unfortunately I did not have time to
>> look at it in more depth so far.
>>
>> Can anybody comment on the current status of AVX?
>
> No codegen support yet (although some stuff works), but the assembler
> support is complete!
There's some codegen support, but it's very, very, very incomplete.

                            -Dave

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Jun 2011 - [LLVMdev] AVX Status?

[LLVMdev] AVX Status?

[LLVMdev] AVX Status?

[LLVMdev] AVX Status?

[LLVMdev] AVX Status?

[LLVMdev] AVX Status?

Possibly Parallel Threads