thr3ads.net - llvm dev - [LLVMdev] AVX Status? [Jun 2011]

If this information is useful, please help other people find it:
Share via:

David A. Greene

2011-Jun-03 21:46 UTC

[LLVMdev] AVX Status?

Bruno Cardoso Lopes <bruno.cardoso at gmail.com> writes:
> Hi Ralf
>
> On Wednesday, June 1, 2011, Ralf Karrenberg <Chareos at gmx.de>
wrote:
>> Hi,
>>
>> The last time the AVX backend was mentioned on this list seems to be
>> from November 2010, so I would like to ask about the current status. Is
>> anybody (e.g. at Cray?) still actively working on it?
>
> I don't think so!
Yes, we are!  I am doing a lot of tuning work at the moment.  We have
been rather swamped with work for new products and I am now just getting
out from under that.  Expect to see more patches flowing in over the
next several weeks.  There's a LOT left to send up.
>> I have tried both LLVM 2.9 final and the latest trunk, and it seems
like
>> some trivial stuff is already working and produces nice code for code
>> using <8 x float>.
>
> Almost everything that could be matched in tablegen files only by
> extending the 128-bit PatFrags and PatLeafs to their 256-bit
> counterparts should work, but besides that (which is where the
> interesting stuff happens) there's no support yet!
Indeed.  The bulk of the work is in shuffle generation.

We have a full implementation.  I just have to get enough time to get it
merged.  :-/
>> define <8 x float> @test2(<8 x float> %a, <8 x float>
%b, <8 x i32> %m)
>> nounwind readnone {
>> entry:
>>    %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x
float> %a,
>> <8 x float> %b, i8 1) nounwind readnone
>>    %cast = bitcast <8 x float> %cmp to <8 x i32>
>>    %mask = and <8 x i32> %cast, %m
>>    %blend_cond = bitcast <8 x i32> %mask to <8 x float>
>>    %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8
x float>
>> %a, <8 x float> %b, <8 x float> %blend_cond) nounwind
readnone
>>    ret <8 x float> %res
>> }
>>
>> llc (latest trunk) bails out with:
>>
>> LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16]
>>    0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15]
>>      0x2532070: v4i64 = bitcast 0x2510740 [ID=14]
>>        0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340,
>> 0x2510f40, 0x2511140 [ORD=3] [ID=12]
>> ...
>>
>> The same counts for or and xor where VXORPS etc. should be selected.
>
> Please file bug reports!
It's a problem with integer code.  There are no 256-bit integer bitwise
instructions in AVX.  There are no 256-bit integer instructions period.
What's missing is the legalize code to handle this.  I have it in our
tree.
>> There seems to be some code for this because
>> xor <8 x i32> %m, %m
>> works, probably because it can get rid of all bitcasts.
And it can use xorps to implement the operation.
>> Ideally, I guess we would want code like this instead of the intrinsics
>> at some point:
>>
>> define <8 x float> @test3(<8 x float> %a, <8 x float>
%b, <8 x i1> %m)
>> nounwind readnone {
>> entry:
>>    %cmp = fcmp ugt <8 x float> %a, %b
>>    %mask = and <8 x i1> %cmp, %m
>>    %res = select <8 x i1> %mask, <8 x float> %a, <8 x
float> %b
>>    ret <8 x float> %res
>> }
>
> That would be nice indeed
Some lowering code would be needed to convert from i1 masks to i8 masks
(the so-called packed vs. sparse mask issue).  I don't think I've added
anything to do this as our vectorizer doesn't generate code this way.
>> -> VCMPPS, VANDPS, BLENDVPS
>>
>> Nadav Rotem sent around a patch a few weeks ago in which he implemented
>> codegen for the select for SSE, unfortunately I did not have time to
>> look at it in more depth so far.
>>
>> Can anybody comment on the current status of AVX?
>
> No codegen support yet (although some stuff works), but the assembler
> support is complete!
There's some codegen support, but it's very, very, very incomplete.

                            -Dave

Ralf Karrenberg

2011-Jun-04 11:09 UTC

head link

[LLVMdev] AVX Status?

Hi David,
>> The last time the AVX backend was mentioned on this list seems to be
>> from November 2010, so I would like to ask about the current status. Is
>> anybody (e.g. at Cray?) still actively working on it?
>
> Yes, we are!  I am doing a lot of tuning work at the moment.  We have
> been rather swamped with work for new products and I am now just getting
> out from under that.  Expect to see more patches flowing in over the
> next several weeks.  There's a LOT left to send up.
> We have a full implementation.  I just have to get enough time to get it
> merged.  :-/
This sounds great!

For my case, I only require some basic support, so I am optimistic that 
your next few patches will provide everything I need.
> It's a problem with integer code.  There are no 256-bit integer bitwise
> instructions in AVX.  There are no 256-bit integer instructions period.
> What's missing is the legalize code to handle this.  I have it in our
> tree.
>
>> There seems to be some code for this because
>> xor<8 x i32>  %m, %m
>> works, probably because it can get rid of all bitcasts.
>
> And it can use xorps to implement the operation.
Yes, that makes sense. But why does the same not work with "and" and 
"or" (-> VANDPS/VORPS) ?
Anyway, I am looking forward to testing your patches.

Would it be possible to send around a notification when the stuff goes 
upstream?
Thanks a lot :).

Best,
Ralf

David A. Greene

2011-Jun-07 22:08 UTC

head link

[LLVMdev] AVX Status?

Ralf Karrenberg <Chareos at gmx.de> writes:
> This sounds great!
>
> For my case, I only require some basic support, so I am optimistic
> that your next few patches will provide everything I need.
If my evil plan works out, within the next 10 or so patches we should be
in a place where pushing everything up goes pretty quickly.  It's about
8 TableGen patches and then a patch to do ADD or some other simple thing
like that to start the so-called SIMD reorg.  Basically, if I can get
the SIMD reorg patch settled, everything after that is really simple
because it all looks uniform.  Of course, that reorg/ADD patch is going
to cause a lot of discussion, I suspect.  ;)
>>> There seems to be some code for this because
>>> xor<8 x i32>  %m, %m
>>> works, probably because it can get rid of all bitcasts.
>>
>> And it can use xorps to implement the operation.
>
> Yes, that makes sense. But why does the same not work with "and"
and
> "or" (-> VANDPS/VORPS) ?
It can.  Maybe the pattern for ANDPS isn't there yet.  I'd have to dig
deeper into the failure.  The fact that there are inconsistencies like
this is one of the motivations behind the SIMD reorg.  There are plenty
of such inconsistencies in the existing SSE spec.  Hopefully after the
reorg, implementing a pattern like VANDPS given an existing one for
VXORPS is trivial.
> Anyway, I am looking forward to testing your patches.
So am I.  :)
> Would it be possible to send around a notification when the stuff goes
> upstream?
> Thanks a lot :).
I try to put [AVX] in the subject of patch mailings (to -commits) and
commit messages.  Once in a while I forget.  I'll try to remeber to send
semething to -dev when major stuff appears.

                                 -Dave

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Jun 2011 - [LLVMdev] AVX Status?

[LLVMdev] AVX Status?

[LLVMdev] AVX Status?

[LLVMdev] AVX Status?

Seemingly Similar Threads