Hi, The last time the AVX backend was mentioned on this list seems to be from November 2010, so I would like to ask about the current status. Is anybody (e.g. at Cray?) still actively working on it? I have tried both LLVM 2.9 final and the latest trunk, and it seems like some trivial stuff is already working and produces nice code for code using <8 x float>. Unfortunately, the backend gets confused about mask code as e.g. produced by VCMPPS together with mask operations (which LLVM requires to work on <8 x i32> atm) and corresponding bitcasts. Consider these two examples: define <8 x float> @test1(<8 x float> %a, <8 x float> %b, <8 x i32> %m) nounwind readnone { entry: %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, <8 x float> %b, i8 1) nounwind readnone %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone ret <8 x float> %res } This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). On the other hand, this does not work: define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m) nounwind readnone { entry: %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, <8 x float> %b, i8 1) nounwind readnone %cast = bitcast <8 x float> %cmp to <8 x i32> %mask = and <8 x i32> %cast, %m %blend_cond = bitcast <8 x i32> %mask to <8 x float> %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> %a, <8 x float> %b, <8 x float> %blend_cond) nounwind readnone ret <8 x float> %res } llc (latest trunk) bails out with: LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16] 0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15] 0x2532070: v4i64 = bitcast 0x2510740 [ID=14] 0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340, 0x2510f40, 0x2511140 [ORD=3] [ID=12] ... The same counts for or and xor where VXORPS etc. should be selected. There seems to be some code for this because xor <8 x i32> %m, %m works, probably because it can get rid of all bitcasts. Ideally, I guess we would want code like this instead of the intrinsics at some point: define <8 x float> @test3(<8 x float> %a, <8 x float> %b, <8 x i1> %m) nounwind readnone { entry: %cmp = fcmp ugt <8 x float> %a, %b %mask = and <8 x i1> %cmp, %m %res = select <8 x i1> %mask, <8 x float> %a, <8 x float> %b ret <8 x float> %res } -> VCMPPS, VANDPS, BLENDVPS Nadav Rotem sent around a patch a few weeks ago in which he implemented codegen for the select for SSE, unfortunately I did not have time to look at it in more depth so far. Can anybody comment on the current status of AVX? Best, Ralf
Hello Ralf, Chris said AVX backend is not yet mature. http://www.mail-archive.com/llvmbugs at cs.uiuc.edu/msg12442.html I am also interested in AVX codegen backend and trying to write a patch to fix current unusable AVX codegen. I have just tried to submit a patch to fix fpextend(VCVTSS2SD) and sitofp(VCVTSI2SD) codegen, and its in reviewing phase. http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110530/121689.html It'd be definitely welcome for AVX committers, but at this time no one is actively working AVX backend. I am trying to write a AVX patch as much as possible(at least to run my AVX code correctly), but my time is very limited, so I hope someone else would actively work on AVX backend... On Wed, Jun 1, 2011 at 9:52 PM, Ralf Karrenberg <Chareos at gmx.de> wrote:> Hi, > > The last time the AVX backend was mentioned on this list seems to be > from November 2010, so I would like to ask about the current status. Is > anybody (e.g. at Cray?) still actively working on it? > > I have tried both LLVM 2.9 final and the latest trunk, and it seems like > some trivial stuff is already working and produces nice code for code > using <8 x float>. > Unfortunately, the backend gets confused about mask code as e.g. > produced by VCMPPS together with mask operations (which LLVM requires to > work on <8 x i32> atm) and corresponding bitcasts. > > Consider these two examples: > > define <8 x float> @test1(<8 x float> %a, <8 x float> %b, <8 x i32> %m) > nounwind readnone { > entry: > %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, > <8 x float> %b, i8 1) nounwind readnone > %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> > %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone > ret <8 x float> %res > } > > This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). > > On the other hand, this does not work: > > define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m) > nounwind readnone { > entry: > %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, > <8 x float> %b, i8 1) nounwind readnone > %cast = bitcast <8 x float> %cmp to <8 x i32> > %mask = and <8 x i32> %cast, %m > %blend_cond = bitcast <8 x i32> %mask to <8 x float> > %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> > %a, <8 x float> %b, <8 x float> %blend_cond) nounwind readnone > ret <8 x float> %res > } > > llc (latest trunk) bails out with: > > LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16] > 0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15] > 0x2532070: v4i64 = bitcast 0x2510740 [ID=14] > 0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340, > 0x2510f40, 0x2511140 [ORD=3] [ID=12] > ... > > The same counts for or and xor where VXORPS etc. should be selected. > There seems to be some code for this because > xor <8 x i32> %m, %m > works, probably because it can get rid of all bitcasts. > > Ideally, I guess we would want code like this instead of the intrinsics > at some point: > > define <8 x float> @test3(<8 x float> %a, <8 x float> %b, <8 x i1> %m) > nounwind readnone { > entry: > %cmp = fcmp ugt <8 x float> %a, %b > %mask = and <8 x i1> %cmp, %m > %res = select <8 x i1> %mask, <8 x float> %a, <8 x float> %b > ret <8 x float> %res > } > > -> VCMPPS, VANDPS, BLENDVPS > > Nadav Rotem sent around a patch a few weeks ago in which he implemented > codegen for the select for SSE, unfortunately I did not have time to > look at it in more depth so far. > > Can anybody comment on the current status of AVX? > > Best, > Ralf > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Hi Ralf On Wednesday, June 1, 2011, Ralf Karrenberg <Chareos at gmx.de> wrote:> Hi, > > The last time the AVX backend was mentioned on this list seems to be > from November 2010, so I would like to ask about the current status. Is > anybody (e.g. at Cray?) still actively working on it?I don't think so!> I have tried both LLVM 2.9 final and the latest trunk, and it seems like > some trivial stuff is already working and produces nice code for code > using <8 x float>.Almost everything that could be matched in tablegen files only by extending the 128-bit PatFrags and PatLeafs to their 256-bit counterparts should work, but besides that (which is where the interesting stuff happens) there's no support yet!> Unfortunately, the backend gets confused about mask code as e.g. > produced by VCMPPS together with mask operations (which LLVM requires to > work on <8 x i32> atm) and corresponding bitcasts. > > Consider these two examples: > > define <8 x float> @test1(<8 x float> %a, <8 x float> %b, <8 x i32> %m) > nounwind readnone { > entry: > %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, > <8 x float> %b, i8 1) nounwind readnone > %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> > %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone > ret <8 x float> %res > } > > This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). > > On the other hand, this does not work: > > define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m) > nounwind readnone { > entry: > %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, > <8 x float> %b, i8 1) nounwind readnone > %cast = bitcast <8 x float> %cmp to <8 x i32> > %mask = and <8 x i32> %cast, %m > %blend_cond = bitcast <8 x i32> %mask to <8 x float> > %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> > %a, <8 x float> %b, <8 x float> %blend_cond) nounwind readnone > ret <8 x float> %res > } > > llc (latest trunk) bails out with: > > LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16] > 0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15] > 0x2532070: v4i64 = bitcast 0x2510740 [ID=14] > 0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340, > 0x2510f40, 0x2511140 [ORD=3] [ID=12] > ... > > The same counts for or and xor where VXORPS etc. should be selected.Please file bug reports!> There seems to be some code for this because > xor <8 x i32> %m, %m > works, probably because it can get rid of all bitcasts. > > Ideally, I guess we would want code like this instead of the intrinsics > at some point: > > define <8 x float> @test3(<8 x float> %a, <8 x float> %b, <8 x i1> %m) > nounwind readnone { > entry: > %cmp = fcmp ugt <8 x float> %a, %b > %mask = and <8 x i1> %cmp, %m > %res = select <8 x i1> %mask, <8 x float> %a, <8 x float> %b > ret <8 x float> %res > }That would be nice indeed> -> VCMPPS, VANDPS, BLENDVPS > > Nadav Rotem sent around a patch a few weeks ago in which he implemented > codegen for the select for SSE, unfortunately I did not have time to > look at it in more depth so far. > > Can anybody comment on the current status of AVX?No codegen support yet (although some stuff works), but the assembler support is complete!> > Best, > Ralf > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Bruno Cardoso Lopes http://www.brunocardoso.cc
Thanks Syoyo and Bruno for your replies. As suggested, I filed a bug under http://llvm.org/bugs/show_bug.cgi?id=10073 . I am not familiar with .td files and the LLVM backend infrastructure yet, but I might give it a try and solve it myself if I find the time. Best, Ralf Am 02.06.2011 23:55, schrieb Bruno Cardoso Lopes:> Hi Ralf > > On Wednesday, June 1, 2011, Ralf Karrenberg<Chareos at gmx.de> wrote: >> Hi, >> >> The last time the AVX backend was mentioned on this list seems to be >> from November 2010, so I would like to ask about the current status. Is >> anybody (e.g. at Cray?) still actively working on it? > > I don't think so! > >> I have tried both LLVM 2.9 final and the latest trunk, and it seems like >> some trivial stuff is already working and produces nice code for code >> using<8 x float>. > > Almost everything that could be matched in tablegen files only by > extending the 128-bit PatFrags and PatLeafs to their 256-bit > counterparts should work, but besides that (which is where the > interesting stuff happens) there's no support yet! > >> Unfortunately, the backend gets confused about mask code as e.g. >> produced by VCMPPS together with mask operations (which LLVM requires to >> work on<8 x i32> atm) and corresponding bitcasts. >> >> Consider these two examples: >> >> define<8 x float> @test1(<8 x float> %a,<8 x float> %b,<8 x i32> %m) >> nounwind readnone { >> entry: >> %cmp = tail call<8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, >> <8 x float> %b, i8 1) nounwind readnone >> %res = tail call<8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> >> %a,<8 x float> %b,<8 x float> %cmp) nounwind readnone >> ret<8 x float> %res >> } >> >> This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). >> >> On the other hand, this does not work: >> >> define<8 x float> @test2(<8 x float> %a,<8 x float> %b,<8 x i32> %m) >> nounwind readnone { >> entry: >> %cmp = tail call<8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, >> <8 x float> %b, i8 1) nounwind readnone >> %cast = bitcast<8 x float> %cmp to<8 x i32> >> %mask = and<8 x i32> %cast, %m >> %blend_cond = bitcast<8 x i32> %mask to<8 x float> >> %res = tail call<8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> >> %a,<8 x float> %b,<8 x float> %blend_cond) nounwind readnone >> ret<8 x float> %res >> } >> >> llc (latest trunk) bails out with: >> >> LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16] >> 0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15] >> 0x2532070: v4i64 = bitcast 0x2510740 [ID=14] >> 0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340, >> 0x2510f40, 0x2511140 [ORD=3] [ID=12] >> ... >> >> The same counts for or and xor where VXORPS etc. should be selected. > > Please file bug reports! > >> There seems to be some code for this because >> xor<8 x i32> %m, %m >> works, probably because it can get rid of all bitcasts. >> >> Ideally, I guess we would want code like this instead of the intrinsics >> at some point: >> >> define<8 x float> @test3(<8 x float> %a,<8 x float> %b,<8 x i1> %m) >> nounwind readnone { >> entry: >> %cmp = fcmp ugt<8 x float> %a, %b >> %mask = and<8 x i1> %cmp, %m >> %res = select<8 x i1> %mask,<8 x float> %a,<8 x float> %b >> ret<8 x float> %res >> } > > That would be nice indeed > >> -> VCMPPS, VANDPS, BLENDVPS >> >> Nadav Rotem sent around a patch a few weeks ago in which he implemented >> codegen for the select for SSE, unfortunately I did not have time to >> look at it in more depth so far. >> >> Can anybody comment on the current status of AVX? > > No codegen support yet (although some stuff works), but the assembler > support is complete! > >> >> Best, >> Ralf >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >
Bruno Cardoso Lopes <bruno.cardoso at gmail.com> writes:> Hi Ralf > > On Wednesday, June 1, 2011, Ralf Karrenberg <Chareos at gmx.de> wrote: >> Hi, >> >> The last time the AVX backend was mentioned on this list seems to be >> from November 2010, so I would like to ask about the current status. Is >> anybody (e.g. at Cray?) still actively working on it? > > I don't think so!Yes, we are! I am doing a lot of tuning work at the moment. We have been rather swamped with work for new products and I am now just getting out from under that. Expect to see more patches flowing in over the next several weeks. There's a LOT left to send up.>> I have tried both LLVM 2.9 final and the latest trunk, and it seems like >> some trivial stuff is already working and produces nice code for code >> using <8 x float>. > > Almost everything that could be matched in tablegen files only by > extending the 128-bit PatFrags and PatLeafs to their 256-bit > counterparts should work, but besides that (which is where the > interesting stuff happens) there's no support yet!Indeed. The bulk of the work is in shuffle generation. We have a full implementation. I just have to get enough time to get it merged. :-/>> define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m) >> nounwind readnone { >> entry: >> %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, >> <8 x float> %b, i8 1) nounwind readnone >> %cast = bitcast <8 x float> %cmp to <8 x i32> >> %mask = and <8 x i32> %cast, %m >> %blend_cond = bitcast <8 x i32> %mask to <8 x float> >> %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> >> %a, <8 x float> %b, <8 x float> %blend_cond) nounwind readnone >> ret <8 x float> %res >> } >> >> llc (latest trunk) bails out with: >> >> LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16] >> 0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15] >> 0x2532070: v4i64 = bitcast 0x2510740 [ID=14] >> 0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340, >> 0x2510f40, 0x2511140 [ORD=3] [ID=12] >> ... >> >> The same counts for or and xor where VXORPS etc. should be selected. > > Please file bug reports!It's a problem with integer code. There are no 256-bit integer bitwise instructions in AVX. There are no 256-bit integer instructions period. What's missing is the legalize code to handle this. I have it in our tree.>> There seems to be some code for this because >> xor <8 x i32> %m, %m >> works, probably because it can get rid of all bitcasts.And it can use xorps to implement the operation.>> Ideally, I guess we would want code like this instead of the intrinsics >> at some point: >> >> define <8 x float> @test3(<8 x float> %a, <8 x float> %b, <8 x i1> %m) >> nounwind readnone { >> entry: >> %cmp = fcmp ugt <8 x float> %a, %b >> %mask = and <8 x i1> %cmp, %m >> %res = select <8 x i1> %mask, <8 x float> %a, <8 x float> %b >> ret <8 x float> %res >> } > > That would be nice indeedSome lowering code would be needed to convert from i1 masks to i8 masks (the so-called packed vs. sparse mask issue). I don't think I've added anything to do this as our vectorizer doesn't generate code this way.>> -> VCMPPS, VANDPS, BLENDVPS >> >> Nadav Rotem sent around a patch a few weeks ago in which he implemented >> codegen for the select for SSE, unfortunately I did not have time to >> look at it in more depth so far. >> >> Can anybody comment on the current status of AVX? > > No codegen support yet (although some stuff works), but the assembler > support is complete!There's some codegen support, but it's very, very, very incomplete. -Dave