> > which goes through memory. Is there some idiom I'm missing so that it woulduse> > for instance movmsk for SSE or vcmpgt & cr6 for altivec? > > I don't think you are missing anything: LLVM IR has no support for horizontal > operations like or'ing the elements of a vector of boolean together. The code > generators do try to recognize a few idioms and synthesize horizontal > operations from them, but I think only addition is currently recognized, andThanks Duncan, you're right - that does compile to a mess of spills to memory not unlike the original. I went to have a look at this further: It seems the existing SelectInst is pretty close to what is needed. Value IRBuilder::*CreateSelect(Value *C, Value *True, Value *False, const Twine &Name) Currently, this asserts that the True & False are both vector types of the same size as "C". I was thinking of weakening this condition so that if True and False are both i1 types, it will be allowed and will result in something which can be branched on. I have quite a bit of reading ahead it seems! Stephen.
Am 04.09.2012 00:08, schrieb Stephen:>>> which goes through memory. Is there some idiom I'm missing so that it would > use >>> for instance movmsk for SSE or vcmpgt & cr6 for altivec? >> >> I don't think you are missing anything: LLVM IR has no support for horizontal >> operations like or'ing the elements of a vector of boolean together. The code >> generators do try to recognize a few idioms and synthesize horizontal >> operations from them, but I think only addition is currently recognized, and > > Thanks Duncan, > > you're right - that does compile to a mess of spills to memory not > unlike the original. > > I went to have a look at this further: It seems the existing SelectInst > is pretty close to what is needed. > Value IRBuilder::*CreateSelect(Value *C, Value *True, Value *False, > const Twine &Name) > Currently, this asserts that the True & False are both vector types of > the same size as "C". I was thinking of weakening this condition so that > if True and False are both i1 types, it will be allowed and will result > in something which can be branched on. > > I have quite a bit of reading ahead it seems!This looks quite similar to something I filed a bug on (12312). Michael Liao submitted fixes for this, so I think if you change it to %16 = fcmp ogt <4 x float> %15, %cr %17 = sext <4 x i1> %16 to <4 x i32> %18 = bitcast <4 x i32> %17 to i128 %19 = icmp ne i128 %18, 0 br i1 %19, label %true1, label %false2 should do the trick (one cmpps + one ptest + one br instruction). This, however, requires sse41 which I don't know if you have - you say the extractelements go through memory which I've never seen then again our code didn't try to extract the i1 directly (even without fixes for ptest the above sequence will result in only 2 extraction steps instead of 4 if you're on x64 and the cpu supports sse41 but I guess without sse41 and hence no pextrd/q it probably also will go through memory). Though on altivec this sequence might not produce anything good, the free sext requires llvm 2.7 on x86 to work at all (certainly shouldn't be a problem nowadays but on other backends it might be different) and for the ptest sequence very recent svn is required. I don't think the current code can generate movmskps + test (probably the next best thing without sse41) instead of ptest though if you only got sse. Roland
Hi Roland,> This, however, requires sse41 which I don't know if you have - you say > the extractelements go through memory which I've never seenmaybe Stephen is targetting something generic like i386 by accident. Ciao, Duncan.
Roland Scheidegger <sroland <at> vmware.com> writes:> This looks quite similar to something I filed a bug on (12312). Michael > Liao submitted fixes for this, so I think > if you change it to > %16 = fcmp ogt <4 x float> %15, %cr > %17 = sext <4 x i1> %16 to <4 x i32> > %18 = bitcast <4 x i32> %17 to i128 > %19 = icmp ne i128 %18, 0 > br i1 %19, label %true1, label %false2 > > should do the trick (one cmpps + one ptest + one br instruction). > This, however, requires sse41 which I don't know if you have - you say > the extractelements go through memory which I've never seen then again > our code didn't try to extract the i1 directly (even without fixes for > ptest the above sequence will result in only 2 extraction steps instead > of 4 if you're on x64 and the cpu supports sse41 but I guess without > sse41 and hence no pextrd/q it probably also will go through memory). > Though on altivec this sequence might not produce anything good, the > free sext requires llvm 2.7 on x86 to work at all (certainly shouldn't > be a problem nowadays but on other backends it might be different) and > for the ptest sequence very recent svn is required. > I don't think the current code can generate movmskps + test (probably > the next best thing without sse41) instead of ptest though if you only > got sse.Thanks Roland, sign extending gets me part of the way at least. I'm on version 3.1 and as you say in bug report, there are a few extraneous instructions. For the record, casting to a <4 x i8> seems to do a better job for x86 (shuffle, movd, test, jump). Using <4 x i32> seems to issue a pextrd for each element. For x64, it seems to be the same for either. I suppose it's all academic seeing as the ptest patch looks good. Looking at it again, I'm not sure how I saw memory spills. Certainly I can't reproduce them without using -O0. It's possible I was did that accidentally when investigating the issue. Thanks, Stephen.
Maybe Matching Threads
- [LLVMdev] branch on vector compare?
- [LLVMdev] branch on vector compare?
- [LLVMdev] branch on vector compare?
- RFC: [X86] Can we begin removing AutoUpgrade support for x86 instrinsics added in early 3.X versions
- RFC: [X86] Can we begin removing AutoUpgrade support for x86 instrinsics added in early 3.X versions