thr3ads.net - llvm dev - [LLVMdev] branch on vector compare? [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Stephen

2012-Sep-03 22:08 UTC

[LLVMdev] branch on vector compare?

> > which goes through memory. Is there some idiom I'm missing so that
it would
use> > for instance movmsk for SSE or vcmpgt & cr6 for altivec?
> 
> I don't think you are missing anything: LLVM IR has no support for
horizontal
> operations like or'ing the elements of a vector of boolean together. 
The code
> generators do try to recognize a few idioms and synthesize horizontal
> operations from them, but I think only addition is currently recognized,
and
Thanks Duncan,

you're right - that does compile to a mess of spills to memory not
unlike the original.

I went to have a look at this further: It seems the existing SelectInst
is pretty close to what is needed.
Value IRBuilder::*CreateSelect(Value *C, Value *True, Value *False,
const Twine &Name)
Currently, this asserts that the True & False are both vector types of
the same size as "C". I was thinking of weakening this condition so
that
if True and False are both i1 types, it will be allowed and will result
in something which can be branched on.

I have quite a bit of reading ahead it seems!
Stephen.

Roland Scheidegger

2012-Sep-04 01:45 UTC

head link

[LLVMdev] branch on vector compare?

Am 04.09.2012 00:08, schrieb Stephen:>>> which goes through memory. Is there some idiom I'm missing so
that it would
> use
>>> for instance movmsk for SSE or vcmpgt & cr6 for altivec?
>>
>> I don't think you are missing anything: LLVM IR has no support for
horizontal
>> operations like or'ing the elements of a vector of boolean
together.  The code
>> generators do try to recognize a few idioms and synthesize horizontal
>> operations from them, but I think only addition is currently
recognized, and
> 
> Thanks Duncan,
> 
> you're right - that does compile to a mess of spills to memory not
> unlike the original.
> 
> I went to have a look at this further: It seems the existing SelectInst
> is pretty close to what is needed.
> Value IRBuilder::*CreateSelect(Value *C, Value *True, Value *False,
> const Twine &Name)
> Currently, this asserts that the True & False are both vector types of
> the same size as "C". I was thinking of weakening this condition
so that
> if True and False are both i1 types, it will be allowed and will result
> in something which can be branched on.
> 
> I have quite a bit of reading ahead it seems!
This looks quite similar to something I filed a bug on (12312). Michael
Liao submitted fixes for this, so I think
if you change it to
  %16 = fcmp ogt <4 x float> %15, %cr
  %17 = sext <4 x i1> %16 to <4 x i32>
  %18 = bitcast <4 x i32> %17 to i128
  %19 = icmp ne i128 %18, 0
  br i1 %19, label %true1, label %false2

should do the trick (one cmpps + one ptest + one br instruction).
This, however, requires sse41 which I don't know if you have - you say
the extractelements go through memory which I've never seen then again
our code didn't try to extract the i1 directly (even without fixes for
ptest the above sequence will result in only 2 extraction steps instead
of 4 if you're on x64 and the cpu supports sse41 but I guess without
sse41 and hence no pextrd/q it probably also will go through memory).
Though on altivec this sequence might not produce anything good, the
free sext requires llvm 2.7 on x86 to work at all (certainly shouldn't
be a problem nowadays but on other backends it might be different) and
for the ptest sequence very recent svn is required.
I don't think the current code can generate movmskps + test (probably
the next best thing without sse41) instead of ptest though if you only
got sse.

Roland

Duncan Sands

2012-Sep-04 06:14 UTC

head link

[LLVMdev] branch on vector compare?

Hi Roland,
> This, however, requires sse41 which I don't know if you have - you say
> the extractelements go through memory which I've never seen
maybe Stephen is targetting something generic like i386 by accident.

Ciao, Duncan.

Stephen

2012-Sep-04 22:24 UTC

head link

[LLVMdev] branch on vector compare?

Roland Scheidegger <sroland <at> vmware.com>
writes:> This looks quite similar to something I filed a bug on (12312). Michael
> Liao submitted fixes for this, so I think
> if you change it to
>   %16 = fcmp ogt <4 x float> %15, %cr
>   %17 = sext <4 x i1> %16 to <4 x i32>
>   %18 = bitcast <4 x i32> %17 to i128
>   %19 = icmp ne i128 %18, 0
>   br i1 %19, label %true1, label %false2
> 
> should do the trick (one cmpps + one ptest + one br instruction).
> This, however, requires sse41 which I don't know if you have - you say
> the extractelements go through memory which I've never seen then again
> our code didn't try to extract the i1 directly (even without fixes for
> ptest the above sequence will result in only 2 extraction steps instead
> of 4 if you're on x64 and the cpu supports sse41 but I guess without
> sse41 and hence no pextrd/q it probably also will go through memory).
> Though on altivec this sequence might not produce anything good, the
> free sext requires llvm 2.7 on x86 to work at all (certainly shouldn't
> be a problem nowadays but on other backends it might be different) and
> for the ptest sequence very recent svn is required.
> I don't think the current code can generate movmskps + test (probably
> the next best thing without sse41) instead of ptest though if you only
> got sse.

Thanks Roland, sign extending gets me part of the way at least.
I'm on version 3.1 and as you say in bug report, there are a
few extraneous instructions. For the record, casting to a <4 x i8>
seems to do a better job for x86 (shuffle, movd, test, jump). Using
<4 x i32> seems to issue a pextrd for each element. For x64, it seems
to be the same for either. I suppose it's all academic seeing as the
ptest patch looks good.

Looking at it again, I'm not sure how I saw memory spills. Certainly
I can't reproduce them without using -O0. It's possible I was did
that accidentally when investigating the issue.

Thanks,
Stephen.

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Sep 2012 - [LLVMdev] branch on vector compare?

[LLVMdev] branch on vector compare?

[LLVMdev] branch on vector compare?

[LLVMdev] branch on vector compare?

[LLVMdev] branch on vector compare?

Apparently Analagous Threads