thr3ads.net - llvm dev - [LLVMdev] Predicated Vector Operations [May 2013]

If this information is useful, please help other people find it:
Share via:

Arnold Schwaighofer

2013-May-09 21:15 UTC

[LLVMdev] Predicated Vector Operations

On May 9, 2013, at 3:05 PM, Jeff Bush <jeffbush001 at gmail.com> wrote:
> On Thu, May 9, 2013 at 8:10 AM,  <dag at cray.com> wrote:
>> Jeff Bush <jeffbush001 at gmail.com> writes:
>> 
>>> %tx = select %mask, %x, <0.0, 0.0, 0.0 ...>
>>> %ty = select %mask, %y, <0.0, 0.0, 0.0 ...>
>>> %sum = fadd %tx, %ty
>>> %newvalue = select %mask, %sum, %oldvalue
>>> 
>>> I believe the generated instructions depend on whether %oldvalue is
>>> still live after the last instruction. If it is, you need to
generate
>>> two instructions: a copy into a new physical register then
predicated
>>> write to it.  If it is not used, then it is just a predicated write
to
>>> the same register.
>>> 
>>>  move r1, r0
>>>  fadd r1{m0}, r2, r3
>>> 
>>> (r0 is now %oldvalue and r1 is %newvalue)
>>> 
>>> vs.
>>> 
>>>  fadd r0{m0}, r2, r3
>>> 
>>> (r0 was %oldvalue and is now %newvalue)
>> 
>> I'm assuming some parts of %oldvalue are still used.  The masked
fadd
>> could preserve them for false values of the mask, depending on how
>> masking was defined.  Therefore, there's no need for a register
copy.
>> If the masked operation does not preserve the old values in r0, then we
>> do need a register copy.
>> 
>> Preserving old values does complicate things for SSA, as you note.
>> 
>>>> The bottom line is that it is probably easier to set this up
before LLVM
>>>> IR goes into SSA form.
>>> 
>>> That makes sense, but it's unclear to me how you would preserve
that
>>> information after going into SSA form.
>> 
>> I should think the semantics of select would handle that.  After a
>> select all vector elements of the result are defined.  There is no
>> preservation of old values.  There cannot be, by definition of SSA.
>> 
>>> It seems to me that these are not really LLVM issues as much as the
>>> fact that SSA doesn't cleanly map to predicated instructions.
>> 
>> It entirely depends on how the predication is defined to work.
> 
> Good point.  I was thinking of it narrowly as preserving the old value
> in the register.  I guess I'd amend my previous statement to say that
> it actually does map just fine to SSA, but instruction selection
> becomes more complex.
> 
> It sounds like the current LLVM instruction selection algorithm can't
> really handle the use case I described cleanly (generating predicated
> arithmetic instructions that preserve the old register value).  Is
> that a fair statement?

I don’t think this is a fair statement. Tied register operands should handle
this use case just fine. This problem is similar to that of two-address
constraints. Two address instructions work as follows. When we match an
instruction we “tie” input and output registers.

Say you had an LLVM-IR add:

x = add i32 y, z

for x86 we generate the following machine ir instruction during ISel:

vr0<def, tied1> = ADD32rr vr1<use, tied0>, vr2<use>

Once we go out of SSA during CodeGen we have to replace the two address
constraint by copies:

vr0 = vr1
vr0 = ADD32rr vr0, vr2

Coalescing and allocation will then take care of removing unnecessary copies. I
think that predicate instructions would be handled similar (for the sake of
making the example shorted I replaced your sequence of IR instruction by one
“virtual” IR instruction):

x = predicated_add %mask, %x, %y, %oldvalue

This (actually, your sequence of selects, and add) would be matched during ISel
to:

vr0<def, tied1> = PRED_ADD mask_vr, vr1<use>, vr2<use>,
vr3<use, tied0>
>From here on the machinery is the same, the two-address pass would translate
such instructions to:
vr0 = vr3
vr0 = PRED_ADD mask_vr, vr1, vr2, vr0

If vr3 is not live after PRED_ADD coalescing will coalesce vr0 and vr3. I don’t
think there is a fundamental difficulty handling predictated instructions this
way - at least wrt. to register constraints.

Jeff Bush

2013-May-10 05:16 UTC

head link

[LLVMdev] Predicated Vector Operations

Ah, I think I get it now.  This was mentioned earlier in the thread,
but it didn't click at the time. It sounds like I can do instruction
selection with a pattern like (omitting selection of the sources):

let Constraints = "$dst = $oldvalue" in {
    def MASKEDARITH : MyInstruction<
        (outs VectorReg:$dst),
        (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2,
VectorReg:$oldvalue),
        "add $dst {$mask}, $src1, $src2",
        [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1,
v16i32:$src2), v16i32:$oldvalue))]>;
}

That's actually pretty clean.

Thanks

On Thu, May 9, 2013 at 2:15 PM, Arnold Schwaighofer
<aschwaighofer at apple.com> wrote:>
> On May 9, 2013, at 3:05 PM, Jeff Bush <jeffbush001 at gmail.com>
wrote:
>
>> On Thu, May 9, 2013 at 8:10 AM,  <dag at cray.com> wrote:
>>> Jeff Bush <jeffbush001 at gmail.com> writes:
>>>
>>>> %tx = select %mask, %x, <0.0, 0.0, 0.0 ...>
>>>> %ty = select %mask, %y, <0.0, 0.0, 0.0 ...>
>>>> %sum = fadd %tx, %ty
>>>> %newvalue = select %mask, %sum, %oldvalue
>>>>
>>>> I believe the generated instructions depend on whether
%oldvalue is
>>>> still live after the last instruction. If it is, you need to
generate
>>>> two instructions: a copy into a new physical register then
predicated
>>>> write to it.  If it is not used, then it is just a predicated
write to
>>>> the same register.
>>>>
>>>>  move r1, r0
>>>>  fadd r1{m0}, r2, r3
>>>>
>>>> (r0 is now %oldvalue and r1 is %newvalue)
>>>>
>>>> vs.
>>>>
>>>>  fadd r0{m0}, r2, r3
>>>>
>>>> (r0 was %oldvalue and is now %newvalue)
>>>
>>> I'm assuming some parts of %oldvalue are still used.  The
masked fadd
>>> could preserve them for false values of the mask, depending on how
>>> masking was defined.  Therefore, there's no need for a register
copy.
>>> If the masked operation does not preserve the old values in r0,
then we
>>> do need a register copy.
>>>
>>> Preserving old values does complicate things for SSA, as you note.
>>>
>>>>> The bottom line is that it is probably easier to set this
up before LLVM
>>>>> IR goes into SSA form.
>>>>
>>>> That makes sense, but it's unclear to me how you would
preserve that
>>>> information after going into SSA form.
>>>
>>> I should think the semantics of select would handle that.  After a
>>> select all vector elements of the result are defined.  There is no
>>> preservation of old values.  There cannot be, by definition of SSA.
>>>
>>>> It seems to me that these are not really LLVM issues as much as
the
>>>> fact that SSA doesn't cleanly map to predicated
instructions.
>>>
>>> It entirely depends on how the predication is defined to work.
>>
>> Good point.  I was thinking of it narrowly as preserving the old value
>> in the register.  I guess I'd amend my previous statement to say
that
>> it actually does map just fine to SSA, but instruction selection
>> becomes more complex.
>>
>> It sounds like the current LLVM instruction selection algorithm
can't
>> really handle the use case I described cleanly (generating predicated
>> arithmetic instructions that preserve the old register value).  Is
>> that a fair statement?
>
>
> I don’t think this is a fair statement. Tied register operands should
handle this use case just fine. This problem is similar to that of two-address
constraints. Two address instructions work as follows. When we match an
instruction we “tie” input and output registers.
>
> Say you had an LLVM-IR add:
>
> x = add i32 y, z
>
> for x86 we generate the following machine ir instruction during ISel:
>
> vr0<def, tied1> = ADD32rr vr1<use, tied0>, vr2<use>
>
> Once we go out of SSA during CodeGen we have to replace the two address
constraint by copies:
>
> vr0 = vr1
> vr0 = ADD32rr vr0, vr2
>
> Coalescing and allocation will then take care of removing unnecessary
copies. I think that predicate instructions would be handled similar (for the
sake of making the example shorted I replaced your sequence of IR instruction by
one “virtual” IR instruction):
>
> x = predicated_add %mask, %x, %y, %oldvalue
>
> This (actually, your sequence of selects, and add) would be matched during
ISel to:
>
> vr0<def, tied1> = PRED_ADD mask_vr, vr1<use>, vr2<use>,
vr3<use, tied0>
>
> From here on the machinery is the same, the two-address pass would
translate such instructions to:
>
> vr0 = vr3
> vr0 = PRED_ADD mask_vr, vr1, vr2, vr0
>
> If vr3 is not live after PRED_ADD coalescing will coalesce vr0 and vr3. I
don’t think there is a fundamental difficulty handling predictated instructions
this way - at least wrt. to register constraints.

dag at cray.com

2013-May-10 16:53 UTC

head link

[LLVMdev] Predicated Vector Operations

Jeff Bush <jeffbush001 at gmail.com> writes:
> Ah, I think I get it now.  This was mentioned earlier in the thread,
> but it didn't click at the time. It sounds like I can do instruction
> selection with a pattern like (omitting selection of the sources):
>
> let Constraints = "$dst = $oldvalue" in {
>     def MASKEDARITH : MyInstruction<
>         (outs VectorReg:$dst),
>         (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2,
> VectorReg:$oldvalue),
>         "add $dst {$mask}, $src1, $src2",
>         [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1,
> v16i32:$src2), v16i32:$oldvalue))]>;
> }
Ok, but where does $oldvalue come from?  That is the trickty part as far
as I can see and is why this isn't quite the same as handling
two-address instructions.

I agree that the pattern itself is straightforward.  It's bascially what
I've written here.

                                -David

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - May 2013 - [LLVMdev] Predicated Vector Operations

[LLVMdev] Predicated Vector Operations

[LLVMdev] Predicated Vector Operations

[LLVMdev] Predicated Vector Operations

Seemingly Similar Threads