On May 9, 2013, at 3:05 PM, Jeff Bush <jeffbush001 at gmail.com> wrote:> On Thu, May 9, 2013 at 8:10 AM, <dag at cray.com> wrote: >> Jeff Bush <jeffbush001 at gmail.com> writes: >> >>> %tx = select %mask, %x, <0.0, 0.0, 0.0 ...> >>> %ty = select %mask, %y, <0.0, 0.0, 0.0 ...> >>> %sum = fadd %tx, %ty >>> %newvalue = select %mask, %sum, %oldvalue >>> >>> I believe the generated instructions depend on whether %oldvalue is >>> still live after the last instruction. If it is, you need to generate >>> two instructions: a copy into a new physical register then predicated >>> write to it. If it is not used, then it is just a predicated write to >>> the same register. >>> >>> move r1, r0 >>> fadd r1{m0}, r2, r3 >>> >>> (r0 is now %oldvalue and r1 is %newvalue) >>> >>> vs. >>> >>> fadd r0{m0}, r2, r3 >>> >>> (r0 was %oldvalue and is now %newvalue) >> >> I'm assuming some parts of %oldvalue are still used. The masked fadd >> could preserve them for false values of the mask, depending on how >> masking was defined. Therefore, there's no need for a register copy. >> If the masked operation does not preserve the old values in r0, then we >> do need a register copy. >> >> Preserving old values does complicate things for SSA, as you note. >> >>>> The bottom line is that it is probably easier to set this up before LLVM >>>> IR goes into SSA form. >>> >>> That makes sense, but it's unclear to me how you would preserve that >>> information after going into SSA form. >> >> I should think the semantics of select would handle that. After a >> select all vector elements of the result are defined. There is no >> preservation of old values. There cannot be, by definition of SSA. >> >>> It seems to me that these are not really LLVM issues as much as the >>> fact that SSA doesn't cleanly map to predicated instructions. >> >> It entirely depends on how the predication is defined to work. > > Good point. I was thinking of it narrowly as preserving the old value > in the register. I guess I'd amend my previous statement to say that > it actually does map just fine to SSA, but instruction selection > becomes more complex. > > It sounds like the current LLVM instruction selection algorithm can't > really handle the use case I described cleanly (generating predicated > arithmetic instructions that preserve the old register value). Is > that a fair statement?I don’t think this is a fair statement. Tied register operands should handle this use case just fine. This problem is similar to that of two-address constraints. Two address instructions work as follows. When we match an instruction we “tie” input and output registers. Say you had an LLVM-IR add: x = add i32 y, z for x86 we generate the following machine ir instruction during ISel: vr0<def, tied1> = ADD32rr vr1<use, tied0>, vr2<use> Once we go out of SSA during CodeGen we have to replace the two address constraint by copies: vr0 = vr1 vr0 = ADD32rr vr0, vr2 Coalescing and allocation will then take care of removing unnecessary copies. I think that predicate instructions would be handled similar (for the sake of making the example shorted I replaced your sequence of IR instruction by one “virtual” IR instruction): x = predicated_add %mask, %x, %y, %oldvalue This (actually, your sequence of selects, and add) would be matched during ISel to: vr0<def, tied1> = PRED_ADD mask_vr, vr1<use>, vr2<use>, vr3<use, tied0>>From here on the machinery is the same, the two-address pass would translate such instructions to:vr0 = vr3 vr0 = PRED_ADD mask_vr, vr1, vr2, vr0 If vr3 is not live after PRED_ADD coalescing will coalesce vr0 and vr3. I don’t think there is a fundamental difficulty handling predictated instructions this way - at least wrt. to register constraints.
Ah, I think I get it now. This was mentioned earlier in the thread, but it didn't click at the time. It sounds like I can do instruction selection with a pattern like (omitting selection of the sources): let Constraints = "$dst = $oldvalue" in { def MASKEDARITH : MyInstruction< (outs VectorReg:$dst), (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2, VectorReg:$oldvalue), "add $dst {$mask}, $src1, $src2", [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1, v16i32:$src2), v16i32:$oldvalue))]>; } That's actually pretty clean. Thanks On Thu, May 9, 2013 at 2:15 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:> > On May 9, 2013, at 3:05 PM, Jeff Bush <jeffbush001 at gmail.com> wrote: > >> On Thu, May 9, 2013 at 8:10 AM, <dag at cray.com> wrote: >>> Jeff Bush <jeffbush001 at gmail.com> writes: >>> >>>> %tx = select %mask, %x, <0.0, 0.0, 0.0 ...> >>>> %ty = select %mask, %y, <0.0, 0.0, 0.0 ...> >>>> %sum = fadd %tx, %ty >>>> %newvalue = select %mask, %sum, %oldvalue >>>> >>>> I believe the generated instructions depend on whether %oldvalue is >>>> still live after the last instruction. If it is, you need to generate >>>> two instructions: a copy into a new physical register then predicated >>>> write to it. If it is not used, then it is just a predicated write to >>>> the same register. >>>> >>>> move r1, r0 >>>> fadd r1{m0}, r2, r3 >>>> >>>> (r0 is now %oldvalue and r1 is %newvalue) >>>> >>>> vs. >>>> >>>> fadd r0{m0}, r2, r3 >>>> >>>> (r0 was %oldvalue and is now %newvalue) >>> >>> I'm assuming some parts of %oldvalue are still used. The masked fadd >>> could preserve them for false values of the mask, depending on how >>> masking was defined. Therefore, there's no need for a register copy. >>> If the masked operation does not preserve the old values in r0, then we >>> do need a register copy. >>> >>> Preserving old values does complicate things for SSA, as you note. >>> >>>>> The bottom line is that it is probably easier to set this up before LLVM >>>>> IR goes into SSA form. >>>> >>>> That makes sense, but it's unclear to me how you would preserve that >>>> information after going into SSA form. >>> >>> I should think the semantics of select would handle that. After a >>> select all vector elements of the result are defined. There is no >>> preservation of old values. There cannot be, by definition of SSA. >>> >>>> It seems to me that these are not really LLVM issues as much as the >>>> fact that SSA doesn't cleanly map to predicated instructions. >>> >>> It entirely depends on how the predication is defined to work. >> >> Good point. I was thinking of it narrowly as preserving the old value >> in the register. I guess I'd amend my previous statement to say that >> it actually does map just fine to SSA, but instruction selection >> becomes more complex. >> >> It sounds like the current LLVM instruction selection algorithm can't >> really handle the use case I described cleanly (generating predicated >> arithmetic instructions that preserve the old register value). Is >> that a fair statement? > > > I don’t think this is a fair statement. Tied register operands should handle this use case just fine. This problem is similar to that of two-address constraints. Two address instructions work as follows. When we match an instruction we “tie” input and output registers. > > Say you had an LLVM-IR add: > > x = add i32 y, z > > for x86 we generate the following machine ir instruction during ISel: > > vr0<def, tied1> = ADD32rr vr1<use, tied0>, vr2<use> > > Once we go out of SSA during CodeGen we have to replace the two address constraint by copies: > > vr0 = vr1 > vr0 = ADD32rr vr0, vr2 > > Coalescing and allocation will then take care of removing unnecessary copies. I think that predicate instructions would be handled similar (for the sake of making the example shorted I replaced your sequence of IR instruction by one “virtual” IR instruction): > > x = predicated_add %mask, %x, %y, %oldvalue > > This (actually, your sequence of selects, and add) would be matched during ISel to: > > vr0<def, tied1> = PRED_ADD mask_vr, vr1<use>, vr2<use>, vr3<use, tied0> > > From here on the machinery is the same, the two-address pass would translate such instructions to: > > vr0 = vr3 > vr0 = PRED_ADD mask_vr, vr1, vr2, vr0 > > If vr3 is not live after PRED_ADD coalescing will coalesce vr0 and vr3. I don’t think there is a fundamental difficulty handling predictated instructions this way - at least wrt. to register constraints.
Jeff Bush <jeffbush001 at gmail.com> writes:> Ah, I think I get it now. This was mentioned earlier in the thread, > but it didn't click at the time. It sounds like I can do instruction > selection with a pattern like (omitting selection of the sources): > > let Constraints = "$dst = $oldvalue" in { > def MASKEDARITH : MyInstruction< > (outs VectorReg:$dst), > (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2, > VectorReg:$oldvalue), > "add $dst {$mask}, $src1, $src2", > [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1, > v16i32:$src2), v16i32:$oldvalue))]>; > }Ok, but where does $oldvalue come from? That is the trickty part as far as I can see and is why this isn't quite the same as handling two-address instructions. I agree that the pattern itself is straightforward. It's bascially what I've written here. -David