Jeff Bush <jeffbush001 at gmail.com> writes:> Ah, I think I get it now. This was mentioned earlier in the thread, > but it didn't click at the time. It sounds like I can do instruction > selection with a pattern like (omitting selection of the sources): > > let Constraints = "$dst = $oldvalue" in { > def MASKEDARITH : MyInstruction< > (outs VectorReg:$dst), > (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2, > VectorReg:$oldvalue), > "add $dst {$mask}, $src1, $src2", > [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1, > v16i32:$src2), v16i32:$oldvalue))]>; > }Ok, but where does $oldvalue come from? That is the trickty part as far as I can see and is why this isn't quite the same as handling two-address instructions. I agree that the pattern itself is straightforward. It's bascially what I've written here. -David
On May 10, 2013, at 11:53 AM, dag at cray.com wrote:> Jeff Bush <jeffbush001 at gmail.com> writes: > >> Ah, I think I get it now. This was mentioned earlier in the thread, >> but it didn't click at the time. It sounds like I can do instruction >> selection with a pattern like (omitting selection of the sources): >> >> let Constraints = "$dst = $oldvalue" in { >> def MASKEDARITH : MyInstruction< >> (outs VectorReg:$dst), >> (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2, >> VectorReg:$oldvalue), >> "add $dst {$mask}, $src1, $src2", >> [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1, >> v16i32:$src2), v16i32:$oldvalue))]>; >> } > > Ok, but where does $oldvalue come from? That is the trickty part as far > as I can see and is why this isn't quite the same as handling > two-address instructions.>From the semantics of your program?%tx = select %mask, %x, <0.0, 0.0, 0.0 ...> %ty = select %mask, %y, <0.0, 0.0, 0.0 ...> %sum = fadd %tx, %ty %newvalue = select %mask, %sum, %oldvalue << From here? If you had a designated predicated instruction you would have the same issue. The %oldvalue has to come from somewhere (or be undefined). %oldval = ... | undef %newvalue = predicated_fadd %mask, %left, %right, %oldval I guess, I don’t understand your question. Instcombine might remove the select %mask, %sum, undef but that is another issue ...
Arnold Schwaighofer <aschwaighofer at apple.com> writes:>> Ok, but where does $oldvalue come from? That is the trickty part as far >> as I can see and is why this isn't quite the same as handling >> two-address instructions. > > > From the semantics of your program? > > %tx = select %mask, %x, <0.0, 0.0, 0.0 ...> > %ty = select %mask, %y, <0.0, 0.0, 0.0 ...> > %sum = fadd %tx, %ty > %newvalue = select %mask, %sum, %oldvalue << From here?Well yes. The issue isn't really in codegen, it's in IR generation. I don't think it's unsolvable, just tricky.> If you had a designated predicated instruction you would have the same > issue. The %oldvalue has to come from somewhere (or be undefined). > > %oldval = ... | undef > %newvalue = predicated_fadd %mask, %left, %right, %oldvalCertainly. This is an SSA issue, not a predication issue, really. -David
On Fri, May 10, 2013 at 9:53 AM, <dag at cray.com> wrote:> Jeff Bush <jeffbush001 at gmail.com> writes: > >> Ah, I think I get it now. This was mentioned earlier in the thread, >> but it didn't click at the time. It sounds like I can do instruction >> selection with a pattern like (omitting selection of the sources): >> >> let Constraints = "$dst = $oldvalue" in { >> def MASKEDARITH : MyInstruction< >> (outs VectorReg:$dst), >> (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2, >> VectorReg:$oldvalue), >> "add $dst {$mask}, $src1, $src2", >> [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1, >> v16i32:$src2), v16i32:$oldvalue))]>; >> } > > Ok, but where does $oldvalue come from? That is the trickty part as far > as I can see and is why this isn't quite the same as handling > two-address instructions.I may be missing some important detail here, but I assumed $oldvalue and $dst were just SSA names for the same variable. For example, given the following snippet for a compute kernel: if (x > 10) x = x - 10 If you wanted to run a bunch of parallel instances with each vector lane representing an instance, I assume the IR would be something roughly like (ignoring source selects for brevity): %mask = cmp gt %x1, 10 %diff = sub %x1, 10 %x2 = select %mask, %diff, %x1 At this point, %x1 is dead. %x1 and %x2 represent 'x' in the program above.
Even in architectures, like MIC, that support masks, using masks is not always beneficial because it adds register dependency. We analyzed these aspects and got into conclusion that vectorizer can use masked load/store and gather/scatter intrinsics to protect memory access. Adding masks to instructions is partially possible on a target specific machine pass. - Elena -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Jeff Bush Sent: Saturday, May 11, 2013 04:09 To: dag at cray.com Cc: LLVM Dev Subject: Re: [LLVMdev] Predicated Vector Operations On Fri, May 10, 2013 at 9:53 AM, <dag at cray.com> wrote:> Jeff Bush <jeffbush001 at gmail.com> writes: > >> Ah, I think I get it now. This was mentioned earlier in the thread, >> but it didn't click at the time. It sounds like I can do instruction >> selection with a pattern like (omitting selection of the sources): >> >> let Constraints = "$dst = $oldvalue" in { >> def MASKEDARITH : MyInstruction< >> (outs VectorReg:$dst), >> (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2, >> VectorReg:$oldvalue), >> "add $dst {$mask}, $src1, $src2", >> [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1, >> v16i32:$src2), v16i32:$oldvalue))]>; } > > Ok, but where does $oldvalue come from? That is the trickty part as > far as I can see and is why this isn't quite the same as handling > two-address instructions.I may be missing some important detail here, but I assumed $oldvalue and $dst were just SSA names for the same variable. For example, given the following snippet for a compute kernel: if (x > 10) x = x - 10 If you wanted to run a bunch of parallel instances with each vector lane representing an instance, I assume the IR would be something roughly like (ignoring source selects for brevity): %mask = cmp gt %x1, 10 %diff = sub %x1, 10 %x2 = select %mask, %diff, %x1 At this point, %x1 is dead. %x1 and %x2 represent 'x' in the program above. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.