On May 1, 2009, at 3:50 PM, Chris Lattner wrote:> > The goal is to replace the pattern fragment and the C++ code for > X86::isMOVDDUPMask with something like: > > def movddup : PatFrag<(ops node:$lhs, node:$rhs), > (vector_shuffle node:$lhs, node:$rhs, > 0, 1, 0, 1, Cost<42>) > > Alternatively, the cost could be put on the instructions etc, whatever > makes the most sense. incidentally, I'm not sure why movddup is > currently defined to take a LHS/RHS: the RHS should always be undef so > it should be coded into the movddup def. > > Another possible syntax would be to add a special kind of shuffle node > to give more natural and clean syntax. This is probably the better > solution: > > def movddup : Shuffle4<VR128, undef, 0, 1, 0, 1>, Cost<42>;What does "cost" mean here? Currently isel cost means complexity of the matched pattern. It's hard to compute this by hand so the current hack is to allow manual cost adjustments. I think it makes sense for isel to use HW cost (instruction latency, code size) as a late tie breaker. In that case, shouldn't cost be part of instruction itinerary? Evan
On Tuesday 05 May 2009 01:02, Evan Cheng wrote:> I think it makes sense for isel to use HW cost (instruction latency, > code size) as a late tie breaker. In that case, shouldn't cost be part > of instruction itinerary?What latency? Each implementation has its own quirks and LLVM must be flexible enough to handle them. So cost needs to be a function of the CPU type as well as the instruction. We do need a better cost/priority mechanism than AddedComplexity (the naming alone of that is very confusing). Perhaps we can have some base cost values per instruction and allow each CPU type to override them. -Dave
On May 5, 2009, at 9:31 AM, David Greene wrote:> On Tuesday 05 May 2009 01:02, Evan Cheng wrote: > >> I think it makes sense for isel to use HW cost (instruction latency, >> code size) as a late tie breaker. In that case, shouldn't cost be >> part >> of instruction itinerary? > > What latency? Each implementation has its own quirks and LLVM must be > flexible enough to handle them. So cost needs to be a function of > the CPU type as well as the instruction.For shuffles, I don't have a strong opinion. I just want dag combiner to be able to say "if these two shuffles have greater or equal cost to the equivalent combined shuffle, then merge the shuffles into one". It doesn't matter what units these are in. The other use is to break ties between multiple instructions that can match the same shuffle pattern. For these, the precise units also don't matter. Looking further ahead to a world where we have vectorization, we will need very precise cost models for various vector operands, scalar operations etc. I don't think it necessarily makes sense to overconstraint a solution for shuffles in the short term though. -Chris