On 2017-01-20 14:31, Hal Finkel wrote:> > On 01/20/2017 06:11 AM, Jonas Paulsson via llvm-dev wrote: >> Hi, >> >> I wonder why getScalarizationOverhead() does not take into account >> the number of operands of the instruction? This should influence the >> number of extracts needed, so instead of >> >> Scalarization cost = NumEls * (insert + extract) >> >> it would be better to do >> >> Scalarization cost = NumEls * (insert + (extract * numOperands)) > > I suspect this is an oversight (although we need to be a bit careful > here because if two operands are the same, which is not uncommon, we > don't want to double the cost). > > -HalDo you in those cases of an identical operand want to count just a cost of "1" for a register move, instead of the "extraction cost"? /Jonas
On 01/20/2017 08:30 AM, Jonas Paulsson wrote:> > > On 2017-01-20 14:31, Hal Finkel wrote: >> >> On 01/20/2017 06:11 AM, Jonas Paulsson via llvm-dev wrote: >>> Hi, >>> >>> I wonder why getScalarizationOverhead() does not take into account >>> the number of operands of the instruction? This should influence the >>> number of extracts needed, so instead of >>> >>> Scalarization cost = NumEls * (insert + extract) >>> >>> it would be better to do >>> >>> Scalarization cost = NumEls * (insert + (extract * numOperands)) >> >> I suspect this is an oversight (although we need to be a bit careful >> here because if two operands are the same, which is not uncommon, we >> don't want to double the cost). >> >> -Hal > > Do you in those cases of an identical operand want to count just a > cost of "1" for a register move, instead of the "extraction cost"?There should be no cost to reusing the operand. (mul a, a) should only extract a once, the fact that it is used twice should not increase the cost. -Hal> > /Jonas >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
> On 20 Jan 2017, at 14:53, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > On 01/20/2017 08:30 AM, Jonas Paulsson wrote: >> >> >> On 2017-01-20 14:31, Hal Finkel wrote: >>> >>> On 01/20/2017 06:11 AM, Jonas Paulsson via llvm-dev wrote: >>>> Hi, >>>> >>>> I wonder why getScalarizationOverhead() does not take into account the number of operands of the instruction? This should influence the number of extracts needed, so instead of >>>> >>>> Scalarization cost = NumEls * (insert + extract) >>>> >>>> it would be better to do >>>> >>>> Scalarization cost = NumEls * (insert + (extract * numOperands)) >>> >>> I suspect this is an oversight (although we need to be a bit careful here because if two operands are the same, which is not uncommon, we don't want to double the cost). >>> >>> -Hal >> >> Do you in those cases of an identical operand want to count just a cost of "1" for a register move, instead of the "extraction cost"? > > There should be no cost to reusing the operand. (mul a, a) should only extract a once, the fact that it is used twice should not increase the cost. > > -HalThere appears to be a similar issue within the x86 AVX1 cost tables for cases where we have to split the 256-bit integer operations. Some binops add 1*extract_subvector + 1*insert_subvector to the 2*128-binop costs whilst others don’t bother adding anything at all. We need to try harder to determine if we should add 1 (duplicate input or constant folded extract) or 2 extracts to the final cost.
> On 20 Jan 2017, at 14:53, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > On 01/20/2017 08:30 AM, Jonas Paulsson wrote: >> >> >> On 2017-01-20 14:31, Hal Finkel wrote: >>> >>> On 01/20/2017 06:11 AM, Jonas Paulsson via llvm-dev wrote: >>>> Hi, >>>> >>>> I wonder why getScalarizationOverhead() does not take into account the number of operands of the instruction? This should influence the number of extracts needed, so instead of >>>> >>>> Scalarization cost = NumEls * (insert + extract) >>>> >>>> it would be better to do >>>> >>>> Scalarization cost = NumEls * (insert + (extract * numOperands)) >>> >>> I suspect this is an oversight (although we need to be a bit careful here because if two operands are the same, which is not uncommon, we don't want to double the cost). >>> >>> -Hal >> >> Do you in those cases of an identical operand want to count just a cost of "1" for a register move, instead of the "extraction cost"? > > There should be no cost to reusing the operand. (mul a, a) should only extract a once, the fact that it is used twice should not increase the cost. > > -HalThere appears to be a similar issue within the x86 AVX1 cost tables for cases where we have to split the 256-bit integer operations. Some binops add 1*extract_subvector + 1*insert_subvector to the 2*128-binop costs whilst others don’t bother adding anything at all. We need to try harder to determine if we should add 1 (duplicate input or constant folded extract) or 2 extracts to the final cost.