thr3ads.net - llvm dev - [LLVMdev] AVX2 Cost Table in X86TargetTransformInfo [May 2015]

If this information is useful, please help other people find it:
Share via:

suyog sarda

2015-May-04 17:23 UTC

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

Thanks Nadav for the info. It clears my query :)

Yes its an integer ADD, and since AVX2 supports 256 bits integer
arithmetic, so its cost is less than AVX1.

One query though - shouldn't then the cost of integer ADD/SUB/MUL (which
would be 1) be explicitly specified in AVX2 cost table? Because right now
this entry is missing and cost of these operations are taken from BaseTTI
(which is generic). IMO, it will make things more clear.

Your thoughts on this??

Regards,
Suyog Sarda
On 4 May 2015 21:57, "Nadav Rotem" <nrotem at apple.com> wrote:
>
> > On May 4, 2015, at 2:36 AM, suyog sarda <sardask01 at gmail.com>
wrote:
> >
> > Hi all,
> >
> > I have a query regarding Cost Table for AVX2 in TargetTransformInfo.
> >
> > The table consist of entries for shift and div operations only. There
> are no entries for ADD, SUB and MUL for AVX2 cost table. Those entries are
> present in Cost Table for AVX.
>
> Most of the cost information is inferred from the TargetLowering tables
> (where operations are marked as Legal, Custom, etc.)  Only exceptional
> instructions need to be recorded in the TargetTransformInfo cost tables.
>
> >
> > The reason for query is - when my sub target feature is AVX2, in SLP
> Vectorization,  while calculating scalar cost of ADD, it doesn't see
the
> entry in cost table and falls back to default implementation returning cost
> 1. While for AVX, it finds the ADD in cost table and returns 4 as scalar
> cost.
>
> >
> > I am suspecting this is something specific to architecture difference
> between AVX and AVX2. I am naive to architecture specifics in this case.
>
> I assume that this is integer ADD, because AVX1 only supported floating
> point arithmetic on 256bit vectors, while AVX2 added support for 256bit
> integer arithmetic. So, it makes sense that the cost that AVX1 gives this
> operation is much higher.
>
>
> >
> > I would be glad if someone clarifies on this.
> >
> > Thanks.
> >
> > Regards,
> > Suyog Sarda
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/1a35d513/attachment.html>

Nadav Rotem

2015-May-04 17:50 UTC

head link

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

> On May 4, 2015, at 10:23 AM, suyog sarda <sardask01 at gmail.com>
wrote:
> 
> Thanks Nadav for the info. It clears my query :)
> 
> Yes its an integer ADD, and since AVX2 supports 256 bits integer
arithmetic, so its cost is less than AVX1.
> 
> One query though - shouldn't then the cost of integer ADD/SUB/MUL
(which would be 1) be explicitly specified in AVX2 cost table? Because right now
this entry is missing and cost of these operations are taken from BaseTTI (which
is generic). IMO, it will make things more clear.
> 
> Your thoughts on this??
> 
> 
I prefer that we continue to rely on TargetLowering in order to avoid
duplicating the cost information. > Regards,
> Suyog Sarda
> 
> On 4 May 2015 21:57, "Nadav Rotem" <nrotem at apple.com
<mailto:nrotem at apple.com>> wrote:
> 
> > On May 4, 2015, at 2:36 AM, suyog sarda <sardask01 at gmail.com
<mailto:sardask01 at gmail.com>> wrote:
> >
> > Hi all,
> >
> > I have a query regarding Cost Table for AVX2 in TargetTransformInfo.
> >
> > The table consist of entries for shift and div operations only. There
are no entries for ADD, SUB and MUL for AVX2 cost table. Those entries are
present in Cost Table for AVX.
> 
> Most of the cost information is inferred from the TargetLowering tables
(where operations are marked as Legal, Custom, etc.)  Only exceptional
instructions need to be recorded in the TargetTransformInfo cost tables.
> 
> >
> > The reason for query is - when my sub target feature is AVX2, in SLP
Vectorization,  while calculating scalar cost of ADD, it doesn't see the
entry in cost table and falls back to default implementation returning cost 1.
While for AVX, it finds the ADD in cost table and returns 4 as scalar cost.
> 
> >
> > I am suspecting this is something specific to architecture difference
between AVX and AVX2. I am naive to architecture specifics in this case.
> 
> I assume that this is integer ADD, because AVX1 only supported floating
point arithmetic on 256bit vectors, while AVX2 added support for 256bit integer
arithmetic. So, it makes sense that the cost that AVX1 gives this operation is
much higher.
> 
> 
> >
> > I would be glad if someone clarifies on this.
> >
> > Thanks.
> >
> > Regards,
> > Suyog Sarda
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/3e80372f/attachment.html>

suyog sarda

2015-May-05 10:01 UTC

head link

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

+LLVMdev (sorry for not broadcasting earlier)
On 5 May 2015 12:40, "suyog sarda" <sardask01 at gmail.com>
wrote:
> Hi Nadav,
>
> I stumbled upon one more question (sorry for not specifying earlier).
> Below query is when -mavx2 is specified as target feature.
>
> As i understand correctly, AVX1 is subset of AVX2. In SLP, we get scalar
> reduction cost in getReduction() function, which queries the TTI
> (TargetTransformInfo) via getArithmeticInstrCost().
>
> Now for integer ADD, since AVX2 added support for integer arithmetic, the
> entry for ADD (SUB/MUL) are missing in AVX2CostTable (which is what you
> also specified earlier).
> It fails to find the entry and goes for subsequent checks further. When it
> comes to AVX1 check, it specifically checks if AVX2 is not specified.
>
> (ST->hasAVX() && !ST->hasAVX2())
>
> since, we have specified -mavx2 this check also fails falls back to
> BaseTTI.
>
> Shouldn't it just check for hasAVX(), since AVX1 is subset of AVX2 ?
>
> (ST->hasAVX())
>
> I have a situation where i have integer ADD as reduction op. When i
> specify AVX2, the scalar cost is much less than AVX1. And hence, it
doesn't
> vectorize the code at all.
>  If AVX2 vector instructions are costly, shouldn't it fall back to AVX1
> and generate AVX1 vector instructions?
>
> Correct me if i am wrong somewhere. Awaiting for your comments :)
>
> Thanks.
>
> Regards,
> Suyog
>
>
>
>
>
>
> On Mon, May 4, 2015 at 11:20 PM, Nadav Rotem <nrotem at apple.com>
wrote:
>
>>
>> On May 4, 2015, at 10:23 AM, suyog sarda <sardask01 at gmail.com>
wrote:
>>
>> Thanks Nadav for the info. It clears my query :)
>>
>> Yes its an integer ADD, and since AVX2 supports 256 bits integer
>> arithmetic, so its cost is less than AVX1.
>>
>> One query though - shouldn't then the cost of integer ADD/SUB/MUL
(which
>> would be 1) be explicitly specified in AVX2 cost table? Because right
now
>> this entry is missing and cost of these operations are taken from
BaseTTI
>> (which is generic). IMO, it will make things more clear.
>>
>> Your thoughts on this??
>>
>>
>> I prefer that we continue to rely on TargetLowering in order to avoid
>> duplicating the cost information.
>>
>> Regards,
>> Suyog Sarda
>> On 4 May 2015 21:57, "Nadav Rotem" <nrotem at
apple.com> wrote:
>>
>>>
>>> > On May 4, 2015, at 2:36 AM, suyog sarda <sardask01 at
gmail.com> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I have a query regarding Cost Table for AVX2 in
TargetTransformInfo.
>>> >
>>> > The table consist of entries for shift and div operations
only. There
>>> are no entries for ADD, SUB and MUL for AVX2 cost table. Those
entries are
>>> present in Cost Table for AVX.
>>>
>>> Most of the cost information is inferred from the TargetLowering
tables
>>> (where operations are marked as Legal, Custom, etc.)  Only
exceptional
>>> instructions need to be recorded in the TargetTransformInfo cost
tables.
>>>
>>> >
>>> > The reason for query is - when my sub target feature is AVX2,
in SLP
>>> Vectorization,  while calculating scalar cost of ADD, it
doesn't see the
>>> entry in cost table and falls back to default implementation
returning cost
>>> 1. While for AVX, it finds the ADD in cost table and returns 4 as
scalar
>>> cost.
>>>
>>> >
>>> > I am suspecting this is something specific to architecture
difference
>>> between AVX and AVX2. I am naive to architecture specifics in this
case.
>>>
>>> I assume that this is integer ADD, because AVX1 only supported
floating
>>> point arithmetic on 256bit vectors, while AVX2 added support for
256bit
>>> integer arithmetic. So, it makes sense that the cost that AVX1
gives this
>>> operation is much higher.
>>>
>>>
>>> >
>>> > I would be glad if someone clarifies on this.
>>> >
>>> > Thanks.
>>> >
>>> > Regards,
>>> > Suyog Sarda
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150505/270fa5d5/attachment.html>

Renato Golin

2015-May-06 08:47 UTC

head link

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

On 4 May 2015 at 18:50, Nadav Rotem <nrotem at apple.com>
wrote:> I prefer that we continue to rely on TargetLowering in order to avoid
> duplicating the cost information.
+1

cheers,
--renato

llvm dev - May 2015 - [LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo