Thanks Nadav for the info. It clears my query :) Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1. One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (which is generic). IMO, it will make things more clear. Your thoughts on this?? Regards, Suyog Sarda On 4 May 2015 21:57, "Nadav Rotem" <nrotem at apple.com> wrote:> > > On May 4, 2015, at 2:36 AM, suyog sarda <sardask01 at gmail.com> wrote: > > > > Hi all, > > > > I have a query regarding Cost Table for AVX2 in TargetTransformInfo. > > > > The table consist of entries for shift and div operations only. There > are no entries for ADD, SUB and MUL for AVX2 cost table. Those entries are > present in Cost Table for AVX. > > Most of the cost information is inferred from the TargetLowering tables > (where operations are marked as Legal, Custom, etc.) Only exceptional > instructions need to be recorded in the TargetTransformInfo cost tables. > > > > > The reason for query is - when my sub target feature is AVX2, in SLP > Vectorization, while calculating scalar cost of ADD, it doesn't see the > entry in cost table and falls back to default implementation returning cost > 1. While for AVX, it finds the ADD in cost table and returns 4 as scalar > cost. > > > > > I am suspecting this is something specific to architecture difference > between AVX and AVX2. I am naive to architecture specifics in this case. > > I assume that this is integer ADD, because AVX1 only supported floating > point arithmetic on 256bit vectors, while AVX2 added support for 256bit > integer arithmetic. So, it makes sense that the cost that AVX1 gives this > operation is much higher. > > > > > > I would be glad if someone clarifies on this. > > > > Thanks. > > > > Regards, > > Suyog Sarda > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/1a35d513/attachment.html>
> On May 4, 2015, at 10:23 AM, suyog sarda <sardask01 at gmail.com> wrote: > > Thanks Nadav for the info. It clears my query :) > > Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1. > > One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (which is generic). IMO, it will make things more clear. > > Your thoughts on this?? > >I prefer that we continue to rely on TargetLowering in order to avoid duplicating the cost information.> Regards, > Suyog Sarda > > On 4 May 2015 21:57, "Nadav Rotem" <nrotem at apple.com <mailto:nrotem at apple.com>> wrote: > > > On May 4, 2015, at 2:36 AM, suyog sarda <sardask01 at gmail.com <mailto:sardask01 at gmail.com>> wrote: > > > > Hi all, > > > > I have a query regarding Cost Table for AVX2 in TargetTransformInfo. > > > > The table consist of entries for shift and div operations only. There are no entries for ADD, SUB and MUL for AVX2 cost table. Those entries are present in Cost Table for AVX. > > Most of the cost information is inferred from the TargetLowering tables (where operations are marked as Legal, Custom, etc.) Only exceptional instructions need to be recorded in the TargetTransformInfo cost tables. > > > > > The reason for query is - when my sub target feature is AVX2, in SLP Vectorization, while calculating scalar cost of ADD, it doesn't see the entry in cost table and falls back to default implementation returning cost 1. While for AVX, it finds the ADD in cost table and returns 4 as scalar cost. > > > > > I am suspecting this is something specific to architecture difference between AVX and AVX2. I am naive to architecture specifics in this case. > > I assume that this is integer ADD, because AVX1 only supported floating point arithmetic on 256bit vectors, while AVX2 added support for 256bit integer arithmetic. So, it makes sense that the cost that AVX1 gives this operation is much higher. > > > > > > I would be glad if someone clarifies on this. > > > > Thanks. > > > > Regards, > > Suyog Sarda >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/3e80372f/attachment.html>
+LLVMdev (sorry for not broadcasting earlier) On 5 May 2015 12:40, "suyog sarda" <sardask01 at gmail.com> wrote:> Hi Nadav, > > I stumbled upon one more question (sorry for not specifying earlier). > Below query is when -mavx2 is specified as target feature. > > As i understand correctly, AVX1 is subset of AVX2. In SLP, we get scalar > reduction cost in getReduction() function, which queries the TTI > (TargetTransformInfo) via getArithmeticInstrCost(). > > Now for integer ADD, since AVX2 added support for integer arithmetic, the > entry for ADD (SUB/MUL) are missing in AVX2CostTable (which is what you > also specified earlier). > It fails to find the entry and goes for subsequent checks further. When it > comes to AVX1 check, it specifically checks if AVX2 is not specified. > > (ST->hasAVX() && !ST->hasAVX2()) > > since, we have specified -mavx2 this check also fails falls back to > BaseTTI. > > Shouldn't it just check for hasAVX(), since AVX1 is subset of AVX2 ? > > (ST->hasAVX()) > > I have a situation where i have integer ADD as reduction op. When i > specify AVX2, the scalar cost is much less than AVX1. And hence, it doesn't > vectorize the code at all. > If AVX2 vector instructions are costly, shouldn't it fall back to AVX1 > and generate AVX1 vector instructions? > > Correct me if i am wrong somewhere. Awaiting for your comments :) > > Thanks. > > Regards, > Suyog > > > > > > > On Mon, May 4, 2015 at 11:20 PM, Nadav Rotem <nrotem at apple.com> wrote: > >> >> On May 4, 2015, at 10:23 AM, suyog sarda <sardask01 at gmail.com> wrote: >> >> Thanks Nadav for the info. It clears my query :) >> >> Yes its an integer ADD, and since AVX2 supports 256 bits integer >> arithmetic, so its cost is less than AVX1. >> >> One query though - shouldn't then the cost of integer ADD/SUB/MUL (which >> would be 1) be explicitly specified in AVX2 cost table? Because right now >> this entry is missing and cost of these operations are taken from BaseTTI >> (which is generic). IMO, it will make things more clear. >> >> Your thoughts on this?? >> >> >> I prefer that we continue to rely on TargetLowering in order to avoid >> duplicating the cost information. >> >> Regards, >> Suyog Sarda >> On 4 May 2015 21:57, "Nadav Rotem" <nrotem at apple.com> wrote: >> >>> >>> > On May 4, 2015, at 2:36 AM, suyog sarda <sardask01 at gmail.com> wrote: >>> > >>> > Hi all, >>> > >>> > I have a query regarding Cost Table for AVX2 in TargetTransformInfo. >>> > >>> > The table consist of entries for shift and div operations only. There >>> are no entries for ADD, SUB and MUL for AVX2 cost table. Those entries are >>> present in Cost Table for AVX. >>> >>> Most of the cost information is inferred from the TargetLowering tables >>> (where operations are marked as Legal, Custom, etc.) Only exceptional >>> instructions need to be recorded in the TargetTransformInfo cost tables. >>> >>> > >>> > The reason for query is - when my sub target feature is AVX2, in SLP >>> Vectorization, while calculating scalar cost of ADD, it doesn't see the >>> entry in cost table and falls back to default implementation returning cost >>> 1. While for AVX, it finds the ADD in cost table and returns 4 as scalar >>> cost. >>> >>> > >>> > I am suspecting this is something specific to architecture difference >>> between AVX and AVX2. I am naive to architecture specifics in this case. >>> >>> I assume that this is integer ADD, because AVX1 only supported floating >>> point arithmetic on 256bit vectors, while AVX2 added support for 256bit >>> integer arithmetic. So, it makes sense that the cost that AVX1 gives this >>> operation is much higher. >>> >>> >>> > >>> > I would be glad if someone clarifies on this. >>> > >>> > Thanks. >>> > >>> > Regards, >>> > Suyog Sarda >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150505/270fa5d5/attachment.html>