----- Original Message -----> From: "Nadav Rotem" <nrotem at apple.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Sunday, January 27, 2013 9:16:42 PM > Subject: Re: Floats as Doubles in Vectors > > Hi Hal, > > I am not familiar with Blue Gene at all but I will do my best to > answer. > > > is going to assume that the vectors can hold 256/32 == 8 > > single-precision values even though the real maximum is 4 values. > > How should we handle this? > > The loop vectorizer uses TTI to find the size of the vector register. > It uses this number to calculate the maximum vectorization factor. > The formula is MaxVF = (RegSize/LargestElementTypeSize). Next it > checks for the costs of all of the possible vectorization factors > from 1 (scalar) to MaxVF. > > I think that the best approach would be to improve the cost model > because on blue gene <8 x float> ops would be more expensive than <4 > x float>.I agree. The cost model needs to reflect this fact, I am wondering only about things to be done on top of that. Why should the loop vectorizer (or the BB vectorizer for that matter) spend time considering vectorization factors that will never be profitable?> > > > > 1. Do nothing and let the type legalization cost take care of > > penalizing the formation of <8 x float> vectors. This may work, > > but will cause an unnecessary compile-time increase. > > Not only that, it would produce suboptimal code in many cases. For > example, the loop vectorizer unrolls the loops itself in order to > increase ILP. We don't want the type legalize to add additional ILP. > Additionally, if you have smaller types they may get legalized into > a single register. For example <8 x i8> or <8 x i8>. > > > 2. Add some kind of API to TTI that allows the default logic (# => > vector-size/type-size) to be overridden. This could be something > > specific to the problem at hand, such as: bool > > areFloatsStoredAsDoublesInVectors(), or something more general. > > Maybe something like: unsigned getSizeAsVectorElement(Type *T) > > which would return the number of bytes that the specified type > > actually takes up when used as a vector element. > > I prefer to fix the cost model. Do you see any problems with this > approach ? I understand that it is more difficult to select the VF > in the BB vectorizer, but I suspect that even if we add an > additional api (such as getTargetMaxVF) we will run into problems > with non-float data types.There is no problem with fixing the cost model; but doing that will not help the unnecessary compile time increase. Thanks again, Hal> > Thanks, > Nadav > >
>> >> I think that the best approach would be to improve the cost model >> because on blue gene <8 x float> ops would be more expensive than <4 >> x float>. > > I agree. The cost model needs to reflect this fact, I am wondering only about things to be done on top of that. Why should the loop vectorizer (or the BB vectorizer for that matter) spend time considering vectorization factors that will never be profitable?I can think of scenarios where larger VFs would be profitable. For example, loops that only copy memory.> >> >>> >>> 1. Do nothing and let the type legalization cost take care of >>> penalizing the formation of <8 x float> vectors. This may work, >>> but will cause an unnecessary compile-time increase. >> >> Not only that, it would produce suboptimal code in many cases. For >> example, the loop vectorizer unrolls the loops itself in order to >> increase ILP. We don't want the type legalize to add additional ILP. >> Additionally, if you have smaller types they may get legalized into >> a single register. For example <8 x i8> or <8 x i8>. >> >>> 2. Add some kind of API to TTI that allows the default logic (# =>>> vector-size/type-size) to be overridden. This could be something >>> specific to the problem at hand, such as: bool >>> areFloatsStoredAsDoublesInVectors(), or something more general. >>> Maybe something like: unsigned getSizeAsVectorElement(Type *T) >>> which would return the number of bytes that the specified type >>> actually takes up when used as a vector element. >> >> I prefer to fix the cost model. Do you see any problems with this >> approach ? I understand that it is more difficult to select the VF >> in the BB vectorizer, but I suspect that even if we add an >> additional api (such as getTargetMaxVF) we will run into problems >> with non-float data types. > > There is no problem with fixing the cost model; but doing that will not help the unnecessary compile time increase. > > Thanks again, > Hal > >> >> Thanks, >> Nadav >> >>
----- Original Message -----> From: "Nadav Rotem" <nrotem at apple.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Sunday, January 27, 2013 10:37:47 PM > Subject: Re: Floats as Doubles in Vectors > > >> > >> I think that the best approach would be to improve the cost model > >> because on blue gene <8 x float> ops would be more expensive than > >> <4 > >> x float>. > > > > I agree. The cost model needs to reflect this fact, I am wondering > > only about things to be done on top of that. Why should the loop > > vectorizer (or the BB vectorizer for that matter) spend time > > considering vectorization factors that will never be profitable? > > I can think of scenarios where larger VFs would be profitable. For > example, loops that only copy memory. >Alright, I think that I can see that. Thanks again, Hal> > > > >> > >>> > >>> 1. Do nothing and let the type legalization cost take care of > >>> penalizing the formation of <8 x float> vectors. This may work, > >>> but will cause an unnecessary compile-time increase. > >> > >> Not only that, it would produce suboptimal code in many cases. For > >> example, the loop vectorizer unrolls the loops itself in order to > >> increase ILP. We don't want the type legalize to add additional > >> ILP. > >> Additionally, if you have smaller types they may get legalized > >> into > >> a single register. For example <8 x i8> or <8 x i8>. > >> > >>> 2. Add some kind of API to TTI that allows the default logic (# > >>> => >>> vector-size/type-size) to be overridden. This could be something > >>> specific to the problem at hand, such as: bool > >>> areFloatsStoredAsDoublesInVectors(), or something more general. > >>> Maybe something like: unsigned getSizeAsVectorElement(Type *T) > >>> which would return the number of bytes that the specified type > >>> actually takes up when used as a vector element. > >> > >> I prefer to fix the cost model. Do you see any problems with this > >> approach ? I understand that it is more difficult to select the > >> VF > >> in the BB vectorizer, but I suspect that even if we add an > >> additional api (such as getTargetMaxVF) we will run into problems > >> with non-float data types. > > > > There is no problem with fixing the cost model; but doing that will > > not help the unnecessary compile time increase. > > > > Thanks again, > > Hal > > > >> > >> Thanks, > >> Nadav > >> > >> > >