thr3ads.net - llvm dev - [LLVMdev] Floats as Doubles in Vectors [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2013-Jan-27 22:07 UTC

[LLVMdev] Floats as Doubles in Vectors

Nadav, et al.,

On the BG/Q, the vectors hold 4 double-precision values. For vectorizing
single-precision code, there are single-precision-rounded instructions, and
special load/store instructions, which allow the double-precision numbers to be
treated as single-precision numbers. The problem is that the current
vectorization code (in the BBVectorizer and, as far as I can tell, also in the
LoopVectorizer) is going to assume that the vectors can hold 256/32 == 8
single-precision values even though the real maximum is 4 values. How should we
handle this? A couple of thoughts:

1. Do nothing and let the type legalization cost take care of penalizing the
formation of <8 x float> vectors. This may work, but will cause an
unnecessary compile-time increase.

2. Add some kind of API to TTI that allows the default logic (# ==
vector-size/type-size) to be overridden. This could be something specific to the
problem at hand, such as: bool areFloatsStoredAsDoublesInVectors(), or something
more general. Maybe something like: unsigned getSizeAsVectorElement(Type *T)
which would return the number of bytes that the specified type actually takes up
when used as a vector element.

I'm certainly open to suggestions. I don't know if this situation
applies to other targets as well.

Thanks again,
Hal

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Nadav Rotem

2013-Jan-28 03:16 UTC

head link

[LLVMdev] Floats as Doubles in Vectors

Hi Hal, 

I am not familiar with Blue Gene at all but I will do my best to answer. 
> is going to assume that the vectors can hold 256/32 == 8 single-precision
values even though the real maximum is 4 values. How should we handle this?
The loop vectorizer uses TTI to find the size of the vector register. It uses
this number to calculate the maximum vectorization factor. The formula is MaxVF
= (RegSize/LargestElementTypeSize).  Next it checks for the costs of all of the
possible vectorization factors from 1 (scalar) to MaxVF.

I think that the best approach would be to improve the cost model because on
blue gene <8 x float> ops would be more expensive than <4 x float>.
> 
> 1. Do nothing and let the type legalization cost take care of penalizing
the formation of <8 x float> vectors. This may work, but will cause an
unnecessary compile-time increase.
Not only that, it would produce suboptimal code in many cases. For example, the
loop vectorizer unrolls the loops itself in order to increase ILP. We don't
want the type legalize to add additional ILP. Additionally, if you have smaller
types they may get legalized into a single register. For example <8 x i8>
or <8 x i8>.
> 2. Add some kind of API to TTI that allows the default logic (# ==
vector-size/type-size) to be overridden. This could be something specific to the
problem at hand, such as: bool areFloatsStoredAsDoublesInVectors(), or something
more general. Maybe something like: unsigned getSizeAsVectorElement(Type *T)
which would return the number of bytes that the specified type actually takes up
when used as a vector element.
I prefer to fix the cost model. Do you see any problems with this approach ?  I
understand that it is more difficult to select the VF in the BB vectorizer, but
I suspect that even if we add an additional api (such as getTargetMaxVF) we will
run into problems with non-float data types.

Thanks,
Nadav

Hal Finkel

2013-Jan-28 03:51 UTC

head link

[LLVMdev] Floats as Doubles in Vectors

----- Original Message -----> From: "Nadav Rotem" <nrotem at apple.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Sunday, January 27, 2013 9:16:42 PM
> Subject: Re: Floats as Doubles in Vectors
> 
> Hi Hal,
> 
> I am not familiar with Blue Gene at all but I will do my best to
> answer.
> 
> > is going to assume that the vectors can hold 256/32 == 8
> > single-precision values even though the real maximum is 4 values.
> > How should we handle this?
> 
> The loop vectorizer uses TTI to find the size of the vector register.
> It uses this number to calculate the maximum vectorization factor.
> The formula is MaxVF = (RegSize/LargestElementTypeSize).  Next it
> checks for the costs of all of the possible vectorization factors
> from 1 (scalar) to MaxVF.
> 
> I think that the best approach would be to improve the cost model
> because on blue gene <8 x float> ops would be more expensive than
<4
> x float>.
I agree. The cost model needs to reflect this fact, I am wondering only about
things to be done on top of that. Why should the loop vectorizer (or the BB
vectorizer for that matter) spend time considering vectorization factors that
will never be profitable?
> 
> > 
> > 1. Do nothing and let the type legalization cost take care of
> > penalizing the formation of <8 x float> vectors. This may work,
> > but will cause an unnecessary compile-time increase.
> 
> Not only that, it would produce suboptimal code in many cases. For
> example, the loop vectorizer unrolls the loops itself in order to
> increase ILP. We don't want the type legalize to add additional ILP.
> Additionally, if you have smaller types they may get legalized into
> a single register. For example <8 x i8> or <8 x i8>.
> 
> > 2. Add some kind of API to TTI that allows the default logic (# =>
> vector-size/type-size) to be overridden. This could be something
> > specific to the problem at hand, such as: bool
> > areFloatsStoredAsDoublesInVectors(), or something more general.
> > Maybe something like: unsigned getSizeAsVectorElement(Type *T)
> > which would return the number of bytes that the specified type
> > actually takes up when used as a vector element.
> 
> I prefer to fix the cost model. Do you see any problems with this
> approach ?  I understand that it is more difficult to select the VF
> in the BB vectorizer, but I suspect that even if we add an
> additional api (such as getTargetMaxVF) we will run into problems
> with non-float data types.
There is no problem with fixing the cost model; but doing that will not help the
unnecessary compile time increase.

Thanks again,
Hal
> 
> Thanks,
> Nadav
> 
>

Renato Golin

2013-Jan-28 10:13 UTC

head link

[LLVMdev] Floats as Doubles in Vectors

On 28 January 2013 03:16, Nadav Rotem <nrotem at apple.com> wrote:
> I think that the best approach would be to improve the cost model because
> on blue gene <8 x float> ops would be more expensive than <4 x
float>.
>
+1. Clean and simple.

The few cycles you lose by getting the cost of all options (and realizing
one is 10 and the other is 100) could be negligible close to the complexity
of having a scheme where you can invalidate/or not, your computation at any
time, plus the decision of where to put all the shades of grey into these
two categories. If the cost model is accurate, the compiler will probably
find better solutions that we could possibly dream of. ;)

Plus, as Nadav said, there could be some extreme cases where it ends up
profitable.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20130128/247db442/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Jan 2013 - [LLVMdev] Floats as Doubles in Vectors

[LLVMdev] Floats as Doubles in Vectors

[LLVMdev] Floats as Doubles in Vectors

[LLVMdev] Floats as Doubles in Vectors

[LLVMdev] Floats as Doubles in Vectors

Apparently Analagous Threads