thr3ads.net - llvm dev - [LLVMdev] How to vectorize a vector type cast? [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Gurd, Preston

2012-Feb-28 22:11 UTC

[LLVMdev] How to vectorize a vector type cast?

Since Clang does not seem to allow type casts, such as uchar4 to float4, between
vector types, it seems it is necessary to write them as element by element
conversions, such as

typedef float float4 __attribute__((ext_vector_type(4)));
typedef unsigned char uchar4 __attribute__((ext_vector_type(4)));

float4 to_float4(uchar4 in)
{
  float4 out = {in.x, in.y, in.z, in.w};
  return out;
}

Running this code through "clang -c -emit-llvm" and then through
"opt -O2 -S", produces the following IR:

define <4 x float> @to_float4(i32 %in.coerce) nounwind uwtable readnone {
entry:
  %0 = bitcast i32 %in.coerce to <4 x i8>
  %1 = extractelement <4 x i8> %0, i32 0
  %conv = uitofp i8 %1 to float
  %vecinit = insertelement <4 x float> undef, float %conv, i32 0
  %2 = extractelement <4 x i8> %0, i32 1
  %conv2 = uitofp i8 %2 to float
  %vecinit3 = insertelement <4 x float> %vecinit, float %conv2, i32 1
  %3 = extractelement <4 x i8> %0, i32 2
  %conv4 = uitofp i8 %3 to float
  %vecinit5 = insertelement <4 x float> %vecinit3, float %conv4, i32 2
  %4 = extractelement <4 x i8> %0, i32 3
  %conv6 = uitofp i8 %4 to float
  %vecinit7 = insertelement <4 x float> %vecinit5, float %conv6, i32 3
  ret <4 x float> %vecinit7

Which does the cast as a sequence of scalar operations, whereas it could be done
as

   %1 = uitofp <4 x i8> %0 to <4 x float>
   ret <4 x float> %1

It seemed to me that the recently committed basic block vectorizer might be able
to do this kind of optimization, but the current version does not do so.

Is this optimization the kind of thing that the bb-vectorizer is intended to be
able to do? And, if so, do you have any suggestions as to how that may be done?
Or, if not, can you suggest another possible way to parallelize this kind of
code?

Thanks,

Preston

--
Preston Gurd <preston.gurd at intel.com>
  Intel Waterloo


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120228/014cce40/attachment.html>

Eli Friedman

2012-Mar-01 20:28 UTC

head link

[LLVMdev] How to vectorize a vector type cast?

On Tue, Feb 28, 2012 at 2:11 PM, Gurd, Preston <preston.gurd at intel.com>
wrote:> Since Clang does not seem to allow type casts, such as uchar4 to float4,
> between vector types, it seems it is necessary to write them as element by
> element conversions, such as
>
>
>
> typedef float float4 __attribute__((ext_vector_type(4)));
>
> typedef unsigned char uchar4 __attribute__((ext_vector_type(4)));
>
>
>
> float4 to_float4(uchar4 in)
>
> {
>
>   float4 out = {in.x, in.y, in.z, in.w};
>
>   return out;
>
> }
I think that's right... we can represent them in IR, but I don't think
clang has a generic way to write them outside OpenCL mode.  Granted,
you can use platform-specific intrinsics (_mm_cvttps_epi32 etc.).
> Running this code through “clang –c –emit-llvm” and then through “opt –O2
> –S”, produces the following IR:
>
>
>
> define <4 x float> @to_float4(i32 %in.coerce) nounwind uwtable
readnone {
>
> entry:
>
>   %0 = bitcast i32 %in.coerce to <4 x i8>
>
>   %1 = extractelement <4 x i8> %0, i32 0
>
>   %conv = uitofp i8 %1 to float
>
>   %vecinit = insertelement <4 x float> undef, float %conv, i32 0
>
>   %2 = extractelement <4 x i8> %0, i32 1
>
>   %conv2 = uitofp i8 %2 to float
>
>   %vecinit3 = insertelement <4 x float> %vecinit, float %conv2, i32 1
>
>   %3 = extractelement <4 x i8> %0, i32 2
>
>   %conv4 = uitofp i8 %3 to float
>
>   %vecinit5 = insertelement <4 x float> %vecinit3, float %conv4, i32
2
>
>   %4 = extractelement <4 x i8> %0, i32 3
>
>   %conv6 = uitofp i8 %4 to float
>
>   %vecinit7 = insertelement <4 x float> %vecinit5, float %conv6, i32
3
>
>   ret <4 x float> %vecinit7
>
>
>
> Which does the cast as a sequence of scalar operations, whereas it could be
> done as
>
>
>
>    %1 = uitofp <4 x i8> %0 to <4 x float>
>
>    ret <4 x float> %1
>
>
>
> It seemed to me that the recently committed basic block vectorizer might be
> able to do this kind of optimization, but the current version does not do
> so.
Yes, that seems reasonable.

-Eli

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Feb 2012 - [LLVMdev] How to vectorize a vector type cast?

[LLVMdev] How to vectorize a vector type cast?

[LLVMdev] How to vectorize a vector type cast?

Maybe Matching Threads