Since Clang does not seem to allow type casts, such as uchar4 to float4, between vector types, it seems it is necessary to write them as element by element conversions, such as typedef float float4 __attribute__((ext_vector_type(4))); typedef unsigned char uchar4 __attribute__((ext_vector_type(4))); float4 to_float4(uchar4 in) { float4 out = {in.x, in.y, in.z, in.w}; return out; } Running this code through "clang -c -emit-llvm" and then through "opt -O2 -S", produces the following IR: define <4 x float> @to_float4(i32 %in.coerce) nounwind uwtable readnone { entry: %0 = bitcast i32 %in.coerce to <4 x i8> %1 = extractelement <4 x i8> %0, i32 0 %conv = uitofp i8 %1 to float %vecinit = insertelement <4 x float> undef, float %conv, i32 0 %2 = extractelement <4 x i8> %0, i32 1 %conv2 = uitofp i8 %2 to float %vecinit3 = insertelement <4 x float> %vecinit, float %conv2, i32 1 %3 = extractelement <4 x i8> %0, i32 2 %conv4 = uitofp i8 %3 to float %vecinit5 = insertelement <4 x float> %vecinit3, float %conv4, i32 2 %4 = extractelement <4 x i8> %0, i32 3 %conv6 = uitofp i8 %4 to float %vecinit7 = insertelement <4 x float> %vecinit5, float %conv6, i32 3 ret <4 x float> %vecinit7 Which does the cast as a sequence of scalar operations, whereas it could be done as %1 = uitofp <4 x i8> %0 to <4 x float> ret <4 x float> %1 It seemed to me that the recently committed basic block vectorizer might be able to do this kind of optimization, but the current version does not do so. Is this optimization the kind of thing that the bb-vectorizer is intended to be able to do? And, if so, do you have any suggestions as to how that may be done? Or, if not, can you suggest another possible way to parallelize this kind of code? Thanks, Preston -- Preston Gurd <preston.gurd at intel.com> Intel Waterloo -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120228/014cce40/attachment.html>
On Tue, Feb 28, 2012 at 2:11 PM, Gurd, Preston <preston.gurd at intel.com> wrote:> Since Clang does not seem to allow type casts, such as uchar4 to float4, > between vector types, it seems it is necessary to write them as element by > element conversions, such as > > > > typedef float float4 __attribute__((ext_vector_type(4))); > > typedef unsigned char uchar4 __attribute__((ext_vector_type(4))); > > > > float4 to_float4(uchar4 in) > > { > > float4 out = {in.x, in.y, in.z, in.w}; > > return out; > > }I think that's right... we can represent them in IR, but I don't think clang has a generic way to write them outside OpenCL mode. Granted, you can use platform-specific intrinsics (_mm_cvttps_epi32 etc.).> Running this code through “clang –c –emit-llvm” and then through “opt –O2 > –S”, produces the following IR: > > > > define <4 x float> @to_float4(i32 %in.coerce) nounwind uwtable readnone { > > entry: > > %0 = bitcast i32 %in.coerce to <4 x i8> > > %1 = extractelement <4 x i8> %0, i32 0 > > %conv = uitofp i8 %1 to float > > %vecinit = insertelement <4 x float> undef, float %conv, i32 0 > > %2 = extractelement <4 x i8> %0, i32 1 > > %conv2 = uitofp i8 %2 to float > > %vecinit3 = insertelement <4 x float> %vecinit, float %conv2, i32 1 > > %3 = extractelement <4 x i8> %0, i32 2 > > %conv4 = uitofp i8 %3 to float > > %vecinit5 = insertelement <4 x float> %vecinit3, float %conv4, i32 2 > > %4 = extractelement <4 x i8> %0, i32 3 > > %conv6 = uitofp i8 %4 to float > > %vecinit7 = insertelement <4 x float> %vecinit5, float %conv6, i32 3 > > ret <4 x float> %vecinit7 > > > > Which does the cast as a sequence of scalar operations, whereas it could be > done as > > > > %1 = uitofp <4 x i8> %0 to <4 x float> > > ret <4 x float> %1 > > > > It seemed to me that the recently committed basic block vectorizer might be > able to do this kind of optimization, but the current version does not do > so.Yes, that seems reasonable. -Eli
Maybe Matching Threads
- [RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
- [RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
- [RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
- [LLVMdev] Question on Machine Combiner Pass
- Remove zext-unfolding from InstCombine