thr3ads.net - search: "4xf32"

Displaying 5 results from an estimated 5 matches for "4xf32".

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

...or_type(8) )) float float8; float8 f8; float4 f4a, f4b, f4c; f4a = f8.hi; f8.hi = f4b; f8.lo = f4c; where hi and lo represent the high half and low half of the vector. The outgoing IR is %f4a = shufflevector <8xf32>%f8, undef, <4xi32> <0, 1, 2, 3> %f8 = shufflevector <4xf32>%f4b, <4xf32>%f4c, <8xi32> <0, 1, 2, 3, 4, 5, 6, 7> The problem with generating insert and extracts is that we can generate poor code %tmp16 = extractelement <4 x float> %f4b, i32 0 %f8a = insertelement <8 x float> %f8a, float %tmp16, i32 0...

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

...ented using it (not that I necessarily think they should be, it's just a nice side effect). If this is feasible, it would be nice to extend it all the way. This lets you do things like: float3 x; float4 y; // ... y.xyz = x; as a single shufflevector, e.g.: %y2 = shufflevector <4xf32> %y1, <3xf32> %x, <4, 5, 6, 3> I assume my proposed generalization can't hurt codegen, since it could always be turned into a sequence of insert and extracts which would provide the same behaviour as today. -- Stefanus Du Toit <stefanus.dutoit at rapidmind.com> Rap...

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

...e, it's just a nice side effect). > > If this is feasible, it would be nice to extend it all the way. This > lets you do things like: > > float3 x; > float4 y; > > // ... > > y.xyz = x; > > as a single shufflevector, e.g.: > > %y2 = shufflevector <4xf32> %y1, <3xf32> %x, <4, 5, 6, 3> > > I assume my proposed generalization can't hurt codegen, since it could > always be turned into a sequence of insert and extracts which would > provide the same behaviour as today. > > -- > Stefanus Du Toit <stefanus.dutoi...

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

...areful in combining vector shuffles because we don't > want to produce a vector shuffle whose mask is illegal or hard to code gen > so we end up in this code to generate a sequence of unpcks and movhlps for > this. With the new form, Legalize will divide the 8xf32 vector into two > 4xf32 and since the two sides are the same, it will generate quad word moves > to copy the values. I think this specific issue can be fixed without extending the IL-level syntax; DAGCombiner could easily be made a lot more clever about cases like this. For example, before legalization, we can transf...

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

...; If this is feasible, it would be nice to extend it all the way. This >> lets you do things like: >> >> float3 x; >> float4 y; >> >> // ... >> >> y.xyz = x; >> >> as a single shufflevector, e.g.: >> >> %y2 = shufflevector <4xf32> %y1, <3xf32> %x, <4, 5, 6, 3> >> >> I assume my proposed generalization can't hurt codegen, since it >> could >> always be turned into a sequence of insert and extracts which would >> provide the same behaviour as today. >> >> -- >&...

search for: 4xf32