Displaying 4 results from an estimated 4 matches for "row0".
Did you mean:
row
2014 Sep 26
2
[LLVMdev] Canonicalizing vector masking.
Hi, I received an internal test case from a game team (it wasn't about this
in particular), and I was wondering if there was maybe an opportunity to
canonicalize a particular code pattern:
%inputi = bitcast <4 x float> %input to <4 x i32>
%row0i = and <4 x i32> %inputi, <i32 -1, i32 0, i32 0, i32 0>
%row0 = bitcast <4 x i32> %row0i to <4 x float>
%row1i = and <4 x i32> %inputi, <i32 0, i32 -1, i32 0, i32 0>
%row1 = bitcast <4 x i32> %row1i to <4 x float>
%row2i = and <4 x i32>...
2013 Oct 25
0
[LLVMdev] Is there pass to break down <4 x float> to scalars
...optimizer. A DSE is good enough.
>
> Which posted patch about TBAA? you have yet another solution except
> decompose-vectors?
Ah, no, the TBAA thing is separate really. llvmpipe generally operates
on 4 rows at a time, so some functions end up with patterns like:
load <16 x i8> row0 ...
load <16 x i8> row1 ...
load <16 x i8> row2 ...
load <16 x i8> row3 ...
... do stuff ...
store <16 x i8> row0 ...
store <16 x i8> row1 ...
store <16 x i8> row2 ...
store <16 x i8> row3 ...
Since the row stride is variable, llvm...
2013 Oct 25
3
[LLVMdev] Is there pass to break down <4 x float> to scalars
Hi, Richard,
I think we are solving a same problem. I am working on shader language
too. I am not satisfied with current binaries because vector operations
are kept in llvm opt.
glsl shader language has an operation called "swizzle". It can select
sub-components of a vector. If a shader only takes components "xy" for a
vec4. it's certainly wasteful to generate 4
2013 Oct 30
2
[LLVMdev] Is there pass to break down <4 x float> to scalars
...>
> > Which posted patch about TBAA? you have yet another solution except
> > decompose-vectors?
>
> Ah, no, the TBAA thing is separate really. llvmpipe generally operates
> on 4 rows at a time, so some functions end up with patterns like:
>
> load <16 x i8> row0 ...
> load <16 x i8> row1 ...
> load <16 x i8> row2 ...
> load <16 x i8> row3 ...
> ... do stuff ...
> store <16 x i8> row0 ...
> store <16 x i8> row1 ...
> store <16 x i8> row2 ...
> store <16 x i8> row3 ......