search for: row0

Displaying 4 results from an estimated 4 matches for "row0".

Did you mean: row
2014 Sep 26
2
[LLVMdev] Canonicalizing vector masking.
Hi, I received an internal test case from a game team (it wasn't about this in particular), and I was wondering if there was maybe an opportunity to canonicalize a particular code pattern: %inputi = bitcast <4 x float> %input to <4 x i32> %row0i = and <4 x i32> %inputi, <i32 -1, i32 0, i32 0, i32 0> %row0 = bitcast <4 x i32> %row0i to <4 x float> %row1i = and <4 x i32> %inputi, <i32 0, i32 -1, i32 0, i32 0> %row1 = bitcast <4 x i32> %row1i to <4 x float> %row2i = and <4 x i32&gt...
2013 Oct 25
0
[LLVMdev] Is there pass to break down <4 x float> to scalars
...optimizer. A DSE is good enough. > > Which posted patch about TBAA? you have yet another solution except > decompose-vectors? Ah, no, the TBAA thing is separate really. llvmpipe generally operates on 4 rows at a time, so some functions end up with patterns like: load <16 x i8> row0 ... load <16 x i8> row1 ... load <16 x i8> row2 ... load <16 x i8> row3 ... ... do stuff ... store <16 x i8> row0 ... store <16 x i8> row1 ... store <16 x i8> row2 ... store <16 x i8> row3 ... Since the row stride is variable, llvm...
2013 Oct 25
3
[LLVMdev] Is there pass to break down <4 x float> to scalars
Hi, Richard, I think we are solving a same problem. I am working on shader language too. I am not satisfied with current binaries because vector operations are kept in llvm opt. glsl shader language has an operation called "swizzle". It can select sub-components of a vector. If a shader only takes components "xy" for a vec4. it's certainly wasteful to generate 4
2013 Oct 30
2
[LLVMdev] Is there pass to break down <4 x float> to scalars
...> > > Which posted patch about TBAA? you have yet another solution except > > decompose-vectors? > > Ah, no, the TBAA thing is separate really. llvmpipe generally operates > on 4 rows at a time, so some functions end up with patterns like: > > load <16 x i8> row0 ... > load <16 x i8> row1 ... > load <16 x i8> row2 ... > load <16 x i8> row3 ... > ... do stuff ... > store <16 x i8> row0 ... > store <16 x i8> row1 ... > store <16 x i8> row2 ... > store <16 x i8> row3 ......