thr3ads.net - llvm dev - [LLVMdev] [Mesa3d-dev] Folding vector instructions [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Corbin Simpson

2008-Dec-30 14:39 UTC

[LLVMdev] [Mesa3d-dev] Folding vector instructions

Alex wrote:> Hello.
> 
> Sorry I am not sure this question should go to llvm or mesa3d-dev mailing
> list, so I post it to both.
> 
> I am writing a llvm backend for a modern graphics processor which has a ISA
> very similar to that of Direct 3D.
> 
> I am reading the code in Gallium-3D driver in a mesa3d branch, which
> converts the shader programs (TGSI tokens) to LLVM IR.
> 
> For the shader instruction also found in LLVM IR, the conversion is
trivial:
> 
> <code>
> llvm::Value * Instructions::mul(llvm::Value *in1, llvm::Value *in2) {
>    return m_builder.CreateMul(in1, in2, name("mul")); //
m_builder is a
> llvm::IRBuilder
> }
> </code>
> 
> However, the special instrucions cannot directly be mapped to LLVM IR, like
> "min", the conversion involves in 'extract' the vector,
create
> less-than-compare, create 'select' instruction, and create
'insert-element'
> instruction.
> 
> <code>
> llvm::Value * Instructions::min(llvm::Value *in1, llvm::Value *in2)
> {
>    std::vector<llvm::Value*> vec1 = extractVector(in1); // generate
LLVM
> extract element
>    std::vector<llvm::Value*> vec2 = extractVector(in2);
> 
>    Value *xcmp  = m_builder.CreateFCmpOLT(vec1[0], vec2[0],
name("xcmp"));
>    Value *selx = m_builder.CreateSelect(xcmp, vec1[0], vec2[0],
>                                         name("selx"));
> 
>    Value *ycmp  = m_builder.CreateFCmpOLT(vec1[1], vec2[1],
name("ycmp"));
>    Value *sely = m_builder.CreateSelect(ycmp, vec1[1], vec2[1],
>                                         name("sely"));
> 
>    Value *zcmp  = m_builder.CreateFCmpOLT(vec1[2], vec2[2],
name("zcmp"));
>    Value *selz = m_builder.CreateSelect(zcmp, vec1[2], vec2[2],
>                                         name("selz"));
> 
>    Value *wcmp  = m_builder.CreateFCmpOLT(vec1[3], vec2[3],
name("wcmp"));
>    Value *selw = m_builder.CreateSelect(wcmp, vec1[3], vec2[3],
>                                         name("selw"));
>    return vectorFromVals(selx, sely, selz, selw); // generate LLVM
> 'insert-element'
> }
> </code>
> 
> Eventually all these should be folded to a 'min' instruction in the
codegen,
> so I wonder if the conversion only generates a simple 'call'
instruction to
> a 'min Function' will make the instruction selection easier (no
folding and
> complicated pattern-matching in the instruction selection DAG).
> 
> I don't have experience of the new vector instructions in LLVM, and
perhaps
> that's why it makes me feel it's complicated to fold the swizzle
and
> writemask.
> 
> Thanks.
I hope marcheu sees this too.

Um, I was thinking that we should eventually create intrinsic functions
for some of the commands, like LIT, that might not be
single-instruction, but that can be lowered eventually, and for commands
like LG2, that might be single-instruction for shaders, but probably not
for non-shader chipsets.

Unfortunately, I'm still learning LLVM, so I might be completely and
totally off-base here.

Out of curiosity, which chipset are you working on? R600? NV50?
Something else?

~ C.

Stephane Marchesin

2008-Dec-30 22:52 UTC

head link

[LLVMdev] [Mesa3d-dev] Folding vector instructions

On Tue, Dec 30, 2008 at 21:30, Chris Lattner <clattner at apple.com>
wrote:> On Dec 30, 2008, at 6:39 AM, Corbin Simpson wrote:
>>> However, the special instrucions cannot directly be mapped to LLVM
>>> IR, like
>>> "min", the conversion involves in 'extract' the
vector, create
>>> less-than-compare, create 'select' instruction, and create
'insert-
>>> element'
>>> instruction.
>
> Using scalar operations obviously works, but will probably produce
> very inefficient code.  One positive thing is that all target-specific
> operations of supported vector ISAs (Altivec and SSE[1-4] currently)
> are exposed either through LLVM IR ops or through target-specific
> builtins/intrinsics.  This means that you can get access to all the
> crazy SSE instructions, but it means that your codegen would have to
> handle this target-specific code generation.
Well, scalar is surely an option we're aiming at. NV50 or even your
regular FPU are examples of fully scalar architectures. As for SSE
generation, it was solved by using horizontal parallelism (i.e.
processing four fragments or vertices at once) instead of vertical
parallelism. Sadly this doens't work with GPUs.

So what remains are chips that are natively vector GPUs. The question
is more whether we'll be able to have llvm build up vector
instructions from scalar ones, and from my limited testing with SSE
and simple test programs it seemed to work, so I suppose the same can
be obtained from GPU targets.

Stephane

Zack Rusin

2008-Dec-30 23:03 UTC

head link

[LLVMdev] [Mesa3d-dev] Folding vector instructions

On Tuesday 30 December 2008 15:30:35 Chris Lattner
wrote:> On Dec 30, 2008, at 6:39 AM, Corbin Simpson wrote:
> >> However, the special instrucions cannot directly be mapped to LLVM
> >> IR, like
> >> "min", the conversion involves in 'extract' the
vector, create
> >> less-than-compare, create 'select' instruction, and create
'insert-
> >> element'
> >> instruction.
>
> Using scalar operations obviously works, but will probably produce
> very inefficient code.  One positive thing is that all target-specific
> operations of supported vector ISAs (Altivec and SSE[1-4] currently)
> are exposed either through LLVM IR ops or through target-specific
> builtins/intrinsics.  This means that you can get access to all the
> crazy SSE instructions, but it means that your codegen would have to
> handle this target-specific code generation.
I think Alex was referring here to a AOS layout which is completely not ready. 
The currently supported one is SOA layout which eliminates scalar operations.
> The direction we're going is to expose more and more vector operations
> in LLVM IR.  For example, compares and select are currently being
> worked on, so you can do a comparison of two vectors which returns a
> vector of bools, and use that as the compare value of a select
> instruction (selecting between two vectors).  This would allow
> implementing min and a variety of other operations and is easier for
> the codegen to reassemble into a first-class min operation etc.
>
> I don't know what the status of this is, I think it is partially
> implemented but may not be complete yet.
Ah, that's good to know!
> >> I don't have experience of the new vector instructions in
LLVM, and
> >> perhaps
> >> that's why it makes me feel it's complicated to fold the
swizzle and
> >> writemask.
>
> We have really good support for swizzling operations already with the
> shuffle_vector instruction.  I'm not sure about writemask.
With SOA they're rarely used (essentially never unless we "kill" a
pixel") [4
x <4 x float> ] {{xxxx, yyyy, zzzz, wwww}, {xxxx, yyyy, zzzz, www}...} so
with
SOA both shuffles and writemask come down to a simple selection of the element 
within the array (whether that will be good or bad is yet to be seen based on 
the code in gpu llvm backends that we'll have)
 > Sure, it would be very reasonable to make these target-specific
> builtins when targeting a GPU, the same way we have target-specific
> builtins for SSE.
Actually currently the plan is to have essentially a "two pass" LLVM
IR. I
wanted the first one to never lower any of the GPU instructions so we'd have
intrinsics or maybe even just function calls like gallium.lit, gallium.dot, 
gallium.noise and such. Then gallium should query the driver to figure out 
which instructions the GPU supports and runs our custom llvm lowering pass 
that decomposes those into things the GPU supports. Essentially I'd like to 
make as many complicated things in gallium as possible to make the GPU llvm 
backends in drivers as simple as possible and this would help us make the 
pattern matching in the generator /a lot/ easier (matching gallium.lit vs 9+ 
instructions it would be be decomposed to) and give us a more generic GPU 
independent layer above. But that hasn't been done yet, I hope to be able to
write that code while working on the OpenCL implementation for Gallium.

z

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Dec 2008 - [LLVMdev] [Mesa3d-dev] Folding vector instructions

[LLVMdev] [Mesa3d-dev] Folding vector instructions

[LLVMdev] [Mesa3d-dev] Folding vector instructions

[LLVMdev] [Mesa3d-dev] Folding vector instructions

Possibly Parallel Threads