thr3ads.net - llvm dev - [LLVMdev] llvm, gpu execution environments [May 2007]

If this information is useful, please help other people find it:
Share via:

Keith Whitwell

2007-May-18 10:23 UTC

[LLVMdev] llvm, gpu execution environments

I'm interested in understanding the extent of the assumptions which llvm 
makes about the types of hardware it is capable of targeting.

In particular, I'm investigating a proposal by Zack Rusin to use llvm as 
the shader compilation engine within Mesa, targeting GPU backends.

I'm aware of the Apple GLSL compiler, and also I've seen the Vector LLVA
paper.  However, I'm not sure that either of these quite bridges the gap 
to the execution environment provided by modern GPUs.

Though there are a couple of question marks, I'll pick the most obvious 
one:

It seems that LLVA and by extension Vector-LLVA assumes that looping and 
branching control flow can be expressed in terms of a simple "br"
branch
operation.

Typically GPU environments cannot provide such a facility as they tend 
to run 16, 32 or 64 simd threads all with the same program counter. 
Though this is a wide vector environment, each of the threads is 
typically a scalar program and at any branch point, some of those 
threads may take the branch and some not.  So, to provide dynamic 
branching facilities in this environment, you end up with per-channel 
execution masks, and opcodes like "IF", "THEN", and
"ELSE" which
manipulate those per-channel masks, and use stack semantics for pushing 
and popping masks to emulate nested control structures.

This is probably all very familiar to anybody who's thought about simd 
program execution.  But it means that GPUs, and low-level GPU 
abstractions tend not to have branch instructions.

The question then, is to what extent it is possible to target this type 
of execution environment with LLVM and the LLVA/Vector-LLVA ISAs???

Is it necessary (or feasible) to try to analyse LLVA programs and 
extract IF/THEN/ELSE semantics from a set of arbitary branch instructions?

Is it possible to extend LLVA with these 'high level' control flow 
instructions and end up generating those instead of branches, and if so 
how does that affect the rest of LLVM?

Is it for some reason just not feasible at all?

Keith

Keith Whitwell

2007-May-19 07:57 UTC

head link

[LLVMdev] llvm, gpu execution environments

Chris Lattner wrote:> On Fri, 18 May 2007, Keith Whitwell wrote:
>> I'm interested in understanding the extent of the assumptions which
llvm
>> makes about the types of hardware it is capable of targeting.
> 
> Different pieces of the compiler make different assumptions.  In 
> particular, the code generator we ship is good for targetting certain 
> classes of devices, but isn't fully general (it doesn't help if
you're
> synthesizing a netlist from llvm, for example).
> 
>> In particular, I'm investigating a proposal by Zack Rusin to use
llvm as
>> the shader compilation engine within Mesa, targeting GPU backends.
> 
> Ok
> 
>> It seems that LLVA and by extension Vector-LLVA assumes that looping
and
>> branching control flow can be expressed in terms of a simple
"br" branch
>> operation.
> 
> LLVA is not a part of LLVM, so I won't answer for it.
OK, I guess I misunderstood the papers I pulled down - my impression was 
that at some stage programs in llvm would be represented in LLVA.

What, out of interest, is the relationship between LLVM and LLVA?
>> Typically GPU environments cannot provide such a facility as they tend
>> to run 16, 32 or 64 simd threads all with the same program counter.
>> Though this is a wide vector environment, each of the threads is
>> typically a scalar program and at any branch point, some of those
>> threads may take the branch and some not.  So, to provide dynamic
>> branching facilities in this environment, you end up with per-channel
>> execution masks, and opcodes like "IF", "THEN", and
"ELSE" which
>> manipulate those per-channel masks, and use stack semantics for pushing
>> and popping masks to emulate nested control structures.
> 
> Right, it's basically a form of predication.
Yes, a set of high-level-ish instructions layered on dynamic predication 
to give a very close match to the normally understood dynamic branching 
and looping constructs we're all familiar with.

It's initially quite odd to see something like "ELSE" or
"WHILE" as
hardware opcodes, but it makes sense under the hood.
>> This is probably all very familiar to anybody who's thought about
simd
>> program execution.  But it means that GPUs, and low-level GPU
>> abstractions tend not to have branch instructions.
>>
>> The question then, is to what extent it is possible to target this type
>> of execution environment with LLVM and the LLVA/Vector-LLVA ISAs???
>> Is it necessary (or feasible) to try to analyse LLVA programs and
>> extract IF/THEN/ELSE semantics from a set of arbitary branch
instructions?
>> Is it possible to extend LLVA with these 'high level' control
flow
>> instructions and end up generating those instead of branches, and if so
>> how does that affect the rest of LLVM?
> 
> The code generator and llvm should be able to handle this just fine, with 
> only minimal extensions.
> 
> Basically, you want to model this as predicated execution, and you want 
> the code generator to predicate away as many branches etc as possible.
> 
> One observation can be made though: there will always be some programs 
> that you can't map onto the hardware.  For example, if you don't
have
> branches, you can't do loops that execute for a variable number of 
> iterations.
Actually, you can - there is a program counter, the loop keeps executing 
until the execution mask reaches zero.  Likewise branches are dynamic.

> As such, I'd structure the compiler as a typical code generator with an
> early predication pass that flattens branches.  If you get to the end of 
> the codegen and have some dynamic branches left, you detect the error 
> condition and reject the shader from the hardware path (so you have to 
> emulate it in software).
The hardware *does* support dynamic branching, and looping, provided it 
is expressed in IF/THEN/ELSE, LOOP/BREAK/CONTINUE/ENDLOOP type 
instructions.  Even CALL/RETURN.  The only thing it can't do is execute 
something like "GOTO" or "BRANCH" dynamically.
> Does this make sense?
Yes, but at slight cross-purposes.

There are no cases where compilation should fail to produce a hardware 
executable result, within the constraints of the high-level language we 
are compiling.  Dynamic branches and looping are entirely within the 
capability of the hardware, provided they are expressed in terms of the 
hardware IF/THEN/ELSE, LOOP/ENDLOOP, etc, opcodes.

But it seems like my initial understanding of the intermediate 
representation within llvm is incorrect & I probably should just dive 
into the source to figure out what's going on.

My concern was that llva throws away the information that I'd probably 
need to reconstruct these high-level opcodes required by the hardware - 
if the code generator can come in at a higher level while that 
information still exists, then a lot of things get easier.

Keith

Chris Lattner

2007-May-19 08:01 UTC

head link

[LLVMdev] llvm, gpu execution environments

On Fri, 18 May 2007, Keith Whitwell wrote:> I'm interested in understanding the extent of the assumptions which
llvm
> makes about the types of hardware it is capable of targeting.
Different pieces of the compiler make different assumptions.  In 
particular, the code generator we ship is good for targetting certain 
classes of devices, but isn't fully general (it doesn't help if
you're
synthesizing a netlist from llvm, for example).
> In particular, I'm investigating a proposal by Zack Rusin to use llvm
as
> the shader compilation engine within Mesa, targeting GPU backends.
Ok
> It seems that LLVA and by extension Vector-LLVA assumes that looping and
> branching control flow can be expressed in terms of a simple "br"
branch
> operation.
LLVA is not a part of LLVM, so I won't answer for it.
> Typically GPU environments cannot provide such a facility as they tend
> to run 16, 32 or 64 simd threads all with the same program counter.
> Though this is a wide vector environment, each of the threads is
> typically a scalar program and at any branch point, some of those
> threads may take the branch and some not.  So, to provide dynamic
> branching facilities in this environment, you end up with per-channel
> execution masks, and opcodes like "IF", "THEN", and
"ELSE" which
> manipulate those per-channel masks, and use stack semantics for pushing
> and popping masks to emulate nested control structures.
Right, it's basically a form of predication.
> This is probably all very familiar to anybody who's thought about simd
> program execution.  But it means that GPUs, and low-level GPU
> abstractions tend not to have branch instructions.
>
> The question then, is to what extent it is possible to target this type
> of execution environment with LLVM and the LLVA/Vector-LLVA ISAs???
> Is it necessary (or feasible) to try to analyse LLVA programs and
> extract IF/THEN/ELSE semantics from a set of arbitary branch instructions?
> Is it possible to extend LLVA with these 'high level' control flow
> instructions and end up generating those instead of branches, and if so
> how does that affect the rest of LLVM?
The code generator and llvm should be able to handle this just fine, with 
only minimal extensions.

Basically, you want to model this as predicated execution, and you want 
the code generator to predicate away as many branches etc as possible.

One observation can be made though: there will always be some programs 
that you can't map onto the hardware.  For example, if you don't have 
branches, you can't do loops that execute for a variable number of 
iterations.

As such, I'd structure the compiler as a typical code generator with an 
early predication pass that flattens branches.  If you get to the end of 
the codegen and have some dynamic branches left, you detect the error 
condition and reject the shader from the hardware path (so you have to 
emulate it in software).

Does this make sense?

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - May 2007 - [LLVMdev] llvm, gpu execution environments

[LLVMdev] llvm, gpu execution environments

[LLVMdev] llvm, gpu execution environments

[LLVMdev] llvm, gpu execution environments

Seemingly Similar Threads