thr3ads.net - llvm dev - [LLVMdev] Available code-generation parallism [Nov 2008]

If this information is useful, please help other people find it:
Share via:

heisenbug

2008-Nov-03 23:55 UTC

[LLVMdev] Available code-generation parallism

On 3 Nov., 10:06, Chris Lattner <clatt... at apple.com>
wrote:> On Nov 2, 2008, at 2:20 PM, Jonathan Brandmeyer wrote:
>
> > I am interested in making my LLVM front-end multi-threaded in a way
> > similar to the GCC compiler server proposal and was wondering about  
> > the
> > extent that the LLVM passes support it.
>
> Do you have a link for this?  I'm not familiar with any parallelism  
> proposed by that project.  My understanding was that it was mostly  
> about sharing across invocations of the compiler.
>
> > Expression-at-a-time parallel construction:
> > If function definitions are built purely depth-first, such that the
> > parent pointers are not provided as they are created, what will break?
> > I noted that the function and module verifiers aren't complaining,
at
> > least not yet.  Is there a generic "fixup upward-pointing parent
> > pointers" pass that can be run afterwords?  If not, do I need to
> > implement and perform that pass?  I suspect that emitting code for
> > individual expressions in parallel will probably end up being too
> > fine-grained, which leads me to...
>
> Are you talking about building your AST or about building LLVM IR.  
> The rules for constructing your AST are pretty much defined by you.  
> The rules for constructing LLVM IR are a bit more tricky.  The most  
> significant issue right now is that certain objects in LLVM IR are  
> uniqued (like constants) and these have use/def chains.  Since use/def  
> chain updating is not atomic or locked, this means that you can't  
> create llvm ir on multiple threads.  This is something that I'm very  
> much interested in solving someday, but no one is working on it at  
> this time (that I'm aware of).
What about "inventing" pseudo-constants (which point to the right
thing) and build the piece of IR with them. When done, grab mutex and
RAUW it in. Alternatively, submit to a privileged thread that performs
the RAUW.
The trick is to prepare the def/use chain(s) to a degree that the
mutex is only held a minimal time. If only IR-builder threads are
running concurrently there is no danger that a real constant vanishes,
leaving behind a stale reference from a pseudo-constant.

Any major headaches I have ignored?

Cheers,

   Gabor

>
> > Function-at-a-time parallel construction:
> > Which (if any) LLVM objects support the object-level thread safety
> > guarantee?  If I construct two separate function pass managers in
> > separate threads and use them to optimize and emit object code for
> > separate llvm::Function definitions in the program, will this work?
> > Same question for llvm::Modules.
>
> Unfortunately, for the above reason... basically none.  The LLVM code  
> generators are actually very close to being able to run in parallel.  
> The major issue is that they run a few llvm IR level passes first (LSR  
> and codegen prepare) that hack on LLVM IR before the code generators  
> run.  Because of this, they inherit the limitations of LLVM IR  
> passes.  Very long term, I'd really like to make the code generator  
> not affect the LLVM IR being put into them, but this is not likely to  
> happen anytime in the near future.
>
> If you're interested in this, tackling the use/def atomicity issues  
> would be a great place to start.
>
> -Chris
>
> _______________________________________________
> LLVM Developers mailing list
> LLVM... at cs.uiuc.edu      
 http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Chris Lattner

2008-Nov-04 07:59 UTC

head link

[LLVMdev] Available code-generation parallism

On Nov 3, 2008, at 3:55 PM, heisenbug wrote:> What about "inventing" pseudo-constants (which point to the right
> thing) and build the piece of IR with them. When done, grab mutex and
> RAUW it in. Alternatively, submit to a privileged thread that performs
> the RAUW.
> The trick is to prepare the def/use chain(s) to a degree that the
> mutex is only held a minimal time. If only IR-builder threads are
> running concurrently there is no danger that a real constant vanishes,
> leaving behind a stale reference from a pseudo-constant.
That could work.  It would also have to be done for global values as  
well, and inline asm objects etc.  However, I don't see any show- 
stoppers.  The implementation could be tricky, but a nice property of  
your approach is that the single threaded case could be made  
particularly fast (instead of doing atomic ops or locking always).

-Chris

Jonathan Brandmeyer

2008-Nov-06 03:22 UTC

head link

[LLVMdev] Available code-generation parallelism

On Mon, 2008-11-03 at 23:59 -0800, Chris Lattner wrote: > On Nov 3, 2008, at 3:55 PM, heisenbug wrote:
> > What about "inventing" pseudo-constants (which point to the
right
> > thing) and build the piece of IR with them. When done, grab mutex and
> > RAUW it in. Alternatively, submit to a privileged thread that performs
> > the RAUW.
> > The trick is to prepare the def/use chain(s) to a degree that the
> > mutex is only held a minimal time. If only IR-builder threads are
> > running concurrently there is no danger that a real constant vanishes,
> > leaving behind a stale reference from a pseudo-constant.
> 
> That could work.  It would also have to be done for global values as  
> well, and inline asm objects etc.  However, I don't see any show- 
> stoppers.  The implementation could be tricky, but a nice property of  
> your approach is that the single threaded case could be made  
> particularly fast (instead of doing atomic ops or locking always).
It might work for the IR construction phase, but not for optimization
and emitting object code.  The locking issue is going to be severe
because it will be nearly (completely?) impossible to guarantee a
globally consistent lock order for any given def/use chain.  Therefore,
such a solution would require a kind of high-level contention manager
akin to software transactional memory (STM).  Even the fastest STMs in
research are much slower than locking.  I think that there is a better
way.

I would like to propose a different solution: Lift all internalized
objects to be unique at the Module level instead of globally.  This will
require an initial pass to be performed (called IRFinalize?), and
equality of Type objects by pointer comparison will not be valid until
after this pass is complete.  The Module is already the unit of
compilation once LLVM IR has been initially emitted for most cases, and
it should be straightfoward to structure the pass such that it can work
on single functions if the user is compiling at that level.

The IRFinalize pass can allocate the bookkeeping storage for identifying
duplicate Constants and Types and then release it so long as none of the
optimization and analysis passes 1) Emit new Types or 2) are broken by
duplicate Constants.

-Jonathan

PS: What is RAUW?  I'll volunteer the clerical work of adding it to the
Lexicon if you'd be kind enough to hand me a small dose of clue :)
> -Chris
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Nov 2008 - [LLVMdev] Available code-generation parallism

[LLVMdev] Available code-generation parallism

[LLVMdev] Available code-generation parallism

[LLVMdev] Available code-generation parallelism

Apparently Analagous Threads