On 08/05/2014 03:46 PM, Robin Morisset wrote:>
> Hello everyone,
>
>
> I have recently started on optimizing C11/C++11 atomics in LLVM, and 
> plan to focus on that for the next two months as an intern in the 
> PNaCl team. I’ve sent two patches on this topic to Phabricator that 
> fix http://llvm.org/bugs/show_bug.cgi?id=17281:
>
> http://reviews.llvm.org/D4796
>
> http://reviews.llvm.org/D4797
>
First, welcome!  This is definitely an area which needs some attention.  
As you've already seen, I'll be happy to help review some of these 
patches.  We will need to find other qualified reviewers though.  As 
you've already seen, some of these issues are *extremely* subtle.
Fair warning, given how challenging this is to get right, and how hard 
it is to debug when wrong, you should expect very slow progress and 
quite a bit of time required for reviews.  Split your reviews into the 
smallest possible pieces you can to give reviewers a chance at 
understanding them.>
>
> The first patch is X86-specific, and tries to apply operations with 
> immediates to atomics without going through a register. The main 
> trouble here is that the X86 backend appears to respect LLVM memory 
> model instead of the x86-TSO memory model, and may reorder 
> instructions. In order to prevent illegal reordering of atomic 
> accesses, the backend converts atomic accesses to pseudo-instructions 
> in X86InstrCompiler.td (RELEASE_MOV* and ACQUIRE_MOV*) that are opaque 
> to most of the rest of the backend, and only lowers those at the very 
> end of the pipeline. I have decided to follow the same approach, just 
> adding some more RELEASE_* pseudo-instructions rather than trying to 
> find every possibly misbehaving part of the backend in order to do 
> early lowering. This lowers the risk and complexity of the patch, but 
> at the cost of possibly missing some optimization possibilities.
>
> Another trouble I had with this patch is a failure of TableGen type 
> inference when adding negate/not to the scheme. As a result I have 
> left these two instructions commented out in this patch. Does anyone 
> have an idea for how to proceed with this ?
>
>
> The second patch is more straightforward: in the C11/C++11 memory 
> model (that LLVM basically borrows), optimizations like DSE can safely 
> fire across atomic accesses in many cases, basically as long as they 
> are not operating across a release-acquire pair (see paper referenced 
> in the comments). So I tweaked MemoryDependenceAnalysis to track such 
> pairs and only return a clobber result in such cases.
>
I've already made comments on these in the respective review threads.
Other readers - Please, we need extra reviewers on these!  If you have 
experience reasoning about memory model issues, please chime
in.>
> My next steps will probably be to improve the compilation of acquire 
> atomics in the ARM backend. In particular, they are currently compiled 
> by a load + dmb, while a load + dependant useless branch + isb is also 
> valid (see http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html 
> <http://www.cl.cam.ac.uk/%7Epes20/cpp/cpp0xmappings.html>for example)
> and may be faster. Even better: if there is already a dependant branch 
> (such as the loop for the lowering of CAS), it is just a cheap isb. 
> The main step will be switching off the InsertFencesForAtomic flag, 
> and do the lowering of atomics in the backend, because once an acquire 
> load has been transformed in an acquire fence, too much information 
> has been lost to apply this mapping.
>
I'm unfamiliar with the details of the ARM architecture, so I won't be 
able to help you much here.>
> Longer term, I hope to improve the fence elimination of the ARM 
> backend with a kind of PRE algorithm. Both of these improvements to 
> the ARM backend should be fairly straightforward to port to the POWER 
> architecture later, and I hope to also do that.
>
Any reason these couldn't be done at the IR level?>
> Does this approach seem worthwhile to you ? Can I do anything to help 
> the review process ?
>
Small patches.  Lots of justification.  Many test cases.  Ping folks 
periodically.
Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140807/02a0267a/attachment.html>