I was working on compare and swap and ran into the following problem. Several architectures implement this with a load locked, store conditional sequence. This is good, for those archs I can write generic code to legalize a compare and swap (and most other atomic ops) to load locked store conditional sequences (then the arch only had to give the instr for ldl, stc to support all atomic ops (this applies to mips, arm, ppc, and alpha)). However, I have to split the basic block at the CAS instruction and create two more basic blocks. This isn't currently possible during legalize, nor during the initial SelectionDAG formation (the tricks switch lowering uses only work for terminator instructions). Anyone have an idea? The patch as it stands is attached below. X86 is a pseudo instruction because the necessary ones and prefixes aren't in the code gen yet, but I would imagine they will be (so ignore that ugliness). The true ugliness can be seen in the alpha impl which open codes it, including a couple relative branches. The code sequence for alpha is identical to ppc, mips, and arm, so it would be nice to lower these to the correct sequences before code gen rather than splitting (or hiding as I did here) basic blocks after code gen. Andrew -------------- next part -------------- A non-text attachment was scrubbed... Name: lcs.patch Type: text/x-diff Size: 15127 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080219/9554a14a/attachment.patch>
The current *hack* solution is to mark your pseudo instruction with usesCustomDAGSchedInserter = 1. That allows the targets to expand it at scheduling time by providing a EmitInstrWithCustomInserter() hook. You can create new basic blocks then. Evan On Feb 19, 2008, at 4:51 PM, Andrew Lenharth wrote:> I was working on compare and swap and ran into the following problem. > Several architectures implement this with a load locked, store > conditional sequence. This is good, for those archs I can write > generic code to legalize a compare and swap (and most other atomic > ops) to load locked store conditional sequences (then the arch only > had to give the instr for ldl, stc to support all atomic ops (this > applies to mips, arm, ppc, and alpha)). However, I have to split the > basic block at the CAS instruction and create two more basic blocks. > > This isn't currently possible during legalize, nor during the initial > SelectionDAG formation (the tricks switch lowering uses only work for > terminator instructions). > > Anyone have an idea? The patch as it stands is attached below. X86 > is a pseudo instruction because the necessary ones and prefixes aren't > in the code gen yet, but I would imagine they will be (so ignore that > ugliness). The true ugliness can be seen in the alpha impl which open > codes it, including a couple relative branches. The code sequence for > alpha is identical to ppc, mips, and arm, so it would be nice to lower > these to the correct sequences before code gen rather than splitting > (or hiding as I did here) basic blocks after code gen. > > Andrew > <lcs.patch>_______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On 2/19/08, Evan Cheng <evan.cheng at apple.com> wrote:> The current *hack* solution is to mark your pseudo instruction with > usesCustomDAGSchedInserter = 1. That allows the targets to expand it > at scheduling time by providing a EmitInstrWithCustomInserter() hook. > You can create new basic blocks then.I guess that can work in the short term. It just seems wasteful for each target that uses ldl/stc sequences to have to all implement it. But if that is what we can do right now, I'll give that a shot. Thanks, Andrew
On Wednesday 20 February 2008 01:51, Andrew Lenharth wrote:> Anyone have an idea? The patch as it stands is attached below. X86 > is a pseudo instruction because the necessary ones and prefixes aren't > in the code gen yet, but I would imagine they will be (so ignore that > ugliness). The true ugliness can be seen in the alpha impl which open > codes it, including a couple relative branches. The code sequence for > alpha is identical to ppc, mips, and arm, so it would be nice to lower > these to the correct sequences before code gen rather than splitting > (or hiding as I did here) basic blocks after code gen.Andrew, why is the intrinsic name not CAS? And having another version that returns just a bool might be better in some cases ( 1. does CAS return the value on all architectures? 2. you can just jump based on a flag and don't need to compare it again). Just my 2 cents though ... torvald
On 2/21/08, Torvald Riegel <torvald at se.inf.tu-dresden.de> wrote:> why is the intrinsic name not CAS? And having another version that returns > just a bool might be better in some cases ( 1. does CAS return the value on > all architectures? 2. you can just jump based on a flag and don't need to > compare it again). Just my 2 cents though ...I was going from chandler's docs, but it could be renamed trivially (and I almost did at several points). 1) yes, but on some it may be easier to have a bool version than others. 2.a) to get the bool, the x86 (and some others) backend would have to generate the compare instruction anyway, so you don't save anything by having a bool version. 2.b) in the case of a load locked store conditional based backend, the bool version would save a compare if the store conditional has the typical returns success or failure semantics. So, yes, a CAS that returned bool could be useful. However, it is pretty easy to pattern match CAS -> Compare in those backends that can save the compare by testing the result of the store conditional. Andrew
Torvald Riegel wrote:> On Wednesday 20 February 2008 01:51, Andrew Lenharth wrote: >> Anyone have an idea? The patch as it stands is attached below. X86 >> is a pseudo instruction because the necessary ones and prefixes aren't >> in the code gen yet, but I would imagine they will be (so ignore that >> ugliness). The true ugliness can be seen in the alpha impl which open >> codes it, including a couple relative branches. The code sequence for >> alpha is identical to ppc, mips, and arm, so it would be nice to lower >> these to the correct sequences before code gen rather than splitting >> (or hiding as I did here) basic blocks after code gen. > > Andrew, > > why is the intrinsic name not CAS?Because, fundamentally, it loads, compares, and conditionally stores. There is no concept of a "swap" in SSA, so removing that aspect of the atomic primitives makes the *LLVM* representation easier to understand.> And having another version that returns > just a bool might be better in some cases ( 1. does CAS return the value on > all architectures?Check the page (http://chandlerc.net/llvm_atomics.html -- the implementation info is still current, even though the docs are not) for how this gets implemented. As Andrew has already pointed out, on x86, the LLVM behavior maps to the underlying architecture. Other architectures which might avoid a compare can easily do so by pattern matching in the codegen. I'm not saying this is 100% correct mapping of all architectures, but it seems very clean, and not to introduce performance issues on any.> 2. you can just jump based on a flag and don't need to > compare it again). Just my 2 cents though ...Again, pattern matching can enable the architectures which don't need to compare again, to in fact not do so, but some architectures will *need* to compare again in order to determine the bool value. My strongest feeling is that "swap" has no place in an SSA IR, and the idea of atomically loading, comparing, and storing is far more in keeping. In fact, I thought the "swap" instrinsic had even been re-named to "ls" for load-store at some point this summer.. Do you have those changes Andrew? In any event, those are the reasons I had for moving away from "swap" in the design process, and as Andrew said he was primarily basing the implementation on that work. -Chandler> > torvald > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev