Hi all, I've decided the next step in atomics is the weak compare-and-exchange operation. As with the failure order, I'm going t outline the direction I'd like to take: 1. All cmpxchg instructions now return { iN, i1 } where the first value is what we got before (the loaded result), the second == 1 if an exchange took place. 1. "weak" is an optional modifier to the cmpxchg instructions. If anyone wants a bikeshed to paint, this would be a good one. I wasn't sure myself. Theoretically, we could only modify the return type of the weak variant, but I think that making it a global change helps keep the IR uniform. Additionally, this extra bit actually exists on most platforms even for the strong cmpxchg: on LL/SC ones by virtue of control flow, and x86 (for example) sets ZF based on it. Assuming that's OK, onto the DAG level. What I'd like to do is a little odd, but makes sense I think: 1. Keep ATOMIC_CMP_SWAP with its current semantics: strong cmpxchg, returning just the loaded value. 2. Add ATOMIC_CMP_SWAP_WITH_SUCCESS (better name suggestions welcome) that has a second i1 value (initially) as with the IR instruction. It's still a strong cmpxchg. 3. Expanding ATOMIC_CMP_SWAP_WITH_SUCCESS will yield an ATOMIC_CMP_SWAP and a SETCC. The advantage of this scheme is that existing targets can remain mostly unchanged if they're not interested in the extra work required. LL/SC architectures will be assumed to move over to something like the IR level pass already used by ARM if they want weak support (or they can add weak DAG nodes if they're masochists). x86 can switch its Custom handling over to the new ATOMIC_CMP_SWAP_WITH_SUCCESS to get rid of the vast majority of redundant comparisons coming out of C++ code with relative ease. I've attached two LLVM patches here, in case anyone wants something to actually play with. They're not quite ready yet, and Clang doesn't produce the right code obviously, but they give an idea of where I'd like to go and what the IR might look like: 1. The first patch adds "weak" in a fairly minimal set of places. 2. The second makes the x86 switch, which will get rid of the redundant comparisons. 3. A hypothetical 3rd patch (very simple) would actually make use of the "weak" information in the atomic expansion pass. What do people think of the idea? (I'm sending to llvmdev too; out of tree front-ends would need to be updated, and probably at least backend tests for cmpxchg). Cheers. Tim. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-IR-implement-read-write-of-cmpxchg-weak.patch Type: application/octet-stream Size: 103435 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140612/0b1cc835/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-Fix-X86-cmpxchg-codegen.patch Type: application/octet-stream Size: 5811 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140612/0b1cc835/attachment-0001.obj>
On Thu, Jun 12, 2014 at 5:22 PM, Tim Northover <t.p.northover at gmail.com> wrote:> 1. All cmpxchg instructions now return { iN, i1 } where the first > value is what we got before (the loaded result), the second == 1 if an > exchange took place. >Having to do extractvalue to get stuff out of these kinds of aggregates is... somewhat horrible... have you thought of any alternatives? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140612/f9d96965/attachment.html>
>> 1. All cmpxchg instructions now return { iN, i1 } where the first >> value is what we got before (the loaded result), the second == 1 if an >> exchange took place. > > Having to do extractvalue to get stuff out of these kinds of aggregates > is... somewhat horrible... have you thought of any alternatives?Well, sort of. There was that LL/SC thing from the other week, which could be made the canonical form of a weak cmpxchg (perhaps with an IRBuilder helper or something). But David's concerns about the "cmpxchg weak" -> LL/SC transformation being much easier than the reverse seem quite valid. I'm not sure even I want that any more. But within a single instruction, we've got two values to produce. If they're both part of the return then this is about the only option (that I'm aware of, anyway). If one's part of an input then it'd have to be a pointer like in the C++ interface; that would be horrific for CodeGen, it's just not what actually happens in any architecture I'm aware of. Did you have anything in mind? Cheers. Tim.
On Thu, Jun 12, 2014 at 5:50 PM, Chandler Carruth <chandlerc at google.com> wrote:> On Thu, Jun 12, 2014 at 5:22 PM, Tim Northover <t.p.northover at gmail.com> > wrote: > >> 1. All cmpxchg instructions now return { iN, i1 } where the first >> value is what we got before (the loaded result), the second == 1 if an >> exchange took place. >> > > Having to do extractvalue to get stuff out of these kinds of aggregates > is... somewhat horrible... have you thought of any alternatives?Having thought about this a bit, I don't see anything better. If we lived in a world w/o the select instruction, there might be a better design for this (and for the overflow observing arithmetic intrinsics). It is a bit sad, especially because I can't see a good reason to make the weak variant an intrinsic and force this through the call path... going back to the meat of the proposal... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140612/9a2467b1/attachment.html>
On Thu, Jun 12, 2014 at 5:22 PM, Tim Northover <t.p.northover at gmail.com> wrote:> I've decided the next step in atomics is the weak compare-and-exchange > operation. As with the failure order, I'm going t outline the > direction I'd like to take: > > 1. All cmpxchg instructions now return { iN, i1 } where the first > value is what we got before (the loaded result), the second == 1 if an > exchange took place. > 1. "weak" is an optional modifier to the cmpxchg instructions. If > anyone wants a bikeshed to paint, this would be a good one. I wasn't > sure myself. > > Theoretically, we could only modify the return type of the weak > variant, but I think that making it a global change helps keep the IR > uniform. Additionally, this extra bit actually exists on most > platforms even for the strong cmpxchg: on LL/SC ones by virtue of > control flow, and x86 (for example) sets ZF based on it. >Ok, I've mulled over all of the IR-level memory model changes. While I didn't *really* doubt that this was the right way to go, I'm increasingly happy with it. I specifically increasingly happy with the { iN, i1 } result. I do wish we had a better way in the IR to model this entire concept (which we see in the overflow arithmetic intrinsics as well) but we don't. With the current IR, this is clearly the best design (and I feel bad that I argued against it in the original design discussion years ago... ah, hindsight). If we ever figure out a better way to model this kind of operation, then we can reconsider all of them in light of that, but I'm not sure when or if that day will even arrive. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140613/fce1fd15/attachment.html>
OK, DAG... On Thu, Jun 12, 2014 at 5:22 PM, Tim Northover <t.p.northover at gmail.com> wrote:> Assuming that's OK, onto the DAG level. What I'd like to do is a > little odd, but makes sense I think: > > 1. Keep ATOMIC_CMP_SWAP with its current semantics: strong cmpxchg, > returning just the loaded value. > 2. Add ATOMIC_CMP_SWAP_WITH_SUCCESS (better name suggestions welcome) > that has a second i1 value (initially) as with the IR instruction. > It's still a strong cmpxchg. > 3. Expanding ATOMIC_CMP_SWAP_WITH_SUCCESS will yield an > ATOMIC_CMP_SWAP and a SETCC. > > The advantage of this scheme is that existing targets can remain > mostly unchanged if they're not interested in the extra work required. > > LL/SC architectures will be assumed to move over to something like the > IR level pass already used by ARM if they want weak support (or they > can add weak DAG nodes if they're masochists). x86 can switch its > Custom handling over to the new ATOMIC_CMP_SWAP_WITH_SUCCESS to get > rid of the vast majority of redundant comparisons coming out of C++ > code with relative ease. >So, I see where you're going here, but I'm curious -- why not just switch ATOMIC_CMP_SWAP to have a second i1 value, and still be strong? Is this *just* to support expanding? I wonder if that's really useful rather than just lowering it directly on the various targets.... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140613/63f2431c/attachment.html>
Hi Chandler,> So, I see where you're going here, but I'm curious -- why not just switch > ATOMIC_CMP_SWAP to have a second i1 value, and still be strong? Is this > *just* to support expanding? I wonder if that's really useful rather than > just lowering it directly on the various targets....I tried that originally, but quickly got into murky waters with all the targets I've got basically no clue about. It still could have been the best option (I started poking at Mips), but for one particularly nasty facet: you can select ATOMIC_CMP_SWAP in TableGen, but not ATOMIC_CMP_SWAP_WITH_SUCCESS (multiple results and all that). That means even getting selection back to the status quo would have been a lot of duplicated C++ code in multiple targets, all for what's basically a dead-end in most cases. It's far easier to make use of the i1 before the DAG level if possible. Anyway, that was my reasoning. Possibly more pragmatism than principle. Cheers. Tim.