Jeffrey Yasskin
2011-Oct-13 06:44 UTC
[LLVMdev] Are x86/ARM likely to support atomics larger than 2 pointers?
There's a discussion over on cfe-commits about how future-proof to make the C1x/C++11 atomic ABI. (http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20111010/047647.html) One argument is that, because C ABI changes are painful, and processors may introduce larger atomic operations in the future, we should try to design the atomics implementation in such a way that it can take advantage of future instruction sets without needing an ABI change. The other argument (apologies if I misstate this) is that atomics larger than 2 pointers aren't useful, so we shouldn't make anything more expensive than today's implementation needs, just to support hypothetical instructions that processors may never implement. If any of the processor designers on this list want to chime in, this would be a good time to do so, so the wrong decision doesn't get written in stone until the next ABI change. Thanks, Jeffrey
John McCall
2011-Oct-13 08:22 UTC
[LLVMdev] Are x86/ARM likely to support atomics larger than 2 pointers?
On Oct 12, 2011, at 11:44 PM, Jeffrey Yasskin wrote:> There's a discussion over on cfe-commits about how future-proof to > make the C1x/C++11 atomic ABI. > (http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20111010/047647.html) > > One argument is that, because C ABI changes are painful, and > processors may introduce larger atomic operations in the future, we > should try to design the atomics implementation in such a way that it > can take advantage of future instruction sets without needing an ABI > change. > > The other argument (apologies if I misstate this) is that atomics > larger than 2 pointers aren't useful, so we shouldn't make anything > more expensive than today's implementation needs, just to support > hypothetical instructions that processors may never implement.This is more-or-less my argument, but allow me the privilege of restating. First, the basic rules of the language. For essentially any type T, there is a type _Atomic(T). Objects of this type guarantee atomic access to the value, using one of four operations: load, store, exchange, and compare-and-exchange. Since T is arbitrary, at least some types will require the use of locks "under the hood" — consider a 2k struct. Obviously, we don't want to use locks if we don't have to. _Atomic(T) is allowed to be larger and/or more aligned than T, which gives us some flexibility. It's also illegal to access an _Atomic(T) object except through specific functions, which gives us some more flexibility. So that's good. In an unchanging world, the implementation choices here would be completely obvious: 1. The target processor can natively support the four atomic operations on certain operand sizes, given adequate alignment. 2. If T isn't too large for all of those, we give _Atomic(T) the size and alignment of the best match, and we directly emit the appropriate instructions for the operations. 3a. Otherwise, we're going to need a lock, so we give _Atomic(T) the normal size and alignment of T, and we issue calls to the C runtime library to do all the operations for us under a global lock (probably a striped spin lock). But the world isn't unchanging; several major processor ISAs, including x86-32, x86-64, and ARM, are all regularly extended with new instructions. So now (3a) isn't necessarily the right thing to do: suppose that ARM develops a new "Wide Atomics Extension" (WAE), and now armv13 chips can do lock-free operations on 32-byte operands if they're 32-byte-aligned. We can make the C runtime functions on WAE-compliant systems check for 32-byte operands that are also 32-byte-aligned and just use the new instructions, but if the compiler isn't making objects large enough or aligned enough, that might not kick in, and then we'll be stuck using locks in situations where it's ideally unnecessary. So there's an alternative proposal: 3b. If T is small enough that it's plausible that the ISA might grow new atomic operations for it, then we should make _Atomic(T) an adequate size and alignment for those operations. Specifically, we should do this for sizes 16, 32, and 64, as it's plausible that atomics might grow to a full cache line, but no larger. Okay, now the arguments. I see them like this: A. We have to make a decision. We can't make _Atomic(T) larger or more aligned for an existing type T without changing the ABI. A1. However, if a language extension adds a new type, that type can be given new rules. B. If sizeof(T) isn't a power of 2, (3a) makes _Atomic(T) smaller than it would be under (3b). C. (3a) makes _Atomic(T) less aligned than it would be under (3b), reducing the amount of wasted space when it's embedded in a struct. D. Space usage is also important for performance, so (B) and (C) are bad. They're particularly unconscionable if they don't gain us anything. E. Under low contention, a good spin-locking implementation is probably slower than native atomic operations by ~3-4x. That's bad, but it's still quite cheap in the large scale of things. F. Making the C runtime functions check for WAE-compatible operands is not free. G. (3b) has no advantage at all if the ISA never grows something like WAE. H. I don't see as likely that anybody would implement something like WAE. H1. For one, I don't know of any current chips at all that support atomics on operands larger than two pointers, except transactional-memory chips that obviously don't have this problem in the first place. H2. Nor do I know of any plans for such an extension, or even serious proposals for one. H3. I also don't know of anyone who would find this particularly useful; having (say) a locklessly atomic quartet of pointers is not obviously more powerful than having a locklessly atomic pair of pointers. So the risk of making this less efficient is not terribly disappointing to me. I. It is easy enough to add a language extension, say a new attribute, which changes a specific _Atomic(T) lock-free even if the ABI says they're generally not. So in the final analysis, I think future-proofing for the possibility of WAE would waste memory for no reasonable prospect of gain. John.