Scott Michel
2007-Jul-09 17:33 UTC
[LLVMdev] Proposal for atomic and synchronization instructions
Torvald Riegel wrote:> Hi, > > I'd like to see support for something like this. I have some comments, and I > think there is existing work that you can reuse."reuse within the compiler."> "While the processor may spin and attempt the atomic operation more than once > before it is successful, research indicates this is extremely uncommon." > I don't understand this sentence, what do you mean?I'm not sure I can pinpoint the paper from which the statement is based, but I seem to recall something similar in the original LL-SC papers (Maurice Herlihy, DEC Western Research Labs?) It's a foundation for lock-free algorithms.> You probably don't need to require CAS (compare-and-set) to return the > previous value (I think some architectures don't), but just return a boolean > value (success/failure).compare and swap?> What are the reasons because of which you picked the Load/Store model for > barriers and not some other kind (e.g., acquire/release/...)?Chandler looked at what the various current LLVM architectures and summarized what he found. What he found are the memory barriers that the various processors support.> Did you have a look at the atomic_ops project? > http://www.hpl.hp.com/research/linux/atomic_ops/ > It already has implementations for several architectures and several > compilers. It uses a different consistency model (different set of > constraints for operations) and groups necessary memory barriers with > instructions (helpful on some architectures). It supports a few more > operations. The author (Hans Boehm) seems to also be active in the area of > C/C++ memory models (or some support for this).LLVM doesn't emit external library calls -- there is no "-lllvm" to which programs have to link, so adding an atomic operation library is likely to be a non-starter. LLVM is interested in emitting instructions to make atomic operations (and higher level concurrency primitives) possible, which is why Chandler's work is usefully important. -scooter
Christopher Lamb
2007-Jul-09 19:27 UTC
[LLVMdev] Proposal for atomic and synchronization instructions
On Jul 9, 2007, at 10:33 AM, Scott Michel wrote:> Torvald Riegel wrote:>> What are the reasons because of which you picked the Load/Store >> model for >> barriers and not some other kind (e.g., acquire/release/...)? > > Chandler looked at what the various current LLVM architectures and > summarized what he found. What he found are the memory barriers > that the > various processors support.That being said, it probably would not be difficult to add an optional pointer to the membarrier instructions. Thus one could map acquire/release semantics onto the membarrier, and it would be up to the various optimizations to conservatively determine whether code motion is legal given aliasing information. -- Christopher Lamb -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070709/18dc51d9/attachment.html>
Torvald Riegel
2007-Jul-09 19:39 UTC
[LLVMdev] Proposal for atomic and synchronization instructions
On Monday 09 July 2007 19:33, Scott Michel wrote:> Torvald Riegel wrote: > > Hi, > > > > I'd like to see support for something like this. I have some comments, > > and I think there is existing work that you can reuse. > > "reuse within the compiler."within the LLVM compiler framework, to be precise.> > > "While the processor may spin and attempt the atomic operation more than > > once before it is successful, research indicates this is extremely > > uncommon." I don't understand this sentence, what do you mean? > > I'm not sure I can pinpoint the paper from which the statement is based, > but I seem to recall something similar in the original LL-SC papers > (Maurice Herlihy, DEC Western Research Labs?) It's a foundation for > lock-free algorithms.Well, the statement says that often you have low contention. But that's something you want, not necessarily something you will get, and depends on the workload/algorithm. I'm missing the context. Is the actual statement as obvious as that you should try to use the atomic instructions offered by your processor, instead of doing blocking algorithms?> > You probably don't need to require CAS (compare-and-set) to return the > > previous value (I think some architectures don't), but just return a > > boolean value (success/failure). > > compare and swap?Well, do you need the swap, or is a compare-and-set sufficient most of the time? What do other architectures offer?> > > What are the reasons because of which you picked the Load/Store model for > > barriers and not some other kind (e.g., acquire/release/...)? > > Chandler looked at what the various current LLVM architectures and > summarized what he found. What he found are the memory barriers that the > various processors support.What you would want is to have a model that is (1) easy-to-use for the developers and (2) close to what the hardware offers. L/S membars are easy to use, but I think some architectures such as Itanium offer different membars with different costs. So if you pick the wrong model and have to use stronger membars (mfence Itanium) to implement your model, than you pay for that by decreased performance.> > > Did you have a look at the atomic_ops project? > > http://www.hpl.hp.com/research/linux/atomic_ops/ > > It already has implementations for several architectures and several > > compilers. It uses a different consistency model (different set of > > constraints for operations) and groups necessary memory barriers with > > instructions (helpful on some architectures). It supports a few more > > operations. The author (Hans Boehm) seems to also be active in the area > > of C/C++ memory models (or some support for this). > > LLVM doesn't emit external library calls -- there is no "-lllvm" to > which programs have to link, so adding an atomic operation library is > likely to be a non-starter. LLVM is interested in emitting instructions > to make atomic operations (and higher level concurrency primitives) > possible, which is why Chandler's work is usefully important.Please have a real look at atomic_ops first. It does have a library part to it -- but that's just for a nonblocking stack. All the atomic operations are macros and asm for the specific compiler/architecture pairs. So if you reuse that in the LLVM code generators, you save one large part of the work. Of course you can redo all this work, surely giving you a very fast start... Second, I guess there has been some serious effort put into selecting the specific model. So, for example, if you look at some of Hans' published slides etc., there are some arguments in favor of associating membars with specific instructions. Do you know reasons why LLVM shouldn't do this? Has anyone looked at the memory models that are being in discussion for C/C++? Although there is no consensus yet AFAIK, it should be good for LLVM to stay close. And please observe that I didn't state that the work is not important or not useful. We should just strive to select the best model we have, and reuse work if we can. And if we can reuse a tested implementation and model, that is a good thing. Torvald
Chandler Carruth
2007-Jul-09 20:36 UTC
[LLVMdev] Proposal for atomic and synchronization instructions
> > > "While the processor may spin and attempt the atomic operation more than > > > once before it is successful, research indicates this is extremely > > > uncommon." I don't understand this sentence, what do you mean? > > > > I'm not sure I can pinpoint the paper from which the statement is based, > > but I seem to recall something similar in the original LL-SC papers > > (Maurice Herlihy, DEC Western Research Labs?) It's a foundation for > > lock-free algorithms. > > Well, the statement says that often you have low contention. But that's > something you want, not necessarily something you will get, and depends on > the workload/algorithm. I'm missing the context. Is the actual statement as > obvious as that you should try to use the atomic instructions offered by your > processor, instead of doing blocking algorithms?LL/SC is not a blocking algorithm. I'm going to be changing some of the nomenclature on the page to reflect this, but while it spins, it does not actually lock. The idea (as I understand it, and I'm still hoping Scott can find the reference he gave me to a DEC paper outlining this) is that the spin only occurs if another process does something breaking atomicity for that _particular_ LL/SC pairing. Even when the spin occurs, it should only occur until that particular process gets through the op without interruption. This needs some statistical analysis however, and hopefully the research in the literature can be located and referenced.> > > > You probably don't need to require CAS (compare-and-set) to return the > > > previous value (I think some architectures don't), but just return a > > > boolean value (success/failure). > > > > compare and swap? > > Well, do you need the swap, or is a compare-and-set sufficient most of the > time? What do other architectures offer?All of the architectures offer swap, or have no penalty for swapping. This allows much easier algorithm development to my mind, as you have the best of both worlds -- success/failure information, and the value from memory.> > > What are the reasons because of which you picked the Load/Store model for > > > barriers and not some other kind (e.g., acquire/release/...)? > > > > Chandler looked at what the various current LLVM architectures and > > summarized what he found. What he found are the memory barriers that the > > various processors support. > > What you would want is to have a model that is (1) easy-to-use for the > developers and (2) close to what the hardware offers. L/S membars are easy to > use, but I think some architectures such as Itanium offer different membars > with different costs. So if you pick the wrong model and have to use stronger > membars (mfence Itanium) to implement your model, than you pay for that by > decreased performance.Itanium was the only architecture to offer these semantics, while the L/S membars are offered to varying levels of detail on several architectures. As Itanium is not yet a fully functional target, it was not prioritized. Moreover, as the only instructions (to my knowledge) on Itanium to have memory synchronization components are cmpxchg and fetchadd, these could be implemented correctly when implementing the lowering for the instructions in this proposal, while still providing full memory barriers when needed outside of the atomic instructions. If there is serious demand for building memory semantics into the atomic instructions, "aquire" and "release" flags could be used, and implementations appropriately handle them. This doesn't seem to anull the need for non-operation-based memory barriers.> > > Did you have a look at the atomic_ops project? > > > http://www.hpl.hp.com/research/linux/atomic_ops/ > > > It already has implementations for several architectures and several > > > compilers. It uses a different consistency model (different set of > > > constraints for operations) and groups necessary memory barriers with > > > instructions (helpful on some architectures). It supports a few more > > > operations. The author (Hans Boehm) seems to also be active in the area > > > of C/C++ memory models (or some support for this). > > > > LLVM doesn't emit external library calls -- there is no "-lllvm" to > > which programs have to link, so adding an atomic operation library is > > likely to be a non-starter. LLVM is interested in emitting instructions > > to make atomic operations (and higher level concurrency primitives) > > possible, which is why Chandler's work is usefully important. > > Please have a real look at atomic_ops first. It does have a library part to > it -- but that's just for a nonblocking stack. > > All the atomic operations are macros and asm for the specific > compiler/architecture pairs. > So if you reuse that in the LLVM code generators, you save one large part of > the work. Of course you can redo all this work, surely giving you a very fast > start...The implementations for the current proposal came from architecture manuals, the Linux kernel, and the Apache Portable Runtime. I will definitely be looking at the atomic_ops implementations to see if there are improvements that can be made, but ultimately this provides another model, but not a re-usable component as this must be done through codegen at some point.> Second, I guess there has been some serious effort put into selecting the > specific model. So, for example, if you look at some of Hans' published > slides etc., there are some arguments in favor of associating membars with > specific instructions. Do you know reasons why LLVM shouldn't do this?My reason for not associating them is due to the majority of hardware implementations not associating them. The override motive was to remain very close to the hardware. Could libraries and intrinsic functions in the FE provide these different interfaces to the constructs?> Has anyone looked at the memory models that are being in discussion for C/C++? > Although there is no consensus yet AFAIK, it should be good for LLVM to stay > close.Not that LLVM should shun C/C++, but those aren't its only languages. I think its better to approach the problem from the hardware, than the language. This keeps LLVM an accurate layer for expressing hardware operations, and allows languages to translate their constructs to appropriately use the hardware.-> And please observe that I didn't state that the work is not important or not > useful. We should just strive to select the best model we have, and reuse > work if we can. And if we can reuse a tested implementation and model, that > is a good thing.I absolutely agree. This is why every aspect of the current proposal came from a hardware representation, combined with the Linux kernel representations. We deviated toward the hardware to ensure the ability of these intrinsics to fully exploit and expose the hardware capabilities while remaining lowerable (if that were a word) across all targets. Does this clarify some? I am quite open to trying to add support for Itanium-style hardware representations if this is a significant issue, it was simply not a priority, and not a problem that I well understand. (The Linux kernel does not use these semantics that I could find, but then it may not be the best example.) -Chandler Carruth> > Torvald > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Scott Michel
2007-Jul-09 23:38 UTC
[LLVMdev] Proposal for atomic and synchronization instructions
Torvald Riegel wrote:> On Monday 09 July 2007 19:33, Scott Michel wrote: >> Torvald Riegel wrote: >>> Hi, >>> >>> I'd like to see support for something like this. I have some comments, >>> and I think there is existing work that you can reuse. >> "reuse within the compiler." > > within the LLVM compiler framework, to be precise. > >>> "While the processor may spin and attempt the atomic operation more than >>> once before it is successful, research indicates this is extremely >>> uncommon." I don't understand this sentence, what do you mean? >> I'm not sure I can pinpoint the paper from which the statement is based, >> but I seem to recall something similar in the original LL-SC papers >> (Maurice Herlihy, DEC Western Research Labs?) It's a foundation for >> lock-free algorithms. > > Well, the statement says that often you have low contention. But that's > something you want, not necessarily something you will get, and depends on > the workload/algorithm. I'm missing the context. Is the actual statement as > obvious as that you should try to use the atomic instructions offered by your > processor, instead of doing blocking algorithms?As Chandler pointed out, LL/SC isn't blocking. It belongs to the optimistic concurrency class of constructs. One of the earliest papers (IIRC, the first paper) on LL/SC was: Herlihy, M. 1993. A methodology for implementing highly concurrent data objects. ACM Trans. Program. Lang. Syst. 15, 5 (Nov. 1993), 745-770. DOI= http://doi.acm.org/10.1145/161468.161469 LL/SC on the various RISC architectures are used for spin locks, but they don't have to be used that way. I suspect that current work on software transactional memory is LL/SC-like on memory regions -- if you look at the paper, there is a chunk of code in the examples that rolls back or restarts a computation if the SC operation fails.> Please have a real look at atomic_ops first. It does have a library part to > it -- but that's just for a nonblocking stack.It's a lot like Apple's (and gcc's) work to reconcile the Intel and PPC vector intrinsics. Nice work but an unnecessary dependency, in my personal and not so humble opinion.> Second, I guess there has been some serious effort put into selecting the > specific model. So, for example, if you look at some of Hans' published > slides etc., there are some arguments in favor of associating membars with > specific instructions. Do you know reasons why LLVM shouldn't do this?You mean the papers that don't have to do with garbage collection? :-) Seriously, I think that's the overall purpose for some of this work so that llvm can do a better job in instruction-level parallelism.> Has anyone looked at the memory models that are being in discussion for C/C++? > Although there is no consensus yet AFAIK, it should be good for LLVM to stay > close.Even when consensus is achieved, it still has to be implemented on the hardware. As you point out, LL/SC is used to create spinlocks. But LL/SC is somewhat more powerful than that. -scooter
Chris Lattner
2007-Jul-10 19:01 UTC
[LLVMdev] Proposal for atomic and synchronization instructions
On Mon, 9 Jul 2007, Torvald Riegel wrote:>>> http://www.hpl.hp.com/research/linux/atomic_ops/ >>> It already has implementations for several architectures and several >>> compilers. It uses a different consistency model (different set of >>> constraints for operations) and groups necessary memory barriers with >>> instructions (helpful on some architectures). It supports a few more >>> operations. The author (Hans Boehm) seems to also be active in the area >>> of C/C++ memory models (or some support for this). >> >> LLVM doesn't emit external library calls -- there is no "-lllvm" to >> which programs have to link, so adding an atomic operation library is >> likely to be a non-starter. LLVM is interested in emitting instructions >> to make atomic operations (and higher level concurrency primitives) >> possible, which is why Chandler's work is usefully important. > > Please have a real look at atomic_ops first. It does have a library part to > it -- but that's just for a nonblocking stack. > > All the atomic operations are macros and asm for the specific > compiler/architecture pairs. > So if you reuse that in the LLVM code generators, you save one large part of > the work. Of course you can redo all this work, surely giving you a very fast > start...The atomic_ops library does look very interesting. Parts of its model could be adapted, and the various operations seem like they would turn into a few inlined instructions each (not libcalls). -Chris -- http://nondot.org/sabre/ http://llvm.org/
Reasonably Related Threads
- [LLVMdev] Proposal for atomic and synchronization instructions
- [LLVMdev] Proposal for atomic and synchronization instructions
- [LLVMdev] Atomic Operation and Synchronization Proposal v2
- [LLVMdev] Atomic Operation and Synchronization Proposal v2
- [LLVMdev] Proposal for atomic and synchronization instructions