Looking through the various architectures, it seems that the minimal approach to atomic intrinsics isn't necessarily the best. If we assume CAS and atomic add, then we can implement atomic N, where n is some other operation with a loop. however, for the ll/sc architectures, this will lower into a double loop (the outer loop of load-op-CAS and the CAS loop. On such archs, the atomic op can be done as one loop. To generate the best code, we would have to recognize loops that equated to atomic N, and raise them to a more efficient implementation. The alternative is to implement atomic N for all the Ns in gcc's atomic ops, and let all the ll/sc archs generate efficient code easily, and just lower to a loop for x86, sparc, and ia64. Which is a long way to ask, what do people think design wise? Should we have a large set of atomic ops that most platforms support natively and the couple that don't can easily lower, or have a minimal set and try to raise the lowered gcc atomic ops to efficient code on archs that support ll/sc (essentially trying to recognize the ld, op, CAS loops during codegen). Andrew
Andrew Lenharth wrote:> Looking through the various architectures, it seems that the minimal > approach to atomic intrinsics isn't necessarily the best. > > If we assume CAS and atomic add, then we can implement atomic N, where > n is some other operation with a loop. however, for the ll/sc > architectures, this will lower into a double loop (the outer loop of > load-op-CAS and the CAS loop. On such archs, the atomic op can be > done as one loop. To generate the best code, we would have to > recognize loops that equated to atomic N, and raise them to a more > efficient implementation. The alternative is to implement atomic N > for all the Ns in gcc's atomic ops, and let all the ll/sc archs > generate efficient code easily, and just lower to a loop for x86, > sparc, and ia64. >Most of these atomic operations for the GCC builtins seem to be variations of fetch_and_phi where phi is some integer or bitwise operation (and, or, add, sub, inc, dec, etc). There are relatively few of them, and they'd all be nearly identical to the fetch_and_add implement ion on LL/SC architectures, so development effort is minimal. I'd say implement them all (or exclude only those that get very, very little usage). There aren't that many atomic different kinds of atomic builtins, and the analysis to raise atomic op loops looks like a lot of effort. -- John T.> Which is a long way to ask, what do people think design wise? Should > we have a large set of atomic ops that most platforms support natively > and the couple that don't can easily lower, or have a minimal set and > try to raise the lowered gcc atomic ops to efficient code on archs > that support ll/sc (essentially trying to recognize the ld, op, CAS > loops during codegen). > > Andrew > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On 3/3/08, Chris Lattner <sabre at nondot.org> wrote:> On Mon, 3 Mar 2008, Andrew Lenharth wrote: > > we have a large set of atomic ops that most platforms support natively > > and the couple that don't can easily lower, or have a minimal set and > > try to raise the lowered gcc atomic ops to efficient code on archs > > that support ll/sc (essentially trying to recognize the ld, op, CAS > > loops during codegen). > > > I'd suggest starting with a minimal set. It's easier to add things lazily > as needed than it is to take things out that end up not being needed.Right, that is what is done. And they are sufficient, just not easy to make efficient. But I mostly agree and we can wait until the PPC people complain that their locks are too slow. Andrew
On Mon, 3 Mar 2008, Andrew Lenharth wrote:> we have a large set of atomic ops that most platforms support natively > and the couple that don't can easily lower, or have a minimal set and > try to raise the lowered gcc atomic ops to efficient code on archs > that support ll/sc (essentially trying to recognize the ld, op, CAS > loops during codegen).I'd suggest starting with a minimal set. It's easier to add things lazily as needed than it is to take things out that end up not being needed. -Chris -- http://nondot.org/sabre/ http://llvm.org/