thr3ads.net - llvm dev - [LLVMdev] Proposal for atomic and synchronization instructions [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Scott Michel

2007-Jul-09 17:33 UTC

[LLVMdev] Proposal for atomic and synchronization instructions

Torvald Riegel wrote:> Hi,
> 
> I'd like to see support for something like this. I have some comments,
and I
> think there is existing work that you can reuse.
"reuse within the compiler."
> "While the processor may spin and attempt the atomic operation more
than once
> before it is successful, research indicates this is extremely
uncommon."
> I don't understand this sentence, what do you mean?
I'm not sure I can pinpoint the paper from which the statement is based,
but I seem to recall something similar in the original LL-SC papers
(Maurice Herlihy, DEC Western Research Labs?) It's a foundation for
lock-free algorithms.
> You probably don't need to require CAS (compare-and-set) to return the 
> previous value (I think some architectures don't), but just return a
boolean
> value (success/failure).
compare and swap?
> What are the reasons because of which you picked the Load/Store model for 
> barriers and not some other kind (e.g., acquire/release/...)?
Chandler looked at what the various current LLVM architectures and
summarized what he found. What he found are the memory barriers that the
various processors support.
> Did you have a look at the atomic_ops project?
> http://www.hpl.hp.com/research/linux/atomic_ops/
> It already has implementations for several architectures and several 
> compilers. It uses a different consistency model (different set of 
> constraints for operations) and groups necessary memory barriers with 
> instructions (helpful on some architectures). It supports a few more 
> operations. The author (Hans Boehm) seems to also be active in the area of 
> C/C++ memory models (or some support for this).
LLVM doesn't emit external library calls -- there is no "-lllvm"
to
which programs have to link, so adding an atomic operation library is
likely to be a non-starter. LLVM is interested in emitting instructions
to make atomic operations (and higher level concurrency primitives)
possible, which is why Chandler's work is usefully important.


-scooter

Christopher Lamb

2007-Jul-09 19:27 UTC

head link

[LLVMdev] Proposal for atomic and synchronization instructions

On Jul 9, 2007, at 10:33 AM, Scott Michel wrote:
> Torvald Riegel wrote:
>> What are the reasons because of which you picked the Load/Store  
>> model for
>> barriers and not some other kind (e.g., acquire/release/...)?
>
> Chandler looked at what the various current LLVM architectures and
> summarized what he found. What he found are the memory barriers  
> that the
> various processors support.
That being said, it probably would not be difficult to add an  
optional pointer to the membarrier instructions. Thus one could map  
acquire/release semantics onto the membarrier, and it would be up to  
the various optimizations to conservatively determine whether code  
motion is legal given aliasing information.

--
Christopher Lamb



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20070709/18dc51d9/attachment.html>

Torvald Riegel

2007-Jul-09 19:39 UTC

head link

[LLVMdev] Proposal for atomic and synchronization instructions

On Monday 09 July 2007 19:33, Scott Michel wrote:> Torvald Riegel wrote:
> > Hi,
> >
> > I'd like to see support for something like this. I have some
comments,
> > and I think there is existing work that you can reuse.
>
> "reuse within the compiler."
within the LLVM compiler framework, to be precise.
>
> > "While the processor may spin and attempt the atomic operation
more than
> > once before it is successful, research indicates this is extremely
> > uncommon." I don't understand this sentence, what do you
mean?
>
> I'm not sure I can pinpoint the paper from which the statement is
based,
> but I seem to recall something similar in the original LL-SC papers
> (Maurice Herlihy, DEC Western Research Labs?) It's a foundation for
> lock-free algorithms.
Well, the statement says that often you have low contention. But that's 
something you want, not necessarily something you will get, and depends on 
the workload/algorithm. I'm missing the context. Is the actual statement as 
obvious as that you should try to use the atomic instructions offered by your 
processor, instead of doing blocking algorithms?
> > You probably don't need to require CAS (compare-and-set) to return
the
> > previous value (I think some architectures don't), but just return
a
> > boolean value (success/failure).
>
> compare and swap?
Well, do you need the swap, or is a compare-and-set sufficient most of the 
time? What do other architectures offer?
>
> > What are the reasons because of which you picked the Load/Store model
for
> > barriers and not some other kind (e.g., acquire/release/...)?
>
> Chandler looked at what the various current LLVM architectures and
> summarized what he found. What he found are the memory barriers that the
> various processors support.
What you would want is to have a model that is (1) easy-to-use for the 
developers and (2) close to what the hardware offers. L/S membars are easy to 
use, but I think some architectures such as Itanium offer different membars 
with different costs. So if you pick the wrong model and have to use stronger 
membars (mfence Itanium) to implement your model, than you pay for that by 
decreased performance.
>
> > Did you have a look at the atomic_ops project?
> > http://www.hpl.hp.com/research/linux/atomic_ops/
> > It already has implementations for several architectures and several
> > compilers. It uses a different consistency model (different set of
> > constraints for operations) and groups necessary memory barriers with
> > instructions (helpful on some architectures). It supports a few more
> > operations. The author (Hans Boehm) seems to also be active in the
area
> > of C/C++ memory models (or some support for this).
>
> LLVM doesn't emit external library calls -- there is no
"-lllvm" to
> which programs have to link, so adding an atomic operation library is
> likely to be a non-starter. LLVM is interested in emitting instructions
> to make atomic operations (and higher level concurrency primitives)
> possible, which is why Chandler's work is usefully important.
Please have a real look at atomic_ops first. It does have a library part to 
it -- but that's just for a nonblocking stack.

All the atomic operations are macros and asm for the specific 
compiler/architecture pairs. 
So if you reuse that in the LLVM code generators, you save one large part of 
the work. Of course you can redo all this work, surely giving you a very fast 
start...

Second, I guess there has been some serious effort put into selecting the 
specific model. So, for example, if you look at some of Hans' published 
slides etc., there are some arguments in favor of associating membars with 
specific instructions. Do you know reasons why LLVM shouldn't do this?

Has anyone looked at the memory models that are being in discussion for C/C++? 
Although there is no consensus yet AFAIK, it should be good for LLVM to stay 
close.

And please observe that I didn't state that the work is not important or not
useful. We should just strive to select the best model we have, and reuse 
work if we can. And if we can reuse a tested implementation and model, that 
is a good thing.

Torvald

Chandler Carruth

2007-Jul-09 20:36 UTC

head link

[LLVMdev] Proposal for atomic and synchronization instructions

> > > "While the processor may spin and attempt the atomic
operation more than
> > > once before it is successful, research indicates this is
extremely
> > > uncommon." I don't understand this sentence, what do you
mean?
> >
> > I'm not sure I can pinpoint the paper from which the statement is
based,
> > but I seem to recall something similar in the original LL-SC papers
> > (Maurice Herlihy, DEC Western Research Labs?) It's a foundation
for
> > lock-free algorithms.
>
> Well, the statement says that often you have low contention. But that's
> something you want, not necessarily something you will get, and depends on
> the workload/algorithm. I'm missing the context. Is the actual
statement as
> obvious as that you should try to use the atomic instructions offered by
your
> processor, instead of doing blocking algorithms?
LL/SC is not a blocking algorithm. I'm going to be changing some of
the nomenclature on the page to reflect this, but while it spins, it
does not actually lock. The idea (as I understand it, and I'm still
hoping Scott can find the reference he gave me to a DEC paper
outlining this) is that the spin only occurs if another process does
something breaking atomicity for that _particular_ LL/SC pairing. Even
when the spin occurs, it should only occur until that particular
process gets through the op without interruption. This needs some
statistical analysis however, and hopefully the research in the
literature can be located and referenced.
>
> > > You probably don't need to require CAS (compare-and-set) to
return the
> > > previous value (I think some architectures don't), but just
return a
> > > boolean value (success/failure).
> >
> > compare and swap?
>
> Well, do you need the swap, or is a compare-and-set sufficient most of the
> time? What do other architectures offer?
All of the architectures offer swap, or have no penalty for swapping.
This allows much easier algorithm development to my mind, as you have
the best of both worlds -- success/failure information, and the value
from memory.
> > > What are the reasons because of which you picked the Load/Store
model for
> > > barriers and not some other kind (e.g., acquire/release/...)?
> >
> > Chandler looked at what the various current LLVM architectures and
> > summarized what he found. What he found are the memory barriers that
the
> > various processors support.
>
> What you would want is to have a model that is (1) easy-to-use for the
> developers and (2) close to what the hardware offers. L/S membars are easy
to
> use, but I think some architectures such as Itanium offer different membars
> with different costs. So if you pick the wrong model and have to use
stronger
> membars (mfence Itanium) to implement your model, than you pay for that by
> decreased performance.
Itanium was the only architecture to offer these semantics, while the
L/S membars are offered to varying levels of detail on several
architectures. As Itanium is not yet a fully functional target, it was
not prioritized.

Moreover, as the only instructions (to my knowledge) on Itanium to
have memory synchronization components are cmpxchg and fetchadd, these
could be implemented correctly when implementing the lowering for the
instructions in this proposal, while still providing full memory
barriers when needed outside of the atomic instructions. If there is
serious demand for building memory semantics into the atomic
instructions, "aquire" and "release" flags could be used,
and
implementations appropriately handle them. This doesn't seem to anull
the need for non-operation-based memory barriers.
> > > Did you have a look at the atomic_ops project?
> > > http://www.hpl.hp.com/research/linux/atomic_ops/
> > > It already has implementations for several architectures and
several
> > > compilers. It uses a different consistency model (different set
of
> > > constraints for operations) and groups necessary memory barriers
with
> > > instructions (helpful on some architectures). It supports a few
more
> > > operations. The author (Hans Boehm) seems to also be active in
the area
> > > of C/C++ memory models (or some support for this).
> >
> > LLVM doesn't emit external library calls -- there is no
"-lllvm" to
> > which programs have to link, so adding an atomic operation library is
> > likely to be a non-starter. LLVM is interested in emitting
instructions
> > to make atomic operations (and higher level concurrency primitives)
> > possible, which is why Chandler's work is usefully important.
>
> Please have a real look at atomic_ops first. It does have a library part to
> it -- but that's just for a nonblocking stack.
>
> All the atomic operations are macros and asm for the specific
> compiler/architecture pairs.
> So if you reuse that in the LLVM code generators, you save one large part
of
> the work. Of course you can redo all this work, surely giving you a very
fast
> start...
The implementations for the current proposal came from architecture
manuals, the Linux kernel, and the Apache Portable Runtime. I will
definitely be looking at the atomic_ops implementations to see if
there are improvements that can be made, but ultimately this provides
another model, but not a re-usable component as this must be done
through codegen at some point.
> Second, I guess there has been some serious effort put into selecting the
> specific model. So, for example, if you look at some of Hans' published
> slides etc., there are some arguments in favor of associating membars with
> specific instructions. Do you know reasons why LLVM shouldn't do this?
My reason for not associating them is due to the majority of hardware
implementations not associating them. The override motive was to
remain very close to the hardware. Could libraries and intrinsic
functions in the FE provide these different interfaces to the
constructs?
> Has anyone looked at the memory models that are being in discussion for
C/C++?
> Although there is no consensus yet AFAIK, it should be good for LLVM to
stay
> close.
Not that LLVM should shun C/C++, but those aren't its only languages.
I think its better to approach the problem from the hardware, than the
language. This keeps LLVM an accurate layer for expressing hardware
operations, and allows languages to translate their constructs to
appropriately use the hardware.-
> And please observe that I didn't state that the work is not important
or not
> useful. We should just strive to select the best model we have, and reuse
> work if we can. And if we can reuse a tested implementation and model, that
> is a good thing.
I absolutely agree. This is why every aspect of the current proposal
came from a hardware representation, combined with the Linux kernel
representations. We deviated toward the hardware to ensure the ability
of these intrinsics to fully exploit and expose the hardware
capabilities while remaining lowerable (if that were a word) across
all targets.

Does this clarify some? I am quite open to trying to add support for
Itanium-style hardware representations if this is a significant issue,
it was simply not a priority, and not a problem that I well
understand. (The Linux kernel does not use these semantics that I
could find, but then it may not be the best example.)

-Chandler Carruth
>
> Torvald
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Scott Michel

2007-Jul-09 23:38 UTC

head link

[LLVMdev] Proposal for atomic and synchronization instructions

Torvald Riegel wrote:> On Monday 09 July 2007 19:33, Scott Michel wrote:
>> Torvald Riegel wrote:
>>> Hi,
>>>
>>> I'd like to see support for something like this. I have some
comments,
>>> and I think there is existing work that you can reuse.
>> "reuse within the compiler."
> 
> within the LLVM compiler framework, to be precise.
> 
>>> "While the processor may spin and attempt the atomic operation
more than
>>> once before it is successful, research indicates this is extremely
>>> uncommon." I don't understand this sentence, what do you
mean?
>> I'm not sure I can pinpoint the paper from which the statement is
based,
>> but I seem to recall something similar in the original LL-SC papers
>> (Maurice Herlihy, DEC Western Research Labs?) It's a foundation for
>> lock-free algorithms.
> 
> Well, the statement says that often you have low contention. But that's
> something you want, not necessarily something you will get, and depends on 
> the workload/algorithm. I'm missing the context. Is the actual
statement as
> obvious as that you should try to use the atomic instructions offered by
your
> processor, instead of doing blocking algorithms?
As Chandler pointed out, LL/SC isn't blocking. It belongs to the
optimistic concurrency class of constructs. One of the earliest papers
(IIRC, the first paper) on LL/SC was:

Herlihy, M. 1993. A methodology for implementing highly concurrent data
objects. ACM Trans. Program. Lang. Syst. 15, 5 (Nov. 1993), 745-770.
DOI= http://doi.acm.org/10.1145/161468.161469

LL/SC on the various RISC architectures are used for spin locks, but
they don't have to be used that way. I suspect that current work on
software transactional memory is LL/SC-like on memory regions -- if you
look at the paper, there is a chunk of code in the examples that rolls
back or restarts a computation if the SC operation fails.
> Please have a real look at atomic_ops first. It does have a library part to
> it -- but that's just for a nonblocking stack.
It's a lot like Apple's (and gcc's) work to reconcile the Intel and
PPC
vector intrinsics. Nice work but an unnecessary dependency, in my
personal and not so humble opinion.
> Second, I guess there has been some serious effort put into selecting the 
> specific model. So, for example, if you look at some of Hans' published
> slides etc., there are some arguments in favor of associating membars with 
> specific instructions. Do you know reasons why LLVM shouldn't do this?
You mean the papers that don't have to do with garbage collection? :-)

Seriously, I think that's the overall purpose for some of this work so
that llvm can do a better job in instruction-level parallelism.
> Has anyone looked at the memory models that are being in discussion for
C/C++?
> Although there is no consensus yet AFAIK, it should be good for LLVM to
stay
> close.
Even when consensus is achieved, it still has to be implemented on the
hardware. As you point out, LL/SC is used to create spinlocks. But LL/SC
is somewhat more powerful than that.



-scooter

Chris Lattner

2007-Jul-10 19:01 UTC

head link

[LLVMdev] Proposal for atomic and synchronization instructions

On Mon, 9 Jul 2007, Torvald Riegel wrote:>>> http://www.hpl.hp.com/research/linux/atomic_ops/
>>> It already has implementations for several architectures and
several
>>> compilers. It uses a different consistency model (different set of
>>> constraints for operations) and groups necessary memory barriers
with
>>> instructions (helpful on some architectures). It supports a few
more
>>> operations. The author (Hans Boehm) seems to also be active in the
area
>>> of C/C++ memory models (or some support for this).
>>
>> LLVM doesn't emit external library calls -- there is no
"-lllvm" to
>> which programs have to link, so adding an atomic operation library is
>> likely to be a non-starter. LLVM is interested in emitting instructions
>> to make atomic operations (and higher level concurrency primitives)
>> possible, which is why Chandler's work is usefully important.
>
> Please have a real look at atomic_ops first. It does have a library part to
> it -- but that's just for a nonblocking stack.
>
> All the atomic operations are macros and asm for the specific
> compiler/architecture pairs.
> So if you reuse that in the LLVM code generators, you save one large part
of
> the work. Of course you can redo all this work, surely giving you a very
fast
> start...
The atomic_ops library does look very interesting.  Parts of its model 
could be adapted, and the various operations seem like they would turn 
into a few inlined instructions each (not libcalls).

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Jul 2007 - [LLVMdev] Proposal for atomic and synchronization instructions

[LLVMdev] Proposal for atomic and synchronization instructions

[LLVMdev] Proposal for atomic and synchronization instructions

[LLVMdev] Proposal for atomic and synchronization instructions

[LLVMdev] Proposal for atomic and synchronization instructions

[LLVMdev] Proposal for atomic and synchronization instructions

[LLVMdev] Proposal for atomic and synchronization instructions

Reasonably Related Threads