All, I was wondering if the following operations could be implemented atomically in LLVM (a la AtomicRMW): Multiplication, Division, Remainder, Bit Shifting, or Logical Not (this could be implemented by xor with a value of all 1's). Is there a reason implementation-wise that they are not there? Additionally, would it be possible to create something similar to AtomicRMW that returned the new value instead of the old one? If not, I would be interested in learning why. Thanks, Billy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140102/0d4a7849/attachment.html>
On 01/02/2014 07:40 PM, William Moses wrote:> All, > > I was wondering if the following operations could be implemented > atomically in LLVM (a la AtomicRMW): Multiplication, Division, > Remainder, Bit Shifting, or Logical Not (this could be implemented by > xor with a value of all 1's). > > Is there a reason implementation-wise that they are not there? >If all your concerned about is atomicity (and not progress or efficiency), any of these can be implemented using CAS and their normal non-atomic arithmetic instructions. As such, they would not require any extensions.> Additionally, would it be possible to create something similar to > AtomicRMW that returned the new value instead of the old one?This seems like a straight forward operation that could be described easily in the existing IR. Are you trying to model a particular hardware architecture? Generally, any background you can provide would be helpful in understanding your problem. Philip
On 3 Jan 2014, at 06:03, Philip Reames <listmail at philipreames.com> wrote:> If all your concerned about is atomicity (and not progress or efficiency), any of these can be implemented using CAS and their normal non-atomic arithmetic instructions. As such, they would not require any extensions.Although this is true, it's far harder for a back end to match a sequence of instructions and turn them into something sensible than it is to expand a single pseudo. This is a problem for all architectures that are not x86 currently, because CAS trivially gets turned into a load-linked, compare, store-conditional, redo-on-failure sequence and any more complex operation (including atomic operations on floating point values, which C11 defines) get expanded in the IR as load, op, CAS, which then get first transformed into load, op, load-linked, compare, store-conditional, redo-on-failure, when the optimal encoding for the original expression would simply be load-linked, op, store-conditiona, redo-on-failure. Each back end is then responsible for reimplementing the logic that tries to untangle the resulting mess. If the architecture has an atomic multiply (not sure of any architecture that do, as atomic operations with variable latency are not fun things to implement) then it's even worse, because they have to change a sequence of operations into a single instruction, possibly with stronger forward progress guarantees, which might not actually be a valid transformation in the general case. David