Hi Tom, Matt, I'm working on a project that needs few coherent atomic operations (HSA mode: load, store, compare-and-swap) for std::atomic_uint in HCC. the attached patch implements atomic compare and swap for SI+ (untested). I tried to stay within what was available, but there are few issues that I was unsure how to address: 1.) it currently uses v2i32 for both input and output. This needlessly clobbers a register, but I haven't found a way to say 'output = the first subreg for input or output = input' in pattern constraints. 2.) I think these will need SLC bit set to work correctly on HSA targets. I'm not sure what the best way is to do this. I considered: * adding it as an operand to the newly introduced node (AMDGPUISD::CMP_SWAP), and setting it to 0/1 in Lowering pass. Can this be done without changing order of the inputs? slc is always the last but the count is variable * introducing HSA variants of the instructions with SLC bit set * setting the bit in DAG combine (is there a way to know if I'm combining atomic op, or do I need to split the SelectMUBUF functions?) 3.) depending on 2 atomic load/store can be either (preferably) MUBUF load/store with glc and slc set, or hacked to be either swp_noret (store) or cmp_swap 0,0 (load). thanks, -- Jan Vesely <jan.vesely at rutgers.edu> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-AMDGPU-SI-Implement-atomic-compare-and-swap.patch Type: text/x-patch Size: 12265 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/6f91d7ab/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/6f91d7ab/attachment.sig>
On Fri, Mar 25, 2016 at 02:22:11PM -0400, Jan Vesely wrote:> Hi Tom, Matt, > > I'm working on a project that needs few coherent atomic operations (HSA > mode: load, store, compare-and-swap) for std::atomic_uint in HCC. > > the attached patch implements atomic compare and swap for SI+ > (untested). I tried to stay within what was available, but there are > few issues that I was unsure how to address: > > 1.) it currently uses v2i32 for both input and output. This needlessly > clobbers a register, but I haven't found a way to say 'output = the > first subreg for input or output = input' in pattern constraints. > > 2.) I think these will need SLC bit set to work correctly on HSA > targets. I'm not sure what the best way is to do this. I considered: > * adding it as an operand to the newly introduced node > (AMDGPUISD::CMP_SWAP), and setting it to 0/1 in Lowering pass. Can this > be done without changing order of the inputs? slc is always the last > but the count is variable > * introducing HSA variants of the instructions with SLC bit set > * setting the bit in DAG combine (is there a way to know if I'm > combining atomic op, or do I need to split the SelectMUBUF functions?) > > 3.) depending on 2 atomic load/store can be either (preferably) MUBUF > load/store with glc and slc set, or hacked to be either swp_noret > (store) or cmp_swap 0,0 (load). >Have you seen this patch: http://reviews.llvm.org/D17280 -Tom> thanks, > -- > Jan Vesely <jan.vesely at rutgers.edu>> From ffd2c28fb68e958682dc60090a2b1fb669897423 Mon Sep 17 00:00:00 2001 > From: Jan Vesely <jan.vesely at rutgers.edu> > Date: Tue, 22 Mar 2016 13:54:55 -0400 > Subject: [PATCH 1/1] AMDGPU,SI: Implement atomic compare and swap > > --- > lib/Target/AMDGPU/AMDGPUISelLowering.cpp | 1 + > lib/Target/AMDGPU/AMDGPUISelLowering.h | 1 + > lib/Target/AMDGPU/CIInstructions.td | 3 +- > lib/Target/AMDGPU/SIISelLowering.cpp | 41 +++++++++++++++ > lib/Target/AMDGPU/SIISelLowering.h | 1 + > lib/Target/AMDGPU/SIInstrInfo.td | 9 ++++ > lib/Target/AMDGPU/SIInstructions.td | 2 +- > test/CodeGen/AMDGPU/global_atomics.ll | 89 ++++++++++++++++++++++++++++++++ > 8 files changed, 145 insertions(+), 2 deletions(-) > > diff --git a/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/lib/Target/AMDGPU/AMDGPUISelLowering.cpp > index 0820898..6c5b1ed 100644 > --- a/lib/Target/AMDGPU/AMDGPUISelLowering.cpp > +++ b/lib/Target/AMDGPU/AMDGPUISelLowering.cpp > @@ -2782,6 +2782,7 @@ const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const { > NODE_NAME_CASE(INTERP_P2) > NODE_NAME_CASE(STORE_MSKOR) > NODE_NAME_CASE(TBUFFER_STORE_FORMAT) > + NODE_NAME_CASE(CMP_SWAP) > case AMDGPUISD::LAST_AMDGPU_ISD_NUMBER: break; > } > return nullptr; > diff --git a/lib/Target/AMDGPU/AMDGPUISelLowering.h b/lib/Target/AMDGPU/AMDGPUISelLowering.h > index 4627f34..fce8e76 100644 > --- a/lib/Target/AMDGPU/AMDGPUISelLowering.h > +++ b/lib/Target/AMDGPU/AMDGPUISelLowering.h > @@ -313,6 +313,7 @@ enum NodeType : unsigned { > STORE_MSKOR, > LOAD_CONSTANT, > TBUFFER_STORE_FORMAT, > + CMP_SWAP, > LAST_AMDGPU_ISD_NUMBER > }; > > diff --git a/lib/Target/AMDGPU/CIInstructions.td b/lib/Target/AMDGPU/CIInstructions.td > index 593300f..d99b013 100644 > --- a/lib/Target/AMDGPU/CIInstructions.td > +++ b/lib/Target/AMDGPU/CIInstructions.td > @@ -156,7 +156,7 @@ defm FLAT_ATOMIC_SWAP : FLAT_ATOMIC < > flat<0x30, 0x40>, "flat_atomic_swap", VGPR_32 > >; > defm FLAT_ATOMIC_CMPSWAP : FLAT_ATOMIC < > - flat<0x31, 0x41>, "flat_atomic_cmpswap", VGPR_32, VReg_64 > + flat<0x31, 0x41>, "flat_atomic_cmpswap", VReg_64 > >; > defm FLAT_ATOMIC_ADD : FLAT_ATOMIC < > flat<0x32, 0x42>, "flat_atomic_add", VGPR_32 > @@ -322,6 +322,7 @@ def : FlatAtomicPat <FLAT_ATOMIC_SMIN_RTN, atomic_min_global, i32>; > def : FlatAtomicPat <FLAT_ATOMIC_UMIN_RTN, atomic_umin_global, i32>; > def : FlatAtomicPat <FLAT_ATOMIC_OR_RTN, atomic_or_global, i32>; > def : FlatAtomicPat <FLAT_ATOMIC_SWAP_RTN, atomic_swap_global, i32>; > +def : FlatAtomicPat <FLAT_ATOMIC_CMPSWAP_RTN, SIcmp_swap, v2i32>; > def : FlatAtomicPat <FLAT_ATOMIC_XOR_RTN, atomic_xor_global, i32>; > > } // End Predicates = [isCIVI] > diff --git a/lib/Target/AMDGPU/SIISelLowering.cpp b/lib/Target/AMDGPU/SIISelLowering.cpp > index a8f6ed5..4b6d118 100644 > --- a/lib/Target/AMDGPU/SIISelLowering.cpp > +++ b/lib/Target/AMDGPU/SIISelLowering.cpp > @@ -263,6 +263,12 @@ SITargetLowering::SITargetLowering(TargetMachine &TM, > setOperationAction(ISD::FDIV, MVT::f32, Custom); > setOperationAction(ISD::FDIV, MVT::f64, Custom); > > + // GCN CMP_SWAP needs input marshalling, and output demarshalling > + setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i32, Custom); > + // We can't return success/failure, only the old value, > + // let LLVM add the comparison > + setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, MVT::i32, Expand); > + > setTargetDAGCombine(ISD::FADD); > setTargetDAGCombine(ISD::FSUB); > setTargetDAGCombine(ISD::FMINNUM); > @@ -1168,6 +1174,7 @@ SDValue SITargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { > SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>(); > return LowerGlobalAddress(MFI, Op, DAG); > } > + case ISD::ATOMIC_CMP_SWAP: return LowerATOMIC_CMP_SWAP(Op, DAG); > case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG); > case ISD::INTRINSIC_VOID: return LowerINTRINSIC_VOID(Op, DAG); > } > @@ -1680,6 +1687,40 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, > } > } > > +SDValue SITargetLowering::LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const > +{ > + SDLoc DL(Op); > + > + AtomicSDNode *Swap = cast<AtomicSDNode>(Op); > + assert(Swap && Swap->isCompareAndSwap()); > + > + EVT MemVT = Swap->getMemoryVT(); > + EVT ValVT = EVT::getVectorVT(*DAG.getContext(), MemVT, 2); > + SDValue Cmp = Op.getOperand(2); > + SDValue New = Op.getOperand(3); > + > + // "src comes from the first data-vgpr, cmp from the second." > + SDValue Val = DAG.getNode(ISD::BUILD_VECTOR, DL, ValVT, New, Cmp); > + > + SDValue Ops[] = {Swap->getChain(), Swap->getBasePtr(), Val}; > + ArrayRef<EVT> VTs = { MVT::v2i32, MVT::Other }; > + SDVTList VTList = { VTs.data(), VTs.size() }; > + > + SDValue CmpSwap = DAG.getMemIntrinsicNode(AMDGPUISD::CMP_SWAP, DL, VTList, > + Ops, MemVT, Swap->getMemOperand()); > + // TODO: What about ordering? synchscope? > + > + // Extract returned old value > + SDValue Zero = DAG.getConstant(0, DL, MVT::i32); > + SDValue Old = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i32, > + CmpSwap.getValue(0), Zero); > + > + // Merge return value and Chain > + SDValue Ret = DAG.getNode(ISD::MERGE_VALUES, DL, Op->getVTList(), > + Old, CmpSwap.getValue(1)); > + return Ret; > +} > + > SDValue SITargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const { > SDLoc DL(Op); > LoadSDNode *Load = cast<LoadSDNode>(Op); > diff --git a/lib/Target/AMDGPU/SIISelLowering.h b/lib/Target/AMDGPU/SIISelLowering.h > index 842fbeb..2f87de1 100644 > --- a/lib/Target/AMDGPU/SIISelLowering.h > +++ b/lib/Target/AMDGPU/SIISelLowering.h > @@ -29,6 +29,7 @@ class SITargetLowering final : public AMDGPUTargetLowering { > SDValue lowerImplicitZextParam(SelectionDAG &DAG, SDValue Op, > MVT VT, unsigned Offset) const; > > + SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const; > SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const; > SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const; > SDValue LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const; > diff --git a/lib/Target/AMDGPU/SIInstrInfo.td b/lib/Target/AMDGPU/SIInstrInfo.td > index 37ee35c..eb0e6ce 100644 > --- a/lib/Target/AMDGPU/SIInstrInfo.td > +++ b/lib/Target/AMDGPU/SIInstrInfo.td > @@ -114,6 +114,15 @@ def SItbuffer_store : SDNode<"AMDGPUISD::TBUFFER_STORE_FORMAT", > [SDNPMayStore, SDNPMemOperand, SDNPHasChain] > >; > > +def SIcmp_swap : SDNode<"AMDGPUISD::CMP_SWAP", > + SDTypeProfile<1, 2, > + [SDTCisVT<0, v2i32>, // return value (should be i32) > + SDTCisVT<1, i64>, // address > + SDTCisVT<2, v2i32> // src followed by cmp > + ]>, > + [SDNPMayLoad, SDNPMayStore, SDNPMemOperand, SDNPHasChain] > +>; > + > def SIload_input : SDNode<"AMDGPUISD::LOAD_INPUT", > SDTypeProfile<1, 3, [SDTCisVT<0, v4f32>, SDTCisVT<1, v4i32>, SDTCisVT<2, i16>, > SDTCisVT<3, i32>]> > diff --git a/lib/Target/AMDGPU/SIInstructions.td b/lib/Target/AMDGPU/SIInstructions.td > index 2f7f908..77bb8ca 100644 > --- a/lib/Target/AMDGPU/SIInstructions.td > +++ b/lib/Target/AMDGPU/SIInstructions.td > @@ -1012,7 +1012,7 @@ defm BUFFER_ATOMIC_SWAP : MUBUF_Atomic < > mubuf<0x30, 0x40>, "buffer_atomic_swap", VGPR_32, i32, atomic_swap_global > >; > defm BUFFER_ATOMIC_CMPSWAP : MUBUF_Atomic < > - mubuf<0x31, 0x41>, "buffer_atomic_cmpswap", VReg_64, v2i32, null_frag > + mubuf<0x31, 0x41>, "buffer_atomic_cmpswap", VReg_64, v2i32, SIcmp_swap > >; > defm BUFFER_ATOMIC_ADD : MUBUF_Atomic < > mubuf<0x32, 0x42>, "buffer_atomic_add", VGPR_32, i32, atomic_add_global > diff --git a/test/CodeGen/AMDGPU/global_atomics.ll b/test/CodeGen/AMDGPU/global_atomics.ll > index a92ee89..340e808 100644 > --- a/test/CodeGen/AMDGPU/global_atomics.ll > +++ b/test/CodeGen/AMDGPU/global_atomics.ll > @@ -758,6 +758,95 @@ entry: > ret void > } > > +; CMP_SWAP > + > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_offset: > +; GCN: buffer_atomic_cmpswap v[{{[0-9]+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 offset:16{{$}} > +define void @atomic_cmpxchg_i32_offset(i32 addrspace(1)* %out, i32 %in, i32 %old) { > +entry: > + %gep = getelementptr i32, i32 addrspace(1)* %out, i32 4 > + %0 = cmpxchg volatile i32 addrspace(1)* %gep, i32 %old, i32 %in seq_cst seq_cst > + ret void > +} > + > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_ret_offset: > +; GCN: buffer_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]{{:[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 offset:16 glc{{$}} > +; GCN: buffer_store_dword v[[RET]] > +define void @atomic_cmpxchg_i32_ret_offset(i32 addrspace(1)* %out, i32 addrspace(1)* %out2, i32 %in, i32 %old) { > +entry: > + %gep = getelementptr i32, i32 addrspace(1)* %out, i32 4 > + %0 = cmpxchg volatile i32 addrspace(1)* %gep, i32 %old, i32 %in seq_cst seq_cst > + %1 = extractvalue { i32, i1 } %0, 0 > + store i32 %1, i32 addrspace(1)* %out2 > + ret void > +} > + > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_addr64_offset: > +; SI: buffer_atomic_cmpswap v[{{[0-9]+\:[0-9]+}}], v[{{[0-9]+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 addr64 offset:16{{$}} > +define void @atomic_cmpxchg_i32_addr64_offset(i32 addrspace(1)* %out, i32 %in, i64 %index, i32 %old) { > +entry: > + %ptr = getelementptr i32, i32 addrspace(1)* %out, i64 %index > + %gep = getelementptr i32, i32 addrspace(1)* %ptr, i32 4 > + %0 = cmpxchg volatile i32 addrspace(1)* %gep, i32 %old, i32 %in seq_cst seq_cst > + ret void > +} > + > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_ret_addr64_offset: > +; SI: buffer_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]:{{[0-9]+}}], v[{{[0-9]+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 addr64 offset:16 glc{{$}} > +; VI: flat_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]:{{[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}] glc{{$}} > +; GCN: buffer_store_dword v[[RET]] > +define void @atomic_cmpxchg_i32_ret_addr64_offset(i32 addrspace(1)* %out, i32 addrspace(1)* %out2, i32 %in, i64 %index, i32 %old) { > +entry: > + %ptr = getelementptr i32, i32 addrspace(1)* %out, i64 %index > + %gep = getelementptr i32, i32 addrspace(1)* %ptr, i32 4 > + %0 = cmpxchg volatile i32 addrspace(1)* %gep, i32 %old, i32 %in seq_cst seq_cst > + %1 = extractvalue { i32, i1 } %0, 0 > + store i32 %1, i32 addrspace(1)* %out2 > + ret void > +} > + > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32: > +; GCN: buffer_atomic_cmpswap v[{{[0-9]+:[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0{{$}} > +define void @atomic_cmpxchg_i32(i32 addrspace(1)* %out, i32 %in, i32 %old) { > +entry: > + %0 = cmpxchg volatile i32 addrspace(1)* %out, i32 %old, i32 %in seq_cst seq_cst > + ret void > +} > + > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_ret: > +; GCN: buffer_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 glc > +; GCN: buffer_store_dword v[[RET]] > +define void @atomic_cmpxchg_i32_ret(i32 addrspace(1)* %out, i32 addrspace(1)* %out2, i32 %in, i32 %old) { > +entry: > + %0 = cmpxchg volatile i32 addrspace(1)* %out, i32 %old, i32 %in seq_cst seq_cst > + %1 = extractvalue { i32, i1 } %0, 0 > + store i32 %1, i32 addrspace(1)* %out2 > + ret void > +} > + > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_addr64: > +; SI: buffer_atomic_cmpswap v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 addr64{{$}} > +; VI: flat_atomic_cmpswap v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]{{$}} > +define void @atomic_cmpxchg_i32_addr64(i32 addrspace(1)* %out, i32 %in, i64 %index, i32 %old) { > +entry: > + %ptr = getelementptr i32, i32 addrspace(1)* %out, i64 %index > + %0 = cmpxchg volatile i32 addrspace(1)* %ptr, i32 %old, i32 %in seq_cst seq_cst > + ret void > +} > + > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_ret_addr64: > +; SI: buffer_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]:{{[0-9]+}}], v[{{[0-9]+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 addr64 glc{{$}} > +; VI: flat_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]:{{[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}] glc{{$}} > +; GCN: buffer_store_dword v[[RET]] > +define void @atomic_cmpxchg_i32_ret_addr64(i32 addrspace(1)* %out, i32 addrspace(1)* %out2, i32 %in, i64 %index, i32 %old) { > +entry: > + %ptr = getelementptr i32, i32 addrspace(1)* %out, i64 %index > + %0 = cmpxchg volatile i32 addrspace(1)* %ptr, i32 %old, i32 %in seq_cst seq_cst > + %1 = extractvalue { i32, i1 } %0, 0 > + store i32 %1, i32 addrspace(1)* %out2 > + ret void > +} > + > ; FUNC-LABEL: {{^}}atomic_xor_i32_offset: > ; GCN: buffer_atomic_xor v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offset:16{{$}} > define void @atomic_xor_i32_offset(i32 addrspace(1)* %out, i32 %in) { > -- > 2.5.5 >
On Mon, 2016-03-28 at 06:54 -0700, Tom Stellard wrote:> On Fri, Mar 25, 2016 at 02:22:11PM -0400, Jan Vesely wrote: > > > > Hi Tom, Matt, > > > > I'm working on a project that needs few coherent atomic operations > > (HSA > > mode: load, store, compare-and-swap) for std::atomic_uint in HCC. > > > > the attached patch implements atomic compare and swap for SI+ > > (untested). I tried to stay within what was available, but there > > are > > few issues that I was unsure how to address: > > > > 1.) it currently uses v2i32 for both input and output. This > > needlessly > > clobbers a register, but I haven't found a way to say 'output = the > > first subreg for input or output = input' in pattern constraints. > > > > 2.) I think these will need SLC bit set to work correctly on HSA > > targets. I'm not sure what the best way is to do this. I > > considered: > > * adding it as an operand to the newly introduced node > > (AMDGPUISD::CMP_SWAP), and setting it to 0/1 in Lowering pass. Can > > this > > be done without changing order of the inputs? slc is always the > > last > > but the count is variable > > * introducing HSA variants of the instructions with SLC bit set > > * setting the bit in DAG combine (is there a way to know if I'm > > combining atomic op, or do I need to split the SelectMUBUF > > functions?) > > > > 3.) depending on 2 atomic load/store can be either (preferably) > > MUBUF > > load/store with glc and slc set, or hacked to be either swp_noret > > (store) or cmp_swap 0,0 (load). > > > Have you seen this patch: http://reviews.llvm.org/D17280Thanks, I subscribed to that one. It answers the first question (can't be done), but it does not address the SLC problem. anyway I moved my questions there. thanks, Jan> > -Tom > > > > > thanks, > > -- > > Jan Vesely <jan.vesely at rutgers.edu> > > > > From ffd2c28fb68e958682dc60090a2b1fb669897423 Mon Sep 17 00:00:00 > > 2001 > > From: Jan Vesely <jan.vesely at rutgers.edu> > > Date: Tue, 22 Mar 2016 13:54:55 -0400 > > Subject: [PATCH 1/1] AMDGPU,SI: Implement atomic compare and swap > > > > --- > > lib/Target/AMDGPU/AMDGPUISelLowering.cpp | 1 + > > lib/Target/AMDGPU/AMDGPUISelLowering.h | 1 + > > lib/Target/AMDGPU/CIInstructions.td | 3 +- > > lib/Target/AMDGPU/SIISelLowering.cpp | 41 +++++++++++++++ > > lib/Target/AMDGPU/SIISelLowering.h | 1 + > > lib/Target/AMDGPU/SIInstrInfo.td | 9 ++++ > > lib/Target/AMDGPU/SIInstructions.td | 2 +- > > test/CodeGen/AMDGPU/global_atomics.ll | 89 > > ++++++++++++++++++++++++++++++++ > > 8 files changed, 145 insertions(+), 2 deletions(-) > > > > diff --git a/lib/Target/AMDGPU/AMDGPUISelLowering.cpp > > b/lib/Target/AMDGPU/AMDGPUISelLowering.cpp > > index 0820898..6c5b1ed 100644 > > --- a/lib/Target/AMDGPU/AMDGPUISelLowering.cpp > > +++ b/lib/Target/AMDGPU/AMDGPUISelLowering.cpp > > @@ -2782,6 +2782,7 @@ const char* > > AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const { > > NODE_NAME_CASE(INTERP_P2) > > NODE_NAME_CASE(STORE_MSKOR) > > NODE_NAME_CASE(TBUFFER_STORE_FORMAT) > > + NODE_NAME_CASE(CMP_SWAP) > > case AMDGPUISD::LAST_AMDGPU_ISD_NUMBER: break; > > } > > return nullptr; > > diff --git a/lib/Target/AMDGPU/AMDGPUISelLowering.h > > b/lib/Target/AMDGPU/AMDGPUISelLowering.h > > index 4627f34..fce8e76 100644 > > --- a/lib/Target/AMDGPU/AMDGPUISelLowering.h > > +++ b/lib/Target/AMDGPU/AMDGPUISelLowering.h > > @@ -313,6 +313,7 @@ enum NodeType : unsigned { > > STORE_MSKOR, > > LOAD_CONSTANT, > > TBUFFER_STORE_FORMAT, > > + CMP_SWAP, > > LAST_AMDGPU_ISD_NUMBER > > }; > > > > diff --git a/lib/Target/AMDGPU/CIInstructions.td > > b/lib/Target/AMDGPU/CIInstructions.td > > index 593300f..d99b013 100644 > > --- a/lib/Target/AMDGPU/CIInstructions.td > > +++ b/lib/Target/AMDGPU/CIInstructions.td > > @@ -156,7 +156,7 @@ defm FLAT_ATOMIC_SWAP : FLAT_ATOMIC < > > flat<0x30, 0x40>, "flat_atomic_swap", VGPR_32 > > >; > > defm FLAT_ATOMIC_CMPSWAP : FLAT_ATOMIC < > > - flat<0x31, 0x41>, "flat_atomic_cmpswap", VGPR_32, VReg_64 > > + flat<0x31, 0x41>, "flat_atomic_cmpswap", VReg_64 > > >; > > defm FLAT_ATOMIC_ADD : FLAT_ATOMIC < > > flat<0x32, 0x42>, "flat_atomic_add", VGPR_32 > > @@ -322,6 +322,7 @@ def : FlatAtomicPat <FLAT_ATOMIC_SMIN_RTN, > > atomic_min_global, i32>; > > def : FlatAtomicPat <FLAT_ATOMIC_UMIN_RTN, atomic_umin_global, > > i32>; > > def : FlatAtomicPat <FLAT_ATOMIC_OR_RTN, atomic_or_global, i32>; > > def : FlatAtomicPat <FLAT_ATOMIC_SWAP_RTN, atomic_swap_global, > > i32>; > > +def : FlatAtomicPat <FLAT_ATOMIC_CMPSWAP_RTN, SIcmp_swap, v2i32>; > > def : FlatAtomicPat <FLAT_ATOMIC_XOR_RTN, atomic_xor_global, i32>; > > > > } // End Predicates = [isCIVI] > > diff --git a/lib/Target/AMDGPU/SIISelLowering.cpp > > b/lib/Target/AMDGPU/SIISelLowering.cpp > > index a8f6ed5..4b6d118 100644 > > --- a/lib/Target/AMDGPU/SIISelLowering.cpp > > +++ b/lib/Target/AMDGPU/SIISelLowering.cpp > > @@ -263,6 +263,12 @@ > > SITargetLowering::SITargetLowering(TargetMachine &TM, > > setOperationAction(ISD::FDIV, MVT::f32, Custom); > > setOperationAction(ISD::FDIV, MVT::f64, Custom); > > > > + // GCN CMP_SWAP needs input marshalling, and output > > demarshalling > > + setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i32, Custom); > > + // We can't return success/failure, only the old value, > > + // let LLVM add the comparison > > + setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, MVT::i32, > > Expand); > > + > > setTargetDAGCombine(ISD::FADD); > > setTargetDAGCombine(ISD::FSUB); > > setTargetDAGCombine(ISD::FMINNUM); > > @@ -1168,6 +1174,7 @@ SDValue > > SITargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) > > const { > > SIMachineFunctionInfo *MFI > > MF.getInfo<SIMachineFunctionInfo>(); > > return LowerGlobalAddress(MFI, Op, DAG); > > } > > + case ISD::ATOMIC_CMP_SWAP: return LowerATOMIC_CMP_SWAP(Op, DAG); > > case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, > > DAG); > > case ISD::INTRINSIC_VOID: return LowerINTRINSIC_VOID(Op, DAG); > > } > > @@ -1680,6 +1687,40 @@ SDValue > > SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, > > } > > } > > > > +SDValue SITargetLowering::LowerATOMIC_CMP_SWAP(SDValue Op, > > SelectionDAG &DAG) const > > +{ > > + SDLoc DL(Op); > > + > > + AtomicSDNode *Swap = cast<AtomicSDNode>(Op); > > + assert(Swap && Swap->isCompareAndSwap()); > > + > > + EVT MemVT = Swap->getMemoryVT(); > > + EVT ValVT = EVT::getVectorVT(*DAG.getContext(), MemVT, 2); > > + SDValue Cmp = Op.getOperand(2); > > + SDValue New = Op.getOperand(3); > > + > > + // "src comes from the first data-vgpr, cmp from the second." > > + SDValue Val = DAG.getNode(ISD::BUILD_VECTOR, DL, ValVT, New, > > Cmp); > > + > > + SDValue Ops[] = {Swap->getChain(), Swap->getBasePtr(), Val}; > > + ArrayRef<EVT> VTs = { MVT::v2i32, MVT::Other }; > > + SDVTList VTList = { VTs.data(), VTs.size() }; > > + > > + SDValue CmpSwap = DAG.getMemIntrinsicNode(AMDGPUISD::CMP_SWAP, > > DL, VTList, > > + Ops, MemVT, Swap- > > >getMemOperand()); > > + // TODO: What about ordering? synchscope? > > + > > + // Extract returned old value > > + SDValue Zero = DAG.getConstant(0, DL, MVT::i32); > > + SDValue Old = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, > > MVT::i32, > > + CmpSwap.getValue(0), Zero); > > + > > + // Merge return value and Chain > > + SDValue Ret = DAG.getNode(ISD::MERGE_VALUES, DL, Op- > > >getVTList(), > > + Old, CmpSwap.getValue(1)); > > + return Ret; > > +} > > + > > SDValue SITargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) > > const { > > SDLoc DL(Op); > > LoadSDNode *Load = cast<LoadSDNode>(Op); > > diff --git a/lib/Target/AMDGPU/SIISelLowering.h > > b/lib/Target/AMDGPU/SIISelLowering.h > > index 842fbeb..2f87de1 100644 > > --- a/lib/Target/AMDGPU/SIISelLowering.h > > +++ b/lib/Target/AMDGPU/SIISelLowering.h > > @@ -29,6 +29,7 @@ class SITargetLowering final : public > > AMDGPUTargetLowering { > > SDValue lowerImplicitZextParam(SelectionDAG &DAG, SDValue Op, > > MVT VT, unsigned Offset) const; > > > > + SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) > > const; > > SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) > > const; > > SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) > > const; > > SDValue LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const; > > diff --git a/lib/Target/AMDGPU/SIInstrInfo.td > > b/lib/Target/AMDGPU/SIInstrInfo.td > > index 37ee35c..eb0e6ce 100644 > > --- a/lib/Target/AMDGPU/SIInstrInfo.td > > +++ b/lib/Target/AMDGPU/SIInstrInfo.td > > @@ -114,6 +114,15 @@ def SItbuffer_store : > > SDNode<"AMDGPUISD::TBUFFER_STORE_FORMAT", > > [SDNPMayStore, SDNPMemOperand, SDNPHasChain] > > >; > > > > +def SIcmp_swap : SDNode<"AMDGPUISD::CMP_SWAP", > > + SDTypeProfile<1, 2, > > + [SDTCisVT<0, v2i32>, // return value (should be i32) > > + SDTCisVT<1, i64>, // address > > + SDTCisVT<2, v2i32> // src followed by cmp > > + ]>, > > + [SDNPMayLoad, SDNPMayStore, SDNPMemOperand, SDNPHasChain] > > +>; > > + > > def SIload_input : SDNode<"AMDGPUISD::LOAD_INPUT", > > SDTypeProfile<1, 3, [SDTCisVT<0, v4f32>, SDTCisVT<1, v4i32>, > > SDTCisVT<2, i16>, > > SDTCisVT<3, i32>]> > > diff --git a/lib/Target/AMDGPU/SIInstructions.td > > b/lib/Target/AMDGPU/SIInstructions.td > > index 2f7f908..77bb8ca 100644 > > --- a/lib/Target/AMDGPU/SIInstructions.td > > +++ b/lib/Target/AMDGPU/SIInstructions.td > > @@ -1012,7 +1012,7 @@ defm BUFFER_ATOMIC_SWAP : MUBUF_Atomic < > > mubuf<0x30, 0x40>, "buffer_atomic_swap", VGPR_32, i32, > > atomic_swap_global > > >; > > defm BUFFER_ATOMIC_CMPSWAP : MUBUF_Atomic < > > - mubuf<0x31, 0x41>, "buffer_atomic_cmpswap", VReg_64, v2i32, > > null_frag > > + mubuf<0x31, 0x41>, "buffer_atomic_cmpswap", VReg_64, v2i32, > > SIcmp_swap > > >; > > defm BUFFER_ATOMIC_ADD : MUBUF_Atomic < > > mubuf<0x32, 0x42>, "buffer_atomic_add", VGPR_32, i32, > > atomic_add_global > > diff --git a/test/CodeGen/AMDGPU/global_atomics.ll > > b/test/CodeGen/AMDGPU/global_atomics.ll > > index a92ee89..340e808 100644 > > --- a/test/CodeGen/AMDGPU/global_atomics.ll > > +++ b/test/CodeGen/AMDGPU/global_atomics.ll > > @@ -758,6 +758,95 @@ entry: > > ret void > > } > > > > +; CMP_SWAP > > + > > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_offset: > > +; GCN: buffer_atomic_cmpswap v[{{[0-9]+}}:{{[0-9]+}}], s[{{[0- > > 9]+}}:{{[0-9]+}}], 0 offset:16{{$}} > > +define void @atomic_cmpxchg_i32_offset(i32 addrspace(1)* %out, i32 > > %in, i32 %old) { > > +entry: > > + %gep = getelementptr i32, i32 addrspace(1)* %out, i32 4 > > + %0 = cmpxchg volatile i32 addrspace(1)* %gep, i32 %old, i32 %in > > seq_cst seq_cst > > + ret void > > +} > > + > > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_ret_offset: > > +; GCN: buffer_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]{{:[0-9]+}}], > > s[{{[0-9]+}}:{{[0-9]+}}], 0 offset:16 glc{{$}} > > +; GCN: buffer_store_dword v[[RET]] > > +define void @atomic_cmpxchg_i32_ret_offset(i32 addrspace(1)* %out, > > i32 addrspace(1)* %out2, i32 %in, i32 %old) { > > +entry: > > + %gep = getelementptr i32, i32 addrspace(1)* %out, i32 4 > > + %0 = cmpxchg volatile i32 addrspace(1)* %gep, i32 %old, i32 %in > > seq_cst seq_cst > > + %1 = extractvalue { i32, i1 } %0, 0 > > + store i32 %1, i32 addrspace(1)* %out2 > > + ret void > > +} > > + > > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_addr64_offset: > > +; SI: buffer_atomic_cmpswap v[{{[0-9]+\:[0-9]+}}], v[{{[0- > > 9]+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 addr64 > > offset:16{{$}} > > +define void @atomic_cmpxchg_i32_addr64_offset(i32 addrspace(1)* > > %out, i32 %in, i64 %index, i32 %old) { > > +entry: > > + %ptr = getelementptr i32, i32 addrspace(1)* %out, i64 %index > > + %gep = getelementptr i32, i32 addrspace(1)* %ptr, i32 4 > > + %0 = cmpxchg volatile i32 addrspace(1)* %gep, i32 %old, i32 > > %in seq_cst seq_cst > > + ret void > > +} > > + > > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_ret_addr64_offset: > > +; SI: buffer_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]:{{[0-9]+}}], > > v[{{[0-9]+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 addr64 > > offset:16 glc{{$}} > > +; VI: flat_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]:{{[0-9]+}}], > > v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}] glc{{$}} > > +; GCN: buffer_store_dword v[[RET]] > > +define void @atomic_cmpxchg_i32_ret_addr64_offset(i32 > > addrspace(1)* %out, i32 addrspace(1)* %out2, i32 %in, i64 %index, > > i32 %old) { > > +entry: > > + %ptr = getelementptr i32, i32 addrspace(1)* %out, i64 %index > > + %gep = getelementptr i32, i32 addrspace(1)* %ptr, i32 4 > > + %0 = cmpxchg volatile i32 addrspace(1)* %gep, i32 %old, i32 > > %in seq_cst seq_cst > > + %1 = extractvalue { i32, i1 } %0, 0 > > + store i32 %1, i32 addrspace(1)* %out2 > > + ret void > > +} > > + > > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32: > > +; GCN: buffer_atomic_cmpswap v[{{[0-9]+:[0-9]+}}], s[{{[0- > > 9]+}}:{{[0-9]+}}], 0{{$}} > > +define void @atomic_cmpxchg_i32(i32 addrspace(1)* %out, i32 %in, > > i32 %old) { > > +entry: > > + %0 = cmpxchg volatile i32 addrspace(1)* %out, i32 %old, i32 %in > > seq_cst seq_cst > > + ret void > > +} > > + > > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_ret: > > +; GCN: buffer_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]:{{[0-9]+}}], > > s[{{[0-9]+}}:{{[0-9]+}}], 0 glc > > +; GCN: buffer_store_dword v[[RET]] > > +define void @atomic_cmpxchg_i32_ret(i32 addrspace(1)* %out, i32 > > addrspace(1)* %out2, i32 %in, i32 %old) { > > +entry: > > + %0 = cmpxchg volatile i32 addrspace(1)* %out, i32 %old, i32 %in > > seq_cst seq_cst > > + %1 = extractvalue { i32, i1 } %0, 0 > > + store i32 %1, i32 addrspace(1)* %out2 > > + ret void > > +} > > + > > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_addr64: > > +; SI: buffer_atomic_cmpswap v[{{[0-9]+:[0-9]+}}], v[{{[0- > > 9]+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 addr64{{$}} > > +; VI: flat_atomic_cmpswap v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0- > > 9]+}}]{{$}} > > +define void @atomic_cmpxchg_i32_addr64(i32 addrspace(1)* %out, i32 > > %in, i64 %index, i32 %old) { > > +entry: > > + %ptr = getelementptr i32, i32 addrspace(1)* %out, i64 %index > > + %0 = cmpxchg volatile i32 addrspace(1)* %ptr, i32 %old, i32 %in > > seq_cst seq_cst > > + ret void > > +} > > + > > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_ret_addr64: > > +; SI: buffer_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]:{{[0-9]+}}], > > v[{{[0-9]+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 addr64 > > glc{{$}} > > +; VI: flat_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]:{{[0-9]+}}], > > v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}] glc{{$}} > > +; GCN: buffer_store_dword v[[RET]] > > +define void @atomic_cmpxchg_i32_ret_addr64(i32 addrspace(1)* %out, > > i32 addrspace(1)* %out2, i32 %in, i64 %index, i32 %old) { > > +entry: > > + %ptr = getelementptr i32, i32 addrspace(1)* %out, i64 %index > > + %0 = cmpxchg volatile i32 addrspace(1)* %ptr, i32 %old, i32 %in > > seq_cst seq_cst > > + %1 = extractvalue { i32, i1 } %0, 0 > > + store i32 %1, i32 addrspace(1)* %out2 > > + ret void > > +} > > + > > ; FUNC-LABEL: {{^}}atomic_xor_i32_offset: > > ; GCN: buffer_atomic_xor v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 > > offset:16{{$}} > > define void @atomic_xor_i32_offset(i32 addrspace(1)* %out, i32 > > %in) {-- Jan Vesely <jan.vesely at rutgers.edu> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160328/d2eb9b79/attachment.sig>
Possibly Parallel Threads
- RFC: atomic operations on SI+
- [LLVMdev] [Patch][RFC] Change R600 data layout
- [LLVMdev] R600/SI build failure on Leopard (Use of C++11)
- [LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)
- git llvm push errors