Jan Vesely
2014-Oct-03 16:32 UTC
[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)
Hi Tom, Matt, I'm running into strange issues with the cos test (piglit generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c) I have been seeing random failures (incorrect results) for some time and tried to investigate. the weird part is that the failures are not 100% reproducible, sometimes the tests pass, or partly pass (it's usually float8 and float16 subtests that fail). Failure is always the same "Expecting -0.925879 (0xbf6d0668) with tolerance 0.000000 (2 ulps), but got nan (0x7fc00000)" although the position may vary. even if the same value was computed earlier in the results array The first patch of this series does not change the behavior (or instruction dump). however, using the ADDC instruction results in hang on every cos test "ring 0 stalled for more than 10000msec" "GPU lockup (waiting for 0x00000000001023cf last fence id 0x00000000001023ce on ring 0)" although the actual test results follow the same result as before (random failures mostly in float8/16 tests). I can even get test pass with hang on every subtest Using SIGN_EXTEND_INREG instead of "SUB 0" in this patch gets rid of the hangs, and makes the failures fully reproducible in every subtest, triggered on the first occurrence of what should have been -0.925879. the GPU is AMD TURKS (HD 7570 1002:675d) I tried digging throught he manual but it oly mentions that ADDC is vec and trans inst. Is there any errata document the might give a hint? thanks, jan PS: There are no problems with sin, so I might be able to triage at least the code that hangs with this patch On Wed, 2014-09-24 at 20:27 -0400, Jan Vesely wrote:> v2: tighten the sub64 tests > v3: rename to CARRY/BORROW > > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu> > > --- > lib/Target/R600/AMDGPUISelLowering.h | 2 + > lib/Target/R600/AMDGPUInstrInfo.td | 6 ++ > lib/Target/R600/AMDGPUSubtarget.h | 8 ++ > lib/Target/R600/EvergreenInstructions.td | 3 + > lib/Target/R600/R600ISelLowering.cpp | 39 +++++++- > test/CodeGen/R600/add.ll | 154 +++++++++++++++++-------------- > test/CodeGen/R600/sub.ll | 18 ++-- > test/CodeGen/R600/uaddo.ll | 17 +++- > test/CodeGen/R600/usubo.ll | 23 ++++- > 9 files changed, 189 insertions(+), 81 deletions(-) > > diff --git a/lib/Target/R600/AMDGPUISelLowering.h b/lib/Target/R600/AMDGPUISelLowering.h > index 911576b..6eaf001 100644 > --- a/lib/Target/R600/AMDGPUISelLowering.h > +++ b/lib/Target/R600/AMDGPUISelLowering.h > @@ -205,6 +205,8 @@ enum { > RSQ_CLAMPED, > LDEXP, > DOT4, > + CARRY, > + BORROW, > BFE_U32, // Extract range of bits with zero extension to 32-bits. > BFE_I32, // Extract range of bits with sign extension to 32-bits. > BFI, // (src0 & src1) | (~src0 & src2) > diff --git a/lib/Target/R600/AMDGPUInstrInfo.td b/lib/Target/R600/AMDGPUInstrInfo.td > index 3d70791..1600c4a 100644 > --- a/lib/Target/R600/AMDGPUInstrInfo.td > +++ b/lib/Target/R600/AMDGPUInstrInfo.td > @@ -91,6 +91,12 @@ def AMDGPUumin : SDNode<"AMDGPUISD::UMIN", SDTIntBinOp, > [SDNPCommutative, SDNPAssociative] > >; > > +// out = (src0 + src1 > 0xFFFFFFFF) ? 1 : 0 > +def AMDGPUcarry : SDNode<"AMDGPUISD::CARRY", SDTIntBinOp, []>; > + > +// out = (src1 > src0) ? 1 : 0 > +def AMDGPUborrow : SDNode<"AMDGPUISD::BORROW", SDTIntBinOp, []>; > + > > def AMDGPUcvt_f32_ubyte0 : SDNode<"AMDGPUISD::CVT_F32_UBYTE0", > SDTIntToFPOp, []>; > diff --git a/lib/Target/R600/AMDGPUSubtarget.h b/lib/Target/R600/AMDGPUSubtarget.h > index 6797972..9f2ba61 100644 > --- a/lib/Target/R600/AMDGPUSubtarget.h > +++ b/lib/Target/R600/AMDGPUSubtarget.h > @@ -168,6 +168,14 @@ public: > return (getGeneration() >= EVERGREEN); > } > > + bool hasCARRY() const { > + return (getGeneration() >= EVERGREEN); > + } > + > + bool hasBORROW() const { > + return (getGeneration() >= EVERGREEN); > + } > + > bool IsIRStructurizerEnabled() const { > return EnableIRStructurizer; > } > diff --git a/lib/Target/R600/EvergreenInstructions.td b/lib/Target/R600/EvergreenInstructions.td > index 8117b60..d3822ef 100644 > --- a/lib/Target/R600/EvergreenInstructions.td > +++ b/lib/Target/R600/EvergreenInstructions.td > @@ -336,6 +336,9 @@ defm CUBE_eg : CUBE_Common<0xC0>; > > def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>; > > +def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>; > +def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>; > + > def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", ctlz_zero_undef, VecALU>; > def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", cttz_zero_undef, VecALU>; > > diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp > index 9b2b689..a28b76a 100644 > --- a/lib/Target/R600/R600ISelLowering.cpp > +++ b/lib/Target/R600/R600ISelLowering.cpp > @@ -89,6 +89,15 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) : > setOperationAction(ISD::SELECT, MVT::v2i32, Expand); > setOperationAction(ISD::SELECT, MVT::v4i32, Expand); > > + // ADD, SUB overflow. These need to be Custom because > + // SelectionDAGLegalize::LegalizeOp (LegalizeDAG.cpp) > + // turns Legal into expand > + if (Subtarget->hasCARRY()) > + setOperationAction(ISD::UADDO, MVT::i32, Custom); > + > + if (Subtarget->hasBORROW()) > + setOperationAction(ISD::USUBO, MVT::i32, Custom); > + > // Expand sign extension of vectors > if (!Subtarget->hasBFE()) > setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand); > @@ -154,8 +163,6 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) : > setTargetDAGCombine(ISD::SELECT_CC); > setTargetDAGCombine(ISD::INSERT_VECTOR_ELT); > > - setOperationAction(ISD::SUB, MVT::i64, Expand); > - > // These should be replaced by UDVIREM, but it does not happen automatically > // during Type Legalization > setOperationAction(ISD::UDIV, MVT::i64, Custom); > @@ -578,6 +585,34 @@ SDValue R600TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const > case ISD::SHL_PARTS: return LowerSHLParts(Op, DAG); > case ISD::SRA_PARTS: > case ISD::SRL_PARTS: return LowerSRXParts(Op, DAG); > + case ISD::UADDO: { > + SDLoc DL(Op); > + EVT VT = Op.getValueType(); > + > + SDValue Lo = Op.getOperand(0); > + SDValue Hi = Op.getOperand(1); > + > + SDValue OVF = DAG.getNode(AMDGPUISD::CARRY, DL, VT, Lo, Hi); > + //negate sign > + OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF); > + SDValue Res = DAG.getNode(ISD::ADD, DL, VT, Lo, Hi); > + > + return DAG.getNode(ISD::MERGE_VALUES, DL, DAG.getVTList(VT, VT), Res, OVF); > + } > + case ISD::USUBO: { > + SDLoc DL(Op); > + EVT VT = Op.getValueType(); > + > + SDValue Arg0 = Op.getOperand(0); > + SDValue Arg1 = Op.getOperand(1); > + > + SDValue OVF = DAG.getNode(AMDGPUISD::BORROW, DL, VT, Arg0, Arg1); > + //negate sign > + OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF); > + SDValue Res = DAG.getNode(ISD::SUB, DL, VT, Arg0, Arg1); > + > + return DAG.getNode(ISD::MERGE_VALUES, DL, DAG.getVTList(VT, VT), Res, OVF); > + } > case ISD::FCOS: > case ISD::FSIN: return LowerTrig(Op, DAG); > case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG); > diff --git a/test/CodeGen/R600/add.ll b/test/CodeGen/R600/add.ll > index 8cf43d1..fddb951 100644 > --- a/test/CodeGen/R600/add.ll > +++ b/test/CodeGen/R600/add.ll > @@ -1,12 +1,12 @@ > -; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK --check-prefix=FUNC %s > -; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI-CHECK --check-prefix=FUNC %s > +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG --check-prefix=FUNC %s > +; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI --check-prefix=FUNC %s > > ;FUNC-LABEL: @test1: > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > -;SI-CHECK: V_ADD_I32_e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}} > -;SI-CHECK-NOT: [[REG]] > -;SI-CHECK: BUFFER_STORE_DWORD [[REG]], > +;SI: V_ADD_I32_e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}} > +;SI-NOT: [[REG]] > +;SI: BUFFER_STORE_DWORD [[REG]], > define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) { > %b_ptr = getelementptr i32 addrspace(1)* %in, i32 1 > %a = load i32 addrspace(1)* %in > @@ -17,11 +17,11 @@ define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) { > } > > ;FUNC-LABEL: @test2: > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) { > %b_ptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1 > @@ -33,15 +33,15 @@ define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) { > } > > ;FUNC-LABEL: @test4: > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { > %b_ptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1 > @@ -53,22 +53,22 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { > } > > ; FUNC-LABEL: @test8 > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > define void @test8(<8 x i32> addrspace(1)* %out, <8 x i32> %a, <8 x i32> %b) { > entry: > %0 = add <8 x i32> %a, %b > @@ -77,38 +77,38 @@ entry: > } > > ; FUNC-LABEL: @test16 > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; EG-CHECK: ADD_INT > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > -; SI-CHECK: S_ADD_I32 > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; EG: ADD_INT > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > +; SI: S_ADD_I32 > define void @test16(<16 x i32> addrspace(1)* %out, <16 x i32> %a, <16 x i32> %b) { > entry: > %0 = add <16 x i32> %a, %b > @@ -117,8 +117,12 @@ entry: > } > > ; FUNC-LABEL: @add64 > -; SI-CHECK: S_ADD_U32 > -; SI-CHECK: S_ADDC_U32 > +; SI: S_ADD_U32 > +; SI: S_ADDC_U32 > + > +; EG-DAG: ADD_INT > +; EG-DAG: ADDC_UINT > +; EG-DAG: ADD_INT > define void @add64(i64 addrspace(1)* %out, i64 %a, i64 %b) { > entry: > %0 = add i64 %a, %b > @@ -132,7 +136,11 @@ entry: > ; to a VGPR before doing the add. > > ; FUNC-LABEL: @add64_sgpr_vgpr > -; SI-CHECK-NOT: V_ADDC_U32_e32 s > +; SI-NOT: V_ADDC_U32_e32 s > + > +; EG-DAG: ADD_INT > +; EG-DAG: ADDC_UINT > +; EG-DAG: ADD_INT > define void @add64_sgpr_vgpr(i64 addrspace(1)* %out, i64 %a, i64 addrspace(1)* %in) { > entry: > %0 = load i64 addrspace(1)* %in > @@ -143,8 +151,12 @@ entry: > > ; Test i64 add inside a branch. > ; FUNC-LABEL: @add64_in_branch > -; SI-CHECK: S_ADD_U32 > -; SI-CHECK: S_ADDC_U32 > +; SI: S_ADD_U32 > +; SI: S_ADDC_U32 > + > +; EG-DAG: ADD_INT > +; EG-DAG: ADDC_UINT > +; EG-DAG: ADD_INT > define void @add64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) { > entry: > %0 = icmp eq i64 %a, 0 > diff --git a/test/CodeGen/R600/sub.ll b/test/CodeGen/R600/sub.ll > index 8678e2b..1225ebd 100644 > --- a/test/CodeGen/R600/sub.ll > +++ b/test/CodeGen/R600/sub.ll > @@ -43,10 +43,13 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { > ; SI: S_SUB_U32 > ; SI: S_SUBB_U32 > > +; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.[XYZW]]] > +; EG: MEM_RAT_CACHELESS STORE_RAW [[HI:T[0-9]+\.[XYZW]]] > +; EG-DAG: SUB_INT {{[* ]*}}[[LO]] > +; EG-DAG: SUBB_UINT > ; EG-DAG: SUB_INT > -; EG-DAG: SETGT_UINT > -; EG-DAG: SUB_INT > -; EG-DAG: ADD_INT > +; EG-DAG: SUB_INT {{[* ]*}}[[HI]] > +; EG-NOT: SUB > define void @s_sub_i64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind { > %result = sub i64 %a, %b > store i64 %result, i64 addrspace(1)* %out, align 8 > @@ -57,10 +60,13 @@ define void @s_sub_i64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind > ; SI: V_SUB_I32_e32 > ; SI: V_SUBB_U32_e32 > > +; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.[XYZW]]] > +; EG: MEM_RAT_CACHELESS STORE_RAW [[HI:T[0-9]+\.[XYZW]]] > +; EG-DAG: SUB_INT {{[* ]*}}[[LO]] > +; EG-DAG: SUBB_UINT > ; EG-DAG: SUB_INT > -; EG-DAG: SETGT_UINT > -; EG-DAG: SUB_INT > -; EG-DAG: ADD_INT > +; EG-DAG: SUB_INT {{[* ]*}}[[HI]] > +; EG-NOT: SUB > define void @v_sub_i64(i64 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %inA, i64 addrspace(1)* noalias %inB) nounwind { > %tid = call i32 @llvm.r600.read.tidig.x() readnone > %a_ptr = getelementptr i64 addrspace(1)* %inA, i32 %tid > diff --git a/test/CodeGen/R600/uaddo.ll b/test/CodeGen/R600/uaddo.ll > index 0b854b5..ce30bbc 100644 > --- a/test/CodeGen/R600/uaddo.ll > +++ b/test/CodeGen/R600/uaddo.ll > @@ -1,5 +1,5 @@ > ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s > -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s > +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s > > declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone > declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone > @@ -8,6 +8,9 @@ declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone > ; SI: ADD > ; SI: ADDC > ; SI: ADDC > + > +; EG: ADDC_UINT > +; EG: ADDC_UINT > define void @uaddo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind > %val = extractvalue { i64, i1 } %uadd, 0 > @@ -20,6 +23,9 @@ define void @uaddo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > > ; FUNC-LABEL: @s_uaddo_i32 > ; SI: S_ADD_I32 > + > +; EG: ADDC_UINT > +; EG: ADD_INT > define void @s_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind { > %uadd = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b) nounwind > %val = extractvalue { i32, i1 } %uadd, 0 > @@ -31,6 +37,9 @@ define void @s_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > > ; FUNC-LABEL: @v_uaddo_i32 > ; SI: V_ADD_I32 > + > +; EG: ADDC_UINT > +; EG: ADD_INT > define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind { > %a = load i32 addrspace(1)* %aptr, align 4 > %b = load i32 addrspace(1)* %bptr, align 4 > @@ -45,6 +54,9 @@ define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > ; FUNC-LABEL: @s_uaddo_i64 > ; SI: S_ADD_U32 > ; SI: S_ADDC_U32 > + > +; EG: ADDC_UINT > +; EG: ADD_INT > define void @s_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind { > %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind > %val = extractvalue { i64, i1 } %uadd, 0 > @@ -57,6 +69,9 @@ define void @s_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 > ; FUNC-LABEL: @v_uaddo_i64 > ; SI: V_ADD_I32 > ; SI: V_ADDC_U32 > + > +; EG: ADDC_UINT > +; EG: ADD_INT > define void @v_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind { > %a = load i64 addrspace(1)* %aptr, align 4 > %b = load i64 addrspace(1)* %bptr, align 4 > diff --git a/test/CodeGen/R600/usubo.ll b/test/CodeGen/R600/usubo.ll > index c293ad7..d7718e2 100644 > --- a/test/CodeGen/R600/usubo.ll > +++ b/test/CodeGen/R600/usubo.ll > @@ -1,10 +1,13 @@ > ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s > -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s > +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s > > declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, i32) nounwind readnone > declare { i64, i1 } @llvm.usub.with.overflow.i64(i64, i64) nounwind readnone > > ; FUNC-LABEL: @usubo_i64_zext > + > +; EG: SUBB_UINT > +; EG: ADDC_UINT > define void @usubo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind > %val = extractvalue { i64, i1 } %usub, 0 > @@ -17,6 +20,10 @@ define void @usubo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > > ; FUNC-LABEL: @s_usubo_i32 > ; SI: S_SUB_I32 > + > +; EG-DAG: SUBB_UINT > +; EG-DAG: SUB_INT > +; EG: SUB_INT > define void @s_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind { > %usub = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b) nounwind > %val = extractvalue { i32, i1 } %usub, 0 > @@ -28,6 +35,10 @@ define void @s_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > > ; FUNC-LABEL: @v_usubo_i32 > ; SI: V_SUBREV_I32_e32 > + > +; EG-DAG: SUBB_UINT > +; EG-DAG: SUB_INT > +; EG: SUB_INT > define void @v_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind { > %a = load i32 addrspace(1)* %aptr, align 4 > %b = load i32 addrspace(1)* %bptr, align 4 > @@ -42,6 +53,11 @@ define void @v_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > ; FUNC-LABEL: @s_usubo_i64 > ; SI: S_SUB_U32 > ; SI: S_SUBB_U32 > + > +; EG-DAG: SUBB_UINT > +; EG-DAG: SUB_INT > +; EG-DAG: SUB_INT > +; EG: SUB_INT > define void @s_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind { > %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind > %val = extractvalue { i64, i1 } %usub, 0 > @@ -54,6 +70,11 @@ define void @s_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 > ; FUNC-LABEL: @v_usubo_i64 > ; SI: V_SUB_I32 > ; SI: V_SUBB_U32 > + > +; EG-DAG: SUBB_UINT > +; EG-DAG: SUB_INT > +; EG-DAG: SUB_INT > +; EG: SUB_INT > define void @v_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind { > %a = load i64 addrspace(1)* %aptr, align 4 > %b = load i64 addrspace(1)* %bptr, align 4-- Jan Vesely <jan.vesely at rutgers.edu> -- Jan Vesely <jan.vesely at rutgers.edu> -------------- next part -------------- A non-text attachment was scrubbed... Name: dumps.tgz Type: application/x-compressed-tar Size: 1166570 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141003/d4c2ded0/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141003/d4c2ded0/attachment.sig>
Tom Stellard
2014-Oct-03 17:06 UTC
[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)
On Fri, Oct 03, 2014 at 12:32:03PM -0400, Jan Vesely wrote:> Hi Tom, Matt, > > I'm running into strange issues with the cos test (piglit > generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c) > > I have been seeing random failures (incorrect results) for some time and > tried to investigate. the weird part is that the failures are not 100% > reproducible, sometimes the tests pass, or partly pass > (it's usually float8 and float16 subtests that fail). > Failure is always the same > "Expecting -0.925879 (0xbf6d0668) with tolerance 0.000000 (2 ulps), but got nan (0x7fc00000)" > although the position may vary. even if the same value was computed earlier in the results array > > The first patch of this series does not change the behavior (or instruction dump). > however, using the ADDC instruction results in hang on every cos test > "ring 0 stalled for more than 10000msec" > "GPU lockup (waiting for 0x00000000001023cf last fence id 0x00000000001023ce on ring 0)" > > although the actual test results follow the same result as before (random failures mostly in float8/16 tests). > I can even get test pass with hang on every subtest > > Using SIGN_EXTEND_INREG instead of "SUB 0" in this patch gets rid of the hangs, > and makes the failures fully reproducible in every subtest, triggered on the first > occurrence of what should have been -0.925879. > > the GPU is AMD TURKS (HD 7570 1002:675d) > > I tried digging throught he manual but it oly mentions that ADDC is vec and trans inst. > Is there any errata document the might give a hint?It's possible the bug is somewhere else and adding the addc instruction changed the program enough to uncover it. Try modify the packetizer to only allow one instruction per group. I think modifying R600Packetizer::isLegalToPacketizeTogether() to always return false will do this. If you still get lockups even with this, the we can rule out some kind of packetizer bug. -Tom> > thanks, > jan > > PS: There are no problems with sin, so I might be able to triage at least the code that hangs with this patch > > > On Wed, 2014-09-24 at 20:27 -0400, Jan Vesely wrote: > > v2: tighten the sub64 tests > > v3: rename to CARRY/BORROW > > > > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu> > > > > --- > > lib/Target/R600/AMDGPUISelLowering.h | 2 + > > lib/Target/R600/AMDGPUInstrInfo.td | 6 ++ > > lib/Target/R600/AMDGPUSubtarget.h | 8 ++ > > lib/Target/R600/EvergreenInstructions.td | 3 + > > lib/Target/R600/R600ISelLowering.cpp | 39 +++++++- > > test/CodeGen/R600/add.ll | 154 +++++++++++++++++-------------- > > test/CodeGen/R600/sub.ll | 18 ++-- > > test/CodeGen/R600/uaddo.ll | 17 +++- > > test/CodeGen/R600/usubo.ll | 23 ++++- > > 9 files changed, 189 insertions(+), 81 deletions(-) > > > > diff --git a/lib/Target/R600/AMDGPUISelLowering.h b/lib/Target/R600/AMDGPUISelLowering.h > > index 911576b..6eaf001 100644 > > --- a/lib/Target/R600/AMDGPUISelLowering.h > > +++ b/lib/Target/R600/AMDGPUISelLowering.h > > @@ -205,6 +205,8 @@ enum { > > RSQ_CLAMPED, > > LDEXP, > > DOT4, > > + CARRY, > > + BORROW, > > BFE_U32, // Extract range of bits with zero extension to 32-bits. > > BFE_I32, // Extract range of bits with sign extension to 32-bits. > > BFI, // (src0 & src1) | (~src0 & src2) > > diff --git a/lib/Target/R600/AMDGPUInstrInfo.td b/lib/Target/R600/AMDGPUInstrInfo.td > > index 3d70791..1600c4a 100644 > > --- a/lib/Target/R600/AMDGPUInstrInfo.td > > +++ b/lib/Target/R600/AMDGPUInstrInfo.td > > @@ -91,6 +91,12 @@ def AMDGPUumin : SDNode<"AMDGPUISD::UMIN", SDTIntBinOp, > > [SDNPCommutative, SDNPAssociative] > > >; > > > > +// out = (src0 + src1 > 0xFFFFFFFF) ? 1 : 0 > > +def AMDGPUcarry : SDNode<"AMDGPUISD::CARRY", SDTIntBinOp, []>; > > + > > +// out = (src1 > src0) ? 1 : 0 > > +def AMDGPUborrow : SDNode<"AMDGPUISD::BORROW", SDTIntBinOp, []>; > > + > > > > def AMDGPUcvt_f32_ubyte0 : SDNode<"AMDGPUISD::CVT_F32_UBYTE0", > > SDTIntToFPOp, []>; > > diff --git a/lib/Target/R600/AMDGPUSubtarget.h b/lib/Target/R600/AMDGPUSubtarget.h > > index 6797972..9f2ba61 100644 > > --- a/lib/Target/R600/AMDGPUSubtarget.h > > +++ b/lib/Target/R600/AMDGPUSubtarget.h > > @@ -168,6 +168,14 @@ public: > > return (getGeneration() >= EVERGREEN); > > } > > > > + bool hasCARRY() const { > > + return (getGeneration() >= EVERGREEN); > > + } > > + > > + bool hasBORROW() const { > > + return (getGeneration() >= EVERGREEN); > > + } > > + > > bool IsIRStructurizerEnabled() const { > > return EnableIRStructurizer; > > } > > diff --git a/lib/Target/R600/EvergreenInstructions.td b/lib/Target/R600/EvergreenInstructions.td > > index 8117b60..d3822ef 100644 > > --- a/lib/Target/R600/EvergreenInstructions.td > > +++ b/lib/Target/R600/EvergreenInstructions.td > > @@ -336,6 +336,9 @@ defm CUBE_eg : CUBE_Common<0xC0>; > > > > def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>; > > > > +def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>; > > +def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>; > > + > > def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", ctlz_zero_undef, VecALU>; > > def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", cttz_zero_undef, VecALU>; > > > > diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp > > index 9b2b689..a28b76a 100644 > > --- a/lib/Target/R600/R600ISelLowering.cpp > > +++ b/lib/Target/R600/R600ISelLowering.cpp > > @@ -89,6 +89,15 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) : > > setOperationAction(ISD::SELECT, MVT::v2i32, Expand); > > setOperationAction(ISD::SELECT, MVT::v4i32, Expand); > > > > + // ADD, SUB overflow. These need to be Custom because > > + // SelectionDAGLegalize::LegalizeOp (LegalizeDAG.cpp) > > + // turns Legal into expand > > + if (Subtarget->hasCARRY()) > > + setOperationAction(ISD::UADDO, MVT::i32, Custom); > > + > > + if (Subtarget->hasBORROW()) > > + setOperationAction(ISD::USUBO, MVT::i32, Custom); > > + > > // Expand sign extension of vectors > > if (!Subtarget->hasBFE()) > > setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand); > > @@ -154,8 +163,6 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) : > > setTargetDAGCombine(ISD::SELECT_CC); > > setTargetDAGCombine(ISD::INSERT_VECTOR_ELT); > > > > - setOperationAction(ISD::SUB, MVT::i64, Expand); > > - > > // These should be replaced by UDVIREM, but it does not happen automatically > > // during Type Legalization > > setOperationAction(ISD::UDIV, MVT::i64, Custom); > > @@ -578,6 +585,34 @@ SDValue R600TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const > > case ISD::SHL_PARTS: return LowerSHLParts(Op, DAG); > > case ISD::SRA_PARTS: > > case ISD::SRL_PARTS: return LowerSRXParts(Op, DAG); > > + case ISD::UADDO: { > > + SDLoc DL(Op); > > + EVT VT = Op.getValueType(); > > + > > + SDValue Lo = Op.getOperand(0); > > + SDValue Hi = Op.getOperand(1); > > + > > + SDValue OVF = DAG.getNode(AMDGPUISD::CARRY, DL, VT, Lo, Hi); > > + //negate sign > > + OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF); > > + SDValue Res = DAG.getNode(ISD::ADD, DL, VT, Lo, Hi); > > + > > + return DAG.getNode(ISD::MERGE_VALUES, DL, DAG.getVTList(VT, VT), Res, OVF); > > + } > > + case ISD::USUBO: { > > + SDLoc DL(Op); > > + EVT VT = Op.getValueType(); > > + > > + SDValue Arg0 = Op.getOperand(0); > > + SDValue Arg1 = Op.getOperand(1); > > + > > + SDValue OVF = DAG.getNode(AMDGPUISD::BORROW, DL, VT, Arg0, Arg1); > > + //negate sign > > + OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF); > > + SDValue Res = DAG.getNode(ISD::SUB, DL, VT, Arg0, Arg1); > > + > > + return DAG.getNode(ISD::MERGE_VALUES, DL, DAG.getVTList(VT, VT), Res, OVF); > > + } > > case ISD::FCOS: > > case ISD::FSIN: return LowerTrig(Op, DAG); > > case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG); > > diff --git a/test/CodeGen/R600/add.ll b/test/CodeGen/R600/add.ll > > index 8cf43d1..fddb951 100644 > > --- a/test/CodeGen/R600/add.ll > > +++ b/test/CodeGen/R600/add.ll > > @@ -1,12 +1,12 @@ > > -; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK --check-prefix=FUNC %s > > -; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI-CHECK --check-prefix=FUNC %s > > +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG --check-prefix=FUNC %s > > +; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI --check-prefix=FUNC %s > > > > ;FUNC-LABEL: @test1: > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > > -;SI-CHECK: V_ADD_I32_e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}} > > -;SI-CHECK-NOT: [[REG]] > > -;SI-CHECK: BUFFER_STORE_DWORD [[REG]], > > +;SI: V_ADD_I32_e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}} > > +;SI-NOT: [[REG]] > > +;SI: BUFFER_STORE_DWORD [[REG]], > > define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) { > > %b_ptr = getelementptr i32 addrspace(1)* %in, i32 1 > > %a = load i32 addrspace(1)* %in > > @@ -17,11 +17,11 @@ define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) { > > } > > > > ;FUNC-LABEL: @test2: > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > > define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) { > > %b_ptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1 > > @@ -33,15 +33,15 @@ define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) { > > } > > > > ;FUNC-LABEL: @test4: > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > > define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { > > %b_ptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1 > > @@ -53,22 +53,22 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { > > } > > > > ; FUNC-LABEL: @test8 > > -; EG-CHECK: ADD_INT > > -; EG-CHECK: ADD_INT > > -; EG-CHECK: ADD_INT > > -; EG-CHECK: ADD_INT > > -; EG-CHECK: ADD_INT > > -; EG-CHECK: ADD_INT > > -; EG-CHECK: ADD_INT > > -; EG-CHECK: ADD_INT > > -; SI-CHECK: S_ADD_I32 > > -; SI-CHECK: S_ADD_I32 > > -; SI-CHECK: S_ADD_I32 > > -; SI-CHECK: S_ADD_I32 > > -; SI-CHECK: S_ADD_I32 > > -; SI-CHECK: S_ADD_I32 > > -; SI-CHECK: S_ADD_I32 > > -; SI-CHECK: S_ADD_I32 > > +; EG: ADD_INT > > +; EG: ADD_INT > > +; EG: ADD_INT > > +; EG: ADD_INT > > +; EG: ADD_INT > > +; EG: ADD_INT > > +; EG: ADD_INT > > +; EG: ADD_INT > > +; SI: S_ADD_I32 > > +; SI: S_ADD_I32 > > +; SI: S_ADD_I32 > > +; SI: S_ADD_I32 > > +; SI: S_ADD_I32 > > +; SI: S_ADD_I32 > > +; SI: S_ADD_I32 > > +; SI: S_ADD_I32 > > define void @test8(<8 x i32> addrspace(1)* %out, <8 x i32> %a, <8 x i32> %b) { > > entry: > > %0 = add <8 x i32> %a, %b > > @@ -77,38 +77,38 @@ entry: > > } > > > > ; FUNC-LABEL: @testdefine void @test16(<16 x i32> addrspace(1)* %out, <16 x i32> %a, <16 x i32> %b) { > > entry: > > %0 = add <16 x i32> %a, %b > > @@ -117,8 +117,12 @@ entry: > > } > > > > ; FUNC-LABEL: @add64 > > -; SI-CHECK: S_ADD_U32 > > -; SI-CHECK: S_ADDC_U32 > > +; SI: S_ADD_U32 > > +; SI: S_ADDC_U32 > > + > > +; EG-DAG: ADD_INT > > +; EG-DAG: ADDC_UINT > > +; EG-DAG: ADD_INT > > define void @add64(i64 addrspace(1)* %out, i64 %a, i64 %b) { > > entry: > > %0 = add i64 %a, %b > > @@ -132,7 +136,11 @@ entry: > > ; to a VGPR before doing the add. > > > > ; FUNC-LABEL: @add64_sgpr_vgpr > > -; SI-CHECK-NOT: V_ADDC_U32_e32 s > > +; SI-NOT: V_ADDC_U32_e32 s > > + > > +; EG-DAG: ADD_INT > > +; EG-DAG: ADDC_UINT > > +; EG-DAG: ADD_INT > > define void @add64_sgpr_vgpr(i64 addrspace(1)* %out, i64 %a, i64 addrspace(1)* %in) { > > entry: > > %0 = load i64 addrspace(1)* %in > > @@ -143,8 +151,12 @@ entry: > > > > ; Test i64 add inside a branch. > > ; FUNC-LABEL: @add64_in_branch > > -; SI-CHECK: S_ADD_U32 > > -; SI-CHECK: S_ADDC_U32 > > +; SI: S_ADD_U32 > > +; SI: S_ADDC_U32 > > + > > +; EG-DAG: ADD_INT > > +; EG-DAG: ADDC_UINT > > +; EG-DAG: ADD_INT > > define void @add64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) { > > entry: > > %0 = icmp eq i64 %a, 0 > > diff --git a/test/CodeGen/R600/sub.ll b/test/CodeGen/R600/sub.ll > > index 8678e2b..1225ebd 100644 > > --- a/test/CodeGen/R600/sub.ll > > +++ b/test/CodeGen/R600/sub.ll > > @@ -43,10 +43,13 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { > > ; SI: S_SUB_U32 > > ; SI: S_SUBB_U32 > > > > +; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.[XYZW]]] > > +; EG: MEM_RAT_CACHELESS STORE_RAW [[HI:T[0-9]+\.[XYZW]]] > > +; EG-DAG: SUB_INT {{[* ]*}}[[LO]] > > +; EG-DAG: SUBB_UINT > > ; EG-DAG: SUB_INT > > -; EG-DAG: SETGT_UINT > > -; EG-DAG: SUB_INT > > -; EG-DAG: ADD_INT > > +; EG-DAG: SUB_INT {{[* ]*}}[[HI]] > > +; EG-NOT: SUB > > define void @s_sub_i64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind { > > %result = sub i64 %a, %b > > store i64 %result, i64 addrspace(1)* %out, align 8 > > @@ -57,10 +60,13 @@ define void @s_sub_i64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind > > ; SI: V_SUB_I32_e32 > > ; SI: V_SUBB_U32_e32 > > > > +; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.[XYZW]]] > > +; EG: MEM_RAT_CACHELESS STORE_RAW [[HI:T[0-9]+\.[XYZW]]] > > +; EG-DAG: SUB_INT {{[* ]*}}[[LO]] > > +; EG-DAG: SUBB_UINT > > ; EG-DAG: SUB_INT > > -; EG-DAG: SETGT_UINT > > -; EG-DAG: SUB_INT > > -; EG-DAG: ADD_INT > > +; EG-DAG: SUB_INT {{[* ]*}}[[HI]] > > +; EG-NOT: SUB > > define void @v_sub_i64(i64 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %inA, i64 addrspace(1)* noalias %inB) nounwind { > > %tid = call i32 @llvm.r600.read.tidig.x() readnone > > %a_ptr = getelementptr i64 addrspace(1)* %inA, i32 %tid > > diff --git a/test/CodeGen/R600/uaddo.ll b/test/CodeGen/R600/uaddo.ll > > index 0b854b5..ce30bbc 100644 > > --- a/test/CodeGen/R600/uaddo.ll > > +++ b/test/CodeGen/R600/uaddo.ll > > @@ -1,5 +1,5 @@ > > ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s > > -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s > > +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s > > > > declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone > > declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone > > @@ -8,6 +8,9 @@ declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone > > ; SI: ADD > > ; SI: ADDC > > ; SI: ADDC > > + > > +; EG: ADDC_UINT > > +; EG: ADDC_UINT > > define void @uaddo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > > %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind > > %val = extractvalue { i64, i1 } %uadd, 0 > > @@ -20,6 +23,9 @@ define void @uaddo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > > > > ; FUNC-LABEL: @s_uaddo_i32 > > ; SI: S_ADD_I32 > > + > > +; EG: ADDC_UINT > > +; EG: ADD_INT > > define void @s_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind { > > %uadd = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b) nounwind > > %val = extractvalue { i32, i1 } %uadd, 0 > > @@ -31,6 +37,9 @@ define void @s_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > > > > ; FUNC-LABEL: @v_uaddo_i32 > > ; SI: V_ADD_I32 > > + > > +; EG: ADDC_UINT > > +; EG: ADD_INT > > define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind { > > %a = load i32 addrspace(1)* %aptr, align 4 > > %b = load i32 addrspace(1)* %bptr, align 4 > > @@ -45,6 +54,9 @@ define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > > ; FUNC-LABEL: @s_uaddo_i64 > > ; SI: S_ADD_U32 > > ; SI: S_ADDC_U32 > > + > > +; EG: ADDC_UINT > > +; EG: ADD_INT > > define void @s_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind { > > %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind > > %val = extractvalue { i64, i1 } %uadd, 0 > > @@ -57,6 +69,9 @@ define void @s_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 > > ; FUNC-LABEL: @v_uaddo_i64 > > ; SI: V_ADD_I32 > > ; SI: V_ADDC_U32 > > + > > +; EG: ADDC_UINT > > +; EG: ADD_INT > > define void @v_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind { > > %a = load i64 addrspace(1)* %aptr, align 4 > > %b = load i64 addrspace(1)* %bptr, align 4 > > diff --git a/test/CodeGen/R600/usubo.ll b/test/CodeGen/R600/usubo.ll > > index c293ad7..d7718e2 100644 > > --- a/test/CodeGen/R600/usubo.ll > > +++ b/test/CodeGen/R600/usubo.ll > > @@ -1,10 +1,13 @@ > > ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s > > -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s > > +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s > > > > declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, i32) nounwind readnone > > declare { i64, i1 } @llvm.usub.with.overflow.i64(i64, i64) nounwind readnone > > > > ; FUNC-LABEL: @usubo_i64_zext > > + > > +; EG: SUBB_UINT > > +; EG: ADDC_UINT > > define void @usubo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > > %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind > > %val = extractvalue { i64, i1 } %usub, 0 > > @@ -17,6 +20,10 @@ define void @usubo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > > > > ; FUNC-LABEL: @s_usubo_i32 > > ; SI: S_SUB_I32 > > + > > +; EG-DAG: SUBB_UINT > > +; EG-DAG: SUB_INT > > +; EG: SUB_INT > > define void @s_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind { > > %usub = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b) nounwind > > %val = extractvalue { i32, i1 } %usub, 0 > > @@ -28,6 +35,10 @@ define void @s_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > > > > ; FUNC-LABEL: @v_usubo_i32 > > ; SI: V_SUBREV_I32_e32 > > + > > +; EG-DAG: SUBB_UINT > > +; EG-DAG: SUB_INT > > +; EG: SUB_INT > > define void @v_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind { > > %a = load i32 addrspace(1)* %aptr, align 4 > > %b = load i32 addrspace(1)* %bptr, align 4 > > @@ -42,6 +53,11 @@ define void @v_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > > ; FUNC-LABEL: @s_usubo_i64 > > ; SI: S_SUB_U32 > > ; SI: S_SUBB_U32 > > + > > +; EG-DAG: SUBB_UINT > > +; EG-DAG: SUB_INT > > +; EG-DAG: SUB_INT > > +; EG: SUB_INT > > define void @s_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind { > > %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind > > %val = extractvalue { i64, i1 } %usub, 0 > > @@ -54,6 +70,11 @@ define void @s_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 > > ; FUNC-LABEL: @v_usubo_i64 > > ; SI: V_SUB_I32 > > ; SI: V_SUBB_U32 > > + > > +; EG-DAG: SUBB_UINT > > +; EG-DAG: SUB_INT > > +; EG-DAG: SUB_INT > > +; EG: SUB_INT > > define void @v_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind { > > %a = load i64 addrspace(1)* %aptr, align 4 > > %b = load i64 addrspace(1)* %bptr, align 4 > > -- > Jan Vesely <jan.vesely at rutgers.edu> > > -- > Jan Vesely <jan.vesely at rutgers.edu>
Jan Vesely
2014-Oct-10 00:01 UTC
[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)
On Fri, 2014-10-03 at 10:06 -0700, Tom Stellard wrote:> On Fri, Oct 03, 2014 at 12:32:03PM -0400, Jan Vesely wrote: > > Hi Tom, Matt, > > > > I'm running into strange issues with the cos test (piglit > > generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c) > > > > I have been seeing random failures (incorrect results) for some time and > > tried to investigate. the weird part is that the failures are not 100% > > reproducible, sometimes the tests pass, or partly pass > > (it's usually float8 and float16 subtests that fail). > > Failure is always the same > > "Expecting -0.925879 (0xbf6d0668) with tolerance 0.000000 (2 ulps), but got nan (0x7fc00000)" > > although the position may vary. even if the same value was computed earlier in the results array > > > > The first patch of this series does not change the behavior (or instruction dump). > > however, using the ADDC instruction results in hang on every cos test > > "ring 0 stalled for more than 10000msec" > > "GPU lockup (waiting for 0x00000000001023cf last fence id 0x00000000001023ce on ring 0)" > > > > although the actual test results follow the same result as before (random failures mostly in float8/16 tests). > > I can even get test pass with hang on every subtest > > > > Using SIGN_EXTEND_INREG instead of "SUB 0" in this patch gets rid of the hangs, > > and makes the failures fully reproducible in every subtest, triggered on the first > > occurrence of what should have been -0.925879. > > > > the GPU is AMD TURKS (HD 7570 1002:675d) > > > > I tried digging throught he manual but it oly mentions that ADDC is vec and trans inst. > > Is there any errata document the might give a hint? > > It's possible the bug is somewhere else and adding the addc > instruction changed the program enough to uncover it. Try modify the > packetizer to only allow one instruction per group. I think modifying > R600Packetizer::isLegalToPacketizeTogether() to always return false will > do this.Thanks, this helped a lot. returning false fixes both incorrect results and hangs. playing around with the packetizer I found that both hang and incorrect results are caused by FMA incorrectly moved to Trans slot. patch sent. thanks, jan> > If you still get lockups even with this, the we can rule out some kind of packetizer bug. > > -Tom > > > > > thanks, > > jan > > > > PS: There are no problems with sin, so I might be able to triage at least the code that hangs with this patch > > > > > > On Wed, 2014-09-24 at 20:27 -0400, Jan Vesely wrote: > > > v2: tighten the sub64 tests > > > v3: rename to CARRY/BORROW > > > > > > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu> > > > > > > --- > > > lib/Target/R600/AMDGPUISelLowering.h | 2 + > > > lib/Target/R600/AMDGPUInstrInfo.td | 6 ++ > > > lib/Target/R600/AMDGPUSubtarget.h | 8 ++ > > > lib/Target/R600/EvergreenInstructions.td | 3 + > > > lib/Target/R600/R600ISelLowering.cpp | 39 +++++++- > > > test/CodeGen/R600/add.ll | 154 +++++++++++++++++-------------- > > > test/CodeGen/R600/sub.ll | 18 ++-- > > > test/CodeGen/R600/uaddo.ll | 17 +++- > > > test/CodeGen/R600/usubo.ll | 23 ++++- > > > 9 files changed, 189 insertions(+), 81 deletions(-) > > > > > > diff --git a/lib/Target/R600/AMDGPUISelLowering.h b/lib/Target/R600/AMDGPUISelLowering.h > > > index 911576b..6eaf001 100644 > > > --- a/lib/Target/R600/AMDGPUISelLowering.h > > > +++ b/lib/Target/R600/AMDGPUISelLowering.h > > > @@ -205,6 +205,8 @@ enum { > > > RSQ_CLAMPED, > > > LDEXP, > > > DOT4, > > > + CARRY, > > > + BORROW, > > > BFE_U32, // Extract range of bits with zero extension to 32-bits. > > > BFE_I32, // Extract range of bits with sign extension to 32-bits. > > > BFI, // (src0 & src1) | (~src0 & src2) > > > diff --git a/lib/Target/R600/AMDGPUInstrInfo.td b/lib/Target/R600/AMDGPUInstrInfo.td > > > index 3d70791..1600c4a 100644 > > > --- a/lib/Target/R600/AMDGPUInstrInfo.td > > > +++ b/lib/Target/R600/AMDGPUInstrInfo.td > > > @@ -91,6 +91,12 @@ def AMDGPUumin : SDNode<"AMDGPUISD::UMIN", SDTIntBinOp, > > > [SDNPCommutative, SDNPAssociative] > > > >; > > > > > > +// out = (src0 + src1 > 0xFFFFFFFF) ? 1 : 0 > > > +def AMDGPUcarry : SDNode<"AMDGPUISD::CARRY", SDTIntBinOp, []>; > > > + > > > +// out = (src1 > src0) ? 1 : 0 > > > +def AMDGPUborrow : SDNode<"AMDGPUISD::BORROW", SDTIntBinOp, []>; > > > + > > > > > > def AMDGPUcvt_f32_ubyte0 : SDNode<"AMDGPUISD::CVT_F32_UBYTE0", > > > SDTIntToFPOp, []>; > > > diff --git a/lib/Target/R600/AMDGPUSubtarget.h b/lib/Target/R600/AMDGPUSubtarget.h > > > index 6797972..9f2ba61 100644 > > > --- a/lib/Target/R600/AMDGPUSubtarget.h > > > +++ b/lib/Target/R600/AMDGPUSubtarget.h > > > @@ -168,6 +168,14 @@ public: > > > return (getGeneration() >= EVERGREEN); > > > } > > > > > > + bool hasCARRY() const { > > > + return (getGeneration() >= EVERGREEN); > > > + } > > > + > > > + bool hasBORROW() const { > > > + return (getGeneration() >= EVERGREEN); > > > + } > > > + > > > bool IsIRStructurizerEnabled() const { > > > return EnableIRStructurizer; > > > } > > > diff --git a/lib/Target/R600/EvergreenInstructions.td b/lib/Target/R600/EvergreenInstructions.td > > > index 8117b60..d3822ef 100644 > > > --- a/lib/Target/R600/EvergreenInstructions.td > > > +++ b/lib/Target/R600/EvergreenInstructions.td > > > @@ -336,6 +336,9 @@ defm CUBE_eg : CUBE_Common<0xC0>; > > > > > > def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>; > > > > > > +def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>; > > > +def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>; > > > + > > > def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", ctlz_zero_undef, VecALU>; > > > def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", cttz_zero_undef, VecALU>; > > > > > > diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp > > > index 9b2b689..a28b76a 100644 > > > --- a/lib/Target/R600/R600ISelLowering.cpp > > > +++ b/lib/Target/R600/R600ISelLowering.cpp > > > @@ -89,6 +89,15 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) : > > > setOperationAction(ISD::SELECT, MVT::v2i32, Expand); > > > setOperationAction(ISD::SELECT, MVT::v4i32, Expand); > > > > > > + // ADD, SUB overflow. These need to be Custom because > > > + // SelectionDAGLegalize::LegalizeOp (LegalizeDAG.cpp) > > > + // turns Legal into expand > > > + if (Subtarget->hasCARRY()) > > > + setOperationAction(ISD::UADDO, MVT::i32, Custom); > > > + > > > + if (Subtarget->hasBORROW()) > > > + setOperationAction(ISD::USUBO, MVT::i32, Custom); > > > + > > > // Expand sign extension of vectors > > > if (!Subtarget->hasBFE()) > > > setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand); > > > @@ -154,8 +163,6 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) : > > > setTargetDAGCombine(ISD::SELECT_CC); > > > setTargetDAGCombine(ISD::INSERT_VECTOR_ELT); > > > > > > - setOperationAction(ISD::SUB, MVT::i64, Expand); > > > - > > > // These should be replaced by UDVIREM, but it does not happen automatically > > > // during Type Legalization > > > setOperationAction(ISD::UDIV, MVT::i64, Custom); > > > @@ -578,6 +585,34 @@ SDValue R600TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const > > > case ISD::SHL_PARTS: return LowerSHLParts(Op, DAG); > > > case ISD::SRA_PARTS: > > > case ISD::SRL_PARTS: return LowerSRXParts(Op, DAG); > > > + case ISD::UADDO: { > > > + SDLoc DL(Op); > > > + EVT VT = Op.getValueType(); > > > + > > > + SDValue Lo = Op.getOperand(0); > > > + SDValue Hi = Op.getOperand(1); > > > + > > > + SDValue OVF = DAG.getNode(AMDGPUISD::CARRY, DL, VT, Lo, Hi); > > > + //negate sign > > > + OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF); > > > + SDValue Res = DAG.getNode(ISD::ADD, DL, VT, Lo, Hi); > > > + > > > + return DAG.getNode(ISD::MERGE_VALUES, DL, DAG.getVTList(VT, VT), Res, OVF); > > > + } > > > + case ISD::USUBO: { > > > + SDLoc DL(Op); > > > + EVT VT = Op.getValueType(); > > > + > > > + SDValue Arg0 = Op.getOperand(0); > > > + SDValue Arg1 = Op.getOperand(1); > > > + > > > + SDValue OVF = DAG.getNode(AMDGPUISD::BORROW, DL, VT, Arg0, Arg1); > > > + //negate sign > > > + OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF); > > > + SDValue Res = DAG.getNode(ISD::SUB, DL, VT, Arg0, Arg1); > > > + > > > + return DAG.getNode(ISD::MERGE_VALUES, DL, DAG.getVTList(VT, VT), Res, OVF); > > > + } > > > case ISD::FCOS: > > > case ISD::FSIN: return LowerTrig(Op, DAG); > > > case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG); > > > diff --git a/test/CodeGen/R600/add.ll b/test/CodeGen/R600/add.ll > > > index 8cf43d1..fddb951 100644 > > > --- a/test/CodeGen/R600/add.ll > > > +++ b/test/CodeGen/R600/add.ll > > > @@ -1,12 +1,12 @@ > > > -; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK --check-prefix=FUNC %s > > > -; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI-CHECK --check-prefix=FUNC %s > > > +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG --check-prefix=FUNC %s > > > +; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI --check-prefix=FUNC %s > > > > > > ;FUNC-LABEL: @test1: > > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > > > > -;SI-CHECK: V_ADD_I32_e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}} > > > -;SI-CHECK-NOT: [[REG]] > > > -;SI-CHECK: BUFFER_STORE_DWORD [[REG]], > > > +;SI: V_ADD_I32_e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}} > > > +;SI-NOT: [[REG]] > > > +;SI: BUFFER_STORE_DWORD [[REG]], > > > define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) { > > > %b_ptr = getelementptr i32 addrspace(1)* %in, i32 1 > > > %a = load i32 addrspace(1)* %in > > > @@ -17,11 +17,11 @@ define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) { > > > } > > > > > > ;FUNC-LABEL: @test2: > > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > > > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > > > > define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) { > > > %b_ptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1 > > > @@ -33,15 +33,15 @@ define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) { > > > } > > > > > > ;FUNC-LABEL: @test4: > > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > > > > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > -;SI-CHECK: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > +;SI: V_ADD_I32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} > > > > > > define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { > > > %b_ptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1 > > > @@ -53,22 +53,22 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { > > > } > > > > > > ; FUNC-LABEL: @test8 > > > -; EG-CHECK: ADD_INT > > > -; EG-CHECK: ADD_INT > > > -; EG-CHECK: ADD_INT > > > -; EG-CHECK: ADD_INT > > > -; EG-CHECK: ADD_INT > > > -; EG-CHECK: ADD_INT > > > -; EG-CHECK: ADD_INT > > > -; EG-CHECK: ADD_INT > > > -; SI-CHECK: S_ADD_I32 > > > -; SI-CHECK: S_ADD_I32 > > > -; SI-CHECK: S_ADD_I32 > > > -; SI-CHECK: S_ADD_I32 > > > -; SI-CHECK: S_ADD_I32 > > > -; SI-CHECK: S_ADD_I32 > > > -; SI-CHECK: S_ADD_I32 > > > -; SI-CHECK: S_ADD_I32 > > > +; EG: ADD_INT > > > +; EG: ADD_INT > > > +; EG: ADD_INT > > > +; EG: ADD_INT > > > +; EG: ADD_INT > > > +; EG: ADD_INT > > > +; EG: ADD_INT > > > +; EG: ADD_INT > > > +; SI: S_ADD_I32 > > > +; SI: S_ADD_I32 > > > +; SI: S_ADD_I32 > > > +; SI: S_ADD_I32 > > > +; SI: S_ADD_I32 > > > +; SI: S_ADD_I32 > > > +; SI: S_ADD_I32 > > > +; SI: S_ADD_I32 > > > define void @test8(<8 x i32> addrspace(1)* %out, <8 x i32> %a, <8 x i32> %b) { > > > entry: > > > %0 = add <8 x i32> %a, %b > > > @@ -77,38 +77,38 @@ entry: > > > } > > > > > > ; FUNC-LABEL: @testdefine void @test16(<16 x i32> addrspace(1)* %out, <16 x i32> %a, <16 x i32> %b) { > > > entry: > > > %0 = add <16 x i32> %a, %b > > > @@ -117,8 +117,12 @@ entry: > > > } > > > > > > ; FUNC-LABEL: @add64 > > > -; SI-CHECK: S_ADD_U32 > > > -; SI-CHECK: S_ADDC_U32 > > > +; SI: S_ADD_U32 > > > +; SI: S_ADDC_U32 > > > + > > > +; EG-DAG: ADD_INT > > > +; EG-DAG: ADDC_UINT > > > +; EG-DAG: ADD_INT > > > define void @add64(i64 addrspace(1)* %out, i64 %a, i64 %b) { > > > entry: > > > %0 = add i64 %a, %b > > > @@ -132,7 +136,11 @@ entry: > > > ; to a VGPR before doing the add. > > > > > > ; FUNC-LABEL: @add64_sgpr_vgpr > > > -; SI-CHECK-NOT: V_ADDC_U32_e32 s > > > +; SI-NOT: V_ADDC_U32_e32 s > > > + > > > +; EG-DAG: ADD_INT > > > +; EG-DAG: ADDC_UINT > > > +; EG-DAG: ADD_INT > > > define void @add64_sgpr_vgpr(i64 addrspace(1)* %out, i64 %a, i64 addrspace(1)* %in) { > > > entry: > > > %0 = load i64 addrspace(1)* %in > > > @@ -143,8 +151,12 @@ entry: > > > > > > ; Test i64 add inside a branch. > > > ; FUNC-LABEL: @add64_in_branch > > > -; SI-CHECK: S_ADD_U32 > > > -; SI-CHECK: S_ADDC_U32 > > > +; SI: S_ADD_U32 > > > +; SI: S_ADDC_U32 > > > + > > > +; EG-DAG: ADD_INT > > > +; EG-DAG: ADDC_UINT > > > +; EG-DAG: ADD_INT > > > define void @add64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) { > > > entry: > > > %0 = icmp eq i64 %a, 0 > > > diff --git a/test/CodeGen/R600/sub.ll b/test/CodeGen/R600/sub.ll > > > index 8678e2b..1225ebd 100644 > > > --- a/test/CodeGen/R600/sub.ll > > > +++ b/test/CodeGen/R600/sub.ll > > > @@ -43,10 +43,13 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { > > > ; SI: S_SUB_U32 > > > ; SI: S_SUBB_U32 > > > > > > +; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.[XYZW]]] > > > +; EG: MEM_RAT_CACHELESS STORE_RAW [[HI:T[0-9]+\.[XYZW]]] > > > +; EG-DAG: SUB_INT {{[* ]*}}[[LO]] > > > +; EG-DAG: SUBB_UINT > > > ; EG-DAG: SUB_INT > > > -; EG-DAG: SETGT_UINT > > > -; EG-DAG: SUB_INT > > > -; EG-DAG: ADD_INT > > > +; EG-DAG: SUB_INT {{[* ]*}}[[HI]] > > > +; EG-NOT: SUB > > > define void @s_sub_i64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind { > > > %result = sub i64 %a, %b > > > store i64 %result, i64 addrspace(1)* %out, align 8 > > > @@ -57,10 +60,13 @@ define void @s_sub_i64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind > > > ; SI: V_SUB_I32_e32 > > > ; SI: V_SUBB_U32_e32 > > > > > > +; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.[XYZW]]] > > > +; EG: MEM_RAT_CACHELESS STORE_RAW [[HI:T[0-9]+\.[XYZW]]] > > > +; EG-DAG: SUB_INT {{[* ]*}}[[LO]] > > > +; EG-DAG: SUBB_UINT > > > ; EG-DAG: SUB_INT > > > -; EG-DAG: SETGT_UINT > > > -; EG-DAG: SUB_INT > > > -; EG-DAG: ADD_INT > > > +; EG-DAG: SUB_INT {{[* ]*}}[[HI]] > > > +; EG-NOT: SUB > > > define void @v_sub_i64(i64 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %inA, i64 addrspace(1)* noalias %inB) nounwind { > > > %tid = call i32 @llvm.r600.read.tidig.x() readnone > > > %a_ptr = getelementptr i64 addrspace(1)* %inA, i32 %tid > > > diff --git a/test/CodeGen/R600/uaddo.ll b/test/CodeGen/R600/uaddo.ll > > > index 0b854b5..ce30bbc 100644 > > > --- a/test/CodeGen/R600/uaddo.ll > > > +++ b/test/CodeGen/R600/uaddo.ll > > > @@ -1,5 +1,5 @@ > > > ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s > > > -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s > > > +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s > > > > > > declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone > > > declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone > > > @@ -8,6 +8,9 @@ declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone > > > ; SI: ADD > > > ; SI: ADDC > > > ; SI: ADDC > > > + > > > +; EG: ADDC_UINT > > > +; EG: ADDC_UINT > > > define void @uaddo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > > > %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind > > > %val = extractvalue { i64, i1 } %uadd, 0 > > > @@ -20,6 +23,9 @@ define void @uaddo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > > > > > > ; FUNC-LABEL: @s_uaddo_i32 > > > ; SI: S_ADD_I32 > > > + > > > +; EG: ADDC_UINT > > > +; EG: ADD_INT > > > define void @s_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind { > > > %uadd = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b) nounwind > > > %val = extractvalue { i32, i1 } %uadd, 0 > > > @@ -31,6 +37,9 @@ define void @s_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > > > > > > ; FUNC-LABEL: @v_uaddo_i32 > > > ; SI: V_ADD_I32 > > > + > > > +; EG: ADDC_UINT > > > +; EG: ADD_INT > > > define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind { > > > %a = load i32 addrspace(1)* %aptr, align 4 > > > %b = load i32 addrspace(1)* %bptr, align 4 > > > @@ -45,6 +54,9 @@ define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > > > ; FUNC-LABEL: @s_uaddo_i64 > > > ; SI: S_ADD_U32 > > > ; SI: S_ADDC_U32 > > > + > > > +; EG: ADDC_UINT > > > +; EG: ADD_INT > > > define void @s_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind { > > > %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind > > > %val = extractvalue { i64, i1 } %uadd, 0 > > > @@ -57,6 +69,9 @@ define void @s_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 > > > ; FUNC-LABEL: @v_uaddo_i64 > > > ; SI: V_ADD_I32 > > > ; SI: V_ADDC_U32 > > > + > > > +; EG: ADDC_UINT > > > +; EG: ADD_INT > > > define void @v_uaddo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind { > > > %a = load i64 addrspace(1)* %aptr, align 4 > > > %b = load i64 addrspace(1)* %bptr, align 4 > > > diff --git a/test/CodeGen/R600/usubo.ll b/test/CodeGen/R600/usubo.ll > > > index c293ad7..d7718e2 100644 > > > --- a/test/CodeGen/R600/usubo.ll > > > +++ b/test/CodeGen/R600/usubo.ll > > > @@ -1,10 +1,13 @@ > > > ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s > > > -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s > > > +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s > > > > > > declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, i32) nounwind readnone > > > declare { i64, i1 } @llvm.usub.with.overflow.i64(i64, i64) nounwind readnone > > > > > > ; FUNC-LABEL: @usubo_i64_zext > > > + > > > +; EG: SUBB_UINT > > > +; EG: ADDC_UINT > > > define void @usubo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > > > %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind > > > %val = extractvalue { i64, i1 } %usub, 0 > > > @@ -17,6 +20,10 @@ define void @usubo_i64_zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { > > > > > > ; FUNC-LABEL: @s_usubo_i32 > > > ; SI: S_SUB_I32 > > > + > > > +; EG-DAG: SUBB_UINT > > > +; EG-DAG: SUB_INT > > > +; EG: SUB_INT > > > define void @s_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind { > > > %usub = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b) nounwind > > > %val = extractvalue { i32, i1 } %usub, 0 > > > @@ -28,6 +35,10 @@ define void @s_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > > > > > > ; FUNC-LABEL: @v_usubo_i32 > > > ; SI: V_SUBREV_I32_e32 > > > + > > > +; EG-DAG: SUBB_UINT > > > +; EG-DAG: SUB_INT > > > +; EG: SUB_INT > > > define void @v_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind { > > > %a = load i32 addrspace(1)* %aptr, align 4 > > > %b = load i32 addrspace(1)* %bptr, align 4 > > > @@ -42,6 +53,11 @@ define void @v_usubo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > > > ; FUNC-LABEL: @s_usubo_i64 > > > ; SI: S_SUB_U32 > > > ; SI: S_SUBB_U32 > > > + > > > +; EG-DAG: SUBB_UINT > > > +; EG-DAG: SUB_INT > > > +; EG-DAG: SUB_INT > > > +; EG: SUB_INT > > > define void @s_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind { > > > %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind > > > %val = extractvalue { i64, i1 } %usub, 0 > > > @@ -54,6 +70,11 @@ define void @s_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 > > > ; FUNC-LABEL: @v_usubo_i64 > > > ; SI: V_SUB_I32 > > > ; SI: V_SUBB_U32 > > > + > > > +; EG-DAG: SUBB_UINT > > > +; EG-DAG: SUB_INT > > > +; EG-DAG: SUB_INT > > > +; EG: SUB_INT > > > define void @v_usubo_i64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind { > > > %a = load i64 addrspace(1)* %aptr, align 4 > > > %b = load i64 addrspace(1)* %bptr, align 4 > > > > -- > > Jan Vesely <jan.vesely at rutgers.edu> > > > > -- > > Jan Vesely <jan.vesely at rutgers.edu> > > > >-- Jan Vesely <jan.vesely at rutgers.edu> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141009/e578b57f/attachment.sig>
Reasonably Related Threads
- [Mesa-dev] llvm TGSI backend (WIP) questions
- llvm TGSI backend (WIP) questions
- [LLVMdev] Subword register allocation
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- [LLVMdev] Modeling GPU vector registers, again (with my implementation)