Hi, I've prepared patches for both LLVM and Clang to change the datalayout for R600. This may seem like a bold move, but I think it is warranted. R600/SI is a strange architecture in that it uses 64bit pointers but does not support 64 bit arithmetic except for load/store operations that roughly map onto getelementptr. The current datalayout for r600 includes n32:64, which is odd because r600 cannot actually do any 64bit arith natively. This causes particular problems in the optimizer with the following kernel: __kernel void if_eq(__global unsigned long* out, unsigned arg0) { int i=0; for(i = 0; i < arg0; i++){ out[i] = i; } } Clang decides that instead of adding a sext i32 %i to i64 before getelementptr, it would be best to just go ahead and promote the variable i to i64. Which would be all good if i64 was actually a native type. By changing the native types to n32 *only*. clang emits better code for r600, such as ; Function Attrs: nounwind define void @if_eq(i64 addrspace(1)* nocapture %out, i32 %arg0) #0 { entry: %cmp4 = icmp eq i32 %arg0, 0 br i1 %cmp4, label %for.end, label %for.body for.body: ; preds = %for.body, %entry %i.05 = phi i32 [ %inc, %for.body ], [ 0, %entry ] %conv = sext i32 %i.05 to i64 %arrayidx = getelementptr inbounds i64 addrspace(1)* %out, i64 %conv store i64 %conv, i64 addrspace(1)* %arrayidx, align 8, !tbaa !2 %inc = add nsw i32 %i.05, 1 %exitcond = icmp eq i32 %inc, %arg0 br i1 %exitcond, label %for.end, label %for.body for.end: ; preds = %for.body, %entry ret void } Another upside to this is that i64 addition on r600 can even be enabled as it the lowering code need only be called for arithmetic and not pointers, for which it actually works. In the future, r600 will require more patches wrt this issue as the old IR generated by clang was perfectly valid but crashes llc. For now, I think this patch is a good solution because it makes better code and allows me to compile programs much longer than my previous 3 line record :) Not so much a bandaid as a something else that needed to be done. Regards, Jon Pry jonpry at gmail.com -------------- next part --------------
Mostly n64 is there for historical reasons as AMDGPU backend originally derived from the AMDIL backend. The AMDIL backend did support 64bit native types on some hardware chips of the r6XX derived architectures, so for simplicity software efficiently emulated them on the ones it did not. Also GCN does support 64bit on many operations, so I wouldn't remove it as a native type there. Basically what you want to for clang to emit better code is to use per pointer address spaces. So in the address spaces where you want 32bit computation, you specify it in the data layout, and in the address spaces where you need 64bit, you do the same. LLVM produces the right code in these situations.> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Jon Pry > Sent: Tuesday, December 31, 2013 1:56 PM > To: llvmdev at cs.uiuc.edu > Subject: [LLVMdev] [Patch][RFC] Change R600 data layout > > Hi, > > I've prepared patches for both LLVM and Clang to change the datalayout > for R600. This may seem like a bold move, but I think it is warranted. R600/SI > is a strange architecture in that it uses 64bit pointers but does not support 64 > bit arithmetic except for load/store operations that roughly map onto > getelementptr. > > The current datalayout for r600 includes n32:64, which is odd because r600 > cannot actually do any 64bit arith natively. This causes particular problems in > the optimizer with the following kernel: > > __kernel void if_eq(__global unsigned long* out, unsigned arg0) { > int i=0; > for(i = 0; i < arg0; i++){ > out[i] = i; > } > } > > Clang decides that instead of adding a sext i32 %i to i64 before getelementptr, > it would be best to just go ahead and promote the variable i to i64. Which > would be all good if i64 was actually a native type. > > By changing the native types to n32 *only*. clang emits better code for r600, > such as > > ; Function Attrs: nounwind > define void @if_eq(i64 addrspace(1)* nocapture %out, i32 %arg0) #0 { > entry: > %cmp4 = icmp eq i32 %arg0, 0 > br i1 %cmp4, label %for.end, label %for.body > > for.body: ; preds = %for.body, %entry > %i.05 = phi i32 [ %inc, %for.body ], [ 0, %entry ] > %conv = sext i32 %i.05 to i64 > %arrayidx = getelementptr inbounds i64 addrspace(1)* %out, i64 %conv > store i64 %conv, i64 addrspace(1)* %arrayidx, align 8, !tbaa !2 > %inc = add nsw i32 %i.05, 1 > %exitcond = icmp eq i32 %inc, %arg0 > br i1 %exitcond, label %for.end, label %for.body > > for.end: ; preds = %for.body, %entry > ret void > } > > Another upside to this is that i64 addition on r600 can even be enabled as it > the lowering code need only be called for arithmetic and not pointers, for > which it actually works. In the future, r600 will require more patches wrt this > issue as the old IR generated by clang was perfectly valid but crashes llc. > > For now, I think this patch is a good solution because it makes better code > and allows me to compile programs much longer than my previous 3 line > record :) Not so much a bandaid as a something else that needed to be done. > > > Regards, > > Jon Pry > jonpry at gmail.com
On Dec 31, 2013, at 4:55 PM, Jon Pry <jonpry at gmail.com> wrote:> Hi, > > I've prepared patches for both LLVM and Clang to change the > datalayout for R600. This may seem like a bold move, but I think it is > warranted. R600/SI is a strange architecture in that it uses 64bit > pointers but does not support 64 bit arithmetic except for load/store > operations that roughly map onto getelementptr. > > The current datalayout for r600 includes n32:64, which is odd > because r600 cannot actually do any 64bit arith natively.This isn’t entirely true. 64-bit operations do exist for a small number of (mostly bitwise) operations, just not the normal arithmetic add and multiply. Most importantly, there are 64-bit loads and stores. I’ve seen various places that use the native integer type to decide a good sized integer for something like memcpy. It would be beneficial to use a 64-bit type for a series of load / store for global pointers.> This causes > particular problems in the optimizer with the following kernel: > > __kernel void if_eq(__global unsigned long* out, unsigned arg0) > { > int i=0; > for(i = 0; i < arg0; i++){ > out[i] = i; > } > } > > Clang decides that instead of adding a sext i32 %i to i64 before > getelementptr, it would be best to just go ahead and promote the > variable i to i64. Which would be all good if i64 was actually a > native type.> > By changing the native types to n32 *only*. clang emits better code > for r600, such as > > ; Function Attrs: nounwind > define void @if_eq(i64 addrspace(1)* nocapture %out, i32 %arg0) #0 { > entry: > %cmp4 = icmp eq i32 %arg0, 0 > br i1 %cmp4, label %for.end, label %for.body > > for.body: ; preds = %for.body, %entry > %i.05 = phi i32 [ %inc, %for.body ], [ 0, %entry ] > %conv = sext i32 %i.05 to i64 > %arrayidx = getelementptr inbounds i64 addrspace(1)* %out, i64 %conv > store i64 %conv, i64 addrspace(1)* %arrayidx, align 8, !tbaa !2 > %inc = add nsw i32 %i.05, 1 > %exitcond = icmp eq i32 %inc, %arg0 > br i1 %exitcond, label %for.end, label %for.body > > for.end: ; preds = %for.body, %entry > ret void > } > > Another upside to this is that i64 addition on r600 can even be > enabled as it the lowering code need only be called for arithmetic and > not pointers, for which it actually works. In the future, r600 will > require more patches wrt this issue as the old IR generated by clang > was perfectly valid but crashes llc. > > For now, I think this patch is a good solution because it makes better > code and allows me to compile programs much longer than my previous 3 > line record :) Not so much a bandaid as a something else that needed > to be done. > > > Regards, > > Jon Pry > jonpry at gmail.com > <0001-R600-Fix-pointer-arithmetic.txt><0001-R600-Change-datalayout.txt>_______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Tue, Dec 31, 2013 at 6:04 PM, Micah Villmow <micah.villmow at smachines.com> wrote:> Mostly n64 is there for historical reasons as AMDGPU backend originally derived from the AMDIL backend. The AMDIL backend did support 64bit native types on some hardware chips of the r6XX derived architectures, so for simplicity software efficiently emulated them on the ones it did not. > > Also GCN does support 64bit on many operations, so I wouldn't remove it as a native type there. > > Basically what you want to for clang to emit better code is to use per pointer address spaces. So in the address spaces where you want 32bit computation, you specify it in the data layout, and in the address spaces where you need 64bit, you do the same. LLVM produces the right code in these situations.I don't believe this is a viable option. The address space in question is (1) global, which is set to p64 as it must be. The address space computation is not an issue. It's this senseless promotion of the indexer to 64 bits. Thus requiring the i++ to be a 64bit add and so on.> This isn’t entirely true. 64-bit operations do exist for a small number of (mostly bitwise) operations, just not the normal arithmetic add and multiply. Most importantly, there are 64-bit loads and stores. I’ve seen various places that use the native integer type to > decide a good sized integer for something like memcpy. It would be beneficial to use a 64-bit type for a series of load / store for global pointers.I agree that it would be nice if user code could somehow tell that 64bit ld/s was optimal. Imho the crux of the issue is that once clang promotes something to i64 there is no hope for optimizing it back out. It's impossible to tell that the indexer is bounded to 32-bit limits, so the 64 bit additions just have to stay. Ie information is destroyed when clang promotes for us. I'd also like to point out that the patch works really well, especially combined with my previous one. I am able to compile all clang and clover opencl tests for r600/si now. where as before ~90% of them crashed. Not that my 9 lines of code are responsible for all that much. Just a little polish on the edges.
On Tue, Dec 31, 2013 at 04:55:51PM -0500, Jon Pry wrote:> Hi, > > I've prepared patches for both LLVM and Clang to change the > datalayout for R600. This may seem like a bold move, but I think it is > warranted. R600/SI is a strange architecture in that it uses 64bit > pointers but does not support 64 bit arithmetic except for load/store > operations that roughly map onto getelementptr. > > The current datalayout for r600 includes n32:64, which is odd > because r600 cannot actually do any 64bit arith natively. This causes > particular problems in the optimizer with the following kernel: > > __kernel void if_eq(__global unsigned long* out, unsigned arg0) > { > int i=0; > for(i = 0; i < arg0; i++){ > out[i] = i; > } > } > > Clang decides that instead of adding a sext i32 %i to i64 before > getelementptr, it would be best to just go ahead and promote the > variable i to i64. Which would be all good if i64 was actually a > native type. > > By changing the native types to n32 *only*. clang emits better code > for r600, such as > > ; Function Attrs: nounwind > define void @if_eq(i64 addrspace(1)* nocapture %out, i32 %arg0) #0 { > entry: > %cmp4 = icmp eq i32 %arg0, 0 > br i1 %cmp4, label %for.end, label %for.body > > for.body: ; preds = %for.body, %entry > %i.05 = phi i32 [ %inc, %for.body ], [ 0, %entry ] > %conv = sext i32 %i.05 to i64 > %arrayidx = getelementptr inbounds i64 addrspace(1)* %out, i64 %conv > store i64 %conv, i64 addrspace(1)* %arrayidx, align 8, !tbaa !2 > %inc = add nsw i32 %i.05, 1 > %exitcond = icmp eq i32 %inc, %arg0 > br i1 %exitcond, label %for.end, label %for.body > > for.end: ; preds = %for.body, %entry > ret void > } > > Another upside to this is that i64 addition on r600 can even be > enabled as it the lowering code need only be called for arithmetic and > not pointers, for which it actually works. In the future, r600 will > require more patches wrt this issue as the old IR generated by clang > was perfectly valid but crashes llc. > > For now, I think this patch is a good solution because it makes better > code and allows me to compile programs much longer than my previous 3 > line record :) Not so much a bandaid as a something else that needed > to be done. >I'm assuming you are using a Southern Islands GPU. Is this correct? What errors are you seeing without this patch and with which tests? SI supports 64-bit operations natively, so I don't think we should be removing n64 from the DataLayout. I think you could achieve the same results by adding a target-specific DAG combine that recognizes some form of the (i64 add (i64 sext a), (i64 1)) pattern and replaces it with a 32-bit add. I think there used to be a generic LLVM IR optimization that did something like this (I'm not sure what it was called, maybe "value range propagation"), so reviving this would be another option. It appears you are using tests from my opencl demos repo. If you are interested, there are more more mature and comprehensive tests in piglit: http://piglit.freedesktop.org/ take a look at the tests/cl directory. -Tom> > Regards, > > Jon Pry > jonpry at gmail.com> From 0a2572c1f1bd322d1517e15135033be88afc6cd7 Mon Sep 17 00:00:00 2001 > From: Jon Pry <jonpry at gmail.com> > Date: Tue, 31 Dec 2013 14:20:21 -0500 > Subject: [PATCH] R600 - Fix pointer arithmetic > > --- > lib/Target/R600/AMDGPUTargetMachine.cpp | 2 +- > lib/Target/R600/SIISelLowering.cpp | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp b/lib/Target/R600/AMDGPUTargetMachine.cpp > index 1279665..0e19439 100644 > --- a/lib/Target/R600/AMDGPUTargetMachine.cpp > +++ b/lib/Target/R600/AMDGPUTargetMachine.cpp > @@ -58,7 +58,7 @@ static std::string computeDataLayout(const AMDGPUSubtarget &ST) { > } > > Ret += "-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256" > - "-v512:512-v1024:1024-v2048:2048-n32:64"; > + "-v512:512-v1024:1024-v2048:2048-n32"; > > return Ret; > } > diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp > index 4fb8444..add3dda 100644 > --- a/lib/Target/R600/SIISelLowering.cpp > +++ b/lib/Target/R600/SIISelLowering.cpp > @@ -76,7 +76,7 @@ SITargetLowering::SITargetLowering(TargetMachine &TM) : > setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v16i32, Expand); > setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v16f32, Expand); > > - setOperationAction(ISD::ADD, MVT::i64, Legal); > + setOperationAction(ISD::ADD, MVT::i64, Custom); > setOperationAction(ISD::ADD, MVT::i32, Legal); > setOperationAction(ISD::ADDC, MVT::i32, Legal); > setOperationAction(ISD::ADDE, MVT::i32, Legal); > -- > 1.7.10.4 >> From a9ebe3817733d64547eee399d75b16421681b1af Mon Sep 17 00:00:00 2001 > From: Jon Pry <jonpry at gmail.com> > Date: Tue, 31 Dec 2013 14:20:40 -0500 > Subject: [PATCH] R600 - Change datalayout > > --- > lib/Basic/Targets.cpp | 6 +++--- > test/CodeGen/target-data.c | 6 +++--- > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/lib/Basic/Targets.cpp b/lib/Basic/Targets.cpp > index d698090..8c762e2 100644 > --- a/lib/Basic/Targets.cpp > +++ b/lib/Basic/Targets.cpp > @@ -1419,7 +1419,7 @@ static const char *DescriptionStringR600 > "-i64:64" > "-v16:16-v24:32-v32:32-v48:64-v96:128" > "-v192:256-v256:256-v512:512-v1024:1024-v2048:2048" > - "-n32:64"; > + "-n32"; > > static const char *DescriptionStringR600DoubleOps > "e" > @@ -1427,7 +1427,7 @@ static const char *DescriptionStringR600DoubleOps > "-i64:64" > "-v16:16-v24:32-v32:32-v48:64-v96:128" > "-v192:256-v256:256-v512:512-v1024:1024-v2048:2048" > - "-n32:64"; > + "-n32"; > > static const char *DescriptionStringSI > "e" > @@ -1435,7 +1435,7 @@ static const char *DescriptionStringSI > "-i64:64" > "-v16:16-v24:32-v32:32-v48:64-v96:128" > "-v192:256-v256:256-v512:512-v1024:1024-v2048:2048" > - "-n32:64"; > + "-n32"; > > class R600TargetInfo : public TargetInfo { > /// \brief The GPU profiles supported by the R600 target. > diff --git a/test/CodeGen/target-data.c b/test/CodeGen/target-data.c > index a4db2e0..9adde60 100644 > --- a/test/CodeGen/target-data.c > +++ b/test/CodeGen/target-data.c > @@ -108,15 +108,15 @@ > > // RUN: %clang_cc1 -triple r600-unknown -o - -emit-llvm %s | \ > // RUN: FileCheck %s -check-prefix=R600 > -// R600: target datalayout = "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64" > +// R600: target datalayout = "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32" > > // RUN: %clang_cc1 -triple r600-unknown -target-cpu cayman -o - -emit-llvm %s \ > // RUN: | FileCheck %s -check-prefix=R600D > -// R600D: target datalayout = "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64" > +// R600D: target datalayout = "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32" > > // RUN: %clang_cc1 -triple r600-unknown -target-cpu hawaii -o - -emit-llvm %s \ > // RUN: | FileCheck %s -check-prefix=R600SI > -// R600SI: target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:32:32-p5:64:64-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64" > +// R600SI: target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:32:32-p5:64:64-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32" > > // RUN: %clang_cc1 -triple aarch64-unknown -o - -emit-llvm %s | \ > // RUN: FileCheck %s -check-prefix=AARCH64 > -- > 1.7.10.4 >> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev