Hi everybody,
I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks.
Compiling the Histogram64.cl program I get a several backend errors.
I isolated one of them in the following kernel program:
__kernel void kernel_function(__global int *input) {
__local char localArray[16];
for(unsigned int index = 0; index < 16; ++index)
localArray[index] = 0;
input[0] = localArray[get_local_id(0)];
}
fatal error: error in backend: Cannot select:
0x5810cc0: i32,ch = load 0x57fa148,
0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext
from i8> [ID=9]
0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8]
0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7]
0x5813540: i32 = TargetGlobalAddress<[16 x i8] addrspace(4)*
@kernel_function.localArray> 0 [ID=4]
0x5813640: i32,ch = load 0x57fa148, 0x5810dc0,
0x58105c0<LD4[%retval.i]> [ORD=110] [ID=5]
0x5810dc0: i32 = FrameIndex<0> [ORD=110] [ID=1]
0x58105c0: i32 = undef [ORD=110] [ID=2]
0x58105c0: i32 = undef [ORD=110] [ID=2]
The command I am using is:
clang kernels/fatal_error_test.cl -O0 -include ocldef.h -include
builtin_functions_ptx.cl
-D__x86_64__
-ccc-host-triple ptx32 -Xclang
-target-feature
-Xclang +ptx23 -Xclang
-target-feature
-Xclang +compute20
Any ideas ?
Best regards
Alberto
On Mon, Nov 14, 2011 at 8:57 AM, Alberto Magni <alberto.magni86 at gmail.com>wrote:> Hi everybody, > > I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks. > Compiling the Histogram64.cl program I get a several backend errors. > > I isolated one of them in the following kernel program: > > __kernel void kernel_function(__global int *input) { > __local char localArray[16]; > for(unsigned int index = 0; index < 16; ++index) > localArray[index] = 0; > input[0] = localArray[get_local_id(0)]; > } > > fatal error: error in backend: Cannot select: > 0x5810cc0: i32,ch = load 0x57fa148, > 0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext > from i8> [ID=9] > 0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8] > 0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7] > 0x5813540: i32 = TargetGlobalAddress<[16 x i8] addrspace(4)* > @kernel_function.localArray> 0 [ID=4] > 0x5813640: i32,ch = load 0x57fa148, 0x5810dc0, > 0x58105c0<LD4[%retval.i]> [ORD=110] [ID=5] > 0x5810dc0: i32 = FrameIndex<0> [ORD=110] [ID=1] > 0x58105c0: i32 = undef [ORD=110] [ID=2] > 0x58105c0: i32 = undef [ORD=110] [ID=2] > > The command I am using is: > > clang kernels/fatal_error_test.cl -O0 -include ocldef.h -include > builtin_functions_ptx.cl > -D__x86_64__ > -ccc-host-triple ptx32 -Xclang > -target-feature > -Xclang +ptx23 -Xclang > -target-feature > -Xclang +compute20 > > Any ideas ? >Unfortunately, this sample will not work at this time. First, the backend does not support i8 types yet. Second, at higher optimization levels, LLVM turns this loop into a memset intrinsic, which is also not yet implemented. :( Hopefully I'll get some time soon to work on this, and other deficiencies. Patches are always welcome, too.> > Best regards > > Alberto > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111114/d784c139/attachment.html>
Justin,
Add this to your TargetLowering constructor, this fixes the mem* issue.
maxStoresPerMemcpy = 4096;
maxStoresPerMemmove = 4096;
maxStoresPerMemset = 4096;
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Justin Holewinski
Sent: Monday, November 14, 2011 7:12 AM
To: Alberto Magni
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] PTX backend fatal error
On Mon, Nov 14, 2011 at 8:57 AM, Alberto Magni <alberto.magni86 at
gmail.com<mailto:alberto.magni86 at gmail.com>> wrote:
Hi everybody,
I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks.
Compiling the Histogram64.cl program I get a several backend errors.
I isolated one of them in the following kernel program:
__kernel void kernel_function(__global int *input) {
__local char localArray[16];
for(unsigned int index = 0; index < 16; ++index)
localArray[index] = 0;
input[0] = localArray[get_local_id(0)];
}
fatal error: error in backend: Cannot select:
0x5810cc0: i32,ch = load 0x57fa148,
0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext
from i8> [ID=9]
0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8]
0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7]
0x5813540: i32 = TargetGlobalAddress<[16 x i8] addrspace(4)*
@kernel_function.localArray> 0 [ID=4]
0x5813640: i32,ch = load 0x57fa148, 0x5810dc0,
0x58105c0<LD4[%retval.i]> [ORD=110] [ID=5]
0x5810dc0: i32 = FrameIndex<0> [ORD=110] [ID=1]
0x58105c0: i32 = undef [ORD=110] [ID=2]
0x58105c0: i32 = undef [ORD=110] [ID=2]
The command I am using is:
clang kernels/fatal_error_test.cl<http://fatal_error_test.cl> -O0 -include
ocldef.h -include
builtin_functions_ptx.cl<http://builtin_functions_ptx.cl>
-D__x86_64__
-ccc-host-triple ptx32 -Xclang
-target-feature
-Xclang +ptx23 -Xclang
-target-feature
-Xclang +compute20
Any ideas ?
Unfortunately, this sample will not work at this time. First, the backend does
not support i8 types yet. Second, at higher optimization levels, LLVM turns
this loop into a memset intrinsic, which is also not yet implemented. :(
Hopefully I'll get some time soon to work on this, and other deficiencies.
Patches are always welcome, too.
Best regards
Alberto
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
--
Thanks,
Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111114/972b1003/attachment.html>