Hi everybody, I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks. Compiling the Histogram64.cl program I get a several backend errors. I isolated one of them in the following kernel program: __kernel void kernel_function(__global int *input) { __local char localArray[16]; for(unsigned int index = 0; index < 16; ++index) localArray[index] = 0; input[0] = localArray[get_local_id(0)]; } fatal error: error in backend: Cannot select: 0x5810cc0: i32,ch = load 0x57fa148, 0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext from i8> [ID=9] 0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8] 0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7] 0x5813540: i32 = TargetGlobalAddress<[16 x i8] addrspace(4)* @kernel_function.localArray> 0 [ID=4] 0x5813640: i32,ch = load 0x57fa148, 0x5810dc0, 0x58105c0<LD4[%retval.i]> [ORD=110] [ID=5] 0x5810dc0: i32 = FrameIndex<0> [ORD=110] [ID=1] 0x58105c0: i32 = undef [ORD=110] [ID=2] 0x58105c0: i32 = undef [ORD=110] [ID=2] The command I am using is: clang kernels/fatal_error_test.cl -O0 -include ocldef.h -include builtin_functions_ptx.cl -D__x86_64__ -ccc-host-triple ptx32 -Xclang -target-feature -Xclang +ptx23 -Xclang -target-feature -Xclang +compute20 Any ideas ? Best regards Alberto
On Mon, Nov 14, 2011 at 8:57 AM, Alberto Magni <alberto.magni86 at gmail.com>wrote:> Hi everybody, > > I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks. > Compiling the Histogram64.cl program I get a several backend errors. > > I isolated one of them in the following kernel program: > > __kernel void kernel_function(__global int *input) { > __local char localArray[16]; > for(unsigned int index = 0; index < 16; ++index) > localArray[index] = 0; > input[0] = localArray[get_local_id(0)]; > } > > fatal error: error in backend: Cannot select: > 0x5810cc0: i32,ch = load 0x57fa148, > 0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext > from i8> [ID=9] > 0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8] > 0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7] > 0x5813540: i32 = TargetGlobalAddress<[16 x i8] addrspace(4)* > @kernel_function.localArray> 0 [ID=4] > 0x5813640: i32,ch = load 0x57fa148, 0x5810dc0, > 0x58105c0<LD4[%retval.i]> [ORD=110] [ID=5] > 0x5810dc0: i32 = FrameIndex<0> [ORD=110] [ID=1] > 0x58105c0: i32 = undef [ORD=110] [ID=2] > 0x58105c0: i32 = undef [ORD=110] [ID=2] > > The command I am using is: > > clang kernels/fatal_error_test.cl -O0 -include ocldef.h -include > builtin_functions_ptx.cl > -D__x86_64__ > -ccc-host-triple ptx32 -Xclang > -target-feature > -Xclang +ptx23 -Xclang > -target-feature > -Xclang +compute20 > > Any ideas ? >Unfortunately, this sample will not work at this time. First, the backend does not support i8 types yet. Second, at higher optimization levels, LLVM turns this loop into a memset intrinsic, which is also not yet implemented. :( Hopefully I'll get some time soon to work on this, and other deficiencies. Patches are always welcome, too.> > Best regards > > Alberto > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111114/d784c139/attachment.html>
Justin, Add this to your TargetLowering constructor, this fixes the mem* issue. maxStoresPerMemcpy = 4096; maxStoresPerMemmove = 4096; maxStoresPerMemset = 4096; From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Justin Holewinski Sent: Monday, November 14, 2011 7:12 AM To: Alberto Magni Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] PTX backend fatal error On Mon, Nov 14, 2011 at 8:57 AM, Alberto Magni <alberto.magni86 at gmail.com<mailto:alberto.magni86 at gmail.com>> wrote: Hi everybody, I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks. Compiling the Histogram64.cl program I get a several backend errors. I isolated one of them in the following kernel program: __kernel void kernel_function(__global int *input) { __local char localArray[16]; for(unsigned int index = 0; index < 16; ++index) localArray[index] = 0; input[0] = localArray[get_local_id(0)]; } fatal error: error in backend: Cannot select: 0x5810cc0: i32,ch = load 0x57fa148, 0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext from i8> [ID=9] 0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8] 0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7] 0x5813540: i32 = TargetGlobalAddress<[16 x i8] addrspace(4)* @kernel_function.localArray> 0 [ID=4] 0x5813640: i32,ch = load 0x57fa148, 0x5810dc0, 0x58105c0<LD4[%retval.i]> [ORD=110] [ID=5] 0x5810dc0: i32 = FrameIndex<0> [ORD=110] [ID=1] 0x58105c0: i32 = undef [ORD=110] [ID=2] 0x58105c0: i32 = undef [ORD=110] [ID=2] The command I am using is: clang kernels/fatal_error_test.cl<http://fatal_error_test.cl> -O0 -include ocldef.h -include builtin_functions_ptx.cl<http://builtin_functions_ptx.cl> -D__x86_64__ -ccc-host-triple ptx32 -Xclang -target-feature -Xclang +ptx23 -Xclang -target-feature -Xclang +compute20 Any ideas ? Unfortunately, this sample will not work at this time. First, the backend does not support i8 types yet. Second, at higher optimization levels, LLVM turns this loop into a memset intrinsic, which is also not yet implemented. :( Hopefully I'll get some time soon to work on this, and other deficiencies. Patches are always welcome, too. Best regards Alberto _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111114/972b1003/attachment.html>