Dear colleagues, I'm looking if we can replace nvopencc with LLVM NVPTX in our project. It turns NVPTX won't work with the code nvopencc can handle (please see the log below). So are atomic intrinsics not supported or am I doing call in a wrong way? Thanks, - Dima. SOURCE ======= dmikushin at hp2:~> cat kernelgen_monitor.ll ; ModuleID = '/opt/kernelgen/include/kernelgen_monitor.cu' target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64" target triple = "ptx64-unknown-unknown" %struct.kernelgen_callback_t = type { i32, i32, %"struct.kernelgen::kernel_t"*, i32, i32, %struct.kernelgen_callback_data_t* } %"struct.kernelgen::kernel_t" = type opaque %struct.kernelgen_callback_data_t = type opaque define ptx_kernel void @_Z17kernelgen_monitorPi(i32* %callback) nounwind { entry: %callback.addr = alloca i32*, align 8 store i32* %callback, i32** %callback.addr, align 8 %0 = load i32** %callback.addr, align 8 %1 = bitcast i32* %0 to %struct.kernelgen_callback_t* %lock = getelementptr inbounds %struct.kernelgen_callback_t* %1, i32 0, i32 0 %call = call ptx_device i32 @_Z12__iAtomicCASPiii(i32* %lock, i32 1, i32 0) br label %while.cond while.cond: ; preds = %while.body, %entry %2 = load i32** %callback.addr, align 8 %3 = bitcast i32* %2 to %struct.kernelgen_callback_t* %lock1 = getelementptr inbounds %struct.kernelgen_callback_t* %3, i32 0, i32 0 %call2 = call ptx_device i32 @_Z12__iAtomicCASPiii(i32* %lock1, i32 1, i32 1) %tobool = icmp ne i32 %call2, 0 %lnot = xor i1 %tobool, true br i1 %lnot, label %while.body, label %while.end while.body: ; preds = %while.cond br label %while.cond while.end: ; preds = %while.cond ret void } declare ptx_device i32 @_Z12__iAtomicCASPiii(i32*, i32, i32) CODEGEN ======== dmikushin at hp2:~> llc < kernelgen_monitor.ll -march=nvptx -mcpu=sm_20 // // Generated by LLVM NVPTX Back-End // .version 3.0 .target sm_20, texmode_independent .address_size 32 .func (.param .b32 func_retval0) _Z12__iAtomicCASPiii ( .param .b32 _Z12__iAtomicCASPiii_param_0, .param .b32 _Z12__iAtomicCASPiii_param_1, .param .b32 _Z12__iAtomicCASPiii_param_2 ) ; Not Implemented UNREACHABLE executed at /tmp/rpmbuild_debug/BUILD/llvm/build/include/llvm/Target/TargetLowering.h:1249! 0 libLLVM-3.2svn.so 0x00007f47738b8f5f 1 libLLVM-3.2svn.so 0x00007f47738b9525 2 libpthread.so.0 0x00007f47726135d0 3 libc.so.6 0x00007f4771931945 gsignal + 53 4 libc.so.6 0x00007f4771932f21 abort + 385 5 libLLVM-3.2svn.so 0x00007f47738a24c1 llvm::report_fatal_error(llvm::Twine const&) + 0 6 libLLVM-3.2svn.so 0x00007f47735cd390 7 libLLVM-3.2svn.so 0x00007f47737fe2ba llvm::TargetLowering::LowerCallTo(llvm::SDValue, llvm::Type*, bool, bool, bool, bool, unsigned int, llvm::CallingConv::ID, bool, bool, bool, llvm::SDValue, std::vector<llvm::TargetLowering::ArgListEntry, std::allocator<llvm::TargetLowering::ArgListEntry> >&, llvm::SelectionDAG&, llvm::DebugLoc) const + 2120 8 libLLVM-3.2svn.so 0x00007f4773807199 llvm::SelectionDAGBuilder::LowerCallTo(llvm::ImmutableCallSite, llvm::SDValue, bool, llvm::MachineBasicBlock*) + 2913 9 libLLVM-3.2svn.so 0x00007f477381b3af llvm::SelectionDAGBuilder::visitCall(llvm::CallInst const&) + 9681 10 libLLVM-3.2svn.so 0x00007f477382abee llvm::SelectionDAGBuilder::visit(unsigned int, llvm::User const&) + 1044 11 libLLVM-3.2svn.so 0x00007f477382ad6d llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 105 12 libLLVM-3.2svn.so 0x00007f4773844925 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::Instruction const>, llvm::ilist_iterator<llvm::Instruction const>, bool&) + 59 13 libLLVM-3.2svn.so 0x00007f477384540c llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 2620 14 libLLVM-3.2svn.so 0x00007f47738459c6 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 896 15 libLLVM-3.2svn.so 0x00007f4773175bfe llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 82 16 libLLVM-3.2svn.so 0x00007f47733ac299 llvm::FPPassManager::runOnFunction(llvm::Function&) + 331 17 libLLVM-3.2svn.so 0x00007f47733ac474 llvm::FPPassManager::runOnModule(llvm::Module&) + 86 18 libLLVM-3.2svn.so 0x00007f47733abf6d llvm::MPPassManager::runOnModule(llvm::Module&) + 381 19 libLLVM-3.2svn.so 0x00007f47733ad6eb llvm::PassManagerImpl::run(llvm::Module&) + 111 20 libLLVM-3.2svn.so 0x00007f47733ad74d llvm::PassManager::run(llvm::Module&) + 33 21 llc 0x000000000040eed6 main + 2835 22 libc.so.6 0x00007f477191dbc6 __libc_start_main + 230 23 llc 0x000000000040cc09 Stack dump: 0. Program arguments: llc -march=nvptx -mcpu=sm_20 1. Running pass 'Function Pass Manager' on module '<stdin>'. 2. Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on function '@_Z17kernelgen_monitorPi' Aborted dmikushin at hp2:~> cd ~/rpmbuild/BUILD/llvm/ && svn info Path: . Working Copy Root Path: /tmp/rpmbuild_debug/BUILD/llvm URL: http://llvm.org/svn/llvm-project/llvm/trunk Repository Root: http://llvm.org/svn/llvm-project Repository UUID: 91177308-0d34-0410-b5e6-96231b3b80d8 Revision: 156703 Node Kind: directory Schedule: normal Last Changed Author: foad Last Changed Rev: 156703 Last Changed Date: 2012-05-12 12:30:16 +0400 (Sat, 12 May 2012)
> -----Original Message----- > From: Dmitry N. Mikushin [mailto:maemarcus at gmail.com] > Sent: Wednesday, May 16, 2012 5:44 AM > To: LLVM-Dev > Cc: Justin Holewinski > Subject: NVPTX: __iAtomicCAS support ? > > Dear colleagues, > > I'm looking if we can replace nvopencc with LLVM NVPTX in our project. > It turns NVPTX won't work with the code nvopencc can handle (please > see the log below). So are atomic intrinsics not supported or am I > doing call in a wrong way?There are really two issues here. First, the error you are seeing is because calls are disabled in the back-end until an outstanding LLVM core patch is committed. Hopefully, we'll be able to push that in soon. Second, __iAtomicCAS() is a CUDA-C built-in function; the implementation is provided by a library linked with the LLVM IR before the NVPTX back-end sees it. You will need to provide your own implementations for such functions.> > Thanks, > - Dima. > > SOURCE > =======> > dmikushin at hp2:~> cat kernelgen_monitor.ll > ; ModuleID = '/opt/kernelgen/include/kernelgen_monitor.cu' > target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64" > target triple = "ptx64-unknown-unknown" > > %struct.kernelgen_callback_t = type { i32, i32, > %"struct.kernelgen::kernel_t"*, i32, i32, > %struct.kernelgen_callback_data_t* } > %"struct.kernelgen::kernel_t" = type opaque > %struct.kernelgen_callback_data_t = type opaque > > define ptx_kernel void @_Z17kernelgen_monitorPi(i32* %callback) > nounwind { > entry: > %callback.addr = alloca i32*, align 8 > store i32* %callback, i32** %callback.addr, align 8 > %0 = load i32** %callback.addr, align 8 > %1 = bitcast i32* %0 to %struct.kernelgen_callback_t* > %lock = getelementptr inbounds %struct.kernelgen_callback_t* %1, i32 0, > i32 0 > %call = call ptx_device i32 @_Z12__iAtomicCASPiii(i32* %lock, i32 1, i32 0) > br label %while.cond > > while.cond: ; preds = %while.body, %entry > %2 = load i32** %callback.addr, align 8 > %3 = bitcast i32* %2 to %struct.kernelgen_callback_t* > %lock1 = getelementptr inbounds %struct.kernelgen_callback_t* %3, i32 0, > i32 0 > %call2 = call ptx_device i32 @_Z12__iAtomicCASPiii(i32* %lock1, i32 1, i32 1) > %tobool = icmp ne i32 %call2, 0 > %lnot = xor i1 %tobool, true > br i1 %lnot, label %while.body, label %while.end > > while.body: ; preds = %while.cond > br label %while.cond > > while.end: ; preds = %while.cond > ret void > } > > declare ptx_device i32 @_Z12__iAtomicCASPiii(i32*, i32, i32) > > CODEGEN > ========> > dmikushin at hp2:~> llc < kernelgen_monitor.ll -march=nvptx -mcpu=sm_20 > // > // Generated by LLVM NVPTX Back-End > // > > .version 3.0 > .target sm_20, texmode_independent > .address_size 32 > > .func (.param .b32 func_retval0) _Z12__iAtomicCASPiii > ( > .param .b32 _Z12__iAtomicCASPiii_param_0, > .param .b32 _Z12__iAtomicCASPiii_param_1, > .param .b32 _Z12__iAtomicCASPiii_param_2 > ) > ; > > Not Implemented > UNREACHABLE executed at > /tmp/rpmbuild_debug/BUILD/llvm/build/include/llvm/Target/TargetLowerin > g.h:1249! > 0 libLLVM-3.2svn.so 0x00007f47738b8f5f > 1 libLLVM-3.2svn.so 0x00007f47738b9525 > 2 libpthread.so.0 0x00007f47726135d0 > 3 libc.so.6 0x00007f4771931945 gsignal + 53 > 4 libc.so.6 0x00007f4771932f21 abort + 385 > 5 libLLVM-3.2svn.so 0x00007f47738a24c1 > llvm::report_fatal_error(llvm::Twine const&) + 0 > 6 libLLVM-3.2svn.so 0x00007f47735cd390 > 7 libLLVM-3.2svn.so 0x00007f47737fe2ba > llvm::TargetLowering::LowerCallTo(llvm::SDValue, llvm::Type*, bool, > bool, bool, bool, unsigned int, llvm::CallingConv::ID, bool, bool, > bool, llvm::SDValue, std::vector<llvm::TargetLowering::ArgListEntry, > std::allocator<llvm::TargetLowering::ArgListEntry> >&, > llvm::SelectionDAG&, llvm::DebugLoc) const + 2120 > 8 libLLVM-3.2svn.so 0x00007f4773807199 > llvm::SelectionDAGBuilder::LowerCallTo(llvm::ImmutableCallSite, > llvm::SDValue, bool, llvm::MachineBasicBlock*) + 2913 > 9 libLLVM-3.2svn.so 0x00007f477381b3af > llvm::SelectionDAGBuilder::visitCall(llvm::CallInst const&) + 9681 > 10 libLLVM-3.2svn.so 0x00007f477382abee > llvm::SelectionDAGBuilder::visit(unsigned int, llvm::User const&) + > 1044 > 11 libLLVM-3.2svn.so 0x00007f477382ad6d > llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 105 > 12 libLLVM-3.2svn.so 0x00007f4773844925 > llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::Instruction > const>, llvm::ilist_iterator<llvm::Instruction const>, bool&) + 59 > 13 libLLVM-3.2svn.so 0x00007f477384540c > llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + > 2620 > 14 libLLVM-3.2svn.so 0x00007f47738459c6 > llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + > 896 > 15 libLLVM-3.2svn.so 0x00007f4773175bfe > llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 82 > 16 libLLVM-3.2svn.so 0x00007f47733ac299 > llvm::FPPassManager::runOnFunction(llvm::Function&) + 331 > 17 libLLVM-3.2svn.so 0x00007f47733ac474 > llvm::FPPassManager::runOnModule(llvm::Module&) + 86 > 18 libLLVM-3.2svn.so 0x00007f47733abf6d > llvm::MPPassManager::runOnModule(llvm::Module&) + 381 > 19 libLLVM-3.2svn.so 0x00007f47733ad6eb > llvm::PassManagerImpl::run(llvm::Module&) + 111 > 20 libLLVM-3.2svn.so 0x00007f47733ad74d > llvm::PassManager::run(llvm::Module&) + 33 > 21 llc 0x000000000040eed6 main + 2835 > 22 libc.so.6 0x00007f477191dbc6 __libc_start_main + 230 > 23 llc 0x000000000040cc09 > Stack dump: > 0. Program arguments: llc -march=nvptx -mcpu=sm_20 > 1. Running pass 'Function Pass Manager' on module '<stdin>'. > 2. Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on > function '@_Z17kernelgen_monitorPi' > Aborted > dmikushin at hp2:~> cd ~/rpmbuild/BUILD/llvm/ && svn info > Path: . > Working Copy Root Path: /tmp/rpmbuild_debug/BUILD/llvm > URL: http://llvm.org/svn/llvm-project/llvm/trunk > Repository Root: http://llvm.org/svn/llvm-project > Repository UUID: 91177308-0d34-0410-b5e6-96231b3b80d8 > Revision: 156703 > Node Kind: directory > Schedule: normal > Last Changed Author: foad > Last Changed Rev: 156703 > Last Changed Date: 2012-05-12 12:30:16 +0400 (Sat, 12 May 2012)----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------
Thanks, Justin! Clang does not implement atomic intrinsics, but fortunately supports inline asm: __inline__ __attribute__((always_inline)) __attribute__((device)) int __iAtomicCAS(int *p, int compare, int val) { int *global, result; asm( "cvta.to.global.u64 %0, %1;\n\t" "atom.global.cas.b32 %2, [%0], %3, %4;" :: "l"(global), "l"(p), "r"(result), "r"(compare), "r"(val)); return result; } It helped to workaround this particular problem. - D. 2012/5/16 Justin Holewinski <jholewinski at nvidia.com>:>> -----Original Message----- >> From: Dmitry N. Mikushin [mailto:maemarcus at gmail.com] >> Sent: Wednesday, May 16, 2012 5:44 AM >> To: LLVM-Dev >> Cc: Justin Holewinski >> Subject: NVPTX: __iAtomicCAS support ? >> >> Dear colleagues, >> >> I'm looking if we can replace nvopencc with LLVM NVPTX in our project. >> It turns NVPTX won't work with the code nvopencc can handle (please >> see the log below). So are atomic intrinsics not supported or am I >> doing call in a wrong way? > > There are really two issues here. > > First, the error you are seeing is because calls are disabled in the back-end until an outstanding LLVM core patch is committed. Hopefully, we'll be able to push that in soon. > > Second, __iAtomicCAS() is a CUDA-C built-in function; the implementation is provided by a library linked with the LLVM IR before the NVPTX back-end sees it. You will need to provide your own implementations for such functions. > >> >> Thanks, >> - Dima. >> >> SOURCE >> =======>> >> dmikushin at hp2:~> cat kernelgen_monitor.ll >> ; ModuleID = '/opt/kernelgen/include/kernelgen_monitor.cu' >> target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64" >> target triple = "ptx64-unknown-unknown" >> >> %struct.kernelgen_callback_t = type { i32, i32, >> %"struct.kernelgen::kernel_t"*, i32, i32, >> %struct.kernelgen_callback_data_t* } >> %"struct.kernelgen::kernel_t" = type opaque >> %struct.kernelgen_callback_data_t = type opaque >> >> define ptx_kernel void @_Z17kernelgen_monitorPi(i32* %callback) >> nounwind { >> entry: >> %callback.addr = alloca i32*, align 8 >> store i32* %callback, i32** %callback.addr, align 8 >> %0 = load i32** %callback.addr, align 8 >> %1 = bitcast i32* %0 to %struct.kernelgen_callback_t* >> %lock = getelementptr inbounds %struct.kernelgen_callback_t* %1, i32 0, >> i32 0 >> %call = call ptx_device i32 @_Z12__iAtomicCASPiii(i32* %lock, i32 1, i32 0) >> br label %while.cond >> >> while.cond: ; preds = %while.body, %entry >> %2 = load i32** %callback.addr, align 8 >> %3 = bitcast i32* %2 to %struct.kernelgen_callback_t* >> %lock1 = getelementptr inbounds %struct.kernelgen_callback_t* %3, i32 0, >> i32 0 >> %call2 = call ptx_device i32 @_Z12__iAtomicCASPiii(i32* %lock1, i32 1, i32 1) >> %tobool = icmp ne i32 %call2, 0 >> %lnot = xor i1 %tobool, true >> br i1 %lnot, label %while.body, label %while.end >> >> while.body: ; preds = %while.cond >> br label %while.cond >> >> while.end: ; preds = %while.cond >> ret void >> } >> >> declare ptx_device i32 @_Z12__iAtomicCASPiii(i32*, i32, i32) >> >> CODEGEN >> ========>> >> dmikushin at hp2:~> llc < kernelgen_monitor.ll -march=nvptx -mcpu=sm_20 >> // >> // Generated by LLVM NVPTX Back-End >> // >> >> .version 3.0 >> .target sm_20, texmode_independent >> .address_size 32 >> >> .func (.param .b32 func_retval0) _Z12__iAtomicCASPiii >> ( >> .param .b32 _Z12__iAtomicCASPiii_param_0, >> .param .b32 _Z12__iAtomicCASPiii_param_1, >> .param .b32 _Z12__iAtomicCASPiii_param_2 >> ) >> ; >> >> Not Implemented >> UNREACHABLE executed at >> /tmp/rpmbuild_debug/BUILD/llvm/build/include/llvm/Target/TargetLowerin >> g.h:1249! >> 0 libLLVM-3.2svn.so 0x00007f47738b8f5f >> 1 libLLVM-3.2svn.so 0x00007f47738b9525 >> 2 libpthread.so.0 0x00007f47726135d0 >> 3 libc.so.6 0x00007f4771931945 gsignal + 53 >> 4 libc.so.6 0x00007f4771932f21 abort + 385 >> 5 libLLVM-3.2svn.so 0x00007f47738a24c1 >> llvm::report_fatal_error(llvm::Twine const&) + 0 >> 6 libLLVM-3.2svn.so 0x00007f47735cd390 >> 7 libLLVM-3.2svn.so 0x00007f47737fe2ba >> llvm::TargetLowering::LowerCallTo(llvm::SDValue, llvm::Type*, bool, >> bool, bool, bool, unsigned int, llvm::CallingConv::ID, bool, bool, >> bool, llvm::SDValue, std::vector<llvm::TargetLowering::ArgListEntry, >> std::allocator<llvm::TargetLowering::ArgListEntry> >&, >> llvm::SelectionDAG&, llvm::DebugLoc) const + 2120 >> 8 libLLVM-3.2svn.so 0x00007f4773807199 >> llvm::SelectionDAGBuilder::LowerCallTo(llvm::ImmutableCallSite, >> llvm::SDValue, bool, llvm::MachineBasicBlock*) + 2913 >> 9 libLLVM-3.2svn.so 0x00007f477381b3af >> llvm::SelectionDAGBuilder::visitCall(llvm::CallInst const&) + 9681 >> 10 libLLVM-3.2svn.so 0x00007f477382abee >> llvm::SelectionDAGBuilder::visit(unsigned int, llvm::User const&) + >> 1044 >> 11 libLLVM-3.2svn.so 0x00007f477382ad6d >> llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 105 >> 12 libLLVM-3.2svn.so 0x00007f4773844925 >> llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::Instruction >> const>, llvm::ilist_iterator<llvm::Instruction const>, bool&) + 59 >> 13 libLLVM-3.2svn.so 0x00007f477384540c >> llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + >> 2620 >> 14 libLLVM-3.2svn.so 0x00007f47738459c6 >> llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + >> 896 >> 15 libLLVM-3.2svn.so 0x00007f4773175bfe >> llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 82 >> 16 libLLVM-3.2svn.so 0x00007f47733ac299 >> llvm::FPPassManager::runOnFunction(llvm::Function&) + 331 >> 17 libLLVM-3.2svn.so 0x00007f47733ac474 >> llvm::FPPassManager::runOnModule(llvm::Module&) + 86 >> 18 libLLVM-3.2svn.so 0x00007f47733abf6d >> llvm::MPPassManager::runOnModule(llvm::Module&) + 381 >> 19 libLLVM-3.2svn.so 0x00007f47733ad6eb >> llvm::PassManagerImpl::run(llvm::Module&) + 111 >> 20 libLLVM-3.2svn.so 0x00007f47733ad74d >> llvm::PassManager::run(llvm::Module&) + 33 >> 21 llc 0x000000000040eed6 main + 2835 >> 22 libc.so.6 0x00007f477191dbc6 __libc_start_main + 230 >> 23 llc 0x000000000040cc09 >> Stack dump: >> 0. Program arguments: llc -march=nvptx -mcpu=sm_20 >> 1. Running pass 'Function Pass Manager' on module '<stdin>'. >> 2. Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on >> function '@_Z17kernelgen_monitorPi' >> Aborted >> dmikushin at hp2:~> cd ~/rpmbuild/BUILD/llvm/ && svn info >> Path: . >> Working Copy Root Path: /tmp/rpmbuild_debug/BUILD/llvm >> URL: http://llvm.org/svn/llvm-project/llvm/trunk >> Repository Root: http://llvm.org/svn/llvm-project >> Repository UUID: 91177308-0d34-0410-b5e6-96231b3b80d8 >> Revision: 156703 >> Node Kind: directory >> Schedule: normal >> Last Changed Author: foad >> Last Changed Rev: 156703 >> Last Changed Date: 2012-05-12 12:30:16 +0400 (Sat, 12 May 2012) > ----------------------------------------------------------------------------------- > This email message is for the sole use of the intended recipient(s) and may contain > confidential information. Any unauthorized review, use, disclosure or distribution > is prohibited. If you are not the intended recipient, please contact the sender by > reply email and destroy all copies of the original message. > -----------------------------------------------------------------------------------
Possibly Parallel Threads
- [LLVMdev] NVPTX: __iAtomicCAS support ?
- [LLVMdev] [NVPTX] Backend failure in LegalizeDAG due to unimplemented expand in target lowering
- [LLVMdev] [NVPTX] Backend failure in LegalizeDAG due to unimplemented expand in target lowering
- [LLVMdev] Questions about clang options
- [LLVMdev] [NVPTX] Strange assertion around BlockToChain.clear(); in Release+Asserts build