Hans de Goede via llvm-dev
2016-Jan-11 14:35 UTC
[llvm-dev] Some llvm questions (for a tgsi backend)
Hi All, Quick self intro: I'm a Foss developer working for Red Hat's graphics team. I'm currently working on a tgsi backend for llvm with the purpose of being able to compile opencl to tgsi, and to integrate this into mesa's nouveau driver / clover state tracker to add opencl support for Nvidia cards to mesa. I'm currently at the point where I can compile a simple opencl program to something which sort of looks like tgsi. You can find my latest work on this here: http://cgit.freedesktop.org/~jwrdegoede/llvm http://cgit.freedesktop.org/~jwrdegoede/clang I've a little test program of which I have 3 versions now, 1 raw gallium calls + a tgsi kernel 2 opencl calls to clover + a tgsi kernel 3 opencl calls to clover + an opencl kernel 1 and 2 have been tested on a kepler card, 3 has been tested with pocl. My goal for this week is to get the tgsi backend to produce code which I can copy and paste into 2 and then have it working on a kepler card. The test program looks like this: __kernel void test_kern(__global uint *vals, __global uint *buf) { uint id = get_global_id(0); buf[32 * id] -= vals[id]; } The llvm ir looks like this: bin/clang -x cl -c -emit-llvm -target tgsi-- -include /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl ; ModuleID = '/home/hans/foo.cl' target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32" target triple = "tgsi--" ; Function Attrs: nounwind define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32 addrspace(1)* nocapture %buf) #0 { entry: %call = tail call i32 @_Z13get_global_idj(i32 0) #2 %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 %call %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7 %mul = shl i32 %call, 5 %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 %mul %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7 %sub = sub i32 %1, %0 store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7 ret void } declare i32 @_Z13get_global_idj(i32) #1 attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #2 = { nounwind } !opencl.kernels = !{!0} !llvm.ident = !{!6} !0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, !2, !3, !4, !5} !1 = !{!"kernel_arg_addr_space", i32 1, i32 1} !2 = !{!"kernel_arg_access_qual", !"none", !"none"} !3 = !{!"kernel_arg_type", !"uint*", !"uint*"} !4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"} !5 = !{!"kernel_arg_type_qual", !"", !""} !6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git 9376f992e00569bd08a4ecf3a1d06d8b93c97681) (http://llvm.org/git/llvm.git 7a311143550c6fc01aa5000049825ecc09787440)"} !7 = !{!8, !8, i64 0} !8 = !{!"int", !9, i64 0} !9 = !{!"omnipotent char", !10, i64 0} !10 = !{!"Simple C/C++ TBAA"} And the generated "tgsi" looks like this: .text .file "/home/hans/foo.cl" .globl test_kern test_kern: BGNSUB MOVis TEMP1x, 0 CAL _Z13get_global_idj SHLs TEMP1y, TEMP1x, 7 LOADiis TEMP1z, [4] UADDs TEMP1y, TEMP1z, TEMP1y SHLs TEMP1x, TEMP1x, 2 LOADiis TEMP1z, [0] UADDs TEMP1x, TEMP1z, TEMP1x LOADgis TEMP1x, [TEMP1x] INEGs TEMP1x, TEMP1x LOADgis TEMP1z, [TEMP1y] UADDs TEMP1x, TEMP1x, TEMP1z STOREgis [TEMP1y], TEMP1x RET ENDSUB Working tgsi for this would look like this: COMP DCL SV[0], THREAD_ID[0] DCL TEMP[0], LOCAL DCL TEMP[1], LOCAL IMM UINT32 { 0, 0, 0, 0 } IMM UINT32 { 4, 0, 0, 0 } IMM UINT32 { 128, 0, 0, 0 } BGNSUB LOAD TEMP[0].xy, RINPUT, IMM[0] UMUL TEMP[1].x, SV[0], IMM[1] UADD TEMP[0].x, TEMP[0], TEMP[1] UMUL TEMP[1].x, SV[0], IMM[2] UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx LOAD TEMP[1].x, RGLOBAL, TEMP[0] LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy UADD TEMP[1].x, TEMP[0], -TEMP[1] STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1] RET ENDSUB; So my questions (I'm still quite green when it comes to llvm): 1) As you can see a proper tgsi program needs a header to declare which registers (etc) it is using, in which class-method should I implement this ? 2) Immediates need to be declared with a specific value and then addressed as IMM[x], how would I go about this ? 3) The get_global_id call needs to be translated into simply using the SV[0] "register", how would I go about this ? 4) The global and input load / stores are not handled correctly, I see that the LOAD instructions get postfixed with a i reps. g for input / global how would I go about modifying the code emitter (AsmPrinter?) to change "LOADi" into "LOAD <dest> RINPUT <offset>"? 5) Talking about the lowecase suffixes to the instructions, these should not be part of the output, how do I filter these? 6) And finally, the current llvm-tgsi output uses e.g. TEMP1y where as for the destination it should use TEMP[1].y and for the sources it should use TEMP[1].xxxx (so include proper swizzling info). Lots of questions, sorry about that. Feel free to point me to some relvant parts of the docs, I've tried to find answers myself but I've gotten a bit lost in the docs. Regards, Hans
Matt Arsenault via llvm-dev
2016-Jan-19 02:21 UTC
[llvm-dev] Some llvm questions (for a tgsi backend)
On 01/11/2016 06:35 AM, Hans de Goede via llvm-dev wrote:> Hi All, > > Quick self intro: I'm a Foss developer working for Red Hat's graphics > team. > I'm currently working on a tgsi backend for llvm with the purpose of > being > able to compile opencl to tgsi, and to integrate this into mesa's nouveau > driver / clover state tracker to add opencl support for Nvidia cards > to mesa. > > I'm currently at the point where I can compile a simple opencl program > to something which sort of looks like tgsi. > > You can find my latest work on this here: > http://cgit.freedesktop.org/~jwrdegoede/llvm > http://cgit.freedesktop.org/~jwrdegoede/clang > > I've a little test program of which I have 3 versions now, > 1 raw gallium calls + a tgsi kernel > 2 opencl calls to clover + a tgsi kernel > 3 opencl calls to clover + an opencl kernel > > 1 and 2 have been tested on a kepler card, 3 has been > tested with pocl. My goal for this week is to get > the tgsi backend to produce code which I can copy > and paste into 2 and then have it working on a kepler card. > > The test program looks like this: > > __kernel void test_kern(__global uint *vals, __global uint *buf) > { > uint id = get_global_id(0); > > buf[32 * id] -= vals[id]; > } > > The llvm ir looks like this: > > bin/clang -x cl -c -emit-llvm -target tgsi-- -include > /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl > > ; ModuleID = '/home/hans/foo.cl' > target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32" > target triple = "tgsi--" > > ; Function Attrs: nounwind > define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32 > addrspace(1)* nocapture %buf) #0 { > entry: > %call = tail call i32 @_Z13get_global_idj(i32 0) #2 > %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 > %call > %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7 > %mul = shl i32 %call, 5 > %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 > %mul > %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7 > %sub = sub i32 %1, %0 > store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7 > ret void > } > > declare i32 @_Z13get_global_idj(i32) #1 > > attributes #0 = { nounwind "disable-tail-calls"="false" > "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" > "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" > "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" > "unsafe-fp-math"="false" "use-soft-float"="false" } > attributes #1 = { "disable-tail-calls"="false" > "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" > "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" > "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" > "unsafe-fp-math"="false" "use-soft-float"="false" } > attributes #2 = { nounwind } > > !opencl.kernels = !{!0} > !llvm.ident = !{!6} > > !0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, > !2, !3, !4, !5} > !1 = !{!"kernel_arg_addr_space", i32 1, i32 1} > !2 = !{!"kernel_arg_access_qual", !"none", !"none"} > !3 = !{!"kernel_arg_type", !"uint*", !"uint*"} > !4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"} > !5 = !{!"kernel_arg_type_qual", !"", !""} > !6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git > 9376f992e00569bd08a4ecf3a1d06d8b93c97681) > (http://llvm.org/git/llvm.git 7a311143550c6fc01aa5000049825ecc09787440)"} > !7 = !{!8, !8, i64 0} > !8 = !{!"int", !9, i64 0} > !9 = !{!"omnipotent char", !10, i64 0} > !10 = !{!"Simple C/C++ TBAA"} > > And the generated "tgsi" looks like this: > > .text > .file "/home/hans/foo.cl" > .globl test_kern > test_kern: > BGNSUB > MOVis TEMP1x, 0 > CAL _Z13get_global_idj > SHLs TEMP1y, TEMP1x, 7 > LOADiis TEMP1z, [4] > UADDs TEMP1y, TEMP1z, TEMP1y > SHLs TEMP1x, TEMP1x, 2 > LOADiis TEMP1z, [0] > UADDs TEMP1x, TEMP1z, TEMP1x > LOADgis TEMP1x, [TEMP1x] > INEGs TEMP1x, TEMP1x > LOADgis TEMP1z, [TEMP1y] > UADDs TEMP1x, TEMP1x, TEMP1z > STOREgis [TEMP1y], TEMP1x > RET > ENDSUB > > Working tgsi for this would look like this: > > COMP > DCL SV[0], THREAD_ID[0] > DCL TEMP[0], LOCAL > DCL TEMP[1], LOCAL > IMM UINT32 { 0, 0, 0, 0 } > IMM UINT32 { 4, 0, 0, 0 } > IMM UINT32 { 128, 0, 0, 0 } > > BGNSUB > LOAD TEMP[0].xy, RINPUT, IMM[0] > UMUL TEMP[1].x, SV[0], IMM[1] > UADD TEMP[0].x, TEMP[0], TEMP[1] > UMUL TEMP[1].x, SV[0], IMM[2] > UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx > LOAD TEMP[1].x, RGLOBAL, TEMP[0] > LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy > UADD TEMP[1].x, TEMP[0], -TEMP[1] > STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1] > RET > ENDSUB; > > So my questions (I'm still quite green when it comes to llvm): > > 1) As you can see a proper tgsi program needs a header > to declare which registers (etc) it is using, in which > class-method should I implement this ?Module level declarations need to be handled in the AsmPrinter. There are various hooks for printing things like this at different points in the module.> > 2) Immediates need to be declared with a specific > value and then addressed as IMM[x], how would I go about > this ?This is very painful, and I'm not really sure what the best way to deal with this is. I would recommend against what AMDIL did for this, which was jam these into a map in a class attached to the MachineModuleInfo updated every time an immediate operand is added. I would probably implement this as a late custom pass which looks for any immediate operands, and create new symbols as new immediates are encountered.> > 3) The get_global_id call needs to be translated into > simply using the SV[0] "register", how would I go about > this ?You can implement this with an intrinsic, which you can then lower to a read from a register. For an example, look at how amdgcn_dispatch_ptr is implemented.> > 4) The global and input load / stores are not handled > correctly, I see that the LOAD instructions get postfixed > with a i reps. g for input / global how would I go about > modifying the code emitter (AsmPrinter?) to change "LOADi" > into "LOAD <dest> RINPUT <offset>"? > > 5) Talking about the lowecase suffixes to the instructions, > these should not be part of the output, how do I filter these?However you have defined the AsmString for the instructions is including these. You are probably defining your AsmString in some simple way that includes the full instruction record name as part of it. Also note that it isn't really necessary to have different instructions for different immediate/register operands, although many targets still do this.> > 6) And finally, the current llvm-tgsi output uses e.g. > TEMP1y where as for the destination it should use TEMP[1].y > and for the sources it should use TEMP[1].xxxx (so include > proper swizzling info).You need to implement a custom printOperand in your InstPrinter if you want to change how registers are formatted.> > Lots of questions, sorry about that. Feel free to point me > to some relvant parts of the docs, I've tried to find answers > myself but I've gotten a bit lost in the docs. > > Regards, > > Hans > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev