Level Zero's SPIRV barrier function is not working on my Intel(R) UHD
Graphics P630 [0x9bf6]
The kernel I'm looking at implements a reduction (sum) within a workgroup
(local). It sums over the elements of an array of floats and stores the
intermediate results in local memory. The required synchronization within the
workgroup is implemented via
call void @_Z7barrierj(i32 3)
declare spir_func void @_Z7barrierj(i32) #0
attributes #0 = { readnone }
However, the kernel doesn't work correctly. It's result is off a little
bit from the correct value. This is a typical symptom when the barrier function
is not working correctly.
The kernel gets translated to SPIRV and then loaded and launched with Intel
Level Zero/compute-runtime. And that's when the barrier is not working. I
don't know at what stage it gets messed up.
I have tried passing '1' and '2' to the barrier function and not
issuing the call at all: Exactly the same outcome. This hints to the fact that
the barrier is not called at all.
This is what Clang generates for the following statement:
barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
tail call spir_func void @_Z7barrierj(i32 3) #4
declare spir_func void @_Z7barrierj(i32) local_unnamed_addr #2
attributes #2 = { convergent "frame-pointer"="none"
"no-trapping-math"="true"
"stack-protector-buffer-size"="8" }
attributes #4 = { convergent nounwind }
I don't think the additional attributes matter here.
Any ideas?
Thanks,
Frank
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220118/e45c8226/attachment.html>