James Price via llvm-dev
2016-Nov-09 18:13 UTC
[llvm-dev] Optimisation passes introducing address space casts
Hi, I’ve recently encountered an issue where the `instcombine` pass replaces an `llvm.memcpy` between two distinct address spaces with an `addrspacecast` instruction. As an example, see the trivial OpenCL kernel attached. I’m compiling like this: clang -cc1 -triple spir64-unknown-unknown -x cl -O0 -emit-llvm array_init.cl -o before.ll This yields an `llvm.memcpy` to copy the array initialiser data from the global variable (in `addrspace(2)`) to the `alloca` result (in `addrspace(0)`). I then apply the `instcombine` pass via: opt -S -instcombine before.ll -o after.ll This results in the memcpy being nuked, and the `addrspace(2)` data is now accessed directly via an `addrspacecast` to `addrspace(0)`. It seems to me that this sort of optimisation is only valid if it is guaranteed that the two address spaces alias for the given target triple (which for SPIR, they do not). This particular optimisation is coming from lines ~290-300 of InstCombineLoadStoreAlloca.cpp, although I suspect this isn’t the only case where this might happen. Adding a check to only perform this replacement if the two address spaces are equal fixes the issue for me, but this is probably too conservative since many targets with flat address spaces will probably benefit from this optimisation. It feels like passes should query the target about whether two address spaces alias before introducing an `addrspacecast`, but I’m not familiar enough with LLVM internals to know if this is information that is easy to make available (if it isn’t already). Is there something we can do here to avoid this sort of optimisation causing problems for targets with segmented address spaces? James -------------- next part -------------- A non-text attachment was scrubbed... Name: array_init.cl Type: application/octet-stream Size: 76 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161109/aa5c9258/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: before.ll Type: application/octet-stream Size: 2039 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161109/aa5c9258/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: after.ll Type: application/octet-stream Size: 1653 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161109/aa5c9258/attachment-0002.obj>
Mehdi Amini via llvm-dev
2016-Nov-09 21:11 UTC
[llvm-dev] Optimisation passes introducing address space casts
Hi,> On Nov 9, 2016, at 10:13 AM, James Price via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi, > > I’ve recently encountered an issue where the `instcombine` pass replaces an `llvm.memcpy` between two distinct address spaces with an `addrspacecast` instruction. > > As an example, see the trivial OpenCL kernel attached. I’m compiling like this: > > clang -cc1 -triple spir64-unknown-unknown -x cl -O0 -emit-llvm array_init.cl -o before.ll > > This yields an `llvm.memcpy` to copy the array initialiser data from the global variable (in `addrspace(2)`) to the `alloca` result (in `addrspace(0)`). > > I then apply the `instcombine` pass via: > > opt -S -instcombine before.ll -o after.ll > > This results in the memcpy being nuked, and the `addrspace(2)` data is now accessed directly via an `addrspacecast` to `addrspace(0)`. > > > It seems to me that this sort of optimisation is only valid if it is guaranteed that the two address spaces alias for the given target triple (which for SPIR, they do not).I’m not sure “alias” is the right consideration here.> > > This particular optimisation is coming from lines ~290-300 of InstCombineLoadStoreAlloca.cpp, although I suspect this isn’t the only case where this might happen. > > Adding a check to only perform this replacement if the two address spaces are equal fixes the issue for me, but this is probably too conservative since many targets with flat address spaces will probably benefit from this optimisation. > It feels like passes should query the target about whether two address spaces alias before introducing an `addrspacecast`, but I’m not familiar enough with LLVM internals to know if this is information that is easy to make available (if it isn’t already). > > Is there something we can do here to avoid this sort of optimisation causing problems for targets with segmented address spaces?This is a bug, we can’t assume any memory layout I believe. The memcpy is supposed to be equivalent to a sequence of load and store. Here we are just failing to keep the property that the load is performed through addrspace(2). The fix is not “trivial” though, the transformation can only be performed if the address of the alloca does not escape, and some rewriting of the uses is needed to propagate the address space through GEPs for instance. — Mehdi
James Price via llvm-dev
2017-Jan-02 22:12 UTC
[llvm-dev] Optimisation passes introducing address space casts
Hi Mehdi, Thanks for the reply - I’ve finally got round to trying to fix this based on your suggestion. I’ve got something that mostly works, but I just wanted to double-check something about the regression tests before I post a patch.> The memcpy is supposed to be equivalent to a sequence of load and store. Here we are just failing to keep the property that the load is performed through addrspace(2).Based on this comment, I am suspicious of the validity of a couple of existing instcombine regression tests in `memcpy-from-global.ll`. Specifically, there are two tests that look like this: define void @test3_addrspacecast() { %A = alloca %T %a = bitcast %T* %A to i8* call void @llvm.memcpy.p0i8.p1i8.i64(i8* %a, i8 addrspace(1)* addrspacecast (%T* @G to i8 addrspace(1)*), i64 124, i32 4, i1 false) call void @bar(i8* %a) readonly ; CHECK-LABEL: @test3_addrspacecast( ; CHECK-NEXT: call void @bar(i8* getelementptr inbounds (%T, %T* @G, i64 0, i32 0)) ret void } Here, there is a global variable in `addrspace(0)`, which is passed as a source operand to `llvm.memcpy` via an `addrspacecast` to `addrspace(1)`. The memcpy destination operand (an `alloca` in `addrspace(0)`) is then passed to a function. The test asserts that this memcpy should be removed and the global should just be passed directly to the function, but doesn’t this lose the property that the load should be performed through `addrspace(1)`, as per your comment above? I can (and have) fixed this as part of the patch I’m working on, but since this implies that a couple of existing regression tests would be incorrect I just wanted to double-check that I’m not misinterpreting something. CCing Matt Arsenault who added those specific tests originally in r207054. James> On 9 Nov 2016, at 21:11, Mehdi Amini <mehdi.amini at apple.com> wrote: > > Hi, > >> On Nov 9, 2016, at 10:13 AM, James Price via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hi, >> >> I’ve recently encountered an issue where the `instcombine` pass replaces an `llvm.memcpy` between two distinct address spaces with an `addrspacecast` instruction. >> >> As an example, see the trivial OpenCL kernel attached. I’m compiling like this: >> >> clang -cc1 -triple spir64-unknown-unknown -x cl -O0 -emit-llvm array_init.cl -o before.ll >> >> This yields an `llvm.memcpy` to copy the array initialiser data from the global variable (in `addrspace(2)`) to the `alloca` result (in `addrspace(0)`). >> >> I then apply the `instcombine` pass via: >> >> opt -S -instcombine before.ll -o after.ll >> >> This results in the memcpy being nuked, and the `addrspace(2)` data is now accessed directly via an `addrspacecast` to `addrspace(0)`. >> >> >> It seems to me that this sort of optimisation is only valid if it is guaranteed that the two address spaces alias for the given target triple (which for SPIR, they do not). > > I’m not sure “alias” is the right consideration here. > >> >> >> This particular optimisation is coming from lines ~290-300 of InstCombineLoadStoreAlloca.cpp, although I suspect this isn’t the only case where this might happen. >> >> Adding a check to only perform this replacement if the two address spaces are equal fixes the issue for me, but this is probably too conservative since many targets with flat address spaces will probably benefit from this optimisation. >> It feels like passes should query the target about whether two address spaces alias before introducing an `addrspacecast`, but I’m not familiar enough with LLVM internals to know if this is information that is easy to make available (if it isn’t already). >> >> Is there something we can do here to avoid this sort of optimisation causing problems for targets with segmented address spaces? > > This is a bug, we can’t assume any memory layout I believe. > > The memcpy is supposed to be equivalent to a sequence of load and store. Here we are just failing to keep the property that the load is performed through addrspace(2). The fix is not “trivial” though, the transformation can only be performed if the address of the alloca does not escape, and some rewriting of the uses is needed to propagate the address space through GEPs for instance. > > — > Mehdi-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170102/728b73af/attachment.html>