Doerfert, Johannes via llvm-dev
2020-Jan-28 17:44 UTC
[llvm-dev] Global removal pass - potential for improvement?
Hi Karl, Roman, On 01/28, Roman Lebedev via llvm-dev wrote:> On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > I was looking into how the global optimization pass fares against > > things like what's reported in > > https://bugs.llvm.org/show_bug.cgi?id=44676I need to take a closer look but I would have expected BasicAA to be able to determine that `do_log` and `R` cannot alias. In the -Os version (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R` clobbers the read from `do_log` which prevents us from removing the load/store pair. My reasoning would have been that we know the size of `do_log` to be less than the size accessed via `R`. What exactly goes wrong or if my logic is flawed needs to be examined. I would start looking at the debug generated by the code parts touched here: https://reviews.llvm.org/D66157> > Looking at this, I think it would be pretty trivial to optimize that > > down given that there are already threading assumptions made: > > https://godbolt.org/z/u6ZqoBOptimizing more aggressively based on forward process guarantees will get us in more trouble than we are already in. I don't have the link handy but as far as I remember the proposed solution was to have a forward process guarantee function attribute. I would recommend we look into that first before we start more aggressive optimizations which will cause problems for a lot of (non C/C++) folks.> > Is this something I can look into?Sure :)> > Another thing is that currently *all* external calls break this > > optimization, including calls to intrinsics that probably shouldn't: > > https://godbolt.org/z/pK7Cew > I think during load propagation, there is a legality check "here's a > load, and here's a store. > Is there anything in between that may have clobbered that memory location?".Right now we only have `__attribute__((pure/const))` but we want to expose all LLVM-IR attributes to the user soon [0] which will allow way more fine-grained control. Intrinsics are a different story again.> For calls, there are some attributes that are helpful here: > https://llvm.org/docs/LangRef.html#function-attributes > So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can > be annotated with readonly attribute, > thus signalling that it won't clobber that memory location?While target specific intrinsics are a bit more complicated we see the problem often with generic intrinsic already. We proposed the other day [1] to change the default semantics of non-target specific intrinsics such that you have to opt-in for certain effects. For the above example you want `llvm.x86.flags.write` to be `writeonly` and `inaccesiblememonly`. Also `nosync`, `willreturn`, ... Cheers, Johannes [0] https://www.youtube.com/watch?v=elmio6AoyK0 [1] http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/e867acde/attachment.sig>
Karl Rehm via llvm-dev
2020-Jan-28 18:06 UTC
[llvm-dev] Global removal pass - potential for improvement?
> > I need to take a closer look but I would have expected BasicAA to be > able to determine that `do_log` and `R` cannot alias. In the -Os version > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R` > clobbers the read from `do_log` which prevents us from removing the > load/store pair. My reasoning would have been that we know the size of > `do_log` to be less than the size accessed via `R`. What exactly goes > wrong or if my logic is flawed needs to be examined. I would start > looking at the debug generated by the code parts touched here: > https://reviews.llvm.org/D66157 >Well, if I recall correctly there's actually a problem with AA being *too* aggressive for this reason: https://gcc.godbolt.org/z/CXN8Gw There's a bug report about this somewhere, don't remember where though. On Tue, Jan 28, 2020 at 12:44 PM Doerfert, Johannes <jdoerfert at anl.gov> wrote:> Hi Karl, Roman, > > On 01/28, Roman Lebedev via llvm-dev wrote: > > On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > > > I was looking into how the global optimization pass fares against > > > things like what's reported in > > > https://bugs.llvm.org/show_bug.cgi?id=44676 > > I need to take a closer look but I would have expected BasicAA to be > able to determine that `do_log` and `R` cannot alias. In the -Os version > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R` > clobbers the read from `do_log` which prevents us from removing the > load/store pair. My reasoning would have been that we know the size of > `do_log` to be less than the size accessed via `R`. What exactly goes > wrong or if my logic is flawed needs to be examined. I would start > looking at the debug generated by the code parts touched here: > https://reviews.llvm.org/D66157 > > > > > Looking at this, I think it would be pretty trivial to optimize that > > > down given that there are already threading assumptions made: > > > https://godbolt.org/z/u6ZqoB > > Optimizing more aggressively based on forward process guarantees will > get us in more trouble than we are already in. I don't have the link > handy but as far as I remember the proposed solution was to have a > forward process guarantee function attribute. I would recommend we look > into that first before we start more aggressive optimizations which will > cause problems for a lot of (non C/C++) folks. > > > > > Is this something I can look into? > > Sure :) > > > > > Another thing is that currently *all* external calls break this > > > optimization, including calls to intrinsics that probably shouldn't: > > > https://godbolt.org/z/pK7Cew > > I think during load propagation, there is a legality check "here's a > > load, and here's a store. > > Is there anything in between that may have clobbered that memory > location?". > > Right now we only have `__attribute__((pure/const))` but we want to > expose all LLVM-IR attributes to the user soon [0] which will allow way > more fine-grained control. Intrinsics are a different story again. > > > > For calls, there are some attributes that are helpful here: > > https://llvm.org/docs/LangRef.html#function-attributes > > So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can > > be annotated with readonly attribute, > > thus signalling that it won't clobber that memory location? > > While target specific intrinsics are a bit more complicated we see the > problem often with generic intrinsic already. We proposed the other day > [1] to change the default semantics of non-target specific intrinsics > such that you have to opt-in for certain effects. > > For the above example you want `llvm.x86.flags.write` to be `writeonly` and > `inaccesiblememonly`. Also `nosync`, `willreturn`, ... > > Cheers, > Johannes > > > [0] https://www.youtube.com/watch?v=elmio6AoyK0 > [1] http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/3849fd45/attachment.html>
Doerfert, Johannes via llvm-dev
2020-Jan-28 18:23 UTC
[llvm-dev] Global removal pass - potential for improvement?
On 01/28, Karl Rehm wrote:> > > > I need to take a closer look but I would have expected BasicAA to be > > able to determine that `do_log` and `R` cannot alias. In the -Os version > > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R` > > clobbers the read from `do_log` which prevents us from removing the > > load/store pair. My reasoning would have been that we know the size of > > `do_log` to be less than the size accessed via `R`. What exactly goes > > wrong or if my logic is flawed needs to be examined. I would start > > looking at the debug generated by the code parts touched here: > > https://reviews.llvm.org/D66157 > > > > Well, if I recall correctly there's actually a problem with AA being *too* > aggressive for this reason: https://gcc.godbolt.org/z/CXN8Gw > There's a bug report about this somewhere, don't remember where though.That seems to be related purely to TBAA: https://gcc.godbolt.org/z/ZLSzHH What I described should fire if you see the allocation. Alternatively, we can add a new "max object size" attribute [0] which gives a size upper bound for the pointed to memory. In either case, if you know the object pointed to by A is maximal N bytes of size, and the object pointed to by B is at most M bytes of size, and N < M, A and B do not alias. As of now, we use `dereferenceable` to determine the minimum size of the underlying object* and require to see the definition (I think) to obtain a maximum size. * It's actually a lower bound on the minimum size based on the pointer but that is conservatively fine too. [0] https://www.youtube.com/watch?v=HVvvCSSLiTw> On Tue, Jan 28, 2020 at 12:44 PM Doerfert, Johannes <jdoerfert at anl.gov> > wrote: > > > Hi Karl, Roman, > > > > On 01/28, Roman Lebedev via llvm-dev wrote: > > > On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev > > > <llvm-dev at lists.llvm.org> wrote: > > > > I was looking into how the global optimization pass fares against > > > > things like what's reported in > > > > https://bugs.llvm.org/show_bug.cgi?id=44676 > > > > I need to take a closer look but I would have expected BasicAA to be > > able to determine that `do_log` and `R` cannot alias. In the -Os version > > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R` > > clobbers the read from `do_log` which prevents us from removing the > > load/store pair. My reasoning would have been that we know the size of > > `do_log` to be less than the size accessed via `R`. What exactly goes > > wrong or if my logic is flawed needs to be examined. I would start > > looking at the debug generated by the code parts touched here: > > https://reviews.llvm.org/D66157 > > > > > > > > Looking at this, I think it would be pretty trivial to optimize that > > > > down given that there are already threading assumptions made: > > > > https://godbolt.org/z/u6ZqoB > > > > Optimizing more aggressively based on forward process guarantees will > > get us in more trouble than we are already in. I don't have the link > > handy but as far as I remember the proposed solution was to have a > > forward process guarantee function attribute. I would recommend we look > > into that first before we start more aggressive optimizations which will > > cause problems for a lot of (non C/C++) folks. > > > > > > > > Is this something I can look into? > > > > Sure :) > > > > > > > > Another thing is that currently *all* external calls break this > > > > optimization, including calls to intrinsics that probably shouldn't: > > > > https://godbolt.org/z/pK7Cew > > > I think during load propagation, there is a legality check "here's a > > > load, and here's a store. > > > Is there anything in between that may have clobbered that memory > > location?". > > > > Right now we only have `__attribute__((pure/const))` but we want to > > expose all LLVM-IR attributes to the user soon [0] which will allow way > > more fine-grained control. Intrinsics are a different story again. > > > > > > > For calls, there are some attributes that are helpful here: > > > https://llvm.org/docs/LangRef.html#function-attributes > > > So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can > > > be annotated with readonly attribute, > > > thus signalling that it won't clobber that memory location? > > > > While target specific intrinsics are a bit more complicated we see the > > problem often with generic intrinsic already. We proposed the other day > > [1] to change the default semantics of non-target specific intrinsics > > such that you have to opt-in for certain effects. > > > > For the above example you want `llvm.x86.flags.write` to be `writeonly` and > > `inaccesiblememonly`. Also `nosync`, `willreturn`, ... > > > > Cheers, > > Johannes > > > > > > [0] https://www.youtube.com/watch?v=elmio6AoyK0 > > [1] http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html > >-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/bf0be1e5/attachment.sig>
Possibly Parallel Threads
- Global removal pass - potential for improvement?
- Global removal pass - potential for improvement?
- Global removal pass - potential for improvement?
- Questions about jump threading optimization and what we can do
- Questions about jump threading optimization and what we can do