Doerfert, Johannes via llvm-dev
2020-Jan-28 18:23 UTC
[llvm-dev] Global removal pass - potential for improvement?
On 01/28, Karl Rehm wrote:> > > > I need to take a closer look but I would have expected BasicAA to be > > able to determine that `do_log` and `R` cannot alias. In the -Os version > > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R` > > clobbers the read from `do_log` which prevents us from removing the > > load/store pair. My reasoning would have been that we know the size of > > `do_log` to be less than the size accessed via `R`. What exactly goes > > wrong or if my logic is flawed needs to be examined. I would start > > looking at the debug generated by the code parts touched here: > > https://reviews.llvm.org/D66157 > > > > Well, if I recall correctly there's actually a problem with AA being *too* > aggressive for this reason: https://gcc.godbolt.org/z/CXN8Gw > There's a bug report about this somewhere, don't remember where though.That seems to be related purely to TBAA: https://gcc.godbolt.org/z/ZLSzHH What I described should fire if you see the allocation. Alternatively, we can add a new "max object size" attribute [0] which gives a size upper bound for the pointed to memory. In either case, if you know the object pointed to by A is maximal N bytes of size, and the object pointed to by B is at most M bytes of size, and N < M, A and B do not alias. As of now, we use `dereferenceable` to determine the minimum size of the underlying object* and require to see the definition (I think) to obtain a maximum size. * It's actually a lower bound on the minimum size based on the pointer but that is conservatively fine too. [0] https://www.youtube.com/watch?v=HVvvCSSLiTw> On Tue, Jan 28, 2020 at 12:44 PM Doerfert, Johannes <jdoerfert at anl.gov> > wrote: > > > Hi Karl, Roman, > > > > On 01/28, Roman Lebedev via llvm-dev wrote: > > > On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev > > > <llvm-dev at lists.llvm.org> wrote: > > > > I was looking into how the global optimization pass fares against > > > > things like what's reported in > > > > https://bugs.llvm.org/show_bug.cgi?id=44676 > > > > I need to take a closer look but I would have expected BasicAA to be > > able to determine that `do_log` and `R` cannot alias. In the -Os version > > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R` > > clobbers the read from `do_log` which prevents us from removing the > > load/store pair. My reasoning would have been that we know the size of > > `do_log` to be less than the size accessed via `R`. What exactly goes > > wrong or if my logic is flawed needs to be examined. I would start > > looking at the debug generated by the code parts touched here: > > https://reviews.llvm.org/D66157 > > > > > > > > Looking at this, I think it would be pretty trivial to optimize that > > > > down given that there are already threading assumptions made: > > > > https://godbolt.org/z/u6ZqoB > > > > Optimizing more aggressively based on forward process guarantees will > > get us in more trouble than we are already in. I don't have the link > > handy but as far as I remember the proposed solution was to have a > > forward process guarantee function attribute. I would recommend we look > > into that first before we start more aggressive optimizations which will > > cause problems for a lot of (non C/C++) folks. > > > > > > > > Is this something I can look into? > > > > Sure :) > > > > > > > > Another thing is that currently *all* external calls break this > > > > optimization, including calls to intrinsics that probably shouldn't: > > > > https://godbolt.org/z/pK7Cew > > > I think during load propagation, there is a legality check "here's a > > > load, and here's a store. > > > Is there anything in between that may have clobbered that memory > > location?". > > > > Right now we only have `__attribute__((pure/const))` but we want to > > expose all LLVM-IR attributes to the user soon [0] which will allow way > > more fine-grained control. Intrinsics are a different story again. > > > > > > > For calls, there are some attributes that are helpful here: > > > https://llvm.org/docs/LangRef.html#function-attributes > > > So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can > > > be annotated with readonly attribute, > > > thus signalling that it won't clobber that memory location? > > > > While target specific intrinsics are a bit more complicated we see the > > problem often with generic intrinsic already. We proposed the other day > > [1] to change the default semantics of non-target specific intrinsics > > such that you have to opt-in for certain effects. > > > > For the above example you want `llvm.x86.flags.write` to be `writeonly` and > > `inaccesiblememonly`. Also `nosync`, `willreturn`, ... > > > > Cheers, > > Johannes > > > > > > [0] https://www.youtube.com/watch?v=elmio6AoyK0 > > [1] http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html > >-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/bf0be1e5/attachment.sig>
Karl Rehm via llvm-dev
2020-Jan-28 18:42 UTC
[llvm-dev] Global removal pass - potential for improvement?
I'm thinking about this a bit more, and I don't know if it's something that can be used in a generic backend like LLVM. Should accessing a smaller object from a bigger pointer be considered UB? In contrast, sing the assumption that bigger types can not be dereferenced from smaller types is a good idea. If this is already added to BasicAA, I wonder why it does not optimize this out? On Tue, Jan 28, 2020 at 1:23 PM Doerfert, Johannes <jdoerfert at anl.gov> wrote:> On 01/28, Karl Rehm wrote: > > > > > > I need to take a closer look but I would have expected BasicAA to be > > > able to determine that `do_log` and `R` cannot alias. In the -Os > version > > > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R` > > > clobbers the read from `do_log` which prevents us from removing the > > > load/store pair. My reasoning would have been that we know the size of > > > `do_log` to be less than the size accessed via `R`. What exactly goes > > > wrong or if my logic is flawed needs to be examined. I would start > > > looking at the debug generated by the code parts touched here: > > > https://reviews.llvm.org/D66157 > > > > > > > Well, if I recall correctly there's actually a problem with AA being > *too* > > aggressive for this reason: https://gcc.godbolt.org/z/CXN8Gw > > There's a bug report about this somewhere, don't remember where though. > > That seems to be related purely to TBAA: https://gcc.godbolt.org/z/ZLSzHH > > What I described should fire if you see the allocation. Alternatively, > we can add a new "max object size" attribute [0] which gives a size > upper bound for the pointed to memory. In either case, if you know the > object pointed to by A is maximal N bytes of size, and the object > pointed to by B is at most M bytes of size, and N < M, A and B do not > alias. As of now, we use `dereferenceable` to determine the minimum size > of the underlying object* and require to see the definition (I think) to > obtain a maximum size. > > * It's actually a lower bound on the minimum size based on the pointer > but that is conservatively fine too. > > [0] https://www.youtube.com/watch?v=HVvvCSSLiTw > > > > On Tue, Jan 28, 2020 at 12:44 PM Doerfert, Johannes <jdoerfert at anl.gov> > > wrote: > > > > > Hi Karl, Roman, > > > > > > On 01/28, Roman Lebedev via llvm-dev wrote: > > > > On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev > > > > <llvm-dev at lists.llvm.org> wrote: > > > > > I was looking into how the global optimization pass fares against > > > > > things like what's reported in > > > > > https://bugs.llvm.org/show_bug.cgi?id=44676 > > > > > > I need to take a closer look but I would have expected BasicAA to be > > > able to determine that `do_log` and `R` cannot alias. In the -Os > version > > > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R` > > > clobbers the read from `do_log` which prevents us from removing the > > > load/store pair. My reasoning would have been that we know the size of > > > `do_log` to be less than the size accessed via `R`. What exactly goes > > > wrong or if my logic is flawed needs to be examined. I would start > > > looking at the debug generated by the code parts touched here: > > > https://reviews.llvm.org/D66157 > > > > > > > > > > > Looking at this, I think it would be pretty trivial to optimize > that > > > > > down given that there are already threading assumptions made: > > > > > https://godbolt.org/z/u6ZqoB > > > > > > Optimizing more aggressively based on forward process guarantees will > > > get us in more trouble than we are already in. I don't have the link > > > handy but as far as I remember the proposed solution was to have a > > > forward process guarantee function attribute. I would recommend we look > > > into that first before we start more aggressive optimizations which > will > > > cause problems for a lot of (non C/C++) folks. > > > > > > > > > > > Is this something I can look into? > > > > > > Sure :) > > > > > > > > > > > Another thing is that currently *all* external calls break this > > > > > optimization, including calls to intrinsics that probably > shouldn't: > > > > > https://godbolt.org/z/pK7Cew > > > > I think during load propagation, there is a legality check "here's a > > > > load, and here's a store. > > > > Is there anything in between that may have clobbered that memory > > > location?". > > > > > > Right now we only have `__attribute__((pure/const))` but we want to > > > expose all LLVM-IR attributes to the user soon [0] which will allow way > > > more fine-grained control. Intrinsics are a different story again. > > > > > > > > > > For calls, there are some attributes that are helpful here: > > > > https://llvm.org/docs/LangRef.html#function-attributes > > > > So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can > > > > be annotated with readonly attribute, > > > > thus signalling that it won't clobber that memory location? > > > > > > While target specific intrinsics are a bit more complicated we see the > > > problem often with generic intrinsic already. We proposed the other day > > > [1] to change the default semantics of non-target specific intrinsics > > > such that you have to opt-in for certain effects. > > > > > > For the above example you want `llvm.x86.flags.write` to be > `writeonly` and > > > `inaccesiblememonly`. Also `nosync`, `willreturn`, ... > > > > > > Cheers, > > > Johannes > > > > > > > > > [0] https://www.youtube.com/watch?v=elmio6AoyK0 > > > [1] http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html > > > > > -- > > Johannes Doerfert > Researcher > > Argonne National Laboratory > Lemont, IL 60439, USA > > jdoerfert at anl.gov >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/b5f496b4/attachment.html>
Doerfert, Johannes via llvm-dev
2020-Jan-28 18:51 UTC
[llvm-dev] Global removal pass - potential for improvement?
It's not about the pointer per se but the object it points to. If we have 2 objects of which we know the minimal/maximal size respectively we can sometimes conclude the objects must be different. Accesses to different objects do not alias. We already use that logic but we don't have the upper bound size as an attribute. Hence, it only applies if we know the exact size, basically is we see the definition. I might have explained it in the YouTube link I included before. I don't know what you mean by llvm being a generic backend but what I describe above and in the tutorial should make it in as soon as the code is cleaned up. It's on my GitHub already. -- written from my phone ________________________________ From: Karl Rehm <klrehm123 at gmail.com> Sent: Tuesday, January 28, 2020 12:42:33 PM To: Doerfert, Johannes <jdoerfert at anl.gov> Cc: Roman Lebedev <lebedev.ri at gmail.com>; llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Global removal pass - potential for improvement? I'm thinking about this a bit more, and I don't know if it's something that can be used in a generic backend like LLVM. Should accessing a smaller object from a bigger pointer be considered UB? In contrast, sing the assumption that bigger types can not be dereferenced from smaller types is a good idea. If this is already added to BasicAA, I wonder why it does not optimize this out? On Tue, Jan 28, 2020 at 1:23 PM Doerfert, Johannes <jdoerfert at anl.gov<mailto:jdoerfert at anl.gov>> wrote: On 01/28, Karl Rehm wrote:> > > > I need to take a closer look but I would have expected BasicAA to be > > able to determine that `do_log` and `R` cannot alias. In the -Os version > > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R` > > clobbers the read from `do_log` which prevents us from removing the > > load/store pair. My reasoning would have been that we know the size of > > `do_log` to be less than the size accessed via `R`. What exactly goes > > wrong or if my logic is flawed needs to be examined. I would start > > looking at the debug generated by the code parts touched here: > > https://reviews.llvm.org/D66157 > > > > Well, if I recall correctly there's actually a problem with AA being *too* > aggressive for this reason: https://gcc.godbolt.org/z/CXN8Gw > There's a bug report about this somewhere, don't remember where though.That seems to be related purely to TBAA: https://gcc.godbolt.org/z/ZLSzHH What I described should fire if you see the allocation. Alternatively, we can add a new "max object size" attribute [0] which gives a size upper bound for the pointed to memory. In either case, if you know the object pointed to by A is maximal N bytes of size, and the object pointed to by B is at most M bytes of size, and N < M, A and B do not alias. As of now, we use `dereferenceable` to determine the minimum size of the underlying object* and require to see the definition (I think) to obtain a maximum size. * It's actually a lower bound on the minimum size based on the pointer but that is conservatively fine too. [0] https://www.youtube.com/watch?v=HVvvCSSLiTw> On Tue, Jan 28, 2020 at 12:44 PM Doerfert, Johannes <jdoerfert at anl.gov<mailto:jdoerfert at anl.gov>> > wrote: > > > Hi Karl, Roman, > > > > On 01/28, Roman Lebedev via llvm-dev wrote: > > > On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev > > > <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: > > > > I was looking into how the global optimization pass fares against > > > > things like what's reported in > > > > https://bugs.llvm.org/show_bug.cgi?id=44676 > > > > I need to take a closer look but I would have expected BasicAA to be > > able to determine that `do_log` and `R` cannot alias. In the -Os version > > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R` > > clobbers the read from `do_log` which prevents us from removing the > > load/store pair. My reasoning would have been that we know the size of > > `do_log` to be less than the size accessed via `R`. What exactly goes > > wrong or if my logic is flawed needs to be examined. I would start > > looking at the debug generated by the code parts touched here: > > https://reviews.llvm.org/D66157 > > > > > > > > Looking at this, I think it would be pretty trivial to optimize that > > > > down given that there are already threading assumptions made: > > > > https://godbolt.org/z/u6ZqoB > > > > Optimizing more aggressively based on forward process guarantees will > > get us in more trouble than we are already in. I don't have the link > > handy but as far as I remember the proposed solution was to have a > > forward process guarantee function attribute. I would recommend we look > > into that first before we start more aggressive optimizations which will > > cause problems for a lot of (non C/C++) folks. > > > > > > > > Is this something I can look into? > > > > Sure :) > > > > > > > > Another thing is that currently *all* external calls break this > > > > optimization, including calls to intrinsics that probably shouldn't: > > > > https://godbolt.org/z/pK7Cew > > > I think during load propagation, there is a legality check "here's a > > > load, and here's a store. > > > Is there anything in between that may have clobbered that memory > > location?". > > > > Right now we only have `__attribute__((pure/const))` but we want to > > expose all LLVM-IR attributes to the user soon [0] which will allow way > > more fine-grained control. Intrinsics are a different story again. > > > > > > > For calls, there are some attributes that are helpful here: > > > https://llvm.org/docs/LangRef.html#function-attributes > > > So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can > > > be annotated with readonly attribute, > > > thus signalling that it won't clobber that memory location? > > > > While target specific intrinsics are a bit more complicated we see the > > problem often with generic intrinsic already. We proposed the other day > > [1] to change the default semantics of non-target specific intrinsics > > such that you have to opt-in for certain effects. > > > > For the above example you want `llvm.x86.flags.write` to be `writeonly` and > > `inaccesiblememonly`. Also `nosync`, `willreturn`, ... > > > > Cheers, > > Johannes > > > > > > [0] https://www.youtube.com/watch?v=elmio6AoyK0 > > [1] http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html > >-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov<mailto:jdoerfert at anl.gov> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/729741a5/attachment-0001.html>
Apparently Analagous Threads
- Global removal pass - potential for improvement?
- Global removal pass - potential for improvement?
- Questions about jump threading optimization and what we can do
- Questions about jump threading optimization and what we can do
- Questions about jump threading optimization and what we can do