thr3ads.net - llvm dev - [llvm-dev] Global removal pass - potential for improvement? [Jan 2020]

If this information is useful, please help other people find it:
Share via:

Doerfert, Johannes via llvm-dev

2020-Jan-28 17:44 UTC

[llvm-dev] Global removal pass - potential for improvement?

Hi Karl, Roman,

On 01/28, Roman Lebedev via llvm-dev wrote:> On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > I was looking into how the global optimization pass fares against
> > things like what's reported in
> > https://bugs.llvm.org/show_bug.cgi?id=44676
I need to take a closer look but I would have expected BasicAA to be
able to determine that `do_log` and `R` cannot alias. In the -Os version
(lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R`
clobbers the read from `do_log` which prevents us from removing the
load/store pair. My reasoning would have been that we know the size of
`do_log` to be less than the size accessed via `R`. What exactly goes
wrong or if my logic is flawed needs to be examined. I would start
looking at the debug generated by the code parts touched here:
https://reviews.llvm.org/D66157

> > Looking at this, I think it would be pretty trivial to optimize that
> > down given that there are already threading assumptions made:
> > https://godbolt.org/z/u6ZqoB
Optimizing more aggressively based on forward process guarantees will
get us in more trouble than we are already in. I don't have the link
handy but as far as I remember the proposed solution was to have a
forward process guarantee function attribute. I would recommend we look
into that first before we start more aggressive optimizations which will
cause problems for a lot of (non C/C++) folks.

> > Is this something I can look into?
Sure :)

> > Another thing is that currently *all* external calls break this
> > optimization, including calls to intrinsics that probably
shouldn't:
> > https://godbolt.org/z/pK7Cew
> I think during load propagation, there is a legality check "here's
a
> load, and here's a store.
> Is there anything in between that may have clobbered that memory
location?".
Right now we only have `__attribute__((pure/const))` but we want to
expose all LLVM-IR attributes to the user soon [0] which will allow way
more fine-grained control. Intrinsics are a different story again.

> For calls, there are some attributes that are helpful here:
> https://llvm.org/docs/LangRef.html#function-attributes
> So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can
> be annotated with readonly attribute,
> thus signalling that it won't clobber that memory location?
While target specific intrinsics are a bit more complicated we see the
problem often with generic intrinsic already. We proposed the other day
[1] to change the default semantics of non-target specific intrinsics
such that you have to opt-in for certain effects.

For the above example you want `llvm.x86.flags.write` to be `writeonly` and
`inaccesiblememonly`. Also `nosync`, `willreturn`, ...

Cheers,
  Johannes


[0] https://www.youtube.com/watch?v=elmio6AoyK0
[1] http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/e867acde/attachment.sig>

Karl Rehm via llvm-dev

2020-Jan-28 18:06 UTC

head link

[llvm-dev] Global removal pass - potential for improvement?

>
> I need to take a closer look but I would have expected BasicAA to be
> able to determine that `do_log` and `R` cannot alias. In the -Os version
> (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R`
> clobbers the read from `do_log` which prevents us from removing the
> load/store pair. My reasoning would have been that we know the size of
> `do_log` to be less than the size accessed via `R`. What exactly goes
> wrong or if my logic is flawed needs to be examined. I would start
> looking at the debug generated by the code parts touched here:
> https://reviews.llvm.org/D66157
>
Well, if I recall correctly there's actually a problem with AA being *too*
aggressive for this reason: https://gcc.godbolt.org/z/CXN8Gw
There's a bug report about this somewhere, don't remember where though.


On Tue, Jan 28, 2020 at 12:44 PM Doerfert, Johannes <jdoerfert at anl.gov>
wrote:
> Hi Karl, Roman,
>
> On 01/28, Roman Lebedev via llvm-dev wrote:
> > On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > > I was looking into how the global optimization pass fares against
> > > things like what's reported in
> > > https://bugs.llvm.org/show_bug.cgi?id=44676
>
> I need to take a closer look but I would have expected BasicAA to be
> able to determine that `do_log` and `R` cannot alias. In the -Os version
> (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R`
> clobbers the read from `do_log` which prevents us from removing the
> load/store pair. My reasoning would have been that we know the size of
> `do_log` to be less than the size accessed via `R`. What exactly goes
> wrong or if my logic is flawed needs to be examined. I would start
> looking at the debug generated by the code parts touched here:
> https://reviews.llvm.org/D66157
>
>
> > > Looking at this, I think it would be pretty trivial to optimize
that
> > > down given that there are already threading assumptions made:
> > > https://godbolt.org/z/u6ZqoB
>
> Optimizing more aggressively based on forward process guarantees will
> get us in more trouble than we are already in. I don't have the link
> handy but as far as I remember the proposed solution was to have a
> forward process guarantee function attribute. I would recommend we look
> into that first before we start more aggressive optimizations which will
> cause problems for a lot of (non C/C++) folks.
>
>
> > > Is this something I can look into?
>
> Sure :)
>
>
> > > Another thing is that currently *all* external calls break this
> > > optimization, including calls to intrinsics that probably
shouldn't:
> > > https://godbolt.org/z/pK7Cew
> > I think during load propagation, there is a legality check
"here's a
> > load, and here's a store.
> > Is there anything in between that may have clobbered that memory
> location?".
>
> Right now we only have `__attribute__((pure/const))` but we want to
> expose all LLVM-IR attributes to the user soon [0] which will allow way
> more fine-grained control. Intrinsics are a different story again.
>
>
> > For calls, there are some attributes that are helpful here:
> > https://llvm.org/docs/LangRef.html#function-attributes
> > So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can
> > be annotated with readonly attribute,
> > thus signalling that it won't clobber that memory location?
>
> While target specific intrinsics are a bit more complicated we see the
> problem often with generic intrinsic already. We proposed the other day
> [1] to change the default semantics of non-target specific intrinsics
> such that you have to opt-in for certain effects.
>
> For the above example you want `llvm.x86.flags.write` to be `writeonly` and
> `inaccesiblememonly`. Also `nosync`, `willreturn`, ...
>
> Cheers,
>   Johannes
>
>
> [0] https://www.youtube.com/watch?v=elmio6AoyK0
> [1] http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/3849fd45/attachment.html>

Doerfert, Johannes via llvm-dev

2020-Jan-28 18:23 UTC

head link

[llvm-dev] Global removal pass - potential for improvement?

On 01/28, Karl Rehm wrote:> >
> > I need to take a closer look but I would have expected BasicAA to be
> > able to determine that `do_log` and `R` cannot alias. In the -Os
version
> > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R`
> > clobbers the read from `do_log` which prevents us from removing the
> > load/store pair. My reasoning would have been that we know the size of
> > `do_log` to be less than the size accessed via `R`. What exactly goes
> > wrong or if my logic is flawed needs to be examined. I would start
> > looking at the debug generated by the code parts touched here:
> > https://reviews.llvm.org/D66157
> >
> 
> Well, if I recall correctly there's actually a problem with AA being
*too*
> aggressive for this reason: https://gcc.godbolt.org/z/CXN8Gw
> There's a bug report about this somewhere, don't remember where
though.
That seems to be related purely to TBAA: https://gcc.godbolt.org/z/ZLSzHH

What I described should fire if you see the allocation. Alternatively,
we can add a new "max object size" attribute [0] which gives a size
upper bound for the pointed to memory. In either case, if you know the
object pointed to by A is maximal N bytes of size, and the object
pointed to by B is at most M bytes of size, and N < M, A and B do not
alias. As of now, we use `dereferenceable` to determine the minimum size
of the underlying object* and require to see the definition (I think) to
obtain a maximum size.

* It's actually a lower bound on the minimum size based on the pointer
  but that is conservatively fine too.

[0] https://www.youtube.com/watch?v=HVvvCSSLiTw

> On Tue, Jan 28, 2020 at 12:44 PM Doerfert, Johannes <jdoerfert at
anl.gov>
> wrote:
> 
> > Hi Karl, Roman,
> >
> > On 01/28, Roman Lebedev via llvm-dev wrote:
> > > On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev
> > > <llvm-dev at lists.llvm.org> wrote:
> > > > I was looking into how the global optimization pass fares
against
> > > > things like what's reported in
> > > > https://bugs.llvm.org/show_bug.cgi?id=44676
> >
> > I need to take a closer look but I would have expected BasicAA to be
> > able to determine that `do_log` and `R` cannot alias. In the -Os
version
> > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R`
> > clobbers the read from `do_log` which prevents us from removing the
> > load/store pair. My reasoning would have been that we know the size of
> > `do_log` to be less than the size accessed via `R`. What exactly goes
> > wrong or if my logic is flawed needs to be examined. I would start
> > looking at the debug generated by the code parts touched here:
> > https://reviews.llvm.org/D66157
> >
> >
> > > > Looking at this, I think it would be pretty trivial to
optimize that
> > > > down given that there are already threading assumptions
made:
> > > > https://godbolt.org/z/u6ZqoB
> >
> > Optimizing more aggressively based on forward process guarantees will
> > get us in more trouble than we are already in. I don't have the
link
> > handy but as far as I remember the proposed solution was to have a
> > forward process guarantee function attribute. I would recommend we
look
> > into that first before we start more aggressive optimizations which
will
> > cause problems for a lot of (non C/C++) folks.
> >
> >
> > > > Is this something I can look into?
> >
> > Sure :)
> >
> >
> > > > Another thing is that currently *all* external calls break
this
> > > > optimization, including calls to intrinsics that probably
shouldn't:
> > > > https://godbolt.org/z/pK7Cew
> > > I think during load propagation, there is a legality check
"here's a
> > > load, and here's a store.
> > > Is there anything in between that may have clobbered that memory
> > location?".
> >
> > Right now we only have `__attribute__((pure/const))` but we want to
> > expose all LLVM-IR attributes to the user soon [0] which will allow
way
> > more fine-grained control. Intrinsics are a different story again.
> >
> >
> > > For calls, there are some attributes that are helpful here:
> > > https://llvm.org/docs/LangRef.html#function-attributes
> > > So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe
can
> > > be annotated with readonly attribute,
> > > thus signalling that it won't clobber that memory location?
> >
> > While target specific intrinsics are a bit more complicated we see the
> > problem often with generic intrinsic already. We proposed the other
day
> > [1] to change the default semantics of non-target specific intrinsics
> > such that you have to opt-in for certain effects.
> >
> > For the above example you want `llvm.x86.flags.write` to be
`writeonly` and
> > `inaccesiblememonly`. Also `nosync`, `willreturn`, ...
> >
> > Cheers,
> >   Johannes
> >
> >
> > [0] https://www.youtube.com/watch?v=elmio6AoyK0
> > [1] http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html
> >
-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/bf0be1e5/attachment.sig>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jan 2020 - Global removal pass - potential for improvement?

[llvm-dev] Global removal pass - potential for improvement?

[llvm-dev] Global removal pass - potential for improvement?

[llvm-dev] Global removal pass - potential for improvement?

Possibly Parallel Threads