thr3ads.net - llvm dev - [llvm-dev] Global removal pass - potential for improvement? [Jan 2020]

If this information is useful, please help other people find it:
Share via:

Doerfert, Johannes via llvm-dev

2020-Jan-28 18:23 UTC

[llvm-dev] Global removal pass - potential for improvement?

On 01/28, Karl Rehm wrote:> >
> > I need to take a closer look but I would have expected BasicAA to be
> > able to determine that `do_log` and `R` cannot alias. In the -Os
version
> > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R`
> > clobbers the read from `do_log` which prevents us from removing the
> > load/store pair. My reasoning would have been that we know the size of
> > `do_log` to be less than the size accessed via `R`. What exactly goes
> > wrong or if my logic is flawed needs to be examined. I would start
> > looking at the debug generated by the code parts touched here:
> > https://reviews.llvm.org/D66157
> >
> 
> Well, if I recall correctly there's actually a problem with AA being
*too*
> aggressive for this reason: https://gcc.godbolt.org/z/CXN8Gw
> There's a bug report about this somewhere, don't remember where
though.
That seems to be related purely to TBAA: https://gcc.godbolt.org/z/ZLSzHH

What I described should fire if you see the allocation. Alternatively,
we can add a new "max object size" attribute [0] which gives a size
upper bound for the pointed to memory. In either case, if you know the
object pointed to by A is maximal N bytes of size, and the object
pointed to by B is at most M bytes of size, and N < M, A and B do not
alias. As of now, we use `dereferenceable` to determine the minimum size
of the underlying object* and require to see the definition (I think) to
obtain a maximum size.

* It's actually a lower bound on the minimum size based on the pointer
  but that is conservatively fine too.

[0] https://www.youtube.com/watch?v=HVvvCSSLiTw

> On Tue, Jan 28, 2020 at 12:44 PM Doerfert, Johannes <jdoerfert at
anl.gov>
> wrote:
> 
> > Hi Karl, Roman,
> >
> > On 01/28, Roman Lebedev via llvm-dev wrote:
> > > On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev
> > > <llvm-dev at lists.llvm.org> wrote:
> > > > I was looking into how the global optimization pass fares
against
> > > > things like what's reported in
> > > > https://bugs.llvm.org/show_bug.cgi?id=44676
> >
> > I need to take a closer look but I would have expected BasicAA to be
> > able to determine that `do_log` and `R` cannot alias. In the -Os
version
> > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R`
> > clobbers the read from `do_log` which prevents us from removing the
> > load/store pair. My reasoning would have been that we know the size of
> > `do_log` to be less than the size accessed via `R`. What exactly goes
> > wrong or if my logic is flawed needs to be examined. I would start
> > looking at the debug generated by the code parts touched here:
> > https://reviews.llvm.org/D66157
> >
> >
> > > > Looking at this, I think it would be pretty trivial to
optimize that
> > > > down given that there are already threading assumptions
made:
> > > > https://godbolt.org/z/u6ZqoB
> >
> > Optimizing more aggressively based on forward process guarantees will
> > get us in more trouble than we are already in. I don't have the
link
> > handy but as far as I remember the proposed solution was to have a
> > forward process guarantee function attribute. I would recommend we
look
> > into that first before we start more aggressive optimizations which
will
> > cause problems for a lot of (non C/C++) folks.
> >
> >
> > > > Is this something I can look into?
> >
> > Sure :)
> >
> >
> > > > Another thing is that currently *all* external calls break
this
> > > > optimization, including calls to intrinsics that probably
shouldn't:
> > > > https://godbolt.org/z/pK7Cew
> > > I think during load propagation, there is a legality check
"here's a
> > > load, and here's a store.
> > > Is there anything in between that may have clobbered that memory
> > location?".
> >
> > Right now we only have `__attribute__((pure/const))` but we want to
> > expose all LLVM-IR attributes to the user soon [0] which will allow
way
> > more fine-grained control. Intrinsics are a different story again.
> >
> >
> > > For calls, there are some attributes that are helpful here:
> > > https://llvm.org/docs/LangRef.html#function-attributes
> > > So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe
can
> > > be annotated with readonly attribute,
> > > thus signalling that it won't clobber that memory location?
> >
> > While target specific intrinsics are a bit more complicated we see the
> > problem often with generic intrinsic already. We proposed the other
day
> > [1] to change the default semantics of non-target specific intrinsics
> > such that you have to opt-in for certain effects.
> >
> > For the above example you want `llvm.x86.flags.write` to be
`writeonly` and
> > `inaccesiblememonly`. Also `nosync`, `willreturn`, ...
> >
> > Cheers,
> >   Johannes
> >
> >
> > [0] https://www.youtube.com/watch?v=elmio6AoyK0
> > [1] http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html
> >
-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/bf0be1e5/attachment.sig>

Karl Rehm via llvm-dev

2020-Jan-28 18:42 UTC

head link

[llvm-dev] Global removal pass - potential for improvement?

I'm thinking about this a bit more, and I don't know if it's
something that
can be used in a generic backend like LLVM. Should accessing a smaller
object from a bigger pointer be considered UB?
In contrast, sing the assumption that bigger types can not be dereferenced
from smaller types is a good idea. If this is already added to BasicAA, I
wonder why it does not optimize this out?

On Tue, Jan 28, 2020 at 1:23 PM Doerfert, Johannes <jdoerfert at anl.gov>
wrote:
> On 01/28, Karl Rehm wrote:
> > >
> > > I need to take a closer look but I would have expected BasicAA to
be
> > > able to determine that `do_log` and `R` cannot alias. In the -Os
> version
> > > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to
`R`
> > > clobbers the read from `do_log` which prevents us from removing
the
> > > load/store pair. My reasoning would have been that we know the
size of
> > > `do_log` to be less than the size accessed via `R`. What exactly
goes
> > > wrong or if my logic is flawed needs to be examined. I would
start
> > > looking at the debug generated by the code parts touched here:
> > > https://reviews.llvm.org/D66157
> > >
> >
> > Well, if I recall correctly there's actually a problem with AA
being
> *too*
> > aggressive for this reason: https://gcc.godbolt.org/z/CXN8Gw
> > There's a bug report about this somewhere, don't remember
where though.
>
> That seems to be related purely to TBAA: https://gcc.godbolt.org/z/ZLSzHH
>
> What I described should fire if you see the allocation. Alternatively,
> we can add a new "max object size" attribute [0] which gives a
size
> upper bound for the pointed to memory. In either case, if you know the
> object pointed to by A is maximal N bytes of size, and the object
> pointed to by B is at most M bytes of size, and N < M, A and B do not
> alias. As of now, we use `dereferenceable` to determine the minimum size
> of the underlying object* and require to see the definition (I think) to
> obtain a maximum size.
>
> * It's actually a lower bound on the minimum size based on the pointer
>   but that is conservatively fine too.
>
> [0] https://www.youtube.com/watch?v=HVvvCSSLiTw
>
>
> > On Tue, Jan 28, 2020 at 12:44 PM Doerfert, Johannes <jdoerfert at
anl.gov>
> > wrote:
> >
> > > Hi Karl, Roman,
> > >
> > > On 01/28, Roman Lebedev via llvm-dev wrote:
> > > > On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev
> > > > <llvm-dev at lists.llvm.org> wrote:
> > > > > I was looking into how the global optimization pass
fares against
> > > > > things like what's reported in
> > > > > https://bugs.llvm.org/show_bug.cgi?id=44676
> > >
> > > I need to take a closer look but I would have expected BasicAA to
be
> > > able to determine that `do_log` and `R` cannot alias. In the -Os
> version
> > > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to
`R`
> > > clobbers the read from `do_log` which prevents us from removing
the
> > > load/store pair. My reasoning would have been that we know the
size of
> > > `do_log` to be less than the size accessed via `R`. What exactly
goes
> > > wrong or if my logic is flawed needs to be examined. I would
start
> > > looking at the debug generated by the code parts touched here:
> > > https://reviews.llvm.org/D66157
> > >
> > >
> > > > > Looking at this, I think it would be pretty trivial to
optimize
> that
> > > > > down given that there are already threading assumptions
made:
> > > > > https://godbolt.org/z/u6ZqoB
> > >
> > > Optimizing more aggressively based on forward process guarantees
will
> > > get us in more trouble than we are already in. I don't have
the link
> > > handy but as far as I remember the proposed solution was to have
a
> > > forward process guarantee function attribute. I would recommend
we look
> > > into that first before we start more aggressive optimizations
which
> will
> > > cause problems for a lot of (non C/C++) folks.
> > >
> > >
> > > > > Is this something I can look into?
> > >
> > > Sure :)
> > >
> > >
> > > > > Another thing is that currently *all* external calls
break this
> > > > > optimization, including calls to intrinsics that
probably
> shouldn't:
> > > > > https://godbolt.org/z/pK7Cew
> > > > I think during load propagation, there is a legality check
"here's a
> > > > load, and here's a store.
> > > > Is there anything in between that may have clobbered that
memory
> > > location?".
> > >
> > > Right now we only have `__attribute__((pure/const))` but we want
to
> > > expose all LLVM-IR attributes to the user soon [0] which will
allow way
> > > more fine-grained control. Intrinsics are a different story
again.
> > >
> > >
> > > > For calls, there are some attributes that are helpful here:
> > > > https://llvm.org/docs/LangRef.html#function-attributes
> > > > So in this case, i guess `@llvm.x86.flags.write` intrinsic
maybe can
> > > > be annotated with readonly attribute,
> > > > thus signalling that it won't clobber that memory
location?
> > >
> > > While target specific intrinsics are a bit more complicated we
see the
> > > problem often with generic intrinsic already. We proposed the
other day
> > > [1] to change the default semantics of non-target specific
intrinsics
> > > such that you have to opt-in for certain effects.
> > >
> > > For the above example you want `llvm.x86.flags.write` to be
> `writeonly` and
> > > `inaccesiblememonly`. Also `nosync`, `willreturn`, ...
> > >
> > > Cheers,
> > >   Johannes
> > >
> > >
> > > [0] https://www.youtube.com/watch?v=elmio6AoyK0
> > > [1]
http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html
> > >
>
> --
>
> Johannes Doerfert
> Researcher
>
> Argonne National Laboratory
> Lemont, IL 60439, USA
>
> jdoerfert at anl.gov
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/b5f496b4/attachment.html>

Doerfert, Johannes via llvm-dev

2020-Jan-28 18:51 UTC

head link

[llvm-dev] Global removal pass - potential for improvement?

It's not about the pointer per se but the object it points to. If we have 2
objects of which we know the minimal/maximal size respectively we can sometimes
conclude the objects must be different. Accesses to different objects do not
alias.

We already use that logic but we don't have the upper bound size as an
attribute. Hence, it only applies if we know the exact size, basically is we see
the definition. I might have explained it in the YouTube link I included before.

I don't know what you mean by llvm being a generic backend but what I
describe above and in the tutorial should make it in as soon as the code is
cleaned up. It's on my GitHub already.

--
written from my phone

________________________________
From: Karl Rehm <klrehm123 at gmail.com>
Sent: Tuesday, January 28, 2020 12:42:33 PM
To: Doerfert, Johannes <jdoerfert at anl.gov>
Cc: Roman Lebedev <lebedev.ri at gmail.com>; llvm-dev at lists.llvm.org
<llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Global removal pass - potential for improvement?

I'm thinking about this a bit more, and I don't know if it's
something that can be used in a generic backend like LLVM. Should accessing a
smaller object from a bigger pointer be considered UB?
In contrast, sing the assumption that bigger types can not be dereferenced from
smaller types is a good idea. If this is already added to BasicAA, I wonder why
it does not optimize this out?

On Tue, Jan 28, 2020 at 1:23 PM Doerfert, Johannes <jdoerfert at
anl.gov<mailto:jdoerfert at anl.gov>> wrote:
On 01/28, Karl Rehm wrote:> >
> > I need to take a closer look but I would have expected BasicAA to be
> > able to determine that `do_log` and `R` cannot alias. In the -Os
version
> > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R`
> > clobbers the read from `do_log` which prevents us from removing the
> > load/store pair. My reasoning would have been that we know the size of
> > `do_log` to be less than the size accessed via `R`. What exactly goes
> > wrong or if my logic is flawed needs to be examined. I would start
> > looking at the debug generated by the code parts touched here:
> > https://reviews.llvm.org/D66157
> >
>
> Well, if I recall correctly there's actually a problem with AA being
*too*
> aggressive for this reason: https://gcc.godbolt.org/z/CXN8Gw
> There's a bug report about this somewhere, don't remember where
though.
That seems to be related purely to TBAA: https://gcc.godbolt.org/z/ZLSzHH

What I described should fire if you see the allocation. Alternatively,
we can add a new "max object size" attribute [0] which gives a size
upper bound for the pointed to memory. In either case, if you know the
object pointed to by A is maximal N bytes of size, and the object
pointed to by B is at most M bytes of size, and N < M, A and B do not
alias. As of now, we use `dereferenceable` to determine the minimum size
of the underlying object* and require to see the definition (I think) to
obtain a maximum size.

* It's actually a lower bound on the minimum size based on the pointer
  but that is conservatively fine too.

[0] https://www.youtube.com/watch?v=HVvvCSSLiTw

> On Tue, Jan 28, 2020 at 12:44 PM Doerfert, Johannes <jdoerfert at
anl.gov<mailto:jdoerfert at anl.gov>>
> wrote:
>
> > Hi Karl, Roman,
> >
> > On 01/28, Roman Lebedev via llvm-dev wrote:
> > > On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev
> > > <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>> wrote:
> > > > I was looking into how the global optimization pass fares
against
> > > > things like what's reported in
> > > > https://bugs.llvm.org/show_bug.cgi?id=44676
> >
> > I need to take a closer look but I would have expected BasicAA to be
> > able to determine that `do_log` and `R` cannot alias. In the -Os
version
> > (lower right here https://gcc.godbolt.org/z/KLaeiH), the write to `R`
> > clobbers the read from `do_log` which prevents us from removing the
> > load/store pair. My reasoning would have been that we know the size of
> > `do_log` to be less than the size accessed via `R`. What exactly goes
> > wrong or if my logic is flawed needs to be examined. I would start
> > looking at the debug generated by the code parts touched here:
> > https://reviews.llvm.org/D66157
> >
> >
> > > > Looking at this, I think it would be pretty trivial to
optimize that
> > > > down given that there are already threading assumptions
made:
> > > > https://godbolt.org/z/u6ZqoB
> >
> > Optimizing more aggressively based on forward process guarantees will
> > get us in more trouble than we are already in. I don't have the
link
> > handy but as far as I remember the proposed solution was to have a
> > forward process guarantee function attribute. I would recommend we
look
> > into that first before we start more aggressive optimizations which
will
> > cause problems for a lot of (non C/C++) folks.
> >
> >
> > > > Is this something I can look into?
> >
> > Sure :)
> >
> >
> > > > Another thing is that currently *all* external calls break
this
> > > > optimization, including calls to intrinsics that probably
shouldn't:
> > > > https://godbolt.org/z/pK7Cew
> > > I think during load propagation, there is a legality check
"here's a
> > > load, and here's a store.
> > > Is there anything in between that may have clobbered that memory
> > location?".
> >
> > Right now we only have `__attribute__((pure/const))` but we want to
> > expose all LLVM-IR attributes to the user soon [0] which will allow
way
> > more fine-grained control. Intrinsics are a different story again.
> >
> >
> > > For calls, there are some attributes that are helpful here:
> > > https://llvm.org/docs/LangRef.html#function-attributes
> > > So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe
can
> > > be annotated with readonly attribute,
> > > thus signalling that it won't clobber that memory location?
> >
> > While target specific intrinsics are a bit more complicated we see the
> > problem often with generic intrinsic already. We proposed the other
day
> > [1] to change the default semantics of non-target specific intrinsics
> > such that you have to opt-in for certain effects.
> >
> > For the above example you want `llvm.x86.flags.write` to be
`writeonly` and
> > `inaccesiblememonly`. Also `nosync`, `willreturn`, ...
> >
> > Cheers,
> >   Johannes
> >
> >
> > [0] https://www.youtube.com/watch?v=elmio6AoyK0
> > [1] http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html
> >
--

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov<mailto:jdoerfert at anl.gov>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/729741a5/attachment-0001.html>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Jan 2020 - Global removal pass - potential for improvement?

[llvm-dev] Global removal pass - potential for improvement?

[llvm-dev] Global removal pass - potential for improvement?

[llvm-dev] Global removal pass - potential for improvement?

Seemingly Similar Threads