thr3ads.net - llvm dev - [llvm-dev] High memory use and LVI/Correlated Value Propagation [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2016-Jan-14 18:32 UTC

[llvm-dev] High memory use and LVI/Correlated Value Propagation

----- Original Message ----- 
> From: "Daniel Berlin via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Joerg Sonnenberger" <joerg at britannica.bec.de>,
"llvm-dev"
> <llvm-dev at lists.llvm.org>
> Sent: Thursday, January 14, 2016 10:38:10 AM
> Subject: Re: [llvm-dev] High memory use and LVI/Correlated Value
> Propagation
> On Thu, Jan 14, 2016 at 5:57 AM, Joerg Sonnenberger via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> > On Wed, Jan 13, 2016 at 04:28:03PM -0800, Daniel Berlin wrote:
> 
> > > On Wed, Jan 13, 2016 at 4:23 PM, Joerg Sonnenberger via llvm-dev
> > > <
> 
> > > llvm-dev at lists.llvm.org > wrote:
> 
> > >
> 
> > > > On Wed, Jan 13, 2016 at 03:38:24PM -0800, Philip Reames
wrote:
> 
> > > > > I don't think that arbitrary limiting the
complexity of the
> > > > > search is the
> 
> > > > > right approach. There are numerous ways the LVI
> > > > > infrastructure
> > > > > could be
> 
> > > > > made more memory efficient. Fixing the existing code to
be
> > > > > memory
> 
> > > > efficient
> 
> > > > > is the right approach. Only once there's no more
low hanging
> > > > > fruit
> 
> > > > should
> 
> > > > > we even consider clamping the search.
> 
> > > >
> 
> > > > Memory efficiency is only half of the problem. I.e.
groonga's
> > > > expr.c
> 
> > > > needs 4m to build on my laptop, a 2.7GHz i7. That
doesn't sound
> 
> > > > reasonable for a -O2. Unlike the GVN issue, the cases I have
> > > > run
> > > > into do
> 
> > > > finish after a(n unreasonable) while, so at least it is not
> > > > trivially
> 
> > > > superlinear.
> 
> > > >
> 
> > >
> 
> > >
> 
> > > Okay, so rather than artificially limit stuff, we should see if
> > > we
> > > can fix
> 
> > > the efficiency of the algorithms.
> 
> > >
> 
> > > CVP is an O(N*lattice height) pass problem. It sounds like it is
> > > being more
> 
> > > than that for you.
> 
> > I assume you mean something like #BB * #variables?
> 
> No.
> ;)
> I mean the theoretical complexity of performing the optimization CVP
> performs (regardless of how it is currently implemented).
> What CVP does is doable in linear time on number of variables *
> lattice height. It has some memory cost to do this.
> Assume for a second the lattice height is fixed (IE we do not have
> completely symbolic ranges).
> It is possible, in linear time, to build an SSA-like form for the
> ranges it is computing.
> It is then possible, in linear time, to propagate those ranges to all
> affected variables.
> This is O(N) + O(N).
> Doing so may change the ranges (they only get *better*, however).
> So you may need to do this until they fall all the way down the
> lattice.
> This means you may repeat this lattice height number of times.
What is the height of the lattice here? I understand that it goes from top ->
defined -> overdefined, and that's three, but defined is not itself one
state. Because it is tracking integer ranges, those ranges could undergo 2^#bits
refinements in the worst case (thus making the lattice height quite large), no?

Thanks again,
Hal

> CVP as currently implemented tries to be very lazy. The cost of being
> very lazy is that it has a worse theoretical complexity in the worst
> case, but is faster in cases there is not a lot to do.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Daniel Berlin via llvm-dev

2016-Jan-14 19:05 UTC

head link

[llvm-dev] High memory use and LVI/Correlated Value Propagation

On Thu, Jan 14, 2016 at 10:32 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>
> > From: "Daniel Berlin via llvm-dev" <llvm-dev at
lists.llvm.org>
> > To: "Joerg Sonnenberger" <joerg at britannica.bec.de>,
"llvm-dev"
> > <llvm-dev at lists.llvm.org>
> > Sent: Thursday, January 14, 2016 10:38:10 AM
> > Subject: Re: [llvm-dev] High memory use and LVI/Correlated Value
> > Propagation
>
> > On Thu, Jan 14, 2016 at 5:57 AM, Joerg Sonnenberger via llvm-dev <
> > llvm-dev at lists.llvm.org > wrote:
>
> > > On Wed, Jan 13, 2016 at 04:28:03PM -0800, Daniel Berlin wrote:
> >
> > > > On Wed, Jan 13, 2016 at 4:23 PM, Joerg Sonnenberger via
llvm-dev
> > > > <
> >
> > > > llvm-dev at lists.llvm.org > wrote:
> >
> > > >
> >
> > > > > On Wed, Jan 13, 2016 at 03:38:24PM -0800, Philip Reames
wrote:
> >
> > > > > > I don't think that arbitrary limiting the
complexity of the
> > > > > > search is the
> >
> > > > > > right approach. There are numerous ways the LVI
> > > > > > infrastructure
> > > > > > could be
> >
> > > > > > made more memory efficient. Fixing the existing
code to be
> > > > > > memory
> >
> > > > > efficient
> >
> > > > > > is the right approach. Only once there's no
more low hanging
> > > > > > fruit
> >
> > > > > should
> >
> > > > > > we even consider clamping the search.
> >
> > > > >
> >
> > > > > Memory efficiency is only half of the problem. I.e.
groonga's
> > > > > expr.c
> >
> > > > > needs 4m to build on my laptop, a 2.7GHz i7. That
doesn't sound
> >
> > > > > reasonable for a -O2. Unlike the GVN issue, the cases I
have
> > > > > run
> > > > > into do
> >
> > > > > finish after a(n unreasonable) while, so at least it is
not
> > > > > trivially
> >
> > > > > superlinear.
> >
> > > > >
> >
> > > >
> >
> > > >
> >
> > > > Okay, so rather than artificially limit stuff, we should see
if
> > > > we
> > > > can fix
> >
> > > > the efficiency of the algorithms.
> >
> > > >
> >
> > > > CVP is an O(N*lattice height) pass problem. It sounds like
it is
> > > > being more
> >
> > > > than that for you.
> >
>
> > > I assume you mean something like #BB * #variables?
> >
> > No.
> > ;)
> > I mean the theoretical complexity of performing the optimization CVP
> > performs (regardless of how it is currently implemented).
>
> > What CVP does is doable in linear time on number of variables *
> > lattice height. It has some memory cost to do this.
>
> > Assume for a second the lattice height is fixed (IE we do not have
> > completely symbolic ranges).
>
> > It is possible, in linear time, to build an SSA-like form for the
> > ranges it is computing.
> > It is then possible, in linear time, to propagate those ranges to all
> > affected variables.
>
> > This is O(N) + O(N).
>
> > Doing so may change the ranges (they only get *better*, however).
> > So you may need to do this until they fall all the way down the
> > lattice.
>
> > This means you may repeat this lattice height number of times.
>
> What is the height of the lattice here?

The current LVI is a bit hard to analyze, i'm not sure it's really as
fixed
height as one would think.

> I understand that it goes from top -> defined -> overdefined, and
that's
> three, but defined is not itself one state. Because it is tracking integer
> ranges, those ranges could undergo 2^#bits refinements in the worst case
> (thus making the lattice height quite large), no?
>
If it tracks ranges on a per-bit basis, the max refinements for a given
range is is O(bits * 3), assuming things do not go the wrong way on the
lattice.

IE either they are underdefined, or they are 1, or they are 0, or they are
overdefined.

The lattice is

underdefined
 /     \
0     1
 \   /
overdefined

The only way to end up with 2**n refinements is if things can go the other
direction on the lattice.

Note: This is the standard fixpoint algorithm (and what LVI is emulating,
AFAIK).  There are range constraint based solvers that are significantly
more complicated, but that's not what LVI does.

(see https://www.cs.berkeley.edu/~daw/papers/range-tacas04.pdf, and the
paper i cite below, which does a variant of that on llvm in pretty much
linear time)

Also, the worst case here is one where every variable is affected by a
single range.

(it's also possible to lazily construct the ssa form, and only update
changed variables, in the same time bound. It's probably complicated enough
to implement that it may not be worth it).

In practice, you can do even complicated ranges very fast.

http://homepages.dcc.ufmg.br/~fernando/publications/papers/CGO13_raphael.pdf

(the time is basically linear in practice, even for large programs, code is
on github)

Ignore the part where they use it in the end for something else that we
don't care about, etc.

They are also solving for a much more complex set of ranges than we are
talking about for CVP, and for all the variables, so they use a real range
constraint solver in the end instead of a fixpoint with a fixed height
lattice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/0099f81e/attachment-0001.html>

Hal Finkel via llvm-dev

2016-Jan-14 20:05 UTC

head link

[llvm-dev] High memory use and LVI/Correlated Value Propagation

----- Original Message ----- 
> From: "Daniel Berlin" <dberlin at dberlin.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Joerg Sonnenberger" <joerg at britannica.bec.de>,
"llvm-dev"
> <llvm-dev at lists.llvm.org>
> Sent: Thursday, January 14, 2016 1:05:56 PM
> Subject: Re: [llvm-dev] High memory use and LVI/Correlated Value
> Propagation
> On Thu, Jan 14, 2016 at 10:32 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> > ----- Original Message -----
> 
> > > From: "Daniel Berlin via llvm-dev" < llvm-dev at
lists.llvm.org >
> 
> > > To: "Joerg Sonnenberger" < joerg at
britannica.bec.de >, "llvm-dev"
> 
> > > < llvm-dev at lists.llvm.org >
> 
> > > Sent: Thursday, January 14, 2016 10:38:10 AM
> 
> > > Subject: Re: [llvm-dev] High memory use and LVI/Correlated Value
> 
> > > Propagation
> 
> > > On Thu, Jan 14, 2016 at 5:57 AM, Joerg Sonnenberger via llvm-dev
> > > <
> 
> > > llvm-dev at lists.llvm.org > wrote:
> 
> > > > On Wed, Jan 13, 2016 at 04:28:03PM -0800, Daniel Berlin
wrote:
> 
> > >
> 
> > > > > On Wed, Jan 13, 2016 at 4:23 PM, Joerg Sonnenberger via
> > > > > llvm-dev
> 
> > > > > <
> 
> > >
> 
> > > > > llvm-dev at lists.llvm.org > wrote:
> 
> > >
> 
> > > > >
> 
> > >
> 
> > > > > > On Wed, Jan 13, 2016 at 03:38:24PM -0800, Philip
Reames
> > > > > > wrote:
> 
> > >
> 
> > > > > > > I don't think that arbitrary limiting the
complexity of
> > > > > > > the
> 
> > > > > > > search is the
> 
> > >
> 
> > > > > > > right approach. There are numerous ways the
LVI
> 
> > > > > > > infrastructure
> 
> > > > > > > could be
> 
> > >
> 
> > > > > > > made more memory efficient. Fixing the
existing code to
> > > > > > > be
> 
> > > > > > > memory
> 
> > >
> 
> > > > > > efficient
> 
> > >
> 
> > > > > > > is the right approach. Only once there's
no more low
> > > > > > > hanging
> 
> > > > > > > fruit
> 
> > >
> 
> > > > > > should
> 
> > >
> 
> > > > > > > we even consider clamping the search.
> 
> > >
> 
> > > > > >
> 
> > >
> 
> > > > > > Memory efficiency is only half of the problem.
I.e.
> > > > > > groonga's
> 
> > > > > > expr.c
> 
> > >
> 
> > > > > > needs 4m to build on my laptop, a 2.7GHz i7. That
doesn't
> > > > > > sound
> 
> > >
> 
> > > > > > reasonable for a -O2. Unlike the GVN issue, the
cases I
> > > > > > have
> 
> > > > > > run
> 
> > > > > > into do
> 
> > >
> 
> > > > > > finish after a(n unreasonable) while, so at least
it is not
> 
> > > > > > trivially
> 
> > >
> 
> > > > > > superlinear.
> 
> > >
> 
> > > > > >
> 
> > >
> 
> > > > >
> 
> > >
> 
> > > > >
> 
> > >
> 
> > > > > Okay, so rather than artificially limit stuff, we
should see
> > > > > if
> 
> > > > > we
> 
> > > > > can fix
> 
> > >
> 
> > > > > the efficiency of the algorithms.
> 
> > >
> 
> > > > >
> 
> > >
> 
> > > > > CVP is an O(N*lattice height) pass problem. It sounds
like it
> > > > > is
> 
> > > > > being more
> 
> > >
> 
> > > > > than that for you.
> 
> > >
> 
> > > > I assume you mean something like #BB * #variables?
> 
> > >
> 
> > > No.
> 
> > > ;)
> 
> > > I mean the theoretical complexity of performing the optimization
> > > CVP
> 
> > > performs (regardless of how it is currently implemented).
> 
> > > What CVP does is doable in linear time on number of variables *
> 
> > > lattice height. It has some memory cost to do this.
> 
> > > Assume for a second the lattice height is fixed (IE we do not
> > > have
> 
> > > completely symbolic ranges).
> 
> > > It is possible, in linear time, to build an SSA-like form for the
> 
> > > ranges it is computing.
> 
> > > It is then possible, in linear time, to propagate those ranges to
> > > all
> 
> > > affected variables.
> 
> > > This is O(N) + O(N).
> 
> > > Doing so may change the ranges (they only get *better*, however).
> 
> > > So you may need to do this until they fall all the way down the
> 
> > > lattice.
> 
> > > This means you may repeat this lattice height number of times.
> 
> > What is the height of the lattice here?
> 
> The current LVI is a bit hard to analyze, i'm not sure it's really
as
> fixed height as one would think.
> > I understand that it goes from top -> defined -> overdefined,
and
> > that's three, but defined is not itself one state. Because it is
> > tracking integer ranges, those ranges could undergo 2^#bits
> > refinements in the worst case (thus making the lattice height quite
> > large), no?
> 
> If it tracks ranges on a per-bit basis, the max refinements for a
> given range is is O(bits * 3), assuming things do not go the wrong
> way on the lattice.
> IE either they are underdefined, or they are 1, or they are 0, or
> they are overdefined.
> The lattice is
> underdefined
> / \
> 0 1
> \ /
> overdefined
As a separate matter, we *should* also do this (i.e. have a fixed-point
known-bits analysis), but we don't.
> The only way to end up with 2**n refinements is if things can go the
> other direction on the lattice.
> Note: This is the standard fixpoint algorithm (and what LVI is
> emulating, AFAIK). There are range constraint based solvers that are
> significantly more complicated, but that's not what LVI does.
Right. LVI handles only single constant-bounded ranges for each variable
independently.
> (see https://www.cs.berkeley.edu/~daw/papers/range-tacas04.pdf , and
> the paper i cite below, which does a variant of that on llvm in
> pretty much linear time)
Interesting; thanks for the references! For both, the analysis collapses SCCs in
the control flow and, while the SCC analysis is quadratic, SCCs are generally
small so they don't observe a problem in practice.

 -Hal
> Also, the worst case here is one where every variable is affected by
> a single range.
> (it's also possible to lazily construct the ssa form, and only update
> changed variables, in the same time bound. It's probably complicated
> enough to implement that it may not be worth it).
> In practice, you can do even complicated ranges very fast.
>
http://homepages.dcc.ufmg.br/~fernando/publications/papers/CGO13_raphael.pdf
> (the time is basically linear in practice, even for large programs,
> code is on github)
> Ignore the part where they use it in the end for something else that
> we don't care about, etc.
> They are also solving for a much more complex set of ranges than we
> are talking about for CVP, and for all the variables, so they use a
> real range constraint solver in the end instead of a fixpoint with a
> fixed height lattice.
-- 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Jan 2016 - High memory use and LVI/Correlated Value Propagation

[llvm-dev] High memory use and LVI/Correlated Value Propagation

[llvm-dev] High memory use and LVI/Correlated Value Propagation

[llvm-dev] High memory use and LVI/Correlated Value Propagation