thr3ads.net - llvm dev - [LLVMdev] GSoC - Range Analysis [Apr 2012]

If this information is useful, please help other people find it:
Share via:

John Regehr

2012-Apr-02 16:35 UTC

[LLVMdev] GSoC - Range Analysis

> from what I understand of the analysis, to come to its conclusions it
assumes
> that there is no overflow.  That doesn't make it useless for removing
integer
> overflow checks: you can successively walk variables, and if you can prove
that
> there is no overflow of a variable X given your analysis of previously seen
> variables, then X can safely be added to the set of analysed variables. 
Rinse
> and repeat.  However this probably doesn't work for variables inside
loops.
I'd also be interested to hear more from the range analysis authors about 
this.

Remaining faithful to the C/C++ overflow rules while operating on LLVM IR 
may be very tricky, particularly with respect to integer promotions and 
similar constructs, which may have no representation in the IR (even 
before optimization).

An analysis that is not sound with respect to the original C/C++ semantics 
would be of little use to anyone.

John

Victor Campos

2012-Apr-02 17:14 UTC

head link

[LLVMdev] GSoC - Range Analysis

Hi, guys,

    thank you for all the feedback. I will try to answer your
questions below. But, if you think that might not be a good GSoC
project, do not hesitate to tell me. I can try to write a different
proposal. Nevertheless, I would like to hear from you what you think
is important to have in the range analysis. By reading your e-mails, I
see that there are still a lot of things that we do not do, and would
be useful to the community:
> can you speed up program runtime significantly using this analysis?
No, not yet. We have not used the analysis to implement any
optimization yet. So far we can only flag that variables are
dead-code, but I have not tried to remove this code. And for the C
programs that we have analyzed, the number of variables that are
dead-code, after LLVM optimizes the program, is small. We found
something like 200 dead variables in the whole gcc, for instance.
Notice, however, that the other optimizations had not caught those.
> I took a brief look at your report.  I'm not sure what others in the
> LLVM community think (and I'd like their input), but I suspect that
> 10-15 seconds for analyzing GCC means that users would want your
> analysis enabled at optimization levels higher than -O2.
It is a whole (inter-procedural) program analysis, with function
inlining enabled to give us a limited form of context-sensitiveness.
If I run it intra-procedurally, it takes negligible time per function.
I think its runtime is similar to other LLVM passes.
> When I asked for memory usage, I meant the amount of RAM used by the
> analysis.  As far as I can tell, your paper only reports the number of
> constraints, so it doesn't really tell me how much RAM the analysis
> uses.
I just measured it. For SPEC 2006 gcc it took 455MB.
> One other thing I've been wondering about is whether your analysis is
> able to find value ranges for the results of load instructions.
No. We do not do not use pointer analysis.
> Finally, do you think it would be possible in the future to use
> parallelism to improve the performance of your analysis?
We did not think about it. I think that the algorithm, in the way it
is now, is pretty sequential. We can parallelize the processing of
strong components that do not depend on each other, but I speculate
that there are not many of these.
> I recommend that your proposal mention that you'll try your analysis on
> larger and more diverse codes
Yes, we want to find larger benchmarks too.
> one of the big problems with Douglas's original range analysis was that
> it couldn't handle modulo arithmetic.
We still have this problem. Jorge Navas, who is using our analysis, is
working on an implementation that deals with wrapping arithmetics, but
we did not have access to his code yet.
> An analysis that is not sound with respect to the original C/C++ semantics
> would be of little use to anyone.
You can use our analysis to remove overflows. If we output that a variable
is bound to the range [c1, c2], where c1 > -inf, and c2 < +inf, then you
are guaranteed to be inside these bounds. The imprecision happens when
we output that something is bound to, say, [c1, +inf], c1 > -inf, because,
in this case, you could have that the upper bound of the variable
wraps around, and then you may have it changing the lower limit c1. In
our experiments, about 75% of the non-trivial limits that we output is
[c1, c2], c1 > -inf, and c2 < +inf, and could be used to eliminate
overflow tests. For instance, in gcc we output 40,043 good limits, 22,369
limits [c, +inf], 785 limits [-inf, c], and 116,637 limits [-inf, +inf].
> Also, the tricky bit of range analysis is handling loops.  I suggested
> to Douglas that he exploit LLVM's SCEV infrastructure to discover and
> exploit how induction variables are evolving rather than trying to
> create his own poor man's version using interval arithmetic.  How are
> loops handled now?
Yes, Douglas told me about this. He actually wrote a small pass that
catches induction variables using SCEV. We still catch tighter
intervals with our range analysis than using SCEV, but we miss some
induction variables that SCEV handles. Integrating SCEV into our
analysis is in the high list of priorities.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120402/0bdec6ae/attachment.html>

John Criswell

2012-Apr-03 01:27 UTC

head link

[LLVMdev] GSoC - Range Analysis

On 4/2/12 11:35 AM, John Regehr wrote:>> from what I understand of the analysis, to come to its conclusions it
assumes
>> that there is no overflow.  That doesn't make it useless for
removing integer
>> overflow checks: you can successively walk variables, and if you can
prove that
>> there is no overflow of a variable X given your analysis of previously
seen
>> variables, then X can safely be added to the set of analysed variables.
Rinse
>> and repeat.  However this probably doesn't work for variables
inside loops.
> I'd also be interested to hear more from the range analysis authors
about
> this.
>
> Remaining faithful to the C/C++ overflow rules while operating on LLVM IR
> may be very tricky, particularly with respect to integer promotions and
> similar constructs, which may have no representation in the IR (even
> before optimization).
>
> An analysis that is not sound with respect to the original C/C++ semantics
> would be of little use to anyone.
I actually disagree on the last point.  For static array bounds checking 
with SAFECode, we want to assume that all integer operations can 
overflow (including those with the nsw attribute).  We don't care about 
the other integer semantic rules from C (like the promotion rules) or 
whether overflow is language defined or not (for those who don't know, 
unsigned ints have defined overflow in C; signed ints do not).  All we 
want to know is whether a pointer p is within the bounds of a set of 
valid objects, and we just need to take integer overflow into account 
because it can happen on the hardware.

-- John T.

>
> John
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

John Regehr

2012-Apr-03 04:28 UTC

head link

[LLVMdev] GSoC - Range Analysis

> I actually disagree on the last point.  For static array bounds checking
with
> SAFECode, we want to assume that all integer operations can overflow 
> (including those with the nsw attribute).  We don't care about the
other
> integer semantic rules from C (like the promotion rules) or whether
overflow
> is language defined or not (for those who don't know, unsigned ints
have
> defined overflow in C; signed ints do not).  All we want to know is whether
a
> pointer p is within the bounds of a set of valid objects, and we just need
to
> take integer overflow into account because it can happen on the hardware.
I'd have to see some specific examples to figure out if what you're
saying
makes sense. All of my experience with integer overflows in C/C++ had lead 
me to the conclusion that playing games with the semantics usually gets 
you into hot water.

John

John Regehr

2012-Apr-03 04:41 UTC

head link

[LLVMdev] GSoC - Range Analysis

> No, not yet. We have not used the analysis to implement any
> optimization yet. So far we can only flag that variables are
> dead-code, but I have not tried to remove this code. And for the C
> programs that we have analyzed, the number of variables that are
> dead-code, after LLVM optimizes the program, is small.
Even if only a few variables become dead, it might still be the case that 
many dynamic branches can be eliminated. I'd encourage you to try this, 
particularly on the Ada codes that Duncan mentioned and the output of the 
integer overflow checking tool that I mentioned.

One transformation that we found to be quite useful when implementing a 
range analysis for C was a debugging mode that inserted range assertions 
into the generated code. This was an easy way to spot implementation 
errors that otherwise would have been quite hard to detect.

Do you know how your results compare to GCC's existing value range 
propagation pass?

Anyway I think this sounds like an interesting project and potentially a 
useful one.

John

Duncan Sands

2012-Apr-03 07:35 UTC

head link

[LLVMdev] GSoC - Range Analysis

Hi Victor,
>  > one of the big problems with Douglas's original range analysis
was that
>  > it couldn't handle modulo arithmetic.
>
> We still have this problem. Jorge Navas, who is using our analysis, is
> working on an implementation that deals with wrapping arithmetics, but
> we did not have access to his code yet.
can you please clarify whether the pass only looks at operations with the
nsw flag (and bails out on others), or just assumes that there is no wrapping
behaviour anywhere.  It used to just assume that nothing wrapped, which made
all stated results about bit shrinking, dead variables found etc pointless
since they were based on an incorrect analysis.
>  > An analysis that is not sound with respect to the original C/C++
semantics
>  > would be of little use to anyone.
>
> You can use our analysis to remove overflows. If we output that a variable
> is bound to the range [c1, c2], where c1 > -inf, and c2 < +inf, then
you
> are guaranteed to be inside these bounds.
Is this really true?  For example, suppose you know that X is in the range
[1, +inf], and now you calculate Y = 1 / X, then you get that Y is in the
range [0, 1].  But there is no guarantee that Y is really in that range:
since X might have overflowed it might for example be equal to -1, in which
case Y would be equal to -1 too, outside the range [0, 1].  In short, I doubt
you can conclude that a variable Y is really in a range [c1, c2] just from the
information that c1 > -inf and c2 < +inf.  I think you also need to look
at the
history of how you got there.

Ciao, Duncan.

John Regehr

2012-Apr-03 18:08 UTC

head link

[LLVMdev] GSoC - Range Analysis

Another excellent use for range analysis would be to warn about signed 
operations that will definitely overflow.

John

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Apr 2012 - [LLVMdev] GSoC - Range Analysis

[LLVMdev] GSoC - Range Analysis

[LLVMdev] GSoC - Range Analysis

[LLVMdev] GSoC - Range Analysis

[LLVMdev] GSoC - Range Analysis

[LLVMdev] GSoC - Range Analysis

[LLVMdev] GSoC - Range Analysis

[LLVMdev] GSoC - Range Analysis

Reasonably Related Threads