> from what I understand of the analysis, to come to its conclusions it assumes > that there is no overflow. That doesn't make it useless for removing integer > overflow checks: you can successively walk variables, and if you can prove that > there is no overflow of a variable X given your analysis of previously seen > variables, then X can safely be added to the set of analysed variables. Rinse > and repeat. However this probably doesn't work for variables inside loops.I'd also be interested to hear more from the range analysis authors about this. Remaining faithful to the C/C++ overflow rules while operating on LLVM IR may be very tricky, particularly with respect to integer promotions and similar constructs, which may have no representation in the IR (even before optimization). An analysis that is not sound with respect to the original C/C++ semantics would be of little use to anyone. John
Hi, guys, thank you for all the feedback. I will try to answer your questions below. But, if you think that might not be a good GSoC project, do not hesitate to tell me. I can try to write a different proposal. Nevertheless, I would like to hear from you what you think is important to have in the range analysis. By reading your e-mails, I see that there are still a lot of things that we do not do, and would be useful to the community:> can you speed up program runtime significantly using this analysis?No, not yet. We have not used the analysis to implement any optimization yet. So far we can only flag that variables are dead-code, but I have not tried to remove this code. And for the C programs that we have analyzed, the number of variables that are dead-code, after LLVM optimizes the program, is small. We found something like 200 dead variables in the whole gcc, for instance. Notice, however, that the other optimizations had not caught those.> I took a brief look at your report. I'm not sure what others in the > LLVM community think (and I'd like their input), but I suspect that > 10-15 seconds for analyzing GCC means that users would want your > analysis enabled at optimization levels higher than -O2.It is a whole (inter-procedural) program analysis, with function inlining enabled to give us a limited form of context-sensitiveness. If I run it intra-procedurally, it takes negligible time per function. I think its runtime is similar to other LLVM passes.> When I asked for memory usage, I meant the amount of RAM used by the > analysis. As far as I can tell, your paper only reports the number of > constraints, so it doesn't really tell me how much RAM the analysis > uses.I just measured it. For SPEC 2006 gcc it took 455MB.> One other thing I've been wondering about is whether your analysis is > able to find value ranges for the results of load instructions.No. We do not do not use pointer analysis.> Finally, do you think it would be possible in the future to use > parallelism to improve the performance of your analysis?We did not think about it. I think that the algorithm, in the way it is now, is pretty sequential. We can parallelize the processing of strong components that do not depend on each other, but I speculate that there are not many of these.> I recommend that your proposal mention that you'll try your analysis on > larger and more diverse codesYes, we want to find larger benchmarks too.> one of the big problems with Douglas's original range analysis was that > it couldn't handle modulo arithmetic.We still have this problem. Jorge Navas, who is using our analysis, is working on an implementation that deals with wrapping arithmetics, but we did not have access to his code yet.> An analysis that is not sound with respect to the original C/C++ semantics > would be of little use to anyone.You can use our analysis to remove overflows. If we output that a variable is bound to the range [c1, c2], where c1 > -inf, and c2 < +inf, then you are guaranteed to be inside these bounds. The imprecision happens when we output that something is bound to, say, [c1, +inf], c1 > -inf, because, in this case, you could have that the upper bound of the variable wraps around, and then you may have it changing the lower limit c1. In our experiments, about 75% of the non-trivial limits that we output is [c1, c2], c1 > -inf, and c2 < +inf, and could be used to eliminate overflow tests. For instance, in gcc we output 40,043 good limits, 22,369 limits [c, +inf], 785 limits [-inf, c], and 116,637 limits [-inf, +inf].> Also, the tricky bit of range analysis is handling loops. I suggested > to Douglas that he exploit LLVM's SCEV infrastructure to discover and > exploit how induction variables are evolving rather than trying to > create his own poor man's version using interval arithmetic. How are > loops handled now?Yes, Douglas told me about this. He actually wrote a small pass that catches induction variables using SCEV. We still catch tighter intervals with our range analysis than using SCEV, but we miss some induction variables that SCEV handles. Integrating SCEV into our analysis is in the high list of priorities. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120402/0bdec6ae/attachment.html>
On 4/2/12 11:35 AM, John Regehr wrote:>> from what I understand of the analysis, to come to its conclusions it assumes >> that there is no overflow. That doesn't make it useless for removing integer >> overflow checks: you can successively walk variables, and if you can prove that >> there is no overflow of a variable X given your analysis of previously seen >> variables, then X can safely be added to the set of analysed variables. Rinse >> and repeat. However this probably doesn't work for variables inside loops. > I'd also be interested to hear more from the range analysis authors about > this. > > Remaining faithful to the C/C++ overflow rules while operating on LLVM IR > may be very tricky, particularly with respect to integer promotions and > similar constructs, which may have no representation in the IR (even > before optimization). > > An analysis that is not sound with respect to the original C/C++ semantics > would be of little use to anyone.I actually disagree on the last point. For static array bounds checking with SAFECode, we want to assume that all integer operations can overflow (including those with the nsw attribute). We don't care about the other integer semantic rules from C (like the promotion rules) or whether overflow is language defined or not (for those who don't know, unsigned ints have defined overflow in C; signed ints do not). All we want to know is whether a pointer p is within the bounds of a set of valid objects, and we just need to take integer overflow into account because it can happen on the hardware. -- John T.> > John > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> I actually disagree on the last point. For static array bounds checking with > SAFECode, we want to assume that all integer operations can overflow > (including those with the nsw attribute). We don't care about the other > integer semantic rules from C (like the promotion rules) or whether overflow > is language defined or not (for those who don't know, unsigned ints have > defined overflow in C; signed ints do not). All we want to know is whether a > pointer p is within the bounds of a set of valid objects, and we just need to > take integer overflow into account because it can happen on the hardware.I'd have to see some specific examples to figure out if what you're saying makes sense. All of my experience with integer overflows in C/C++ had lead me to the conclusion that playing games with the semantics usually gets you into hot water. John
> No, not yet. We have not used the analysis to implement any > optimization yet. So far we can only flag that variables are > dead-code, but I have not tried to remove this code. And for the C > programs that we have analyzed, the number of variables that are > dead-code, after LLVM optimizes the program, is small.Even if only a few variables become dead, it might still be the case that many dynamic branches can be eliminated. I'd encourage you to try this, particularly on the Ada codes that Duncan mentioned and the output of the integer overflow checking tool that I mentioned. One transformation that we found to be quite useful when implementing a range analysis for C was a debugging mode that inserted range assertions into the generated code. This was an easy way to spot implementation errors that otherwise would have been quite hard to detect. Do you know how your results compare to GCC's existing value range propagation pass? Anyway I think this sounds like an interesting project and potentially a useful one. John
Hi Victor,> > one of the big problems with Douglas's original range analysis was that > > it couldn't handle modulo arithmetic. > > We still have this problem. Jorge Navas, who is using our analysis, is > working on an implementation that deals with wrapping arithmetics, but > we did not have access to his code yet.can you please clarify whether the pass only looks at operations with the nsw flag (and bails out on others), or just assumes that there is no wrapping behaviour anywhere. It used to just assume that nothing wrapped, which made all stated results about bit shrinking, dead variables found etc pointless since they were based on an incorrect analysis.> > An analysis that is not sound with respect to the original C/C++ semantics > > would be of little use to anyone. > > You can use our analysis to remove overflows. If we output that a variable > is bound to the range [c1, c2], where c1 > -inf, and c2 < +inf, then you > are guaranteed to be inside these bounds.Is this really true? For example, suppose you know that X is in the range [1, +inf], and now you calculate Y = 1 / X, then you get that Y is in the range [0, 1]. But there is no guarantee that Y is really in that range: since X might have overflowed it might for example be equal to -1, in which case Y would be equal to -1 too, outside the range [0, 1]. In short, I doubt you can conclude that a variable Y is really in a range [c1, c2] just from the information that c1 > -inf and c2 < +inf. I think you also need to look at the history of how you got there. Ciao, Duncan.