Well put, Chandler! On Sun, Apr 7, 2013 at 6:23 PM, Chandler Carruth <chandlerc at google.com>wrote:> I think this entire conversation is going a bit off the rails. Let's try > to stay focused on the specific request, and why there (may) be problems > with it. > > On Sun, Apr 7, 2013 at 11:50 AM, Cameron McInally < > cameron.mcinally at nyu.edu> wrote: > >> To be clear, you're asking to turn off a set of optimizations. That >>> is, you're asking to make code in general run slower, so that you can >>> get a particular behavior on some CPUs in unusual cases. >>> >> >> I respectfully disagree. I am asking for an *option* to turn off a set of >> optimizations; not turn off optimizations in general. I would like to make >> it easy for a compiler implementor to choose the desired behaviour. I >> whole-heartedly believe that both behaviours (undefined and trap) have >> merit. >> > > I think you're both misconstruing what this would involve. > > You're actually asking for the formal model of the LLVM IR to be > *parameterized*. In one mode, an instruction would produce undefined > behavior on division, and in another mode it would produce a trap. Then you > are asking for the optimizer stack to support either semantic model, and > produce efficient code regardless. > > This is completely intractable for LLVM to support. It would make both the > optimizers and the developers of LLVM crazy to have deep parameterization > of the fundamental semantic model for integer division. > > The correct way to support *exactly* reproducing the architectural > peculiarities of the x86-64 integer divide instruction is to add a > target-specific intrinsic which does this. It will have defined behavior > (of trapping in some cases) as you want, and you can emit this in your FE > easily. However, even this has the risk of incurring a high maintenance > burden. If you want much in the way of optimizations of this intrinsic, > you'll have to go through the optimizer and teach each pass about your > intrinsic. Some of these will be easy, but some will be hard and there will > be a *lot* of them. =/ > > > Cameron, you (and others interested) will certainly need to provide all of > the patches and work to support this if you think this is an important use > case, as the existing developers have found other trade-offs and solutions. > And even then, if it requires really substantial changes to the optimizer, > I'm not sure it's worth pursuing this in LLVM. My primary concerns are > two-fold. First, I think that the amount of work required to recover the > optimizations which could theoretically apply to both of these operations > will be massive. Second, I fear that after having done this work, you will > immediately find the need to remove some other undefined behavior from the > IR which happens to have defined behavior on x86-64. >Alas, I must have been shortsighted. For my purposes, I had envisioned using this target-specific intrinsic only when undefined behaviour was imminent. That information is available before the IR and it would work-around the constant folder. I did not anticipate needing optimizations around that intrinsic, since it would ultimately trap. Supporting the intrinsic as a proper alternative to the integer division operator(s) sounds like a lot of work. I do not believe that the reward is worth the effort, at least for my purposes. Others may feel different.> Fundamentally, the idea of undefined behavior is at the core of the design > of LLVM's optimizers. It is leveraged everywhere, and without it many > algorithms that are fast would become slow, transformations that are cheap > would become expensive, passes that operate locally would be forced to > operate across ever growing scopes in order to be certain the optimizations > applied in this specific case. Trying to remove undefined behavior from > LLVM seems unlikely to be a productive pursuit. >Fair enough.> More productive (IMO) is to emit explicit guards against the undefined > behavior in your language, much as -fsanitize does for undefined behavior > in C++. Then work to build a mode where a specific target can take > advantage of target specific trapping behaviors to emit these guards more > efficiently. This will allow LLVM's optimizers to continue to function in > the world they were designed for, and with a set of rules that we know how > to build efficient optimizers around, and your source programs can operate > in a world with checked behavior rather than undefined behavior. As a > useful side-effect, you can defer the target-specific optimizations until > you have benchmarks (internally is fine!) and can demonstrate the > performance problems (if any). >Regrettably, this implementation does not suit my needs. The constant folding would still occur and I would like to produce the actual division, since the instruction is non-maskable on x86. Others may have a better use for this implementation though, so I don't want to shoot the idea down for everyone.> Cameron, you may disagree, but honestly if you were to convince folks here > I think it would have happened already. I'm not likely to continue the > theoretical debate about whether LLVM's stance on UB (as I've described > above) is a "good" or "bad" stance. Not that I wouldn't enjoy the debate > (especially at a bar some time), but because I fear it isn't a productive > way to spend the time of folks on this list. So let's try to stick to the > technical discussion of strategies, costs, and tradeoffs. >Oh, no. Your analysis was thorough and I can sympathize with it. The seams of C/C++ and the x86 architecture are foggy. I understand that my interpretation of their interactions is not gospel. Thanks again for the thoughtful reply! -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130408/b4aa72d9/attachment.html>
On Sun, Apr 7, 2013 at 9:28 PM, Cameron McInally <cameron.mcinally at nyu.edu>wrote:> More productive (IMO) is to emit explicit guards against the undefined >> behavior in your language, much as -fsanitize does for undefined behavior >> in C++. Then work to build a mode where a specific target can take >> advantage of target specific trapping behaviors to emit these guards more >> efficiently. This will allow LLVM's optimizers to continue to function in >> the world they were designed for, and with a set of rules that we know how >> to build efficient optimizers around, and your source programs can operate >> in a world with checked behavior rather than undefined behavior. As a >> useful side-effect, you can defer the target-specific optimizations until >> you have benchmarks (internally is fine!) and can demonstrate the >> performance problems (if any). >> > > Regrettably, this implementation does not suit my needs. >I'm curious why you think so... I think it would very closely match your needs:> The constant folding would still occur >No? At least, I don't see why. I don't know what your frontend is doing (but my impression is that it is not Clang? If I'm wrong, let me know...) but the common idiom is to emit nothing as a constant other than immediate values, and let LLVM's optimizers figure it out. A consequence of this pattern combined with inserting guards is that (at most) the optimizer will fold directly to the trap. You should be able to control the exact structure in the FE though. and I would like to produce the actual division, since the instruction is> non-maskable on x86. >Yes, and this is the point I was driving at with the comments about a target specific optimization. It is reasonable for the x86 backend to recognize the pattern of testing for a zero divisor and trapping if it occurs, and transform that into an unconditional divide relying on the hardware trap instead. I think it is not unreasonable to have this strategy result in one of two generated code patterns in the overwhelming majority of cases: 1) An unconditional trap because the optimizer proved divide by zero 2) A direct divide instruction which relies on the x86 trapping behavior. That said, it remains a fairly significant amount of work to arrive at this place. Just trying to point out a reasonable approach to arriving there. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130407/a17842d2/attachment.html>
On Mon, Apr 8, 2013 at 1:34 AM, Chandler Carruth <chandlerc at google.com>wrote:> On Sun, Apr 7, 2013 at 9:28 PM, Cameron McInally <cameron.mcinally at nyu.edu > > wrote: > >> More productive (IMO) is to emit explicit guards against the undefined >>> behavior in your language, much as -fsanitize does for undefined behavior >>> in C++. Then work to build a mode where a specific target can take >>> advantage of target specific trapping behaviors to emit these guards more >>> efficiently. This will allow LLVM's optimizers to continue to function in >>> the world they were designed for, and with a set of rules that we know how >>> to build efficient optimizers around, and your source programs can operate >>> in a world with checked behavior rather than undefined behavior. As a >>> useful side-effect, you can defer the target-specific optimizations until >>> you have benchmarks (internally is fine!) and can demonstrate the >>> performance problems (if any). >>> >> >> Regrettably, this implementation does not suit my needs. >> > > I'm curious why you think so... I think it would very closely match your > needs: >It's my current understanding that the actual division would be constant folded away and I would be left with only the guard. Later, the guard would be proved always true. I could, of course, be mistaken. I checked out Clang with -fsanitize=integer-divide-by-zero this weekend and saw similar in the assembly. I also noticed that the division operands are still around in the assembly. If it would be possible to recreate the div instructions, I would be thrilled.> > >> The constant folding would still occur >> > > No? At least, I don't see why. > > I don't know what your frontend is doing (but my impression is that it is > not Clang? If I'm wrong, let me know...) >It is not Clang. Our compiler has proprietary frontends, optimizer, vectorizer, and more. You may have noticed that I don't often share LLVM IR on the list. Our IR has tons of proprietary changes to it. But, I digress.> but the common idiom is to emit nothing as a constant other than immediate > values, and let LLVM's optimizers figure it out. >Our pre-LLVM optimizer may be a culprit here. With the original test case I provided, our inliner will inline both foo(...) and bar(...). In main(), we end up calling CreateSDiv in the IRBuilder with two constants, i.e. (6/0). Would Clang snap these constants into temporaries? Or, does Clang maintain the original source structure? Sorry for these simple questions, I have no insight to Clang at all.> A consequence of this pattern combined with inserting guards is that (at > most) the optimizer will fold directly to the trap. You should be able to > control the exact structure in the FE though. >> and I would like to produce the actual division, since the instruction is >> non-maskable on x86. >> > > Yes, and this is the point I was driving at with the comments about a > target specific optimization. It is reasonable for the x86 backend to > recognize the pattern of testing for a zero divisor and trapping if it > occurs, and transform that into an unconditional divide relying on the > hardware trap instead. I think it is not unreasonable to have this strategy > result in one of two generated code patterns in the overwhelming majority > of cases: >I'm going to be nitpicky over this. It's really not my intention to be unreasonable. I hope you can understand.> 1) An unconditional trap because the optimizer proved divide by zero >This isn't desirable for me. I am afraid an unconditional trap in the assembly will just confuse a user. I suspect that a user will trigger this code and immediately report a compiler bug that we've inserted a rogue trap.> 2) A direct divide instruction which relies on the x86 trapping behavior. >This would be great. Would I be able to accurately recreate the actual constant divide that was folded away? Originally, I had suspected that the operands would be long gone or at least intractable. Again, this may sound unreasonable, but I would like to keep the division intact so that a user can see where their code went wrong in the assembly. Sorry in advance if I'm misunderstanding. Thanks again, Chandler. -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130408/5375d098/attachment.html>
Chandler Carruth <chandlerc at google.com> writes:> Yes, and this is the point I was driving at with the comments about a > target specific optimization. It is reasonable for the x86 backend to > recognize the pattern of testing for a zero divisor and trapping if it > occurs, and transform that into an unconditional divide relying on the > hardware trap instead. I think it is not unreasonable to have this > strategy result in one of two generated code patterns in the > overwhelming majority of cases: > > 1) An unconditional trap because the optimizer proved divide by zero > 2) A direct divide instruction which relies on the x86 trapping > behavior. > > That said, it remains a fairly significant amount of work to arrive at > this place. Just trying to point out a reasonable approach to arriving > there.I agree that this is potentially a good approach. I also agree that it is a significant amount of work. Given infinite resources, I think your solution makes the most sense. Given limited resources, I'm not sure. But Cameron has said he won't push anything upstream that people aren't comfortable with. -David