Hey again, Thank you for your opinions. I will take them into consideration. A few comments... On Sun, Apr 7, 2013 at 1:39 PM, Jeffrey Yasskin <jyasskin at google.com> wrote: ...> If the performance penalty is unclear to you, that means you haven't > measured it. Until you measure, you have absolutely no business > complaining about a potential performance problem. Measure, and then > come back with numbers.Unfortunately, I am restricted from publicly sharing performance results without going through an extensive, expensive legal process. Not fun! Some thoughts though... In order to test the performance of this Clang feature, I would have to build it into my frontend. That's not cost effective for me for the following reason. It seems to me, a priori, that the code currently generated by Clang would indeed have a performance penalty on an inorder processor, without branch prediction. Take Xeon Phi for example. Albeit, a small penalty. Please correct me if my assumptions are incorrect. Our team's culture dictates that "an instruction is an instruction", hence a performance problem. I understand that "performance problem" will have different definitions among different tribes.> Although, I've been contemplating x86-64's behaviour for this case when > > floating point traps are disabled. Ideally, the compiler should preserve > > that behaviour, which might make this software implementation messy. > > Especially if different processors have different implementations. The > > simplest solution... let the hardware behave as it should. > > To be clear, you're asking to turn off a set of optimizations. That > is, you're asking to make code in general run slower, so that you can > get a particular behavior on some CPUs in unusual cases. >I respectfully disagree. I am asking for an *option* to turn off a set of optimizations; not turn off optimizations in general. I would like to make it easy for a compiler implementor to choose the desired behaviour. I whole-heartedly believe that both behaviours (undefined and trap) have merit. To digress in the interest of light-heartedness, this reminds me of the old joke "my program's performance improved 20x!, but the results aren't correct". :)> >> You might need to > >> do this in the processor-specific backend to avoid other > >> undefined-behavior-based optimizations—that is, recognize "if (x == 0) > >> goto err_handler; else y/x;" and replace it with > >> "register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;". > > > > > > I believe that the constant folder would remove the constant division by > > zero and conditional before the backend could have its say. We would be > left > > with only the jump to the error handler. That may complicate things. > > If the compiler can prove x==0, then yes, you'd be left with just a > jump to the error handler. That's more efficient than handling a > hardware trap, so it's what you ought to want. >I would like a trap. I.e. x86-64's expected behaviour. I would also not like a branch on non-constant integer divisions. As a reminder, this discussion originated in the constant folder. The non-constant behaviour works just fine. Thanks again, Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130407/06d19ff4/attachment.html>
I think this entire conversation is going a bit off the rails. Let's try to stay focused on the specific request, and why there (may) be problems with it. On Sun, Apr 7, 2013 at 11:50 AM, Cameron McInally <cameron.mcinally at nyu.edu>wrote:> To be clear, you're asking to turn off a set of optimizations. That >> is, you're asking to make code in general run slower, so that you can >> get a particular behavior on some CPUs in unusual cases. >> > > I respectfully disagree. I am asking for an *option* to turn off a set of > optimizations; not turn off optimizations in general. I would like to make > it easy for a compiler implementor to choose the desired behaviour. I > whole-heartedly believe that both behaviours (undefined and trap) have > merit. >I think you're both misconstruing what this would involve. You're actually asking for the formal model of the LLVM IR to be *parameterized*. In one mode, an instruction would produce undefined behavior on division, and in another mode it would produce a trap. Then you are asking for the optimizer stack to support either semantic model, and produce efficient code regardless. This is completely intractable for LLVM to support. It would make both the optimizers and the developers of LLVM crazy to have deep parameterization of the fundamental semantic model for integer division. The correct way to support *exactly* reproducing the architectural peculiarities of the x86-64 integer divide instruction is to add a target-specific intrinsic which does this. It will have defined behavior (of trapping in some cases) as you want, and you can emit this in your FE easily. However, even this has the risk of incurring a high maintenance burden. If you want much in the way of optimizations of this intrinsic, you'll have to go through the optimizer and teach each pass about your intrinsic. Some of these will be easy, but some will be hard and there will be a *lot* of them. =/ Cameron, you (and others interested) will certainly need to provide all of the patches and work to support this if you think this is an important use case, as the existing developers have found other trade-offs and solutions. And even then, if it requires really substantial changes to the optimizer, I'm not sure it's worth pursuing this in LLVM. My primary concerns are two-fold. First, I think that the amount of work required to recover the optimizations which could theoretically apply to both of these operations will be massive. Second, I fear that after having done this work, you will immediately find the need to remove some other undefined behavior from the IR which happens to have defined behavior on x86-64. Fundamentally, the idea of undefined behavior is at the core of the design of LLVM's optimizers. It is leveraged everywhere, and without it many algorithms that are fast would become slow, transformations that are cheap would become expensive, passes that operate locally would be forced to operate across ever growing scopes in order to be certain the optimizations applied in this specific case. Trying to remove undefined behavior from LLVM seems unlikely to be a productive pursuit. More productive (IMO) is to emit explicit guards against the undefined behavior in your language, much as -fsanitize does for undefined behavior in C++. Then work to build a mode where a specific target can take advantage of target specific trapping behaviors to emit these guards more efficiently. This will allow LLVM's optimizers to continue to function in the world they were designed for, and with a set of rules that we know how to build efficient optimizers around, and your source programs can operate in a world with checked behavior rather than undefined behavior. As a useful side-effect, you can defer the target-specific optimizations until you have benchmarks (internally is fine!) and can demonstrate the performance problems (if any). Cameron, you may disagree, but honestly if you were to convince folks here I think it would have happened already. I'm not likely to continue the theoretical debate about whether LLVM's stance on UB (as I've described above) is a "good" or "bad" stance. Not that I wouldn't enjoy the debate (especially at a bar some time), but because I fear it isn't a productive way to spend the time of folks on this list. So let's try to stick to the technical discussion of strategies, costs, and tradeoffs. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130407/da362a0a/attachment.html>
Well put, Chandler! On Sun, Apr 7, 2013 at 6:23 PM, Chandler Carruth <chandlerc at google.com>wrote:> I think this entire conversation is going a bit off the rails. Let's try > to stay focused on the specific request, and why there (may) be problems > with it. > > On Sun, Apr 7, 2013 at 11:50 AM, Cameron McInally < > cameron.mcinally at nyu.edu> wrote: > >> To be clear, you're asking to turn off a set of optimizations. That >>> is, you're asking to make code in general run slower, so that you can >>> get a particular behavior on some CPUs in unusual cases. >>> >> >> I respectfully disagree. I am asking for an *option* to turn off a set of >> optimizations; not turn off optimizations in general. I would like to make >> it easy for a compiler implementor to choose the desired behaviour. I >> whole-heartedly believe that both behaviours (undefined and trap) have >> merit. >> > > I think you're both misconstruing what this would involve. > > You're actually asking for the formal model of the LLVM IR to be > *parameterized*. In one mode, an instruction would produce undefined > behavior on division, and in another mode it would produce a trap. Then you > are asking for the optimizer stack to support either semantic model, and > produce efficient code regardless. > > This is completely intractable for LLVM to support. It would make both the > optimizers and the developers of LLVM crazy to have deep parameterization > of the fundamental semantic model for integer division. > > The correct way to support *exactly* reproducing the architectural > peculiarities of the x86-64 integer divide instruction is to add a > target-specific intrinsic which does this. It will have defined behavior > (of trapping in some cases) as you want, and you can emit this in your FE > easily. However, even this has the risk of incurring a high maintenance > burden. If you want much in the way of optimizations of this intrinsic, > you'll have to go through the optimizer and teach each pass about your > intrinsic. Some of these will be easy, but some will be hard and there will > be a *lot* of them. =/ > > > Cameron, you (and others interested) will certainly need to provide all of > the patches and work to support this if you think this is an important use > case, as the existing developers have found other trade-offs and solutions. > And even then, if it requires really substantial changes to the optimizer, > I'm not sure it's worth pursuing this in LLVM. My primary concerns are > two-fold. First, I think that the amount of work required to recover the > optimizations which could theoretically apply to both of these operations > will be massive. Second, I fear that after having done this work, you will > immediately find the need to remove some other undefined behavior from the > IR which happens to have defined behavior on x86-64. >Alas, I must have been shortsighted. For my purposes, I had envisioned using this target-specific intrinsic only when undefined behaviour was imminent. That information is available before the IR and it would work-around the constant folder. I did not anticipate needing optimizations around that intrinsic, since it would ultimately trap. Supporting the intrinsic as a proper alternative to the integer division operator(s) sounds like a lot of work. I do not believe that the reward is worth the effort, at least for my purposes. Others may feel different.> Fundamentally, the idea of undefined behavior is at the core of the design > of LLVM's optimizers. It is leveraged everywhere, and without it many > algorithms that are fast would become slow, transformations that are cheap > would become expensive, passes that operate locally would be forced to > operate across ever growing scopes in order to be certain the optimizations > applied in this specific case. Trying to remove undefined behavior from > LLVM seems unlikely to be a productive pursuit. >Fair enough.> More productive (IMO) is to emit explicit guards against the undefined > behavior in your language, much as -fsanitize does for undefined behavior > in C++. Then work to build a mode where a specific target can take > advantage of target specific trapping behaviors to emit these guards more > efficiently. This will allow LLVM's optimizers to continue to function in > the world they were designed for, and with a set of rules that we know how > to build efficient optimizers around, and your source programs can operate > in a world with checked behavior rather than undefined behavior. As a > useful side-effect, you can defer the target-specific optimizations until > you have benchmarks (internally is fine!) and can demonstrate the > performance problems (if any). >Regrettably, this implementation does not suit my needs. The constant folding would still occur and I would like to produce the actual division, since the instruction is non-maskable on x86. Others may have a better use for this implementation though, so I don't want to shoot the idea down for everyone.> Cameron, you may disagree, but honestly if you were to convince folks here > I think it would have happened already. I'm not likely to continue the > theoretical debate about whether LLVM's stance on UB (as I've described > above) is a "good" or "bad" stance. Not that I wouldn't enjoy the debate > (especially at a bar some time), but because I fear it isn't a productive > way to spend the time of folks on this list. So let's try to stick to the > technical discussion of strategies, costs, and tradeoffs. >Oh, no. Your analysis was thorough and I can sympathize with it. The seams of C/C++ and the x86 architecture are foggy. I understand that my interpretation of their interactions is not gospel. Thanks again for the thoughtful reply! -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130408/b4aa72d9/attachment.html>
Chandler Carruth <chandlerc at google.com> writes:> You're actually asking for the formal model of the LLVM IR to be > *parameterized*. In one mode, an instruction would produce undefined > behavior on division, and in another mode it would produce a trap. > Then you are asking for the optimizer stack to support either semantic > model, and produce efficient code regardless. > > This is completely intractable for LLVM to support. It would make both > the optimizers and the developers of LLVM crazy to have deep > parameterization of the fundamental semantic model for integer > division.You're making this *way* too complicated. We have plenty of examples of options to turn off optimizations. Most compliers provide options to preserve traps. The division would still semantically have undefined behavior. The implementation would simply make the behavior a trap instead of returning some random garbage value. Sure, it *may* change how other code is optimized but that is the choice the implementor makes when choosing to use the option to preserve traps. LLVM developers not concerned with preserving traps need not give it a second thought.> The correct way to support *exactly* reproducing the architectural > peculiarities of the x86-64 integer divide instruction is to add a > target-specific intrinsic which does this. It will have defined > behavior (of trapping in some cases) as you want, and you can emit > this in your FE easily.You can't do that if the FE doesn't see the constant expression. Optimization may reveal it later.> Cameron, you (and others interested) will certainly need to provide > all of the patches and work to support this if you think this is an > important use case, as the existing developers have found other > trade-offs and solutions.Certainly. A software test+trap is theoretically possible and the target optimizer could theoretically get rid of it, but I share Cameron's concern about the work required to turn theory into reality. He's not asking to redefine the LLVM IR. He's asking for an option to control the implementation of that IR. Preserving traps is a real-world need. LLVM itself doesn't currently provide a way to do it. It seems like it should have one and not rely on particular frontends. It's supposed to be an independent set of libraries, right? -David