Hey Jeffrey, Thanks for the suggestion. A few comments... On Sun, Apr 7, 2013 at 12:31 PM, Jeffrey Yasskin <jyasskin at googlers.com>wrote: ...> > If you can find a way to implement -fsanitize=undefined to use the FP > trap instead of a branch when checking floating division by 0, I > suspect such a patch would stand a good chance of being accepted. > (Although I'm not intimately familiar with the implementation, so > there could be some subtlety that would prevent it.)This would work well for the constant expression division case, since the division by zero expression would be folded away leaving only a call to trap. The performance penalty for using -fsanitize on other non-constant division expressions is still unclear to me; and concerning. Although, I've been contemplating x86-64's behaviour for this case when floating point traps are disabled. Ideally, the compiler should preserve that behaviour, which might make this software implementation messy. Especially if different processors have different implementations. The simplest solution... let the hardware behave as it should.> You might need to > do this in the processor-specific backend to avoid other > undefined-behavior-based optimizations—that is, recognize "if (x == 0) > goto err_handler; else y/x;" and replace it with > "register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;". >I believe that the constant folder would remove the constant division by zero and conditional before the backend could have its say. We would be left with only the jump to the error handler. That may complicate things. Please correct me if I am misunderstanding. Thanks again, Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130407/7a236fd4/attachment.html>
On Sun, Apr 7, 2013 at 7:24 PM, Cameron McInally <cameron.mcinally at nyu.edu> wrote:> Hey Jeffrey, > > Thanks for the suggestion. A few comments... > > On Sun, Apr 7, 2013 at 12:31 PM, Jeffrey Yasskin <jyasskin at googlers.com> > wrote: > ... >> >> If you can find a way to implement -fsanitize=undefined to use the FP >> trap instead of a branch when checking floating division by 0, I >> suspect such a patch would stand a good chance of being accepted. >> (Although I'm not intimately familiar with the implementation, so >> there could be some subtlety that would prevent it.) > > > This would work well for the constant expression division case, since the > division by zero expression would be folded away leaving only a call to > trap. The performance penalty for using -fsanitize on other non-constant > division expressions is still unclear to me; and concerning.If the performance penalty is unclear to you, that means you haven't measured it. Until you measure, you have absolutely no business complaining about a potential performance problem. Measure, and then come back with numbers.> Although, I've been contemplating x86-64's behaviour for this case when > floating point traps are disabled. Ideally, the compiler should preserve > that behaviour, which might make this software implementation messy. > Especially if different processors have different implementations. The > simplest solution... let the hardware behave as it should.To be clear, you're asking to turn off a set of optimizations. That is, you're asking to make code in general run slower, so that you can get a particular behavior on some CPUs in unusual cases.>> You might need to >> do this in the processor-specific backend to avoid other >> undefined-behavior-based optimizations—that is, recognize "if (x == 0) >> goto err_handler; else y/x;" and replace it with >> "register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;". > > > I believe that the constant folder would remove the constant division by > zero and conditional before the backend could have its say. We would be left > with only the jump to the error handler. That may complicate things.If the compiler can prove x==0, then yes, you'd be left with just a jump to the error handler. That's more efficient than handling a hardware trap, so it's what you ought to want. In cases of non-constant division by zero, the branch wouldn't usually get removed, and the backend would get a chance to optimize it.
Hey again, Thank you for your opinions. I will take them into consideration. A few comments... On Sun, Apr 7, 2013 at 1:39 PM, Jeffrey Yasskin <jyasskin at google.com> wrote: ...> If the performance penalty is unclear to you, that means you haven't > measured it. Until you measure, you have absolutely no business > complaining about a potential performance problem. Measure, and then > come back with numbers.Unfortunately, I am restricted from publicly sharing performance results without going through an extensive, expensive legal process. Not fun! Some thoughts though... In order to test the performance of this Clang feature, I would have to build it into my frontend. That's not cost effective for me for the following reason. It seems to me, a priori, that the code currently generated by Clang would indeed have a performance penalty on an inorder processor, without branch prediction. Take Xeon Phi for example. Albeit, a small penalty. Please correct me if my assumptions are incorrect. Our team's culture dictates that "an instruction is an instruction", hence a performance problem. I understand that "performance problem" will have different definitions among different tribes.> Although, I've been contemplating x86-64's behaviour for this case when > > floating point traps are disabled. Ideally, the compiler should preserve > > that behaviour, which might make this software implementation messy. > > Especially if different processors have different implementations. The > > simplest solution... let the hardware behave as it should. > > To be clear, you're asking to turn off a set of optimizations. That > is, you're asking to make code in general run slower, so that you can > get a particular behavior on some CPUs in unusual cases. >I respectfully disagree. I am asking for an *option* to turn off a set of optimizations; not turn off optimizations in general. I would like to make it easy for a compiler implementor to choose the desired behaviour. I whole-heartedly believe that both behaviours (undefined and trap) have merit. To digress in the interest of light-heartedness, this reminds me of the old joke "my program's performance improved 20x!, but the results aren't correct". :)> >> You might need to > >> do this in the processor-specific backend to avoid other > >> undefined-behavior-based optimizations—that is, recognize "if (x == 0) > >> goto err_handler; else y/x;" and replace it with > >> "register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;". > > > > > > I believe that the constant folder would remove the constant division by > > zero and conditional before the backend could have its say. We would be > left > > with only the jump to the error handler. That may complicate things. > > If the compiler can prove x==0, then yes, you'd be left with just a > jump to the error handler. That's more efficient than handling a > hardware trap, so it's what you ought to want. >I would like a trap. I.e. x86-64's expected behaviour. I would also not like a branch on non-constant integer divisions. As a reminder, this discussion originated in the constant folder. The non-constant behaviour works just fine. Thanks again, Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130407/06d19ff4/attachment.html>
Jeffrey Yasskin <jyasskin at google.com> writes:>> This would work well for the constant expression division case, since the >> division by zero expression would be folded away leaving only a call to >> trap. The performance penalty for using -fsanitize on other non-constant >> division expressions is still unclear to me; and concerning. > > If the performance penalty is unclear to you, that means you haven't > measured it. Until you measure, you have absolutely no business > complaining about a potential performance problem. Measure, and then > come back with numbers.Oh come on. He's expression concern, not saying it's a definite problem or complaining.>> Although, I've been contemplating x86-64's behaviour for this case when >> floating point traps are disabled. Ideally, the compiler should preserve >> that behaviour, which might make this software implementation messy. >> Especially if different processors have different implementations. The >> simplest solution... let the hardware behave as it should. > > To be clear, you're asking to turn off a set of optimizations. That > is, you're asking to make code in general run slower, so that you can > get a particular behavior on some CPUs in unusual cases.No. He's asking to turn off a particular optimization case for a certain _known_ constant expression. Furthermore, he's asking for an option, not to blindly turn it off for everyone.> If the compiler can prove x==0, then yes, you'd be left with just a > jump to the error handler. That's more efficient than handling a > hardware trap, so it's what you ought to want.Not necessarily. See my example of using traps to aid debugging. A jump to a handler would disrupt machine state. Perhaps not greatly but it is an extra complication users don't expect. -David