David Chisnall via llvm-dev
2016-Apr-09 06:46 UTC
[llvm-dev] Implementing a proposed InstCombine optimization
It’s definitely one that would need some target hooks, and is probably not actually worth doing without analysing the producers and consumers of the value. If the source and destination values need to be in floating point registers, the cost of FPR<->GPR moves is likely to be a lot higher than the cost of the subtract, even if the xor is free. If the results are going to end up in integer registers or memory, then the xor version is probably cheaper (though, even there, it may be better for register pressure to keep the results in FPRs). I’d expect that most users of this pattern are immediately followed by a branch on the result. On some architectures, that can become a branch on a floating point condition code, but on others it’s going to be a move to GPR, which means that you lose the win entirely. David> On 9 Apr 2016, at 02:44, Alex Rosenberg via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > This doesn't seem like a good idea to me. There are many architectures where those bitcasts are free operations and the xor will be executed in a shorter pipe than any FP op would. Cell SPU, for example. > > This could introduce new FP exceptions. It's also likely to be much worse on platforms with no FPU like early MIPS. > > Alex > > On Apr 7, 2016, at 9:43 AM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > >> I am not entirely sure this is safe. Transforming this to an fsub could change the value stored on platforms that implement negates using arithmetic instead of with bitmath (such as ours) and either canonicalize NaNs or don’t support denormals. This is actually important because this kind of bitmath on floats is very commonly used as part of algorithms for complex math functions that need to get precise bit patterns from the source (similarly for the transformation of masking off the sign bit -> fabs). It’s also important because if the float happens to “really” be an integer, it’s highly likely we’ll end up zero-flushing it and losing the data. >> >> Example: >> >> a = load float >> b = bitcast a to int >> c = xor b, signbit >> d = bitcast c to float >> store d >> >> Personally I would feel this is safe if and only if the float is coming from an arithmetic operation — in that case, we know that doing another arithmetic operation on it should be safe, since it’s already canonalized and can’t be a denorm [if the platform doesn’t support them]. >> >> I say this coming only a few weeks after our team spent literally dozens of human-hours tracking down an extremely obscure bug involving a GL conformance test in which ints were casted to floats, manipulated with float instructions, then sent back to int, resulting in the ints being flushed to zero and the test failing. >> >> —escha >> >>> On Apr 7, 2016, at 9:09 AM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> >>> Hi Carlos - >>> >>> That sounds like a good patch. >>> >>> Warning - following the link below may remove some of the educational joy for the immediate task at hand: >>> http://reviews.llvm.org/D13076 >>> >>> ...but I wouldn't worry too much, there's plenty more opportunity where that came from. :) >>> >>> Feel free to post follow-up questions here or via a patch review on Phabricator: >>> http://llvm.org/docs/Phabricator.html >>> >>> >>> On Thu, Apr 7, 2016 at 7:17 AM, Carlos Liam via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> Hi, >>> >>> I'm interested in implementing an InstCombine optimization that I discovered and verified with Alive-NJ (with the help of the authors of Alive-NJ). The optimization is described in Alive-NJ format as follows: >>> >>> Name: xor->fsub >>> Pre: isSignBit(C) >>> %x = bitcast %A >>> %y = xor %x, C >>> %z = bitcast %y >>> => >>> %z = fsub -0.0, %A >>> >>> Effectively the optimization targets code that casts a float to an int with the same width, XORs the sign bit, and casts back to float, and replaces it with a subtraction from -0.0. >>> >>> I am not very familiar with C++ or the LLVM codebase so I would greatly appreciate some help in writing a patch adding this optimization. >>> >>> Thanks in advance. >>> >>> - CL >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
via llvm-dev
2016-Apr-09 15:04 UTC
[llvm-dev] Implementing a proposed InstCombine optimization
Personal feeling: LLVM should not assume anything about relative costs of floats or ints (if we want to, there should be some sort of target hooks involved). There are some targets where float costs 100 times more than int, and there are some where int costs lots more than float, so I don’t think it’s obvious exactly what is and isn’t canonical for anything where one is choosing between float and int ops, even ignoring issues of correctness. —escha> On Apr 8, 2016, at 11:46 PM, David Chisnall <David.Chisnall at cl.cam.ac.uk> wrote: > > It’s definitely one that would need some target hooks, and is probably not actually worth doing without analysing the producers and consumers of the value. If the source and destination values need to be in floating point registers, the cost of FPR<->GPR moves is likely to be a lot higher than the cost of the subtract, even if the xor is free. If the results are going to end up in integer registers or memory, then the xor version is probably cheaper (though, even there, it may be better for register pressure to keep the results in FPRs). > > I’d expect that most users of this pattern are immediately followed by a branch on the result. On some architectures, that can become a branch on a floating point condition code, but on others it’s going to be a move to GPR, which means that you lose the win entirely. > > David > >> On 9 Apr 2016, at 02:44, Alex Rosenberg via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> This doesn't seem like a good idea to me. There are many architectures where those bitcasts are free operations and the xor will be executed in a shorter pipe than any FP op would. Cell SPU, for example. >> >> This could introduce new FP exceptions. It's also likely to be much worse on platforms with no FPU like early MIPS. >> >> Alex >> >> On Apr 7, 2016, at 9:43 AM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >>> I am not entirely sure this is safe. Transforming this to an fsub could change the value stored on platforms that implement negates using arithmetic instead of with bitmath (such as ours) and either canonicalize NaNs or don’t support denormals. This is actually important because this kind of bitmath on floats is very commonly used as part of algorithms for complex math functions that need to get precise bit patterns from the source (similarly for the transformation of masking off the sign bit -> fabs). It’s also important because if the float happens to “really” be an integer, it’s highly likely we’ll end up zero-flushing it and losing the data. >>> >>> Example: >>> >>> a = load float >>> b = bitcast a to int >>> c = xor b, signbit >>> d = bitcast c to float >>> store d >>> >>> Personally I would feel this is safe if and only if the float is coming from an arithmetic operation — in that case, we know that doing another arithmetic operation on it should be safe, since it’s already canonalized and can’t be a denorm [if the platform doesn’t support them]. >>> >>> I say this coming only a few weeks after our team spent literally dozens of human-hours tracking down an extremely obscure bug involving a GL conformance test in which ints were casted to floats, manipulated with float instructions, then sent back to int, resulting in the ints being flushed to zero and the test failing. >>> >>> —escha >>> >>>> On Apr 7, 2016, at 9:09 AM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>> >>>> Hi Carlos - >>>> >>>> That sounds like a good patch. >>>> >>>> Warning - following the link below may remove some of the educational joy for the immediate task at hand: >>>> http://reviews.llvm.org/D13076 >>>> >>>> ...but I wouldn't worry too much, there's plenty more opportunity where that came from. :) >>>> >>>> Feel free to post follow-up questions here or via a patch review on Phabricator: >>>> http://llvm.org/docs/Phabricator.html >>>> >>>> >>>> On Thu, Apr 7, 2016 at 7:17 AM, Carlos Liam via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>> Hi, >>>> >>>> I'm interested in implementing an InstCombine optimization that I discovered and verified with Alive-NJ (with the help of the authors of Alive-NJ). The optimization is described in Alive-NJ format as follows: >>>> >>>> Name: xor->fsub >>>> Pre: isSignBit(C) >>>> %x = bitcast %A >>>> %y = xor %x, C >>>> %z = bitcast %y >>>> => >>>> %z = fsub -0.0, %A >>>> >>>> Effectively the optimization targets code that casts a float to an int with the same width, XORs the sign bit, and casts back to float, and replaces it with a subtraction from -0.0. >>>> >>>> I am not very familiar with C++ or the LLVM codebase so I would greatly appreciate some help in writing a patch adding this optimization. >>>> >>>> Thanks in advance. >>>> >>>> - CL >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Sanjay Patel via llvm-dev
2016-Apr-10 22:40 UTC
[llvm-dev] Implementing a proposed InstCombine optimization
Yes, I agree with that assessment now. And given the FP correctness issues raised, I don't see any hope for D18874 as-is. Creating an fneg intrinsic (or IR instruction?) was proposed in D18874, and I think that's been considered before (but rejected?). I don't understand what effects that would have. Now that we have raised the question of FP correctness, I think we need to answer the question: what can target-independent IR passes assume about the underlying LLVM IR FP machine? We'd like to be flexible enough to handle a target that doesn't support denorms. Are there other considerations? Is it safe to do *any* FP transforms in InstCombine? On Sat, Apr 9, 2016 at 9:04 AM, via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Personal feeling: LLVM should not assume anything about relative costs of > floats or ints (if we want to, there should be some sort of target hooks > involved). There are some targets where float costs 100 times more than > int, and there are some where int costs lots more than float, so I don’t > think it’s obvious exactly what is and isn’t canonical for anything where > one is choosing between float and int ops, even ignoring issues of > correctness. > > —escha > > > On Apr 8, 2016, at 11:46 PM, David Chisnall <David.Chisnall at cl.cam.ac.uk> > wrote: > > > > It’s definitely one that would need some target hooks, and is probably > not actually worth doing without analysing the producers and consumers of > the value. If the source and destination values need to be in floating > point registers, the cost of FPR<->GPR moves is likely to be a lot higher > than the cost of the subtract, even if the xor is free. If the results are > going to end up in integer registers or memory, then the xor version is > probably cheaper (though, even there, it may be better for register > pressure to keep the results in FPRs). > > > > I’d expect that most users of this pattern are immediately followed by a > branch on the result. On some architectures, that can become a branch on a > floating point condition code, but on others it’s going to be a move to > GPR, which means that you lose the win entirely. > > > > David > > > >> On 9 Apr 2016, at 02:44, Alex Rosenberg via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> > >> This doesn't seem like a good idea to me. There are many architectures > where those bitcasts are free operations and the xor will be executed in a > shorter pipe than any FP op would. Cell SPU, for example. > >> > >> This could introduce new FP exceptions. It's also likely to be much > worse on platforms with no FPU like early MIPS. > >> > >> Alex > >> > >> On Apr 7, 2016, at 9:43 AM, via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > >> > >>> I am not entirely sure this is safe. Transforming this to an fsub > could change the value stored on platforms that implement negates using > arithmetic instead of with bitmath (such as ours) and either canonicalize > NaNs or don’t support denormals. This is actually important because this > kind of bitmath on floats is very commonly used as part of algorithms for > complex math functions that need to get precise bit patterns from the > source (similarly for the transformation of masking off the sign bit -> > fabs). It’s also important because if the float happens to “really” be an > integer, it’s highly likely we’ll end up zero-flushing it and losing the > data. > >>> > >>> Example: > >>> > >>> a = load float > >>> b = bitcast a to int > >>> c = xor b, signbit > >>> d = bitcast c to float > >>> store d > >>> > >>> Personally I would feel this is safe if and only if the float is > coming from an arithmetic operation — in that case, we know that doing > another arithmetic operation on it should be safe, since it’s already > canonalized and can’t be a denorm [if the platform doesn’t support them]. > >>> > >>> I say this coming only a few weeks after our team spent literally > dozens of human-hours tracking down an extremely obscure bug involving a GL > conformance test in which ints were casted to floats, manipulated with > float instructions, then sent back to int, resulting in the ints being > flushed to zero and the test failing. > >>> > >>> —escha > >>> > >>>> On Apr 7, 2016, at 9:09 AM, Sanjay Patel via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >>>> > >>>> Hi Carlos - > >>>> > >>>> That sounds like a good patch. > >>>> > >>>> Warning - following the link below may remove some of the educational > joy for the immediate task at hand: > >>>> http://reviews.llvm.org/D13076 > >>>> > >>>> ...but I wouldn't worry too much, there's plenty more opportunity > where that came from. :) > >>>> > >>>> Feel free to post follow-up questions here or via a patch review on > Phabricator: > >>>> http://llvm.org/docs/Phabricator.html > >>>> > >>>> > >>>> On Thu, Apr 7, 2016 at 7:17 AM, Carlos Liam via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >>>> Hi, > >>>> > >>>> I'm interested in implementing an InstCombine optimization that I > discovered and verified with Alive-NJ (with the help of the authors of > Alive-NJ). The optimization is described in Alive-NJ format as follows: > >>>> > >>>> Name: xor->fsub > >>>> Pre: isSignBit(C) > >>>> %x = bitcast %A > >>>> %y = xor %x, C > >>>> %z = bitcast %y > >>>> => > >>>> %z = fsub -0.0, %A > >>>> > >>>> Effectively the optimization targets code that casts a float to an > int with the same width, XORs the sign bit, and casts back to float, and > replaces it with a subtraction from -0.0. > >>>> > >>>> I am not very familiar with C++ or the LLVM codebase so I would > greatly appreciate some help in writing a patch adding this optimization. > >>>> > >>>> Thanks in advance. > >>>> > >>>> - CL > >>>> > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> llvm-dev at lists.llvm.org > >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>> > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> llvm-dev at lists.llvm.org > >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160410/11d1cb15/attachment.html>