Sanjoy Das via llvm-dev
2019-Mar-18 18:02 UTC
[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags
On Sun, Mar 17, 2019 at 1:47 PM Craig Topper <craig.topper at gmail.com> wrote:> Can we move HasValueHandle out of the byte used for SubClassOptionalData and move it to the flags at the bottom of value by shrinking NumUserOperands to 27?I like this approach because it is less work for me. :) But I agree with Sanjay below that this only kicks the can slightly further down the road (solutions (2) and (3) also have the same problem). Let's see if we can agree on a more future proof solution. -- Sanjoy> > ~Craig > > > On Sat, Mar 16, 2019 at 12:51 PM Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hi, >> >> I need to add a flush-denormals-to-zero (FTZ) flag to FastMathFlags, >> but we've already used up the 7 bits available in >> Value::SubclassOptionalData (the "backing storage" for >> FPMathOperator::getFastMathFlags()). These are the possibilities I >> can think of: >> >> 1. Increase the size of FPMathOperator. This gives us some additional >> bits for FTZ and other fastmath flags we'd want to add in the future. >> Obvious downside is that it increases LLVM's memory footprint. >> >> 2. Steal some low bits from pointers already present in Value and >> expose them as part of SubclassOptionalData. We can at least steal 3 >> bits from the first two words in Value which are both pointers. The >> LSB of the first pointer needs to be 0, otherwise we could steal 4 >> bits. >> >> 3. Allow only specific combinations in FastMathFlags. In practice, I >> don't think folks are equally interested in all the 2^N combinations >> present in FastMathFlags, so we could compromise and allow only the >> most "typical" 2^7 combinations (e.g. we could nonan and noinf into a >> single bit, under the assumption that users want to enable-disable >> them as a unit). I'm unsure if establishing the most typical 2^7 >> combinations will be straightforward though. >> >> 4. Function level attributes. Instead of wasting precious >> instruction-level space, we could move all FP math attributes on the >> containing function. I'm not sure if this will work for all frontends >> and it also raises annoying tradeoffs around inlining and other >> inter-procedural passes. >> >> >> My gut feeling is to go with (2). It should be semantically >> invisible, have no impact on memory usage, and the ugly bit >> manipulation can be abstracted away. What do you think? Any other >> possibilities I missed? >> >> >> Why I need an FTZ flag: some ARM Neon vector instructions have FTZ >> semantics, which means we can't vectorize instructions when compiling >> for Neon unless we know the user is okay with FTZ. Today we pretend >> that the "fast" variant of FastMathFlags implies FTZ >> (https://reviews.llvm.org/rL266363), which is not ideal. Moreover >> (this is the immediate reason), for XLA CPU I'm trying to generate FP >> instructions without nonan and noinf, which breaks vectorization on >> ARM Neon for this reason. An explicit bit for FTZ will let me >> generate FP operations tagged with FTZ and all fast math flags except >> nonan and noinf, and still have them vectorize on Neon. >> >> -- Sanjoy >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Michael Berg via llvm-dev
2019-Mar-18 19:23 UTC
[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags
Another thing to consider: The current bitcode reader/writer handles backwards compatibility with the previous IR version with some mapping done to preserve context. If we change the bitcode layout we effectively have a new version of IR, bringing up the notion once more of compatibility with a prior version. It is just another item to add to the work list... Regards, Michael> On Mar 18, 2019, at 11:02 AM, Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On Sun, Mar 17, 2019 at 1:47 PM Craig Topper <craig.topper at gmail.com> wrote: >> Can we move HasValueHandle out of the byte used for SubClassOptionalData and move it to the flags at the bottom of value by shrinking NumUserOperands to 27? > > I like this approach because it is less work for me. :) > > But I agree with Sanjay below that this only kicks the can slightly > further down the road (solutions (2) and (3) also have the same > problem). Let's see if we can agree on a more future proof solution. > > -- Sanjoy > >> >> ~Craig >> >> >> On Sat, Mar 16, 2019 at 12:51 PM Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> >>> Hi, >>> >>> I need to add a flush-denormals-to-zero (FTZ) flag to FastMathFlags, >>> but we've already used up the 7 bits available in >>> Value::SubclassOptionalData (the "backing storage" for >>> FPMathOperator::getFastMathFlags()). These are the possibilities I >>> can think of: >>> >>> 1. Increase the size of FPMathOperator. This gives us some additional >>> bits for FTZ and other fastmath flags we'd want to add in the future. >>> Obvious downside is that it increases LLVM's memory footprint. >>> >>> 2. Steal some low bits from pointers already present in Value and >>> expose them as part of SubclassOptionalData. We can at least steal 3 >>> bits from the first two words in Value which are both pointers. The >>> LSB of the first pointer needs to be 0, otherwise we could steal 4 >>> bits. >>> >>> 3. Allow only specific combinations in FastMathFlags. In practice, I >>> don't think folks are equally interested in all the 2^N combinations >>> present in FastMathFlags, so we could compromise and allow only the >>> most "typical" 2^7 combinations (e.g. we could nonan and noinf into a >>> single bit, under the assumption that users want to enable-disable >>> them as a unit). I'm unsure if establishing the most typical 2^7 >>> combinations will be straightforward though. >>> >>> 4. Function level attributes. Instead of wasting precious >>> instruction-level space, we could move all FP math attributes on the >>> containing function. I'm not sure if this will work for all frontends >>> and it also raises annoying tradeoffs around inlining and other >>> inter-procedural passes. >>> >>> >>> My gut feeling is to go with (2). It should be semantically >>> invisible, have no impact on memory usage, and the ugly bit >>> manipulation can be abstracted away. What do you think? Any other >>> possibilities I missed? >>> >>> >>> Why I need an FTZ flag: some ARM Neon vector instructions have FTZ >>> semantics, which means we can't vectorize instructions when compiling >>> for Neon unless we know the user is okay with FTZ. Today we pretend >>> that the "fast" variant of FastMathFlags implies FTZ >>> (https://reviews.llvm.org/rL266363), which is not ideal. Moreover >>> (this is the immediate reason), for XLA CPU I'm trying to generate FP >>> instructions without nonan and noinf, which breaks vectorization on >>> ARM Neon for this reason. An explicit bit for FTZ will let me >>> generate FP operations tagged with FTZ and all fast math flags except >>> nonan and noinf, and still have them vectorize on Neon. >>> >>> -- Sanjoy >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Sanjoy Das via llvm-dev
2019-Mar-18 20:45 UTC
[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags
Hi Michael, On Mon, Mar 18, 2019 at 12:23 PM Michael Berg <michael_c_berg at apple.com> wrote:> > Another thing to consider: The current bitcode reader/writer handles backwards compatibility with the previous IR version with some mapping done to preserve context. If we change the bitcode layout we effectively have a new version of IR, bringing up the notion once more of compatibility with a prior version. > It is just another item to add to the work list...That's good to keep in mind, though I don't quite understand why this would be non-trivial. It seems like we already have a split between bitc::FastMathMap and FastMathFlags with an explicit encode/decode step. Why would a different storage scheme for FastMathFlags influence reading/writing bitcode? -- Sanjoy> > Regards, > Michael > > > On Mar 18, 2019, at 11:02 AM, Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > On Sun, Mar 17, 2019 at 1:47 PM Craig Topper <craig.topper at gmail.com> wrote: > >> Can we move HasValueHandle out of the byte used for SubClassOptionalData and move it to the flags at the bottom of value by shrinking NumUserOperands to 27? > > > > I like this approach because it is less work for me. :) > > > > But I agree with Sanjay below that this only kicks the can slightly > > further down the road (solutions (2) and (3) also have the same > > problem). Let's see if we can agree on a more future proof solution. > > > > -- Sanjoy > > > >> > >> ~Craig > >> > >> > >> On Sat, Mar 16, 2019 at 12:51 PM Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote: > >>> > >>> Hi, > >>> > >>> I need to add a flush-denormals-to-zero (FTZ) flag to FastMathFlags, > >>> but we've already used up the 7 bits available in > >>> Value::SubclassOptionalData (the "backing storage" for > >>> FPMathOperator::getFastMathFlags()). These are the possibilities I > >>> can think of: > >>> > >>> 1. Increase the size of FPMathOperator. This gives us some additional > >>> bits for FTZ and other fastmath flags we'd want to add in the future. > >>> Obvious downside is that it increases LLVM's memory footprint. > >>> > >>> 2. Steal some low bits from pointers already present in Value and > >>> expose them as part of SubclassOptionalData. We can at least steal 3 > >>> bits from the first two words in Value which are both pointers. The > >>> LSB of the first pointer needs to be 0, otherwise we could steal 4 > >>> bits. > >>> > >>> 3. Allow only specific combinations in FastMathFlags. In practice, I > >>> don't think folks are equally interested in all the 2^N combinations > >>> present in FastMathFlags, so we could compromise and allow only the > >>> most "typical" 2^7 combinations (e.g. we could nonan and noinf into a > >>> single bit, under the assumption that users want to enable-disable > >>> them as a unit). I'm unsure if establishing the most typical 2^7 > >>> combinations will be straightforward though. > >>> > >>> 4. Function level attributes. Instead of wasting precious > >>> instruction-level space, we could move all FP math attributes on the > >>> containing function. I'm not sure if this will work for all frontends > >>> and it also raises annoying tradeoffs around inlining and other > >>> inter-procedural passes. > >>> > >>> > >>> My gut feeling is to go with (2). It should be semantically > >>> invisible, have no impact on memory usage, and the ugly bit > >>> manipulation can be abstracted away. What do you think? Any other > >>> possibilities I missed? > >>> > >>> > >>> Why I need an FTZ flag: some ARM Neon vector instructions have FTZ > >>> semantics, which means we can't vectorize instructions when compiling > >>> for Neon unless we know the user is okay with FTZ. Today we pretend > >>> that the "fast" variant of FastMathFlags implies FTZ > >>> (https://reviews.llvm.org/rL266363), which is not ideal. Moreover > >>> (this is the immediate reason), for XLA CPU I'm trying to generate FP > >>> instructions without nonan and noinf, which breaks vectorization on > >>> ARM Neon for this reason. An explicit bit for FTZ will let me > >>> generate FP operations tagged with FTZ and all fast math flags except > >>> nonan and noinf, and still have them vectorize on Neon. > >>> > >>> -- Sanjoy > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org > >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >