thr3ads.net - llvm dev - [llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Sanjay Patel via llvm-dev

2019-Mar-18 16:31 UTC

[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags

We knew the day when we needed another FMF bit was coming back in:
https://reviews.llvm.org/D39304
...it was just a question of 'when'. :)

I'm guessing that an FTZ bit won't be the last new bit needed if we
consider permutations between strict FP and fast-math. Even without that,
denormals-as-zero (DAZ) might also be useful?
So rather than continuing to carve these out bit-by-bit, it's worth
considering a more general solution: instruction-level metadata.

IIUC, the main argument for making FMF part of the instruction was that
per-instruction metadata gets expensive if we're applying it to a
significant chunk of the instructions.
But let's think about that - even the most FP-heavy code tops out around
10% FP math ops out of the total instruction count. Typical FP benchmark
code is only 2-5% FP ops. The rest is the same load/store/control-flow/ALU
stuff found in integer code.

I'm not exactly sure yet what it would take to do the experiment, but it
seems worth exploring moving the existing FMF to metadata.

One point in favor of this approach is that we already have an
"MD_fpmath"
enum. It's currently only used to convey reduced precision requirements to
the AMDGPU backend. We could extend that to include arbitrary FMF settings.

A couple of related points for FMF-as-metadata:
1. It might encourage fixing a hack added for reciprocals: we use a
function-level attribute for those (grep for "reciprocal-estimates").
IIRC,
that was just a quicker fix than using MD_fpmath. The existing squished
boolean FMF can't convey the more general settings that we need for
reciprocal optimizations.
2. These don't require new bits, but FMF isn't applied correctly today
as-is:
https://reviews.llvm.org/D48085
https://bugs.llvm.org/show_bug.cgi?id=38086
https://bugs.llvm.org/show_bug.cgi?id=39535
https://reviews.llvm.org/D51701
...so we need to make FMF changes regardless of FTZ.

On Sun, Mar 17, 2019 at 2:47 PM Craig Topper via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Can we move HasValueHandle out of the byte used for SubClassOptionalData
> and move it to the flags at the bottom of value by
> shrinking NumUserOperands to 27?
>
> ~Craig
>
>
> On Sat, Mar 16, 2019 at 12:51 PM Sanjoy Das via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi,
>>
>> I need to add a flush-denormals-to-zero (FTZ) flag to FastMathFlags,
>> but  we've already used up the 7 bits available in
>> Value::SubclassOptionalData (the "backing storage" for
>> FPMathOperator::getFastMathFlags()).  These are the possibilities I
>> can think of:
>>
>> 1. Increase the size of FPMathOperator.  This gives us some additional
>> bits for FTZ and other fastmath flags we'd want to add in the
future.
>> Obvious downside is that it increases LLVM's memory footprint.
>>
>> 2. Steal some low bits from pointers already present in Value and
>> expose them as part of SubclassOptionalData.  We can at least steal 3
>> bits from the first two words in Value which are both pointers.  The
>> LSB of the first pointer needs to be 0, otherwise we could steal 4
>> bits.
>>
>> 3. Allow only specific combinations in FastMathFlags.  In practice, I
>> don't think folks are equally interested in all the 2^N
combinations
>> present in FastMathFlags, so we could compromise and allow only the
>> most "typical" 2^7 combinations (e.g. we could nonan and
noinf into a
>> single bit, under the assumption that users want to enable-disable
>> them as a unit).  I'm unsure if establishing the most typical 2^7
>> combinations will be straightforward though.
>>
>> 4. Function level attributes.  Instead of wasting precious
>> instruction-level space, we could move all FP math attributes on the
>> containing function.  I'm not sure if this will work for all
frontends
>> and it also raises annoying tradeoffs around inlining and other
>> inter-procedural passes.
>>
>>
>> My gut feeling is to go with (2).  It should be semantically
>> invisible, have no impact on memory usage, and the ugly bit
>> manipulation can be abstracted away.  What do you think?  Any other
>> possibilities I missed?
>>
>>
>> Why I need an FTZ flag:  some ARM Neon vector instructions have FTZ
>> semantics, which means we can't vectorize instructions when
compiling
>> for Neon unless we know the user is okay with FTZ.  Today we pretend
>> that the "fast" variant of FastMathFlags implies FTZ
>> (https://reviews.llvm.org/rL266363), which is not ideal.  Moreover
>> (this is the immediate reason), for XLA CPU I'm trying to generate
FP
>> instructions without nonan and noinf, which breaks vectorization on
>> ARM Neon for this reason.  An explicit bit for FTZ will let me
>> generate FP operations tagged with FTZ and all fast math flags except
>> nonan and noinf, and still have them vectorize on Neon.
>>
>> -- Sanjoy
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190318/4952e6df/attachment.html>

Sanjoy Das via llvm-dev

2019-Mar-18 17:56 UTC

head link

[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags

On Mon, Mar 18, 2019 at 9:31 AM Sanjay Patel <spatel at rotateright.com>
wrote:>
> We knew the day when we needed another FMF bit was coming back in:
> https://reviews.llvm.org/D39304
> ...it was just a question of 'when'. :)
>
> I'm guessing that an FTZ bit won't be the last new bit needed if we
consider permutations between strict FP and fast-math. Even without that,
denormals-as-zero (DAZ) might also be useful?
> So rather than continuing to carve these out bit-by-bit, it's worth
considering a more general solution: instruction-level metadata.
>
> IIUC, the main argument for making FMF part of the instruction was that
per-instruction metadata gets expensive if we're applying it to a
significant chunk of the instructions.
> But let's think about that - even the most FP-heavy code tops out
around 10% FP math ops out of the total instruction count. Typical FP benchmark
code is only 2-5% FP ops. The rest is the same load/store/control-flow/ALU stuff
found in integer code.
If this is true, what do you think about option (1)?  It might be
simpler to increase the size of FPMathOperator by a word, giving us 64
more bits of fastmath flags.  We could also have this extra word in
only those instances of FPMathOperator that have a non-zero
FastMathFlags (this would force us to remove setFastMathFlags since
we'd need to know the contents of FastMathFlags at Instruction
construction time).

-- Sanjoy

Sanjay Patel via llvm-dev

2019-Mar-19 00:15 UTC

head link

[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags

I don't have any objections to increasing the size of FPMathOperator, but I
also don't know what perf impact that would have.
I made this comment in D39304:
"I don't think we can just add a field to FPMathOperator because
Operator
is not intended to be instantiated."
That could just be me not understanding the class hierarchy?

On Mon, Mar 18, 2019 at 11:56 AM Sanjoy Das <sanjoy at
playingwithpointers.com>
wrote:
> On Mon, Mar 18, 2019 at 9:31 AM Sanjay Patel <spatel at
rotateright.com>
> wrote:
> >
> > We knew the day when we needed another FMF bit was coming back in:
> > https://reviews.llvm.org/D39304
> > ...it was just a question of 'when'. :)
> >
> > I'm guessing that an FTZ bit won't be the last new bit needed
if we
> consider permutations between strict FP and fast-math. Even without that,
> denormals-as-zero (DAZ) might also be useful?
> > So rather than continuing to carve these out bit-by-bit, it's
worth
> considering a more general solution: instruction-level metadata.
> >
> > IIUC, the main argument for making FMF part of the instruction was
that
> per-instruction metadata gets expensive if we're applying it to a
> significant chunk of the instructions.
> > But let's think about that - even the most FP-heavy code tops out
around
> 10% FP math ops out of the total instruction count. Typical FP benchmark
> code is only 2-5% FP ops. The rest is the same load/store/control-flow/ALU
> stuff found in integer code.
>
> If this is true, what do you think about option (1)?  It might be
> simpler to increase the size of FPMathOperator by a word, giving us 64
> more bits of fastmath flags.  We could also have this extra word in
> only those instances of FPMathOperator that have a non-zero
> FastMathFlags (this would force us to remove setFastMathFlags since
> we'd need to know the contents of FastMathFlags at Instruction
> construction time).
>
> -- Sanjoy
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190318/35204e4f/attachment.html>

Evandro Menezes via llvm-dev

2019-Mar-19 16:43 UTC

head link

[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags

As for me, I lean for Sanjay's proposal and Sanjoy's #4, as both seem to
me to be more future proof and enable mimicking the behavior of GCC more 
accurately.

On another note, do y'all have any thoughts about changing the FP math 
semantics to FTZ and DAZ for the whole program, as some, if not all, 
current targets support such FP modes through bits in their FP unit 
control register, or similar?

As Hal once pointed out to me, the way that GCC works is a bit 
unnerving, as any DSO that changes the FP mode to use such semantics 
affects all modules, even those which were written without this change 
in mind.  Perhaps emitting the initialization code to change the FP mode 
for DSOs might be suppressed, thus leaving this run time change in the 
hands of the program developer, not the library developer's.  Although 
this raises some questions as well.

GCC accomplishes this in libgcc, whereas, should the same behavior be 
copied by LLVM, it would likely reside in compiler-rt.

Cheers,

-- 
Evandro Menezes

On 03/18/19 11:31, Sanjay Patel via llvm-dev wrote:> We knew the day when we needed another FMF bit was coming back in:
> https://reviews.llvm.org/D39304
> ...it was just a question of 'when'. :)
>
> I'm guessing that an FTZ bit won't be the last new bit needed if we
> consider permutations between strict FP and fast-math. Even without 
> that, denormals-as-zero (DAZ) might also be useful?
> So rather than continuing to carve these out bit-by-bit, it's worth 
> considering a more general solution: instruction-level metadata.
>
> IIUC, the main argument for making FMF part of the instruction was 
> that per-instruction metadata gets expensive if we're applying it to a 
> significant chunk of the instructions.
> But let's think about that - even the most FP-heavy code tops out 
> around 10% FP math ops out of the total instruction count. Typical FP 
> benchmark code is only 2-5% FP ops. The rest is the same 
> load/store/control-flow/ALU stuff found in integer code.
>
> I'm not exactly sure yet what it would take to do the experiment, but 
> it seems worth exploring moving the existing FMF to metadata.
>
> One point in favor of this approach is that we already have an 
> "MD_fpmath" enum. It's currently only used to convey reduced
precision
> requirements to the AMDGPU backend. We could extend that to include 
> arbitrary FMF settings.
>
> A couple of related points for FMF-as-metadata:
> 1. It might encourage fixing a hack added for reciprocals: we use a 
> function-level attribute for those (grep for
"reciprocal-estimates").
> IIRC, that was just a quicker fix than using MD_fpmath. The existing 
> squished boolean FMF can't convey the more general settings that we 
> need for reciprocal optimizations.
> 2. These don't require new bits, but FMF isn't applied correctly
today
> as-is:
> https://reviews.llvm.org/D48085
> https://bugs.llvm.org/show_bug.cgi?id=38086
> https://bugs.llvm.org/show_bug.cgi?id=39535
> https://reviews.llvm.org/D51701
> ...so we need to make FMF changes regardless of FTZ.
>
> On Sun, Mar 17, 2019 at 2:47 PM Craig Topper via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Can we move HasValueHandle out of the byte used for
>     SubClassOptionalData and move it to the flags at the bottom of
>     value by shrinking NumUserOperands to 27?
>
>     ~Craig
>
>
>     On Sat, Mar 16, 2019 at 12:51 PM Sanjoy Das via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>         Hi,
>
>         I need to add a flush-denormals-to-zero (FTZ) flag to
>         FastMathFlags,
>         but  we've already used up the 7 bits available in
>         Value::SubclassOptionalData (the "backing storage" for
>         FPMathOperator::getFastMathFlags()).  These are the
>         possibilities I
>         can think of:
>
>         1. Increase the size of FPMathOperator.  This gives us some
>         additional
>         bits for FTZ and other fastmath flags we'd want to add in the
>         future.
>         Obvious downside is that it increases LLVM's memory footprint.
>
>         2. Steal some low bits from pointers already present in Value and
>         expose them as part of SubclassOptionalData.  We can at least
>         steal 3
>         bits from the first two words in Value which are both
>         pointers.  The
>         LSB of the first pointer needs to be 0, otherwise we could steal 4
>         bits.
>
>         3. Allow only specific combinations in FastMathFlags.  In
>         practice, I
>         don't think folks are equally interested in all the 2^N
>         combinations
>         present in FastMathFlags, so we could compromise and allow
>         only the
>         most "typical" 2^7 combinations (e.g. we could nonan and
noinf
>         into a
>         single bit, under the assumption that users want to enable-disable
>         them as a unit).  I'm unsure if establishing the most typical
2^7
>         combinations will be straightforward though.
>
>         4. Function level attributes.  Instead of wasting precious
>         instruction-level space, we could move all FP math attributes
>         on the
>         containing function.  I'm not sure if this will work for all
>         frontends
>         and it also raises annoying tradeoffs around inlining and other
>         inter-procedural passes.
>
>
>         My gut feeling is to go with (2).  It should be semantically
>         invisible, have no impact on memory usage, and the ugly bit
>         manipulation can be abstracted away.  What do you think? Any other
>         possibilities I missed?
>
>
>         Why I need an FTZ flag:  some ARM Neon vector instructions
>         have FTZ
>         semantics, which means we can't vectorize instructions when
>         compiling
>         for Neon unless we know the user is okay with FTZ.  Today we
>         pretend
>         that the "fast" variant of FastMathFlags implies FTZ
>         (https://reviews.llvm.org/rL266363), which is not ideal.  Moreover
>         (this is the immediate reason), for XLA CPU I'm trying to
>         generate FP
>         instructions without nonan and noinf, which breaks
>         vectorization on
>         ARM Neon for this reason.  An explicit bit for FTZ will let me
>         generate FP operations tagged with FTZ and all fast math flags
>         except
>         nonan and noinf, and still have them vectorize on Neon.
>
>         -- Sanjoy
>         _______________________________________________
>         LLVM Developers mailing list
>         llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>         https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190319/6db33f68/attachment.html>

llvm dev - Mar 2019 - [RFC] Making space for a flush-to-zero flag in FastMathFlags

[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags

[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags

[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags

[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags