thr3ads.net - search: "rl266363"

[RFC] Making space for a flush-to-zero flag in FastMathFlags

2019 Mar 16

3

[RFC] Making space for a flush-to-zero flag in FastMathFlags

...Why I need an FTZ flag: some ARM Neon vector instructions have FTZ semantics, which means we can't vectorize instructions when compiling for Neon unless we know the user is okay with FTZ. Today we pretend that the "fast" variant of FastMathFlags implies FTZ (https://reviews.llvm.org/rL266363), which is not ideal. Moreover (this is the immediate reason), for XLA CPU I'm trying to generate FP instructions without nonan and noinf, which breaks vectorization on ARM Neon for this reason. An explicit bit for FTZ will let me generate FP operations tagged with FTZ and all fast math flags...

[RFC] Making space for a flush-to-zero flag in FastMathFlags

2019 Mar 18

2

[RFC] Making space for a flush-to-zero flag in FastMathFlags

...on vector instructions have FTZ >> semantics, which means we can't vectorize instructions when compiling >> for Neon unless we know the user is okay with FTZ. Today we pretend >> that the "fast" variant of FastMathFlags implies FTZ >> (https://reviews.llvm.org/rL266363), which is not ideal. Moreover >> (this is the immediate reason), for XLA CPU I'm trying to generate FP >> instructions without nonan and noinf, which breaks vectorization on >> ARM Neon for this reason. An explicit bit for FTZ will let me >> generate FP operations tag...

[RFC] Making space for a flush-to-zero flag in FastMathFlags

2019 Mar 18

3

[RFC] Making space for a flush-to-zero flag in FastMathFlags

...on vector instructions have FTZ >> semantics, which means we can't vectorize instructions when compiling >> for Neon unless we know the user is okay with FTZ. Today we pretend >> that the "fast" variant of FastMathFlags implies FTZ >> (https://reviews.llvm.org/rL266363), which is not ideal. Moreover >> (this is the immediate reason), for XLA CPU I'm trying to generate FP >> instructions without nonan and noinf, which breaks vectorization on >> ARM Neon for this reason. An explicit bit for FTZ will let me >> generate FP operations tag...

search for: rl266363