On Fri, Sep 18, 2015 at 10:34 PM, Tim Northover <t.p.northover at gmail.com> wrote:> AArch64's fmadd instruction is fused, which means it can produce a > different result to the two operations executed separately. The C and > C++ standards do not allow such changes.Sorry, sloppy language on my part. I was aware of fmadd, but I was really asking about turning sequences like: fmul s0, s0, s2 fadd s0, s1, s0 into a fmadd: fmadd s0, s0, s2, s1> We support them via various flags (-ffast-math is the obvious one, > though an argument could be made for supporting -mfused-madd and > -mnofused-madd as well). But in the backend we definitely have to > check *somthing* before doing the substitution.Support in what way? I don't see any patterns or machine combiners to do the above replacement. Did I miss something? If I didn't miss something, is there interest in adding this to the AArch64 machine combiners assuming it was guarded by the right flag?> You might want to get together with Ana Pazos, who just asked similar > questions today.I see that on cfe-dev now. Thanks for pointing that out.> Personally, I'd be sad to see Clang's default execution become less > conforming to the standard. But it's pretty undeniable that > "-std=gnu99/gnu11/..." do allow it by default.Agreed. Thanks for the reply! Cheers, Meador
On Fri, Sep 18, 2015 at 11:18:49PM -0500, Meador Inge via llvm-dev wrote:> On Fri, Sep 18, 2015 at 10:34 PM, Tim Northover <t.p.northover at gmail.com> wrote: > > > AArch64's fmadd instruction is fused, which means it can produce a > > different result to the two operations executed separately. The C and > > C++ standards do not allow such changes. > > Sorry, sloppy language on my part. I was aware of fmadd, but I was > really asking about turning sequences like: > > fmul s0, s0, s2 > fadd s0, s1, s0 > > into a fmadd: > > fmadd s0, s0, s2, s1...which is exactly the transform Tim is talking about. FMA is required to provide a correctly rounded result *without* intermediate rounding. The mul+add sequence on other the side are required to perform that rounding. There are algorithms known to break under such optimisations, which is why it is not enabled by default. The logic for producing FMA is not target specific otherwise. Joerg
>> We support them via various flags (-ffast-math is the obvious one, >> though an argument could be made for supporting -mfused-madd and >> -mnofused-madd as well). But in the backend we definitely have to >> check *somthing* before doing the substitution. > > Support in what way? I don't see any patterns or machine combiners > to do the above replacement. Did I miss something?(Having just checked lib/CodeGen/SelectionDAG) I believe it's mostly covered before target-specific code gets involved, in the generic DAG logic at the moment. ISD::FMA nodes should only be formed from paired mul/add based on "AllowFPOpFusion" checks (and other similar explicit permissions). So I think the status quo is that generic code does any permissible fusion, and AArch64 code really doesn't have any freedom to fuse more operations on top of that.> If I didn't miss something, is there interest in adding this to the AArch64 > machine combiners assuming it was guarded by the right flag?If you can find a missing-but-allowed case, adding it to the generic handling would probably be better than making it AArch64 only. Cheers. Tim.
On Sun, Sep 20, 2015 at 9:37 PM, Tim Northover <t.p.northover at gmail.com> wrote:> (Having just checked lib/CodeGen/SelectionDAG) I believe it's mostly > covered before target-specific code gets involved, in the generic DAG > logic at the moment. ISD::FMA nodes should only be formed from paired > mul/add based on "AllowFPOpFusion" checks (and other similar explicit > permissions).Ah, I see it in the DAG combiner now. Thanks.> So I think the status quo is that generic code does any permissible > fusion, and AArch64 code really doesn't have any freedom to fuse more > operations on top of that.Yeah. I originally got thrown off because the integer version is AArch64 specific. There doesn't seem to be any generic handling for that.> If you can find a missing-but-allowed case, adding it to the generic > handling would probably be better than making it AArch64 only.The cases that I was looking into all seem to be handled. Thanks! -- Meador