Thanks for the explanation, Tim! gcc 4.8.1 *does* generate an fma for your code example for an x86 target that supports fma. I'd bet that the HW vendors' compilers do the same, but I don't have any of those installed at the moment to test that theory. So this is a bug in those compilers? Do you know how they justify it? I see section 6.5 "Expressions" in the C standard, and I can see that 6.5.8 would seem to agree with you assuming that a "floating expression" is a subset of "expression"...is there any other part of the standard that you know of that I can reference? This is made a little weirder by the fact that gcc and clang have a 'fast' setting for fp-contract, but the C standard that I'm looking at states that it is just an "on-off-switch". On Wed, Dec 18, 2013 at 11:17 AM, Tim Northover <t.p.northover at gmail.com>wrote:> > llvm.org/bugs/show_bug.cgi?id=17188 > > llvm.org/bugs/show_bug.cgi?id=17211 > > Ah, thanks. That makes a lot more sense now. > > > Correct - clang is different than gcc, icc, msvc, xlc, etc. on this. > Still > > haven't seen any explanation for how this is better though... > > That would be because it follows what C tells us a compiler has to do > by default but provides overrides in either direction if you know what > you're doing. > > The key point is that LLVM (currently) has no notion of statement > boundaries, so it would fuse the operations in this function: > > float foo(float accum, float lhs, float rhs) { > float product = lhs * rhs; > return accum + product; > } > > This isn't allowed even under FP_CONTRACT=on (the multiply and add do > not occur within a single expression), so LLVM can't in good > conscience enable these optimisations by default. > > Cheers. > > Tim. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20131218/d9e4d25c/attachment.html>
Just to clarify: gcc 4.8.1 generates that fma at -O2; no FP relaxation or other flags specified. On Wed, Dec 18, 2013 at 6:02 PM, Kay Tiong Khoo <kkhoo at perfwizard.com>wrote:> Thanks for the explanation, Tim! > > gcc 4.8.1 *does* generate an fma for your code example for an x86 target > that supports fma. I'd bet that the HW vendors' compilers do the same, but > I don't have any of those installed at the moment to test that theory. So > this is a bug in those compilers? Do you know how they justify it? > > I see section 6.5 "Expressions" in the C standard, and I can see that > 6.5.8 would seem to agree with you assuming that a "floating expression" is > a subset of "expression"...is there any other part of the standard that you > know of that I can reference? > > This is made a little weirder by the fact that gcc and clang have a 'fast' > setting for fp-contract, but the C standard that I'm looking at states that > it is just an "on-off-switch". > > > On Wed, Dec 18, 2013 at 11:17 AM, Tim Northover <t.p.northover at gmail.com>wrote: > >> > llvm.org/bugs/show_bug.cgi?id=17188 >> > llvm.org/bugs/show_bug.cgi?id=17211 >> >> Ah, thanks. That makes a lot more sense now. >> >> > Correct - clang is different than gcc, icc, msvc, xlc, etc. on this. >> Still >> > haven't seen any explanation for how this is better though... >> >> That would be because it follows what C tells us a compiler has to do >> by default but provides overrides in either direction if you know what >> you're doing. >> >> The key point is that LLVM (currently) has no notion of statement >> boundaries, so it would fuse the operations in this function: >> >> float foo(float accum, float lhs, float rhs) { >> float product = lhs * rhs; >> return accum + product; >> } >> >> This isn't allowed even under FP_CONTRACT=on (the multiply and add do >> not occur within a single expression), so LLVM can't in good >> conscience enable these optimisations by default. >> >> Cheers. >> >> Tim. >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20131218/6b64b9cd/attachment.html>
Hi all, Thanks for the info. Few observations from my side : LLVM : cortex-a8 vfpv3 : no vmla or vfma instruction emitted cortex-a8 vfpv4 : no vmla or vfma instruction emitted (This is invalid though as cortex-a8 does not have vfpv4) cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma instructions generated will be invalid ) cortex-a15 vfpv4 : vmla instruction emitted (which is a NEON instruction) cortex-a15 vfpv4 with ffp-contract=fast vfma instruction emitted. GCC : cortex-a8 vfpv3 : vmla instruction emitted cortex-a15 vfpv4 : vfma instruction emitted I agree to the point that NEON and VFP instructions shouldn't be used interchangeably. However, if gcc emits vmla (NEON) instruction with cortex-a8 then shouldn't LLVM also emit vmla (NEON) instruction? Can someone please clarify on this point? The performance gain with vmla instruction is huge. Somewhere i read that LLVM prefers precision accuracy over performance. Is this true and hence LLVM is not emiting vmla instructions for cortex-a8? On Thu, Dec 19, 2013 at 6:41 AM, Kay Tiong Khoo <kkhoo at perfwizard.com>wrote:> Just to clarify: gcc 4.8.1 generates that fma at -O2; no FP relaxation or > other flags specified. > > > On Wed, Dec 18, 2013 at 6:02 PM, Kay Tiong Khoo <kkhoo at perfwizard.com>wrote: > >> Thanks for the explanation, Tim! >> >> gcc 4.8.1 *does* generate an fma for your code example for an x86 target >> that supports fma. I'd bet that the HW vendors' compilers do the same, but >> I don't have any of those installed at the moment to test that theory. So >> this is a bug in those compilers? Do you know how they justify it? >> >> I see section 6.5 "Expressions" in the C standard, and I can see that >> 6.5.8 would seem to agree with you assuming that a "floating expression" is >> a subset of "expression"...is there any other part of the standard that you >> know of that I can reference? >> >> This is made a little weirder by the fact that gcc and clang have a >> 'fast' setting for fp-contract, but the C standard that I'm looking at >> states that it is just an "on-off-switch". >> >> >> On Wed, Dec 18, 2013 at 11:17 AM, Tim Northover <t.p.northover at gmail.com>wrote: >> >>> > llvm.org/bugs/show_bug.cgi?id=17188 >>> > llvm.org/bugs/show_bug.cgi?id=17211 >>> >>> Ah, thanks. That makes a lot more sense now. >>> >>> > Correct - clang is different than gcc, icc, msvc, xlc, etc. on this. >>> Still >>> > haven't seen any explanation for how this is better though... >>> >>> That would be because it follows what C tells us a compiler has to do >>> by default but provides overrides in either direction if you know what >>> you're doing. >>> >>> The key point is that LLVM (currently) has no notion of statement >>> boundaries, so it would fuse the operations in this function: >>> >>> float foo(float accum, float lhs, float rhs) { >>> float product = lhs * rhs; >>> return accum + product; >>> } >>> >>> This isn't allowed even under FP_CONTRACT=on (the multiply and add do >>> not occur within a single expression), so LLVM can't in good >>> conscience enable these optimisations by default. >>> >>> Cheers. >>> >>> Tim. >>> >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- With regards, Suyog Sarda -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20131219/9ffd9395/attachment.html>