thr3ads.net - llvm dev - [LLVMdev] Multiply-add combining [Aug 2014]

If this information is useful, please help other people find it:
Share via:

Kevin K O'Brien

2014-Aug-26 15:16 UTC

[LLVMdev] Multiply-add combining

Hi Olivier,
      I think we discussed this last Thursday? My feeling is that each use
of the multiply can be considered separately. If it can be combined, then
we should do so. The multiply should be left in place and removed by a dead
code elimination pass sometime later. This is what TOBEY does. If you want
me to explain the XL method in more detail, come talk to me.

              Kevin
----------------------------------------------
Kevin O'Brien
Manager, Advanced Compiler Technology
IBM T.J Watson Research Center, Yorktown Heights, NY



  From:       Olivier H Sallenave/Watson/IBM

  To:         llvmdev at cs.uiuc.edu,

  Cc:         Samuel F Antao/Watson/IBM at IBMUS, Kevin K O'Brien/Watson/IBM
at IBMUS

  Date:       08/26/2014 11:12 AM

  Subject:    Multiply-add combining





Hi,

I tried to compile the following using -ffp-contract=fast:

  %mul = fmul double %sub5, %x
  %add = fadd double %add6, %mul
  %sub = fsub double %sub5, %mul

I expected fadd and fsub to be contracted with fmul, which didn't happen.

When looking in DAGCombiner.cpp, it appears the result of the fmul needs to
be used only once, which isn't the case here as it is used by both the fadd
and the fsub:

    // fold (fadd (fmul x, y), z) -> (fma x, y, z)
    if (N0.getOpcode() == ISD::FMUL && N0.hasOneUse())
      return DAG.getNode(ISD::FMA, SDLoc(N), VT, N0.getOperand(0),
          N0.getOperand(1), N1);

This heuristic looks a little conservative, could we instead check that
every instruction using the result of the fmul are combinable (i.e., they
are either fadd or fsub)?


Thanks in advance,
Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140826/22477382/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140826/22477382/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140826/22477382/attachment-0001.gif>

Hal Finkel

2014-Aug-26 15:30 UTC

head link

[LLVMdev] Multiply-add combining

Kevin, Olivier,

The target independent heuristic could certainly check all uses, patches
welcome. We could also consider each case separately, as Kevin suggests, but
that might not be optimal on targets with only one floating-point pipeline, so
we'd need to make it opt-in. You should also look at the MachineCombiner
Pass (added in r214832, currently only used by the ARM backend I think) the
tries to solve this problem in a more-sophisticated way.

 -Hal

----- Original Message -----> From: "Kevin K O'Brien" <caomhin at us.ibm.com>
> To: "Olivier H Sallenave" <ohsallen at us.ibm.com>
> Cc: "Samuel F Antao" <sfantao at us.ibm.com>, llvmdev at
cs.uiuc.edu
> Sent: Tuesday, August 26, 2014 10:16:23 AM
> Subject: Re: [LLVMdev] Multiply-add combining
> 
> 
> 
> 
> Hi Olivier,
> I think we discussed this last Thursday? My feeling is that each use
> of the multiply can be considered separately. If it can be combined,
> then we should do so. The multiply should be left in place and
> removed by a dead code elimination pass sometime later. This is what
> TOBEY does. If you want me to explain the XL method in more detail,
> come talk to me.
> 
> Kevin
> ----------------------------------------------
> Kevin O'Brien
> Manager, Advanced Compiler Technology
> IBM T.J Watson Research Center, Yorktown Heights, NY
> 
> Inactive hide details for Olivier H Sallenave---08/26/2014 11:12:04
> AM---Hi, I tried to compile the following using -ffp-contraOlivier H
> Sallenave---08/26/2014 11:12:04 AM---Hi, I tried to compile the
> following using -ffp-contract=fast:
> 
> 
> 
> 
> From:
> Olivier H Sallenave/Watson/IBM
> 
> 
> 
> To:
> llvmdev at cs.uiuc.edu,
> 
> 
> 
> Cc:
> Samuel F Antao/Watson/IBM at IBMUS, Kevin K O'Brien/Watson/IBM at IBMUS
> 
> 
> 
> Date:
> 08/26/2014 11:12 AM
> 
> 
> 
> Subject:
> Multiply-add combining
> 
> 
> Hi,
> 
> I tried to compile the following using -ffp-contract=fast:
> 
> %mul = fmul double %sub5, %x
> %add = fadd double %add6, %mul
> %sub = fsub double %sub5, %mul
> 
> I expected fadd and fsub to be contracted with fmul, which didn't
> happen.
> 
> When looking in DAGCombiner.cpp, it appears the result of the fmul
> needs to be used only once, which isn't the case here as it is used
> by both the fadd and the fsub:
> 
> // fold (fadd (fmul x, y), z) -> (fma x, y, z)
> if (N0.getOpcode() == ISD::FMUL && N0.hasOneUse())
> return DAG.getNode(ISD::FMA, SDLoc(N), VT, N0.getOperand(0),
> N0.getOperand(1), N1);
> 
> This heuristic looks a little conservative, could we instead check
> that every instruction using the result of the fmul are combinable
> (i.e., they are either fadd or fsub)?
> 
> 
> Thanks in advance,
> Olivier
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Aug 2014 - [LLVMdev] Multiply-add combining

[LLVMdev] Multiply-add combining

[LLVMdev] Multiply-add combining