Hi, we noticed that combining instructions at the ISEL level can result in sub-optimal schedules, since an expensive instruction could be folded and lengthen the critical path. A simple illustrative example for this is folding mul-mul-add into mul-madd (fused multiply add), which serializes two multiplications instead of allowing them to execute in parallel. Unfortunately ISEL does not have the information necessary to avoid sub-optimial cases in all instances. The alternative is to post-pone instruction combining to a later pass and decide on whether the combined code sequence performs better. This combiner pass would run at the machine IR level, use the machine trace model information, and roughly implement the following architecture independent algorithm initially targeting mul-add/sub pattern: * Initial Instruction Combiner Algorithm 0. Recognize instruction patterns in a single basic block 1. For each add/sub check if it can be combined to madd/msub 2. For each pattern found, generate the alternative instruction sequence outside the basic block that contains mul - add/sub 3. Evaluate if the new pattern is more efficient: a) In Os substitute when the new pattern has fewer instructions b) Otherwise substitute when neither critical path nor resource length increases based on the machine trace model Do you have any concerns? Or is this the pass you always wanted? :-) Cheers Gerolf
Could you please provide a specific ll test and we will be able to consider what happened in details using debugger 2014-06-03 2:41 GMT+04:00 Gerolf Hoflehner <ghoflehner at apple.com>:> Hi, > > we noticed that combining instructions at the ISEL level can result in > sub-optimal schedules, since an expensive instruction could be folded and > lengthen the critical path. A simple illustrative example for this is > folding mul-mul-add into mul-madd (fused multiply add), which serializes > two multiplications instead of allowing them to execute in parallel. > Unfortunately ISEL does not have the information necessary to avoid > sub-optimial cases in all instances. The alternative is to post-pone > instruction combining to a later pass and decide on whether the combined > code sequence performs better. This combiner pass would run at the machine > IR level, use the machine trace model information, and roughly implement > the following architecture independent algorithm initially targeting > mul-add/sub pattern: > > * Initial Instruction Combiner Algorithm > 0. Recognize instruction patterns in a single basic block > 1. For each add/sub check if it can be combined to madd/msub > 2. For each pattern found, generate the alternative instruction sequence > outside the basic block that contains mul - add/sub > 3. Evaluate if the new pattern is more efficient: > a) In Os substitute when the new pattern has fewer instructions > b) Otherwise substitute when neither critical path nor resource length > increases based on the machine trace model > > Do you have any concerns? Or is this the pass you always wanted? :-) > > Cheers > Gerolf > > > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140603/81ee22fd/attachment.html>