thr3ads.net - llvm dev - [LLVMdev] feedback on late machine combiner pass [Jun 2014]

If this information is useful, please help other people find it:
Share via:

Gerolf Hoflehner

2014-Jun-02 22:41 UTC

[LLVMdev] feedback on late machine combiner pass

Hi,

we noticed that combining instructions at the ISEL level can result in
sub-optimal schedules, since an expensive instruction could be folded and
lengthen the critical path. A simple illustrative example for this is folding
mul-mul-add into mul-madd (fused multiply add), which serializes two
multiplications instead of allowing them to execute in parallel. Unfortunately
ISEL does not have the information necessary to avoid sub-optimial cases in all
instances. The alternative is to post-pone instruction combining to a later pass
and decide on whether the combined code sequence  performs better. This combiner
pass would run at the machine IR level, use the machine trace model information,
and roughly implement the following architecture independent algorithm initially
targeting mul-add/sub pattern:

* Initial Instruction Combiner Algorithm 
0. Recognize instruction patterns in a single basic block
1. For each add/sub check if it can be combined to madd/msub
2. For each pattern found, generate the alternative instruction sequence outside
the basic block that contains mul - add/sub
3. Evaluate if the new pattern is more efficient:
a) In Os substitute when the new pattern has fewer instructions
b) Otherwise substitute when neither critical path nor resource length increases
based on the machine trace model

Do you have any concerns? Or is this the pass you always wanted? :-) 

Cheers
Gerolf

Daniil Troshkov

2014-Jun-03 10:29 UTC

head link

[LLVMdev] feedback on late machine combiner pass

Could you please provide a specific ll test and we will be able to consider
what happened in details using debugger


2014-06-03 2:41 GMT+04:00 Gerolf Hoflehner <ghoflehner at apple.com>:
> Hi,
>
> we noticed that combining instructions at the ISEL level can result in
> sub-optimal schedules, since an expensive instruction could be folded and
> lengthen the critical path. A simple illustrative example for this is
> folding mul-mul-add into mul-madd (fused multiply add), which serializes
> two multiplications instead of allowing them to execute in parallel.
> Unfortunately ISEL does not have the information necessary to avoid
> sub-optimial cases in all instances. The alternative is to post-pone
> instruction combining to a later pass and decide on whether the combined
> code sequence  performs better. This combiner pass would run at the machine
> IR level, use the machine trace model information,  and roughly implement
> the following architecture independent algorithm initially targeting
> mul-add/sub pattern:
>
> * Initial Instruction Combiner Algorithm
> 0. Recognize instruction patterns in a single basic block
> 1. For each add/sub check if it can be combined to madd/msub
> 2. For each pattern found, generate the alternative instruction sequence
> outside the basic block that contains mul - add/sub
> 3. Evaluate if the new pattern is more efficient:
> a) In Os substitute when the new pattern has fewer instructions
> b) Otherwise substitute when neither critical path nor resource length
> increases based on the machine trace model
>
> Do you have any concerns? Or is this the pass you always wanted? :-)
>
> Cheers
> Gerolf
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140603/81ee22fd/attachment.html>

llvm dev - Jun 2014 - [LLVMdev] feedback on late machine combiner pass

[LLVMdev] feedback on late machine combiner pass

[LLVMdev] feedback on late machine combiner pass