Geoff Berry via llvm-dev
2017-Sep-27 03:24 UTC
[llvm-dev] [MachineCopyPropagation] Issue with register forwarding/allocation/verifier in out-of-tree target
On 9/26/2017 6:47 PM, Matthias Braun wrote:> >> On Sep 26, 2017, at 3:33 PM, Geoff Berry <gberry at codeaurora.org >> <mailto:gberry at codeaurora.org>> wrote: >> >> >> >> On 9/26/2017 6:11 PM, Matthias Braun wrote: >>>> On Sep 26, 2017, at 2:39 PM, Geoff Berry via llvm-dev >>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> >>>> Hi all, >>>> >>>> Mikael reported a machine verification failure in his out-of-tree >>>> target with the MachineCopyPropagation changes to forward registers >>>> (which is currently reverted). The verification in question is: >>>> >>>> *** Bad machine code: Multiple connected components in live interval *** >>>> - function: utils_la_suite_matmul_ref >>>> - interval: %vreg77 >>>> [192r,208B:0)[208B,260r:1)[312r,364r:2)[380r,464B:3) 0 at 192r >>>> 1 at 208B-phi 2 at 312r 3 at 380r >>>> 0: valnos 0 1 3 >>>> 1: valnos 2 >>>> >>>> In this particular case, I believe that it is the greedy allocator >>>> that is creating the multiple components in the %vreg77 live >>>> interval. If you look at the attached debug dump file, just after >>>> the greedy allocator runs, the segment of %vreg77 from the def at >>>> 312B to the use at 380B seems to be separable from the other >>>> segments. The reason the above verification failure is not hit at >>>> that point seems to be related to the FIXME in the following snippet >>>> from ConnectedVNInfoEqClasses::Classify(): >>> That dump seems to be well before greedy runs, isn't it? >> >> I'm not sure what you mean. The attached log contains >> -print-before-all -print-after-all and -debug output starting with the >> coalescer pass. The verification failure is right after the first pass >> of MachineCopyPropagation which runs after the greedy allocator. > The copy propagation seemed to be working on vregs. This was extra > confusing as D30751 seems to be currently reverted from trunk so I > couldn't find references to that code.Sorry, I should have mentioned that as well. This verification error is the last problem keeping me from re-enabling the copy forwarding patch (I can send you my latest rebased version, but I don't think it is relevant to this problem. See below).>> >>> At a first glance the odd thing there is that the operand of >>> fladd_a32_a32_a32 is rewritten from vreg77 to vreg76, but the vreg77 >>> operand of the BUNDLE is not. Maybe you can find out why that is? >> >> Sorry, I should have pointed this out before: because the loop over >> instructions in MachineCopyPropagation is only visiting the BUNDLE >> instructions themselves (i.e. it does not visit the instructions >> inside the BUNDLE) and we don't forward to implicit uses (which all of >> the BUNDLE operands are marked as), we won't currently forward a use >> to a bundled instruction. I believe handling bundles more >> aggressively can be added as a follow-on enhancement unless we think >> not doing has an inherent problem. > I would expect you know the code in D30751 and can take a look into why > only 1 of the instructions is rewritten? > From all I've seen so far the verification code seems to behave as > expected.I don't think the fact that BUNDLEd instructions aren't re-written has anything to do with the verification problem. Let me try to simplify what I think is going on. Just after greedy regalloc, we end up with some code like this: ... %vreg1<def> = ... ... ... = %vreg1 ... %vreg1<def> = %vreg1 ... verifyLiveInterval() accepts this code as valid since it sees the second def as part of the same live interval component because ConnectedVNInfoEqClasses::Classify() sees this second def as a "two-addr" redefinition, even though the def and source operands are not tied. MachineCopyProp (pre-rewrite) runs next and turns this code into: ... %vreg1<def> = ... ... ... = %vreg1 ... %vreg1<def> = *%vreg2* ... verifyLiveInterval() now rejects this code since it sees these two def live ranges as being separate components. My claim is that these two code snippets are equivalent as far as the number of live range components is concerned. Therefore verifyLiveInterval() should have rejected the code just after regalloc greedy (as the FIXME in ConnectedVNInfoEqClasses::Classify hints at), which means the source of this particular problem is in regalloc greedy or before (and not in MachineCopyProp).> - Matthias >-- Geoff Berry Employee of Qualcomm Datacenter Technologies, Inc. Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Matthias Braun via llvm-dev
2017-Sep-27 17:36 UTC
[llvm-dev] [MachineCopyPropagation] Issue with register forwarding/allocation/verifier in out-of-tree target
> On Sep 26, 2017, at 8:24 PM, Geoff Berry via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 9/26/2017 6:47 PM, Matthias Braun wrote: >>> On Sep 26, 2017, at 3:33 PM, Geoff Berry <gberry at codeaurora.org <mailto:gberry at codeaurora.org> <mailto:gberry at codeaurora.org <mailto:gberry at codeaurora.org>>> wrote: >>> >>> >>> >>> On 9/26/2017 6:11 PM, Matthias Braun wrote: >>>>> On Sep 26, 2017, at 2:39 PM, Geoff Berry via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> Mikael reported a machine verification failure in his out-of-tree target with the MachineCopyPropagation changes to forward registers (which is currently reverted). The verification in question is: >>>>> >>>>> *** Bad machine code: Multiple connected components in live interval *** >>>>> - function: utils_la_suite_matmul_ref >>>>> - interval: %vreg77 [192r,208B:0)[208B,260r:1)[312r,364r:2)[380r,464B:3) 0 at 192r 1 at 208B-phi 2 at 312r 3 at 380r >>>>> 0: valnos 0 1 3 >>>>> 1: valnos 2 >>>>> >>>>> In this particular case, I believe that it is the greedy allocator that is creating the multiple components in the %vreg77 live interval. If you look at the attached debug dump file, just after the greedy allocator runs, the segment of %vreg77 from the def at 312B to the use at 380B seems to be separable from the other segments. The reason the above verification failure is not hit at that point seems to be related to the FIXME in the following snippet from ConnectedVNInfoEqClasses::Classify(): >>>> That dump seems to be well before greedy runs, isn't it? >>> >>> I'm not sure what you mean. The attached log contains -print-before-all -print-after-all and -debug output starting with the coalescer pass. The verification failure is right after the first pass of MachineCopyPropagation which runs after the greedy allocator. >> The copy propagation seemed to be working on vregs. This was extra confusing as D30751 seems to be currently reverted from trunk so I couldn't find references to that code. > > Sorry, I should have mentioned that as well. This verification error is the last problem keeping me from re-enabling the copy forwarding patch (I can send you my latest rebased version, but I don't think it is relevant to this problem. See below). > >>> >>>> At a first glance the odd thing there is that the operand of fladd_a32_a32_a32 is rewritten from vreg77 to vreg76, but the vreg77 operand of the BUNDLE is not. Maybe you can find out why that is? >>> >>> Sorry, I should have pointed this out before: because the loop over instructions in MachineCopyPropagation is only visiting the BUNDLE instructions themselves (i.e. it does not visit the instructions inside the BUNDLE) and we don't forward to implicit uses (which all of the BUNDLE operands are marked as), we won't currently forward a use to a bundled instruction. I believe handling bundles more aggressively can be added as a follow-on enhancement unless we think not doing has an inherent problem. >> I would expect you know the code in D30751 and can take a look into why only 1 of the instructions is rewritten? >> From all I've seen so far the verification code seems to behave as expected. > > I don't think the fact that BUNDLEd instructions aren't re-written has anything to do with the verification problem. Let me try to simplify what I think is going on. Just after greedy regalloc, we end up with some code like this: > > ... > %vreg1<def> = ... > ... > ... = %vreg1 > ... > %vreg1<def> = %vreg1 > ... > > verifyLiveInterval() accepts this code as valid since it sees the second def as part of the same live interval component because ConnectedVNInfoEqClasses::Classify() sees this second def as a "two-addr" redefinition, even though the def and source operands are not tied. > > MachineCopyProp (pre-rewrite) runs next and turns this code into: > ... > %vreg1<def> = ... > ... > ... = %vreg1 > ... > %vreg1<def> = *%vreg2* > ... > > verifyLiveInterval() now rejects this code since it sees these two def live ranges as being separate components. My claim is that these two code snippets are equivalent as far as the number of live range components is concerned. Therefore verifyLiveInterval() should have rejected the code just after regalloc greedy (as the FIXME in ConnectedVNInfoEqClasses::Classify hints at), which means the source of this particular problem is in regalloc greedy or before (and not in MachineCopyProp).Ah I see. And I would agree with your interpretation. - Only tied use/def operands or a subregister def without the undef flag should result in connected liveranges. - Be careful and measure the compile time impact when switching the implementation in ConnectedVNInfoEqClasses; unfortunately there is currently no way to detect this situation just by looking at the VNInfo. - I guess the RAGreedy result is indeed wrong then. If I'm reading this correctly, it basically looks like this (when simplified to the problem at hand): BB2: vreg76 = COPY vreg77<kill> vreg77 = COPY vreg76 vreg77 = someop vreg77<kill> CondJmp BB2 the `vreg77=COPY` should have used a different vreg. My guess would be that the liverange splitting code makes similar assumptions as the ConnectedVNInfoEqClasses. - Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170927/13b86639/attachment.html>
Geoff Berry via llvm-dev
2017-Sep-28 18:34 UTC
[llvm-dev] [MachineCopyPropagation] Issue with register forwarding/allocation/verifier in out-of-tree target
On 9/27/2017 1:36 PM, Matthias Braun wrote:> >> On Sep 26, 2017, at 8:24 PM, Geoff Berry via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> On 9/26/2017 6:47 PM, Matthias Braun wrote: >>>> On Sep 26, 2017, at 3:33 PM, Geoff Berry <gberry at codeaurora.org >>>> <mailto:gberry at codeaurora.org><mailto:gberry at codeaurora.org>> wrote: >>>> >>>> >>>> >>>> On 9/26/2017 6:11 PM, Matthias Braun wrote: >>>>>> On Sep 26, 2017, at 2:39 PM, Geoff Berry via llvm-dev >>>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>>>> <mailto:llvm-dev at lists.llvm.org>> wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> Mikael reported a machine verification failure in his out-of-tree >>>>>> target with the MachineCopyPropagation changes to forward >>>>>> registers (which is currently reverted). The verification in >>>>>> question is: >>>>>> >>>>>> *** Bad machine code: Multiple connected components in live >>>>>> interval *** >>>>>> - function: utils_la_suite_matmul_ref >>>>>> - interval: %vreg77 >>>>>> [192r,208B:0)[208B,260r:1)[312r,364r:2)[380r,464B:3) 0 at 192r >>>>>> 1 at 208B-phi 2 at 312r 3 at 380r >>>>>> 0: valnos 0 1 3 >>>>>> 1: valnos 2 >>>>>> >>>>>> In this particular case, I believe that it is the greedy allocator >>>>>> that is creating the multiple components in the %vreg77 live >>>>>> interval. If you look at the attached debug dump file, just after >>>>>> the greedy allocator runs, the segment of %vreg77 from the def at >>>>>> 312B to the use at 380B seems to be separable from the other >>>>>> segments. The reason the above verification failure is not hit at >>>>>> that point seems to be related to the FIXME in the following >>>>>> snippet from ConnectedVNInfoEqClasses::Classify(): >>>>> That dump seems to be well before greedy runs, isn't it? >>>> >>>> I'm not sure what you mean. The attached log contains >>>> -print-before-all -print-after-all and -debug output starting with >>>> the coalescer pass. The verification failure is right after the >>>> first pass of MachineCopyPropagation which runs after the greedy >>>> allocator. >>> The copy propagation seemed to be working on vregs. This was extra >>> confusing as D30751 seems to be currently reverted from trunk so I >>> couldn't find references to that code. >> >> Sorry, I should have mentioned that as well. This verification error >> is the last problem keeping me from re-enabling the copy forwarding >> patch (I can send you my latest rebased version, but I don't think it >> is relevant to this problem. See below). >> >>>> >>>>> At a first glance the odd thing there is that the operand of >>>>> fladd_a32_a32_a32 is rewritten from vreg77 to vreg76, but the >>>>> vreg77 operand of the BUNDLE is not. Maybe you can find out why >>>>> that is? >>>> >>>> Sorry, I should have pointed this out before: because the loop over >>>> instructions in MachineCopyPropagation is only visiting the BUNDLE >>>> instructions themselves (i.e. it does not visit the instructions >>>> inside the BUNDLE) and we don't forward to implicit uses (which all >>>> of the BUNDLE operands are marked as), we won't currently forward a >>>> use to a bundled instruction. I believe handling bundles more >>>> aggressively can be added as a follow-on enhancement unless we think >>>> not doing has an inherent problem. >>> I would expect you know the code in D30751 and can take a look into >>> why only 1 of the instructions is rewritten? >>> From all I've seen so far the verification code seems to behave as >>> expected. >> >> I don't think the fact that BUNDLEd instructions aren't re-written has >> anything to do with the verification problem. Let me try to simplify >> what I think is going on. Just after greedy regalloc, we end up with >> some code like this: >> >> ... >> %vreg1<def> = ... >> ... >> ... = %vreg1 >> ... >> %vreg1<def> = %vreg1 >> ... >> >> verifyLiveInterval() accepts this code as valid since it sees the >> second def as part of the same live interval component because >> ConnectedVNInfoEqClasses::Classify() sees this second def as a >> "two-addr" redefinition, even though the def and source operands are >> not tied. >> >> MachineCopyProp (pre-rewrite) runs next and turns this code into: >> ... >> %vreg1<def> = ... >> ... >> ... = %vreg1 >> ... >> %vreg1<def> = *%vreg2* >> ... >> >> verifyLiveInterval() now rejects this code since it sees these two def >> live ranges as being separate components. My claim is that these two >> code snippets are equivalent as far as the number of live range >> components is concerned. Therefore verifyLiveInterval() should have >> rejected the code just after regalloc greedy (as the FIXME in >> ConnectedVNInfoEqClasses::Classify hints at), which means the source >> of this particular problem is in regalloc greedy or before (and not in >> MachineCopyProp). > Ah I see. And I would agree with your interpretation. > > - Only tied use/def operands or a subregister def without the undef flag > should result in connected liveranges. > - Be careful and measure the compile time impact when switching the > implementation in ConnectedVNInfoEqClasses; unfortunately there is > currently no way to detect this situation just by looking at the VNInfo. > - I guess the RAGreedy result is indeed wrong then. If I'm reading this > correctly, it basically looks like this (when simplified to the problem > at hand): > > BB2: > vreg76 = COPY vreg77<kill> > > vreg77 = COPY vreg76 > vreg77 = someop vreg77<kill> > CondJmp BB2 > > the `vreg77=COPY` should have used a different vreg. My guess would be > that the liverange splitting code makes similar assumptions as the > ConnectedVNInfoEqClasses.I think I'm going to try to work around this issue for now (with a big FIXME comment) by not copy forwarding in these cases so I can get my original patch re-enabled. Then we can look into fixing the above issue, though I don't think I'll be able to look into it for some time.> - Matthias-- Geoff Berry Employee of Qualcomm Datacenter Technologies, Inc. Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Seemingly Similar Threads
- [MachineCopyPropagation] Issue with register forwarding/allocation/verifier in out-of-tree target
- [MachineCopyPropagation] Issue with register forwarding/allocation/verifier in out-of-tree target
- [MachineCopyPropagation] Issue with register forwarding/allocation/verifier in out-of-tree target
- [MachineCopyPropagation] Issue with register forwarding/allocation/verifier in out-of-tree target
- [MachineCopyPropagation] Issue with register forwarding/allocation/verifier in out-of-tree target