Boris Boesler
2014-Nov-01 09:19 UTC
[LLVMdev] Virtual register def doesn't dominate all uses
Hi Quentin, Am 01.11.2014 um 00:39 schrieb Quentin Colombet <qcolombet at apple.com>:> > On Oct 31, 2014, at 11:00 AM, Boris Boesler <baembel at gmx.de> wrote: > >> Hi Quentin, >> >> I added some debug output (N->dump()) in ::Select(SDNode*N) and compared it to the dot/Graphviz output (-view-legalize-types-dags; the last one with correct code). I found out, that some SDNodes are not passed to the ::Select(SDNode*N), approximately 11 nodes are missing. The first add-node (v1+v2) is missing. >> >> Is it normal that not all nodes are passes to ::Select()? >> > > Does not sound right! > > They should be selected, unless they are dead (i.e., no uses). > > Have you looked to the others dag (view-isel-dags in particular).Yes, the dags in view-isel-dags and view-legalize-types-dags are correct (the add operations are here and are their results are used) and the dags are the same. Boris> > -Quentin > >> Thanks, >> Boris >> >> >> Am 30.10.2014 um 19:23 schrieb Quentin Colombet <qcolombet at apple.com>: >> >>> Hi Boris, >>> >>> On Oct 29, 2014, at 10:35 AM, Boris Boesler <baembel at gmx.de> wrote: >>> >>>> Hi Quentin, >>>> >>>> yes, this happens quite late. With the Option --debug-pass=Structure it's in or after "Assembly Printer". >>>> I do have a very simple DAGToDAGISel::Select() method: >>>> >>>> SDNode *MyTargetDAGToDAGISel::Select(SDNode *N) >>>> { >>>> SDLoc dl(N); >>>> // default implementation >>>> if (N -> isMachineOpcode()) { >>>> N -> setNodeId(-1); >>>> return NULL; // Already selected. >>>> } >>>> SDNode *res = SelectCode(N); >>>> return res; >>>> } >>>> >>>> Is that too simple? There are no further passes that eliminate anything. >>>> >>>> Anyway, I have another test program, that could point to my bug: >>>> >>>> int formal_args_3(int p1, int p2, int p3) >>>> { >>>> int v1 = p1; >>>> int v2 = p2; >>>> int v3 = p3; >>>> int res = v1 + v2; >>>> return(res); >>>> } >>>> >>>> I can compile this test and I get correct code. With option -view-sched-dags I can verify that all arguments are stored in the stack-frame, the local variables are initialized (several store operations in stack-frame) and the return value is evaluated. >>>> >>>> But if I use the statement "int res = v1 + v2 + v3;" something strange happens: all arguments are stored in the stack-frame and the local variables are initialized. Now, the variables v1 and v2 are loaded, but they are not used (no ADD instructions) and a MOVE instruction register to register is generated that uses itself as an operand. This register should be stored and should be used as function result. >>>> >>>> Well, the LOAD instruction uses another register class for the destination register than the ADD instruction uses for its operands, but both classes share some registers. That should not be a problem. >>> >>> Like you said, that shouldn’t be a problem. >>> >>>> >>>> Any hints where I can search for my bug? >>> >>> Try using -print-machineinstrs and check where the Machine IR diverge from what you were expected. >>> Then, you can use -debug-only <the offending pass> to have more details. >>> >>> Cheers, >>> -Quentin >>> >>>> >>>> Thanks, >>>> Boris >>>> >>>> >>>> Am 24.10.2014 um 19:27 schrieb Quentin Colombet <qcolombet at apple.com>: >>>> >>>>> Hi Boris, >>>>> >>>>> I don’t see any phis in your machine code whereas the IR had some. This means you are already pretty late in the pipeline of the backend (i.e., after SSA form has been deconstructed). >>>>> Do you have any custom pass between instruction selection and the PHIElimination pass? >>>>> >>>>> If so, I would look into them. >>>>> >>>>> Cheers, >>>>> -Quentin >>>>> >>>>>> On Oct 24, 2014, at 7:53 AM, Boris Boesler <baembel at gmx.de> wrote: >>>>>> >>>>>> Hi! >>>>>> >>>>>> During my backend development I get the error message for some tests: >>>>>> *** Bad machine code: Virtual register def doesn't dominate all uses. *** >>>>>> >>>>>> (C source-code, byte-code disassembly and printed machine code at the end of the email) >>>>>> >>>>>> The first USE of vreg4 in BB#1 has no previous DEF in BB#0 or #1. But why? I can't see how the LLVM byte-code is transformed to the lower machine code. >>>>>> >>>>>> One possible reason could be that I haven't implemented all operations, eg I didn't implement MUL at this stage. Their "state" is LEGAL and not CUSTOM or EXPAND. But it fails with implemented operations as well. >>>>>> >>>>>> What did I do wrong? Missing implementation for some operations? What did I miss to implement? >>>>>> >>>>>> Thanks in advance, >>>>>> Boris >>>>>> >>>>>> ----8<---- >>>>>> >>>>>> C source-code: >>>>>> int simple_loop(int end_loop_index) >>>>>> { >>>>>> int sum = 0; >>>>>> for(int i = 0; i < end_loop_index; i++) { >>>>>> sum += i; >>>>>> } >>>>>> return(sum); >>>>>> } >>>>>> >>>>>> >>>>>> LLVm byte-code disassembly: >>>>>> ; Function Attrs: nounwind readnone >>>>>> define i32 @simple_loop(i32 %end_loop_index) #1 { >>>>>> entry: >>>>>> %cmp4 = icmp sgt i32 %end_loop_index, 0 >>>>>> br i1 %cmp4, label %for.cond.for.end_crit_edge, label %for.end >>>>>> >>>>>> for.cond.for.end_crit_edge: ; preds = %entry >>>>>> %0 = add i32 %end_loop_index, -2 >>>>>> %1 = add i32 %end_loop_index, -1 >>>>>> %2 = zext i32 %0 to i33 >>>>>> %3 = zext i32 %1 to i33 >>>>>> %4 = mul i33 %3, %2 >>>>>> %5 = lshr i33 %4, 1 >>>>>> %6 = trunc i33 %5 to i32 >>>>>> %7 = add i32 %6, %end_loop_index >>>>>> %8 = add i32 %7, -1 >>>>>> br label %for.end >>>>>> >>>>>> for.end: ; preds = %for.cond.for.end_crit_edge, %entry >>>>>> %sum.0.lcssa = phi i32 [ %8, %for.cond.for.end_crit_edge ], [ 0, %entry ] >>>>>> ret i32 %sum.0.lcssa >>>>>> } >>>>>> >>>>>> >>>>>> The emitted blocks are: >>>>>> Function Live Ins: %R0 in %vreg2 >>>>>> >>>>>> BB#0: derived from LLVM BB %entry >>>>>> Live Ins: %R0 >>>>>> %vreg2<def> = COPY %R0; IntRegs:%vreg2 >>>>>> %vreg3<def> = MV 0; SRegs:%vreg3 >>>>>> CMP %vreg2, 1, %FLAG<imp-def>; IntRegs:%vreg2 >>>>>> %vreg6<def> = COPY %vreg3; SRegs:%vreg6,%vreg3 >>>>>> BR_cc <BB#2>, 20, %FLAG<imp-use,kill> >>>>>> BR <BB#1> >>>>>> Successors according to CFG: BB#1(20) BB#2(12) >>>>>> >>>>>> BB#1: derived from LLVM BB %for.cond.for.end_crit_edge >>>>>> Predecessors according to CFG: BB#0 >>>>>> %vreg4<def> = MV %vreg4; IntRegs:%vreg4 >>>>>> %vreg5<def> = ADD %vreg4<kill>, -1; IntRegs:%vreg5,%vreg4 >>>>>> %vreg0<def> = COPY %vreg5<kill>; SRegs:%vreg0 IntRegs:%vreg5 >>>>>> %vreg6<def> = COPY %vreg0; SRegs:%vreg6,%vreg0 >>>>>> Successors according to CFG: BB#2 >>>>>> >>>>>> BB#2: derived from LLVM BB %for.end >>>>>> Predecessors according to CFG: BB#0 BB#1 >>>>>> %vreg1<def> = COPY %vreg6<kill>; SRegs:%vreg1,%vreg6 >>>>>> %R0<def> = COPY %vreg1; SRegs:%vreg1 >>>>>> RETURN %R0<imp-use> >>>>>> >>>>>> # End machine code for function simple_loop. >>>>>> >>>>>> *** Bad machine code: Virtual register def doesn't dominate all uses. *** >>>>>> - function: simple_loop >>>>>> - basic block: BB#1 for.cond.for.end_crit_edge (0x7fd7cb025250) >>>>>> - instruction: %vreg4<def> = MV %vreg4; IntRegs:%vreg4 >>>>>> LLVM ERROR: Found 1 machine code errors. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>> >>> >> >
Quentin Colombet
2014-Nov-03 17:49 UTC
[LLVMdev] Virtual register def doesn't dominate all uses
On Nov 1, 2014, at 2:19 AM, Boris Boesler <baembel at gmx.de> wrote:> Hi Quentin, > > Am 01.11.2014 um 00:39 schrieb Quentin Colombet <qcolombet at apple.com>: > >> >> On Oct 31, 2014, at 11:00 AM, Boris Boesler <baembel at gmx.de> wrote: >> >>> Hi Quentin, >>> >>> I added some debug output (N->dump()) in ::Select(SDNode*N) and compared it to the dot/Graphviz output (-view-legalize-types-dags; the last one with correct code). I found out, that some SDNodes are not passed to the ::Select(SDNode*N), approximately 11 nodes are missing. The first add-node (v1+v2) is missing. >>> >>> Is it normal that not all nodes are passes to ::Select()? >>> >> >> Does not sound right! >> >> They should be selected, unless they are dead (i.e., no uses). >> >> Have you looked to the others dag (view-isel-dags in particular). > > Yes, the dags in view-isel-dags and view-legalize-types-dags are correct (the add operations are here and are their results are used) and the dags are the same.And what about view-sched-dags? This one should give you what has been selected. So if this is not correct, you have indeed a problem in the selection problem. If that is the case, you can use -debug-only=isel to help you figuring out what is the problem. -Quentin> > Boris > >> >> -Quentin >> >>> Thanks, >>> Boris >>> >>> >>> Am 30.10.2014 um 19:23 schrieb Quentin Colombet <qcolombet at apple.com>: >>> >>>> Hi Boris, >>>> >>>> On Oct 29, 2014, at 10:35 AM, Boris Boesler <baembel at gmx.de> wrote: >>>> >>>>> Hi Quentin, >>>>> >>>>> yes, this happens quite late. With the Option --debug-pass=Structure it's in or after "Assembly Printer". >>>>> I do have a very simple DAGToDAGISel::Select() method: >>>>> >>>>> SDNode *MyTargetDAGToDAGISel::Select(SDNode *N) >>>>> { >>>>> SDLoc dl(N); >>>>> // default implementation >>>>> if (N -> isMachineOpcode()) { >>>>> N -> setNodeId(-1); >>>>> return NULL; // Already selected. >>>>> } >>>>> SDNode *res = SelectCode(N); >>>>> return res; >>>>> } >>>>> >>>>> Is that too simple? There are no further passes that eliminate anything. >>>>> >>>>> Anyway, I have another test program, that could point to my bug: >>>>> >>>>> int formal_args_3(int p1, int p2, int p3) >>>>> { >>>>> int v1 = p1; >>>>> int v2 = p2; >>>>> int v3 = p3; >>>>> int res = v1 + v2; >>>>> return(res); >>>>> } >>>>> >>>>> I can compile this test and I get correct code. With option -view-sched-dags I can verify that all arguments are stored in the stack-frame, the local variables are initialized (several store operations in stack-frame) and the return value is evaluated. >>>>> >>>>> But if I use the statement "int res = v1 + v2 + v3;" something strange happens: all arguments are stored in the stack-frame and the local variables are initialized. Now, the variables v1 and v2 are loaded, but they are not used (no ADD instructions) and a MOVE instruction register to register is generated that uses itself as an operand. This register should be stored and should be used as function result. >>>>> >>>>> Well, the LOAD instruction uses another register class for the destination register than the ADD instruction uses for its operands, but both classes share some registers. That should not be a problem. >>>> >>>> Like you said, that shouldn’t be a problem. >>>> >>>>> >>>>> Any hints where I can search for my bug? >>>> >>>> Try using -print-machineinstrs and check where the Machine IR diverge from what you were expected. >>>> Then, you can use -debug-only <the offending pass> to have more details. >>>> >>>> Cheers, >>>> -Quentin >>>> >>>>> >>>>> Thanks, >>>>> Boris >>>>> >>>>> >>>>> Am 24.10.2014 um 19:27 schrieb Quentin Colombet <qcolombet at apple.com>: >>>>> >>>>>> Hi Boris, >>>>>> >>>>>> I don’t see any phis in your machine code whereas the IR had some. This means you are already pretty late in the pipeline of the backend (i.e., after SSA form has been deconstructed). >>>>>> Do you have any custom pass between instruction selection and the PHIElimination pass? >>>>>> >>>>>> If so, I would look into them. >>>>>> >>>>>> Cheers, >>>>>> -Quentin >>>>>> >>>>>>> On Oct 24, 2014, at 7:53 AM, Boris Boesler <baembel at gmx.de> wrote: >>>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> During my backend development I get the error message for some tests: >>>>>>> *** Bad machine code: Virtual register def doesn't dominate all uses. *** >>>>>>> >>>>>>> (C source-code, byte-code disassembly and printed machine code at the end of the email) >>>>>>> >>>>>>> The first USE of vreg4 in BB#1 has no previous DEF in BB#0 or #1. But why? I can't see how the LLVM byte-code is transformed to the lower machine code. >>>>>>> >>>>>>> One possible reason could be that I haven't implemented all operations, eg I didn't implement MUL at this stage. Their "state" is LEGAL and not CUSTOM or EXPAND. But it fails with implemented operations as well. >>>>>>> >>>>>>> What did I do wrong? Missing implementation for some operations? What did I miss to implement? >>>>>>> >>>>>>> Thanks in advance, >>>>>>> Boris >>>>>>> >>>>>>> ----8<---- >>>>>>> >>>>>>> C source-code: >>>>>>> int simple_loop(int end_loop_index) >>>>>>> { >>>>>>> int sum = 0; >>>>>>> for(int i = 0; i < end_loop_index; i++) { >>>>>>> sum += i; >>>>>>> } >>>>>>> return(sum); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> LLVm byte-code disassembly: >>>>>>> ; Function Attrs: nounwind readnone >>>>>>> define i32 @simple_loop(i32 %end_loop_index) #1 { >>>>>>> entry: >>>>>>> %cmp4 = icmp sgt i32 %end_loop_index, 0 >>>>>>> br i1 %cmp4, label %for.cond.for.end_crit_edge, label %for.end >>>>>>> >>>>>>> for.cond.for.end_crit_edge: ; preds = %entry >>>>>>> %0 = add i32 %end_loop_index, -2 >>>>>>> %1 = add i32 %end_loop_index, -1 >>>>>>> %2 = zext i32 %0 to i33 >>>>>>> %3 = zext i32 %1 to i33 >>>>>>> %4 = mul i33 %3, %2 >>>>>>> %5 = lshr i33 %4, 1 >>>>>>> %6 = trunc i33 %5 to i32 >>>>>>> %7 = add i32 %6, %end_loop_index >>>>>>> %8 = add i32 %7, -1 >>>>>>> br label %for.end >>>>>>> >>>>>>> for.end: ; preds = %for.cond.for.end_crit_edge, %entry >>>>>>> %sum.0.lcssa = phi i32 [ %8, %for.cond.for.end_crit_edge ], [ 0, %entry ] >>>>>>> ret i32 %sum.0.lcssa >>>>>>> } >>>>>>> >>>>>>> >>>>>>> The emitted blocks are: >>>>>>> Function Live Ins: %R0 in %vreg2 >>>>>>> >>>>>>> BB#0: derived from LLVM BB %entry >>>>>>> Live Ins: %R0 >>>>>>> %vreg2<def> = COPY %R0; IntRegs:%vreg2 >>>>>>> %vreg3<def> = MV 0; SRegs:%vreg3 >>>>>>> CMP %vreg2, 1, %FLAG<imp-def>; IntRegs:%vreg2 >>>>>>> %vreg6<def> = COPY %vreg3; SRegs:%vreg6,%vreg3 >>>>>>> BR_cc <BB#2>, 20, %FLAG<imp-use,kill> >>>>>>> BR <BB#1> >>>>>>> Successors according to CFG: BB#1(20) BB#2(12) >>>>>>> >>>>>>> BB#1: derived from LLVM BB %for.cond.for.end_crit_edge >>>>>>> Predecessors according to CFG: BB#0 >>>>>>> %vreg4<def> = MV %vreg4; IntRegs:%vreg4 >>>>>>> %vreg5<def> = ADD %vreg4<kill>, -1; IntRegs:%vreg5,%vreg4 >>>>>>> %vreg0<def> = COPY %vreg5<kill>; SRegs:%vreg0 IntRegs:%vreg5 >>>>>>> %vreg6<def> = COPY %vreg0; SRegs:%vreg6,%vreg0 >>>>>>> Successors according to CFG: BB#2 >>>>>>> >>>>>>> BB#2: derived from LLVM BB %for.end >>>>>>> Predecessors according to CFG: BB#0 BB#1 >>>>>>> %vreg1<def> = COPY %vreg6<kill>; SRegs:%vreg1,%vreg6 >>>>>>> %R0<def> = COPY %vreg1; SRegs:%vreg1 >>>>>>> RETURN %R0<imp-use> >>>>>>> >>>>>>> # End machine code for function simple_loop. >>>>>>> >>>>>>> *** Bad machine code: Virtual register def doesn't dominate all uses. *** >>>>>>> - function: simple_loop >>>>>>> - basic block: BB#1 for.cond.for.end_crit_edge (0x7fd7cb025250) >>>>>>> - instruction: %vreg4<def> = MV %vreg4; IntRegs:%vreg4 >>>>>>> LLVM ERROR: Found 1 machine code errors. >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141103/02da1716/attachment.html>
Boris Boesler
2014-Nov-03 21:56 UTC
[LLVMdev] Virtual register def doesn't dominate all uses
Hi Quentin,>> Yes, the dags in view-isel-dags and view-legalize-types-dags are correct (the add operations are here and are their results are used) and the dags are the same. > > And what about view-sched-dags?The DAG looks like I described below (*)> This one should give you what has been selected. So if this is not correct, you have indeed a problem in the selection problem. > If that is the case, you can use -debug-only=isel to help you figuring out what is the problem.This is the add node (sum + v3) from the dot file (-view-isel-dags): Node0x7fef2a033610 [shape=record,shape=Mrecord,label="{{<s0>0|<s1>1}|add [ORD=21] [ID=29]|0x7fef2a033610|{<d0>i32}}"]; The debug output (-debug-only=isel) is: ISEL: Starting pattern match on root node: 0x7fef2a033610: i32 = add 0x7fef2a033410, 0x7fef2a032f10 [ORD=21] [ID=29] Skipped scope entry (due to false predicate) at index 3, continuing at 2566 Match failed at index 2575 Continuing at 2628 Match failed at index 2639 Continuing at 2659 Match failed at index 2663 Continuing at 2700 Continuing at 2701 Continuing at 2702 Match failed at index 2703 Continuing at 2817 Match failed at index 2818 Continuing at 2901 Match failed at index 2902 Continuing at 2985 Match failed at index 2986 Continuing at 3100 Match failed at index 3101 Continuing at 3215 Match failed at index 3216 Continuing at 3330 Match failed at index 3331 Continuing at 3445 Match failed at index 3447 Continuing at 3591 Match failed at index 3592 Continuing at 3706 Match failed at index 3707 Continuing at 3790 Match failed at index 3791 Continuing at 3874 Match failed at index 3875 Continuing at 3958 Match failed at index 3959 Continuing at 4046 Match failed at index 4047 Continuing at 4081 Match failed at index 4082 Continuing at 4116 Match failed at index 4117 Continuing at 4179 Match failed at index 4180 Continuing at 4209 Match failed at index 4210 Continuing at 4230 Match failed at index 4231 Continuing at 4261 Match failed at index 4262 Continuing at 4309 Match failed at index 4310 Continuing at 4322 Morphed node: 0x7fef2a033610: i32 = MVrr 0x7fef2a033610 [ORD=21] Does the add operation become a MOVE instruction, or is this a chain of rules? (*)>>>>>> But if I use the statement "int res = v1 + v2 + v3;" something strange happens: all arguments are stored in the stack-frame and the local variables are initialized. Now, the variables v1 and v2 are loaded, but they are not used (no ADD instructions) and a MOVE instruction register to register is generated that uses itself as an operand. This register should be stored and should be used as function result.Boris
Reasonably Related Threads
- [LLVMdev] Virtual register def doesn't dominate all uses
- [LLVMdev] Virtual register def doesn't dominate all uses
- BPF backend with vector operations - error "Could not infer all types in, pattern!"
- Expanding a PseudoOp and accessing the DAG
- Expanding a PseudoOp and accessing the DAG