I'm debugging some X86 patterns and I want to understand the debug dumps from isel better. Here's some example output: 0x391bc40: i64,ch = load 0x3922c50, 0x391b8d0, 0x38dc530 <0x39053e0:0> <sext i32> alignment=4 srcLineNum= 10 0x3922c50: <multiple use> 0x391bc40: <multiple use> 0x3856ab0: <multiple use> 0x3914520: i64 = shl 0x391bc40, 0x3856ab0 srcLineNum= 10 0x38569b0: <multiple use> 0x391bdf0: i64 = add 0x3914520, 0x38569b0 srcLineNum= 10 0x39127c0: <multiple use> 0x3913dd0: i64 = add 0x391bdf0, 0x39127c0 srcLineNum= 10 0x38dc530: <multiple use> 0x391bf40: f64,ch = load 0x3922c50, 0x3913dd0, 0x38dc530 <0x3850d30:0> alignment=8 srcLineNum= 10 I think I've figured out that lines with greater indent feed lines with lesser indent. So for example, the final load is fed by three operands: 0x3922c50, 0x3913dd0 and 0x38dc530. And <multiple use> seems to mean a node that feeds many nodes. Is this understanding correct? Is there any way to tell how these dags map onto the final machine instruction sequence? I'm having a hard time making sense of where some machine instructions are coming from -- they appear to be superfluous. And I have a pattern that specifies two distinct memory operands but the final code sequence produced uses only one of them. I'll try ot write a small example and send it in a bit. -Dave
On Thursday 02 October 2008 11:37, David Greene wrote:> I'll try ot write a small example and send it in a bit.Ok, here's what I'm trying to do: let AddedComplexity = 40 in { def : Pat<(v2f64 (vector_shuffle (v2f64 (scalar_to_vector (loadf64 addr: $src1))), (v2f64 (scalar_to_vector (loadf64 addr: $src2))), SHUFP_shuffle_mask:$sm)), (SHUFPDrri (v2f64 (MOVSD2PDrm addr:$src1)), (v2f64 (MOVSD2PDrm addr:$src2)), SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>; } // AddedComplexity It turns out you can't actually write a pattern like this with tblgen as-is. There's a bug where it outputs multiple definitions of some local variables. I've patched that here and hope to send it upstream once I get approval. But let's say you _could_ write such a pattern (because I can). The input DAG looks like this: 0x391a220: <multiple use> 0x391c970: v2f64 = scalar_to_vector 0x391a220 srcLineNum= 10 0x391ac10: <multiple use> 0x391c8b0: v2f64 = scalar_to_vector 0x391ac10 srcLineNum= 10 0x3927b10: <multiple use> 0x3923100: v2f64 = vector_shuffle 0x391c970, 0x391c8b0, 0x3927b10<0,2> srcLineNum= 10 The code that gets produced looks like this: %reg1071<def> = MOVSD2PDrm %reg1026, 8, %reg1065, 4294967288, Mem:LD(8,8) [r66428 + 0]LD(8,8) [r78427 + 0] ; srcLine 10 %reg1072<def> = MOVSD2PDrm %reg1026, 8, %reg1065, 4294967288, Mem:LD(8,8) [r66428 + 0]LD(8,8) [r78427 + 0] ; srcLine 10 %reg1073<def> = SHUFPDrri %reg1071, %reg1072, 0 ; srcLine 10 Note that %reg1026 and %reg1065 are used in both address expressions even though I specified different names ($src1 and $src2) in the pattern. Huh? How do I find out who is screwing up? It could be an incorrect pattern, it could be an incorrectly patched tblgen or it could be somewhere in SelectionDAG itself. -Dave
On Oct 2, 2008, at 9:37 AM, David Greene wrote:> I'm debugging some X86 patterns and I want to understand the debug > dumps from > isel better. > > Here's some example output: > > 0x391bc40: i64,ch = load 0x3922c50, 0x391b8d0, 0x38dc530 > <0x39053e0:0> <sext > i32> alignment=4 srcLineNum= 10 > 0x3922c50: <multiple use> > 0x391bc40: <multiple use> > 0x3856ab0: <multiple use> > 0x3914520: i64 = shl 0x391bc40, 0x3856ab0 srcLineNum= 10 > 0x38569b0: <multiple use> > 0x391bdf0: i64 = add 0x3914520, 0x38569b0 srcLineNum= 10 > 0x39127c0: <multiple use> > 0x3913dd0: i64 = add 0x391bdf0, 0x39127c0 srcLineNum= 10 > 0x38dc530: <multiple use> > 0x391bf40: f64,ch = load 0x3922c50, 0x3913dd0, 0x38dc530 > <0x3850d30:0> > alignment=8 srcLineNum= 10 > > I think I've figured out that lines with greater indent feed lines > with lesser > indent. So for example, the final load is fed by three operands: > 0x3922c50, > 0x3913dd0 and 0x38dc530. And <multiple use> seems to mean a node > that feeds > many nodes. > > Is this understanding correct?I think so, though I don't actually look at the SelectionDAG dump() output very often. I highly recommend the viewGraph() output. -view-isel-dags and -view-sched-dags show the graph before and after selection, respectively. See the CodeGen docs where I recently added some text describing all these options. Also, you can call viewGraph() from within a debugger, to view the graph at arbitrary point in the middle of the selection process. You can can even put a breakpoint on the Select function and view the graph as each individual instruction is selected. It can get hairy with really large graphs, but if you're trying to understand instruction selection, it's often possible to reduce the testcases to a readable scale while still including the interesting parts. SelectionDAG's setGraphColor method can also help when graphs get large. And FWIW, there are some significant improvements in the viewGraph() output in TOT :-). Dan
On Thursday 02 October 2008 15:37, Dan Gohman wrote:> I highly recommend the viewGraph() output. -view-isel-dags and > -view-sched-dags show the graph before and after selection, > respectively. See the CodeGen docs where I recently added some > text describing all these options.Yeah, I've been using those but they're real hard to understand with big graphs.> Also, you can call viewGraph() from within a debugger, to view > the graph at arbitrary point in the middle of the selection > process. You can can even put a breakpoint on the Select > function and view the graph as each individual instruction is > selected.That might be more helpful. Thanks for the tip! Which Select function are you referring to?> It can get hairy with really large graphs, but if you're trying > to understand instruction selection, it's often possible to reduce > the testcases to a readable scale while still including the > interesting parts. SelectionDAG's setGraphColor method can also > help when graphs get large.Unfortunately, the testcase is about as simple as it can get: a loop with a gather, a multiply and a store. Maybe I can hand-whittle some IR.> And FWIW, there are some significant improvements in the > viewGraph() output in TOT :-).Yeah, I saw that. Unfortunately we won't get it until early next year, probably. :( -Dave
On Thursday 02 October 2008 12:42, David Greene wrote:> But let's say you _could_ write such a pattern (because I can). The input > DAG looks like this: > > 0x391a220: <multiple use> > 0x391c970: v2f64 = scalar_to_vector 0x391a220 srcLineNum= 10 > 0x391ac10: <multiple use> > 0x391c8b0: v2f64 = scalar_to_vector 0x391ac10 srcLineNum= 10 > 0x3927b10: <multiple use> > 0x3923100: v2f64 = vector_shuffle 0x391c970, 0x391c8b0, > 0x3927b10<0,2> srcLineNum= 10 > > The code that gets produced looks like this: > > %reg1071<def> = MOVSD2PDrm %reg1026, 8, %reg1065, 4294967288, Mem:LD(8,8) > [r66428 + 0]LD(8,8) [r78427 + 0] ; srcLine 10 > %reg1072<def> = MOVSD2PDrm %reg1026, 8, %reg1065, 4294967288, Mem:LD(8,8) > [r66428 + 0]LD(8,8) [r78427 + 0] ; srcLine 10 > %reg1073<def> = SHUFPDrri %reg1071, %reg1072, 0 ; srcLine 10Actrually, it's worse than this. I wanted to check to make sure something else wasn't causing the problem but it appears to come from isel. The full output for the DAG looks like this: %reg1059<def> = MOVSX64rm32 %reg1033, 1, %reg0, 4, Mem:LD(4,4) [tmp163 + 0] ; srcLine 10 %reg1060<def> = MOVSDrm %reg1026, 8, %reg1059, 4294967288, Mem:LD(8,8) [r45154 + 0] ; srcLine 10 %reg1061<def> = MOVSX64rm32 %reg1033, 1, %reg0, 0, Mem:LD(4,4) [iv.161162 + 0] ; srcLine 10 %reg1062<def> = MOVSDrm %reg1026, 8, %reg1061, 4294967288, Mem:LD(8,8) [r30158 + 0] ; srcLine 10 %reg1063<def> = MOVSD2PDrm %reg1026, 8, %reg1059, 4294967288, Mem:LD(8,8) [r30158 + 0]LD(8,8) [r45154 + 0] ; srcLine 10 %reg1064<def> = MOVSD2PDrm %reg1026, 8, %reg1059, 4294967288, Mem:LD(8,8) [r30158 + 0]LD(8,8) [r45154 + 0] ; srcLine 10 %reg1065<def> = SHUFPDrri %reg1063, %reg1064, 0 ; srcLine 10 Where the <bleep> are these extra dead MOVSDrms coming from? Note that the extra MOVSDrms at least seem to use the correct addresses. -Dave