Hello llvm-dev, I'm currently integrating an experimental instruction scheduler into LLVM and have come upon a point of confusion regarding WAW dependencies. I apologize in advance for the unnecessarily convoluted example, but it's one solid case which fails on my scheduler and where I can get LLVM to produce a graph that shows my question clearly. I have a small piece of code like this which I'm compiling for x86-64 with -O3: -------------------------------------------------- unsigned long long x = 0; for (long long i=0; i<500000; ++i) { i += i; x -= 123; x *= i; x += 456; } -------------------------------------------------- The optimizer removes the i += i; operation, hardcodes the fact that there are 19 loops, and counts down instead of up. This is irrelevant to the problem, but makes the graph clearer. The loop body produces the following SelectionDAG: http://i.imgur.com/tmJBZ.png The JNE_4 near the root of the graph depends on the flag produced by the DEC64_32r, through a CopyToReg node. Other nodes that write to the flags, such as the ADD64rr nodes on the left are not linked to the DEC64_32rr node with a WAW edge. I'm assuming that this is because they are not hardcoded to output to EFLAGS, but rather to a virtual register, and the aforementioned CopyToReg node takes the virtual output from DEC64_32rr and puts it in EFLAGS. The SelectionDAG is then converted to the following ScheduleDAG: http://i.imgur.com/S1uWQ.png Here we can see that the JNE_4 and its CopyToReg neighbor are merged into a single SUnit and are linked to DEC64_32rr with a data dependency and to TokenFactor with an order dependency. There's still no indication that TokenFactor or any of the nodes above it may overwrite the flags created by DEC64_32rr, again, presumably because JNE_4's CopyToReg takes care of moving DEC64_32rr's flag output to EFLAGS. Now we come to the actual problem. As far as the graph is concerned, there is nothing preventing us from scheduling SU2 (DEC64_32rr) before any of the other flag-affecting nodes, such as SU10. One such schedule is 8-13-12-14-3-11-2-7-10-5-6-9-4-1-0 (which is, incidentally, what my scheduler produces). However, when we look at the code generated from such a schedule, we see this: -------------------------------------------------- .LBB0_1: # %bb # =>This Inner Loop Header: Depth=1 addq $-123, %rdx leaq (%rax,%rax), %rsi imulq %rdx, %rsi decl %ecx leaq 1(%rax,%rax), %rax addq $456, %rsi # imm = 0x1C8 movq %rsi, %rdx jne .LBB0_1 -------------------------------------------------- In this case, the flags set by "decl %ecx" are overwritten by the ones produced by "addq $456, %rsi", which causes "jne .LBB0_1" to use the wrong value. The CopyToReg that was supposed to deliver decl's flags output to jne is nowhere to be seen. I understand that it's usually a conceptual rather than physical instruction and is optimized out, but given the above schedule, it is necessary for code correctness. Compiling the same piece of code using the list-burr or fast schedulers produces valid code, so they must be able to recognize the hidden dependency. However, I can't see how they are doing it. Can someone please shed some light on this issue? Thanks! Regards, Max