Nagurne, James via llvm-dev
2020-Oct-01 20:02 UTC
[llvm-dev] MachinePipeliner and irregular loops
I'm in the process of reading and understanding the background and implementation of the MachinePipeliner's swing modulo scheduling algorithm. In short, my question is: "Does the swing modulo scheduling algorithm work for simple classes of irregular loops such as 'while (x != null) { x = x->next; }' ". This is an important loop shape to optimize in embedded benchmarks like coremark (core_list_join, reverse, find, etc.). Since the loop control is often load -> conditional branch, targets with high latency loads can often see performance boosts when the load and branch are pipelined to hide the load latency. I understand the ModuloScheduleExpander would need some heavy modification, as it relies on helpers like "createTripCountGreaterCondition" and "adjustTripCount", both of which are nonsensical in an irregular loop. I think that's a separate question at the moment. There are other issues with irregular loops that I'm aware of, such operations sensitive to overexecuction like loads/stores, but there are well-known solutions for those problems, especially on targets that support instruction predication in hardware. Our team has investigated pipelining irregular loops before, and found good results with some relatively straightforward additions to our internal implementation : https://dl.acm.org/doi/abs/10.1145/384196.384216 One common theme that seems to exist in all the SMS papers' examples and in the implementation is that the scheduling DAG does not consider the branch in the schedule. In fact, in the implementation, if the successor of any instruction in the DAG happens to be the branch (usually also in an SUnit object as ExitSU), the algorithm ends up crashing llc with a null-pointer dereference. I first encountered this when dealing with a regular loop of the form: %loop.body: %3 = PHI %1, %loop.preheader, %2, %loop.body ... %2 = decrement %3 branch-conditional (%2 > 0) When calculating ALAP, the decrement's successor (the branch) is not considered, and we end up trying to access an element in ScheduleInfo that doesn't exist. I worked around this problem quite simply: I removed the PHI and decrement, hiding it inside of a pseudo-branch instruction with side effects. Now, however, I'm dealing with an irregular loop, and have run into the same problem with no obvious solution. If the base algorithm should work for some types of single-block irregular loops, I'd be more than happy to investigate and/or work with someone to make the changes and upstream them. Regards, J.B. Nagurne Code Generation Texas Instruments -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201001/a340cca5/attachment.html>