Hi Fernando,> Hi, Roman, > > we have an implementation of an allocator for LLVM that, like > Sarkar's, allows to split live ranges at any program point. You can > find > its description here: > >http://compilers.cs.ucla.edu/fernando/publications/drafts/long_PereiraPalsberg07.pdf>I've read your paper already. It is a very interesting approach! But my impression was that it is rather complex solution and requires quite some bits of implementation. I also looked at your sources. Looks like they introduce a lot of new code into LLVM. IMHO, one of the attractive parts of Sarkar's algorithm is its simplicity. It almost does not require any significant changes in the LLVM code-base and is also conceptually a bit simpler.> we have an implementation of an allocator for LLVM that, like > Sarkar's, allows to split live ranges at any program point.Actually, Sarkar's algorithm can split only at block boundaries, as far as I understand. This can be an obstacle for very long basic blocks. One of the interesting ideas could be to introduce live-range splitting into his algorithm. BTW, since I was also working on Wimmer's linear scan algorithm (which is sort of implemented, works on simple test-cases and requires a lot of code clean-up and refactoring), I also have the LiveInterval splitting code in place and it can be done at any program point.> We have the code implemented in LLVM 2.1, and it passes all the tests > that I have tried.Cool! I'm not that far yet :-(> If you need help testing and debugging your allocator, I can > help you.Thank you very much for offering this. I'll most likely come back to you, once I'm that far.> We have a debugger that was very useful when trying to find errors in > our implementation.I've seen references to it on your page. But this debugger is in Java, or? So, one has to dump the results of register allocation and input problem and it would do some checks, or?> Also, to implement Sarkar's ELS, you will need to > swap live ranges at the boundaries of basic blocks. This is a little > tricky once you have registers of different size, and I can help you> with it.Actually, I already implemented something very similar for Wimmer's linear scan. The main issue was to order the copies in the right way and to introduce stores/loads to handle circular dependencies. Or do you say that register classes of different sizes introduce additional problems? Thanks, Roman Lesen Sie Ihre E-Mails auf dem Handy. www.yahoo.de/go
Fernando Magno Quintao Pereira
2008-Feb-01 08:19 UTC
[LLVMdev] Some questions about live intervals
> Actually, Sarkar's algorithm can split only at block boundaries, as far > as I understand. This can be an obstacle for very long basic blocks. > One of the interesting ideas could be to introduce live-range splitting > into his algorithm.As far as I understand it, it can also split in the interior of basic blocks, because of pre-colored registers, otherwise it would not be optimal.> I've seen references to it on your page. But this debugger is in Java, > or? So, one has to dump the results of register allocation and input > problem and it would do some checks, or?It is implemented in Java. You must dump some output and run the debugger on this output.> Actually, I already implemented something very similar for Wimmer's > linear scan. The main issue was to order the copies in the right way > and to introduce stores/loads to handle circular dependencies. Or do > you say that register classes of different sizes introduce additional > problems?Well, it makes it a little bit more complicated, but only a little bit. A problem that I had was when the lower and higher bits of registers were swapped. For instance, when implementing the parallel copy: BH <= AL || AX <= BX you will need two swaps (or three copies), instead of only one swap if there were no aliasing. best, Fernando
> > Actually, Sarkar's algorithm can split only at block boundaries, as > far as I understand. This can be an obstacle for very long basic > blocks. > > One of the interesting ideas could be to introduce live-range > splitting into his algorithm. > > As far as I understand it, it can also split in the interior of basic> blocks, because of pre-colored registers, otherwise it would not be > optimal.Well, I cannot find anything about in my copy of the paper. The paper neither talk about splitting in the interior of basicc blocks, nor about handling of pre-colored registers. And Sarkar also does not claim the optimality of the algorithm. He claims that it is inherently more scalable and efficient, i.e. linear as compared to O(n^2) for Graph coloring. But of course this is required and I'm going to extend the algorithm to support it by doing splitting before pre-colored uses. BTW, there are some other missing features in the Sarkar's algorithm. For example, it spills only whole intervals, which is very similar to typical GC regallocs, but not very efficient. I think, some of the known methods for live-range splitting around high-pressure regions in GC world, can be also applied here. This should improve the quality of allocation. Actually, I'm wondering if interval-splitting, handling of pre-colored registers, handling of high-pressure regions would slow down the algorithm to a very big extent or not?> > I've seen references to it on your page. But this debugger is in > Java, or? So, one has to dump the results of register allocation and > input problem and it would do some checks, or? > > It is implemented in Java. You must dump some output and run the > debugger on this output.OK. Then my understanding was correct.> > Actually, I already implemented something very similar for Wimmer's > > linear scan. The main issue was to order the copies in the right > way and to introduce stores/loads to handle circular dependencies. Or > do you say that register classes of different sizes introduce > additional problems? > > Well, it makes it a little bit more complicated, but only a little > bit. A problem that I had was when the lower and higher bits of > registers were swapped.Oh, now I see. My version of Wimmer's algorithm was written before LLVM introduced support for sub-regs. Therefore I didn't have this problem ;-) But now I have to cope with it. Anyway, I think it is a minor extension. I guess, I should just take aliasing information into account when ordering register moves. -Roman Heute schon einen Blick in die Zukunft von E-Mails wagen? Versuchen SieĀ“s mit dem neuen Yahoo! Mail. www.yahoo.de/mail
Hi Fernando, For some reason, I discovered your mail just today. Fernando Magno Quintao Pereira wrote:> > Hi, Roman, > >> Well, I cannot find anything about in my copy of the paper. The paper >> neither talk about splitting in the interior of basicc blocks, nor >> about handling of pre-colored registers. And Sarkar also does not claim >> the optimality of the algorithm. He claims that it is inherently more >> scalable and efficient, i.e. linear as compared to O(n^2) for Graph >> coloring. >> >> But of course this is required and I'm going to extend the algorithm to >> support it by doing splitting before pre-colored uses. > > I think in the description of the algorithm, in page seven, step 6, > it has to insert register moves to fix the coloring for each program > point P that is part of the program.Well, as far as I understand this algorithm, it does not do any explicit interval splitting. But it tries to color each live range of the live interval separately and may assign different colors to each of those live ranges. After color assignment is done, the algorithm needs to insert some move intsructions, which is done by step 6. In any case, it does not really mean each program point, but only end-points of live range belonging to an live interval. So, it could be seen as some sort of live interval spliiting, but a very limited one, since it only splits at the end of live ranges of live intervals. And for sure, Sarkar's algorithm does not handle pre-colored regs out of the box.>> BTW, there are some other missing features in the Sarkar's algorithm. >> For example, it spills only whole intervals, which is very similar to >> typical GC regallocs, but not very efficient. I think, some of the >> known methods for live-range splitting around high-pressure regions in >> GC world, can be also applied here. This should improve the quality of >> allocation. >> >> Actually, I'm wondering if interval-splitting, handling of pre-colored >> registers, handling of high-pressure regions would slow down the >> algorithm to a very big extent or not? > > One thing that may happen is that, in order to reload spilled variables, > you will need registers available.As far as I understand, it is guaranteed by the algorithm, no special reservation in advance is neccessary.> So, either you spare two registers, > what would be very restrictive in x86, or you do backtracking, as the > current implementation of LLVM's linear scan does. Backtracking may > slowdown the implementation quite a bit.Agree. But it is not neccessary. The way how move instructions are inserted is very similar to Wimmer's linear scan algorithm. There is no need for any reservation of registers in advance or for any backtracking.>I believe that the other factors will not cause too much slowdown.Well, from my experience with the Wimmer's linear scan, book-keeping during the linear scan allocation becomes more complex and time-consuming, once you introduce interval-splitting. Also checks for interception of two live intervals are executed more frequently and consume a significant part of the execution time. Another rather time-consuming operation was the insertion of register moves to glue the splitted parts together. But this is mainly due to the current LLVM design, since such an operation require for each MBB a set of live-in live intervals. And LLVM currently computes live-ins for each MBB just for physical registers, but not for virtual live intervals. Best, Roman