Chandler Carruth
2013-Jun-12 22:15 UTC
[LLVMdev] RFC - Profile Guided Optimization in LLVM
On Wed, Jun 12, 2013 at 3:10 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote:> It predates the block frequency interface. It just needs to be hooked up, > patches welcome. It would also be nice to remove the floating point > computations from the spill placement code.Cool, if Diego doesn't beat me to it, I may send you a patch as that seems easy and obviously beneficial. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130612/09e7857c/attachment.html>
Jakob Stoklund Olesen
2013-Jun-12 22:25 UTC
[LLVMdev] RFC - Profile Guided Optimization in LLVM
On Jun 12, 2013, at 3:15 PM, Chandler Carruth <chandlerc at google.com> wrote:> > On Wed, Jun 12, 2013 at 3:10 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: > It predates the block frequency interface. It just needs to be hooked up, patches welcome. It would also be nice to remove the floating point computations from the spill placement code. > > Cool, if Diego doesn't beat me to it, I may send you a patch as that seems easy and obviously beneficial.Sounds good. The only complication is that the floats are spilling into RAGreedy where they are used in the cost model. I think they can simply be replaced with BlockFrequency everywhere. If the block frequencies make sense, this should also be unnecessary: struct SpillPlacement::Node { /// Scale - Inverse block frequency feeding into[0] or out of[1] the bundle. /// Ideally, these two numbers should be identical, but inaccuracies in the /// block frequency estimates means that we need to normalize ingoing and /// outgoing frequencies separately so they are commensurate. float Scale[2]; Thanks, /jakob
On 2013-06-12 18:15 , Chandler Carruth wrote:> > On Wed, Jun 12, 2013 at 3:10 PM, Jakob Stoklund Olesen > <stoklund at 2pi.dk <mailto:stoklund at 2pi.dk>> wrote: > > It predates the block frequency interface. It just needs to be > hooked up, patches welcome. It would also be nice to remove the > floating point computations from the spill placement code. > > > Cool, if Diego doesn't beat me to it, I may send you a patch as that > seems easy and obviously beneficial.Unless you're in a hurry, I'd rather tackle this one myself. Particularly considering that I've no idea what you two are yapping about, so it will be a good learning experience. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130613/982251db/attachment.html>
On 13.06.2013, at 19:12, Diego Novillo <dnovillo at google.com> wrote:> On 2013-06-12 18:15 , Chandler Carruth wrote: >> >> On Wed, Jun 12, 2013 at 3:10 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: >> It predates the block frequency interface. It just needs to be hooked up, patches welcome. It would also be nice to remove the floating point computations from the spill placement code. >> >> Cool, if Diego doesn't beat me to it, I may send you a patch as that seems easy and obviously beneficial. > > Unless you're in a hurry, I'd rather tackle this one myself. Particularly considering that I've no idea what you two are yapping about, so it will be a good learning experience.I didn't want to interfere with you in any way but I was working on this just this week, though from a completely different background: In zlib's longest_match() (which happens to contain the hottest loop in deflate()) there's a loop with this layout: do { if (something) continue; while (cond && cond && cond && cond && cond) { } } while (something_else); The inner while loop is freezing cold. This is one of the cases where the current estimation based on loop depth completely fails while the heuristics BlockFrequency is based on get it right. With the old spill weights we spilled parts of the "something_else" code above that had to be reloaded on every iteration so we could keep more data of the inner loop in registers, causing a huge slowdown. I hacked up a patch and saw a whopping 5% improvement on deflate, bringing us a lot closer to GCC on this code. Other benchmarks I tried so far looked neutral or positive but sometimes I saw a large increase in spilling; bloating code size, that needs to be figured out. This patch only does the grunt work of getting the BlockFrequency analysis into the spill placement code, it doesn't replace the use of floats in the register allocator. It's possible that some of the downstream calculations need an update to cope with the changed magnitude of the frequencies. Do you want to take over this effort or should I poke more at it? - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: block-frequency-spilling.patch Type: application/octet-stream Size: 23162 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130615/4d5b3577/attachment.obj>