On Sun, Jul 14, 2013 at 5:57 PM, Andrew Trick <atrick at apple.com> wrote:> > On Jul 12, 2013, at 3:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote: > > 6) Miscellaneous > ==========> Will partitioning degrade performance in theory. I think it depends on > the definition of > performance. If performance means execution-time, I guess it dose not. > However, if performance includes code-size, I think it may have some > negative impact. > Following is few scenario: > > - constants generated by the post-IPO passes are not shared across > partitions > - dead func may be detected during the post-IPO stage, and they may not be > deleted. > > > In don't know if it's feasible, but stable linker output, independent of the > partioning, is highly desirable. One of the most irritating performance > regressions to track down involves different versions of the host linker. If > partitioning decisions are thrown into the mix, this could be annoying. Is > it possible for the final link to do a better job cleaning up?While I haven't yet read the rest of the proposal I'm going to comment on this in particular. In my view this is an absolute requirement as the compiler should produce the same output given the same input every time with no deviation. -eric
On Jul 14, 2013, at 6:38 PM, Eric Christopher <echristo at gmail.com> wrote:> On Sun, Jul 14, 2013 at 5:57 PM, Andrew Trick <atrick at apple.com> wrote: >> >> On Jul 12, 2013, at 3:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote: >> >> 6) Miscellaneous >> ==========>> Will partitioning degrade performance in theory. I think it depends on >> the definition of >> performance. If performance means execution-time, I guess it dose not. >> However, if performance includes code-size, I think it may have some >> negative impact. >> Following is few scenario: >> >> - constants generated by the post-IPO passes are not shared across >> partitions >> - dead func may be detected during the post-IPO stage, and they may not be >> deleted. >> >> >> In don't know if it's feasible, but stable linker output, independent of the >> partioning, is highly desirable. One of the most irritating performance >> regressions to track down involves different versions of the host linker. If >> partitioning decisions are thrown into the mix, this could be annoying. Is >> it possible for the final link to do a better job cleaning up? > > While I haven't yet read the rest of the proposal I'm going to comment > on this in particular. In my view this is an absolute requirement as > the compiler should produce the same output given the same input every > time with no deviation.The partitioning should be deterministic. It’s just that the linker output now depends on the partitioning heuristics. As long that decision is based on the input (not the host system), then it still meets Eric’s requirements. I just think it’s unfortunate that post-IPO partitioning (or more generally, parallel codegen) affects the output, but may be hard to avoid. It would be nice to be able to tune the partitioning for compile time without worrying about code quality. Sorry for the tangential thought here... it seems that most of Shuxin’s proposal is actually independent of LTO, even though the prototype and primary goal is enabling LTO. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20130714/b1a8d318/attachment.html>
On Jul 14, 2013, at 7:07 PM, Andrew Trick <atrick at apple.com> wrote:> The partitioning should be deterministic. It’s just that the linker output now depends on the partitioning heuristics. As long that decision is based on the input (not the host system), then it still meets Eric’s requirements. I just think it’s unfortunate that post-IPO partitioning (or more generally, parallel codegen) affects the output, but may be hard to avoid. It would be nice to be able to tune the partitioning for compile time without worrying about code quality.I also want to chime in on the importance of stable binary outputs. And not just same compiler and same sources produces same binary, but that minor changes to either should cause minor changes to the output binary. For software updates, Apple updater tries to download only the delta to the binaries, so we want those to be as small as possible. In addition, it often happens late in an OS release cycle that some critical bug is found and the fix is in the compiler. To qualify it, we rebuild the whole OS with the new compiler, then compare all the binaries in the OS, making sure only things related to the bug are changed.> Sorry for the tangential thought here... it seems that most of Shuxin’s proposal is actually independent of LTO, even though the prototype and primary goal is enabling LTO.This is very insightful, Andrew! Rather than think of this (post-IPO parallelization) as an LTO enhancement, it should be that the backend simply has some threshold (e.g. number of functions) which causes it to start parallelizing the last steps. On Jul 12, 2013, at 3:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:> There are two camps: one camp advocate compiling partitions via multi-process, > the other one favor multi-thread.There is also a variant of multi-threading that is popular at Apple. Our OSs have libdispatch which makes is easy to queue up chucks of work. The OS looks at the overall system balance and uses the ideal number of threads to process the work queue.> The compiler used to generate a single object file from the merged > IR, now it will generate multiple of them, one for each partition.I have not studied the MC interface, but why does each partition need to generate a separate object file? Why can’t the first partition done create an object file, and as other partitions finish, they just append to that object file? -Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20130717/5494a8ea/attachment.html>