Rui Ueyama via llvm-dev
2016-Nov-16 20:44 UTC
[llvm-dev] LLD: time to enable --threads by default
LLD supports multi-threading, and it seems to be working well as you can see in a recent result <http://llvm.org/viewvc/llvm-project?view=revision&revision=287140>. In short, LLD runs 30% faster with --threads option and more than 50% faster if you are using --build-id (your mileage may vary depending on your computer). However, I don't think most users even don't know about that because --threads is not a default option. I'm thinking to enable --threads by default. We now have real users, and they'll be happy about the performance boost. Any concerns? I can't think of problems with that, but I want to write a few notes about that: - We still need to focus on single-thread performance rather than multi-threaded one because it is hard to make a slow program faster just by using more threads. - We shouldn't do "too clever" things with threads. Currently, we are using multi-threads only at two places where they are highly parallelizable by nature (namely, copying and applying relocations for each input section, and computing build-id hash). We are using parallel_for_each, and that is very simple and easy to understand. I believe this was a right design choice, and I don't think we want to have something like workqueues/tasks in GNU gold, for example. - Run benchmarks with --no-threads if you are not focusing on multi-thread performance. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/3e9593e1/attachment.html>
Rafael Espíndola via llvm-dev
2016-Nov-16 20:52 UTC
[llvm-dev] LLD: time to enable --threads by default
I will do a quick benchmark run. Other than the observations you have my only concern is the situation where many lld invocations run in parallel, like in a llvm build where there many outputs in bin/. Our task system doesn't know about load, so I worry that it might degrade performance in that case. Cheers, Rafael On 16 November 2016 at 15:44, Rui Ueyama <ruiu at google.com> wrote:> LLD supports multi-threading, and it seems to be working well as you can see > in a recent result. In short, LLD runs 30% faster with --threads option and > more than 50% faster if you are using --build-id (your mileage may vary > depending on your computer). However, I don't think most users even don't > know about that because --threads is not a default option. > > I'm thinking to enable --threads by default. We now have real users, and > they'll be happy about the performance boost. > > Any concerns? > > I can't think of problems with that, but I want to write a few notes about > that: > > - We still need to focus on single-thread performance rather than > multi-threaded one because it is hard to make a slow program faster just by > using more threads. > > - We shouldn't do "too clever" things with threads. Currently, we are using > multi-threads only at two places where they are highly parallelizable by > nature (namely, copying and applying relocations for each input section, and > computing build-id hash). We are using parallel_for_each, and that is very > simple and easy to understand. I believe this was a right design choice, and > I don't think we want to have something like workqueues/tasks in GNU gold, > for example. > > - Run benchmarks with --no-threads if you are not focusing on multi-thread > performance. >
Renato Golin via llvm-dev
2016-Nov-16 20:55 UTC
[llvm-dev] LLD: time to enable --threads by default
On 16 November 2016 at 20:44, Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I'm thinking to enable --threads by default. We now have real users, and > they'll be happy about the performance boost.Will it detect single-core computers and disable it? What is the minimum number of threads that can run in that mode? Is the penalty on dual core computers less than the gains? If you could have a VM with only two cores, where the OS is running on one and LLD threads are running on both, it'd be good to measure the downgrade. Rafael's concern is also very real. I/O and memory consumption are important factors on small footprint systems, though I'd be happy to have a different default per architecture or even carry the burden of forcing a --no-threads option every run if the benefits are substantial. If those issues are not a concern, than I'm in favour!> - We still need to focus on single-thread performance rather than > multi-threaded one because it is hard to make a slow program faster just by > using more threads.Agreed.> - We shouldn't do "too clever" things with threads. Currently, we are using > multi-threads only at two places where they are highly parallelizable by > nature (namely, copying and applying relocations for each input section, and > computing build-id hash). We are using parallel_for_each, and that is very > simple and easy to understand. I believe this was a right design choice, and > I don't think we want to have something like workqueues/tasks in GNU gold, > for example.Strongly agreed. cheers, --renato
Rui Ueyama via llvm-dev
2016-Nov-16 21:27 UTC
[llvm-dev] LLD: time to enable --threads by default
On Wed, Nov 16, 2016 at 12:55 PM, Renato Golin <renato.golin at linaro.org> wrote:> On 16 November 2016 at 20:44, Rui Ueyama via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > I'm thinking to enable --threads by default. We now have real users, and > > they'll be happy about the performance boost. > > Will it detect single-core computers and disable it? What is the > minimum number of threads that can run in that mode? > > Is the penalty on dual core computers less than the gains? If you > could have a VM with only two cores, where the OS is running on one > and LLD threads are running on both, it'd be good to measure the > downgrade. >As a quick test, I ran the benchmark again with "taskset -c 0" to use only one core. LLD still spawns 40 threads because my machine has 40 cores (20 physical cores), so 40 threads ran on one core. With --no-threads (one thread on a single core), it took 6.66 seconds to self-link. With -thread (40 threads on a single core), it took 6.70 seconds. I guess they are mostly in error margin. So I think it wouldn't hurt single core machine. Rafael may be running his benchmarks and will bring his results. Rafael's concern is also very real. I/O and memory consumption are> important factors on small footprint systems, though I'd be happy to > have a different default per architecture or even carry the burden of > forcing a --no-threads option every run if the benefits are > substantial. > > If those issues are not a concern, than I'm in favour! > > > > - We still need to focus on single-thread performance rather than > > multi-threaded one because it is hard to make a slow program faster just > by > > using more threads. > > Agreed. > > > > - We shouldn't do "too clever" things with threads. Currently, we are > using > > multi-threads only at two places where they are highly parallelizable by > > nature (namely, copying and applying relocations for each input section, > and > > computing build-id hash). We are using parallel_for_each, and that is > very > > simple and easy to understand. I believe this was a right design choice, > and > > I don't think we want to have something like workqueues/tasks in GNU > gold, > > for example. > > Strongly agreed. > > cheers, > --renato >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/d9173f75/attachment.html>
Rui Ueyama via llvm-dev
2016-Nov-16 21:29 UTC
[llvm-dev] LLD: time to enable --threads by default
On Wed, Nov 16, 2016 at 12:55 PM, Renato Golin <renato.golin at linaro.org> wrote:> On 16 November 2016 at 20:44, Rui Ueyama via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > I'm thinking to enable --threads by default. We now have real users, and > > they'll be happy about the performance boost. > > Will it detect single-core computers and disable it? What is the > minimum number of threads that can run in that mode? > > Is the penalty on dual core computers less than the gains? If you > could have a VM with only two cores, where the OS is running on one > and LLD threads are running on both, it'd be good to measure the > downgrade. > > Rafael's concern is also very real. I/O and memory consumption are > important factors on small footprint systems, though I'd be happy to > have a different default per architecture or even carry the burden of > forcing a --no-threads option every run if the benefits are > substantial. >On such a computer, you don't want to enable threads at all, no? If so, you can build LLVM without LLVM_ENABLE_THREADS.> If those issues are not a concern, than I'm in favour! > > > > - We still need to focus on single-thread performance rather than > > multi-threaded one because it is hard to make a slow program faster just > by > > using more threads. > > Agreed. > > > > - We shouldn't do "too clever" things with threads. Currently, we are > using > > multi-threads only at two places where they are highly parallelizable by > > nature (namely, copying and applying relocations for each input section, > and > > computing build-id hash). We are using parallel_for_each, and that is > very > > simple and easy to understand. I believe this was a right design choice, > and > > I don't think we want to have something like workqueues/tasks in GNU > gold, > > for example. > > Strongly agreed. > > cheers, > --renato >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/1d2ad80c/attachment.html>
Rafael Espíndola via llvm-dev
2016-Nov-16 21:46 UTC
[llvm-dev] LLD: time to enable --threads by default
On 16 November 2016 at 15:52, Rafael Espíndola <rafael.espindola at gmail.com> wrote:> I will do a quick benchmark run.On a mac pro (running linux) the results I got with all cores available: firefox master 7.146418217 patch 5.304271767 1.34729488437x faster firefox-gc master 7.316743822 patch 5.46436812 1.33899174824x faster chromium master 4.265597914 patch 3.972218527 1.07385781648x faster chromium fast master 1.823614026 patch 1.686059427 1.08158348205x faster the gold plugin master 0.340167513 patch 0.318601465 1.06768973269x faster clang master 0.579914119 patch 0.520784947 1.11353855817x faster llvm-as master 0.03323043 patch 0.041571719 1.251013574x slower the gold plugin fsds master 0.36675887 patch 0.350970944 1.04498356992x faster clang fsds master 0.656180056 patch 0.591607603 1.10914743602x faster llvm-as fsds master 0.030324313 patch 0.040045353 1.32056917497x slower scylla master 3.23378908 patch 2.019191831 1.60152642773x faster With only 2 cores: firefox master 7.174839911 patch 6.319808477 1.13529388384x faster firefox-gc master 7.345525844 patch 6.493005841 1.13129820362x faster chromium master 4.180752414 patch 4.129515199 1.01240756179x faster chromium fast master 1.847296843 patch 1.78837299 1.0329483018x faster the gold plugin master 0.341725451 patch 0.339943222 1.0052427255x faster clang master 0.581901114 patch 0.566932481 1.02640284955x faster llvm-as master 0.03381059 patch 0.036671392 1.08461260215x slower the gold plugin fsds master 0.369184003 patch 0.368774353 1.00111084189x faster clang fsds master 0.660120583 patch 0.641040511 1.02976422187x faster llvm-as fsds master 0.031074029 patch 0.035421531 1.13990789543x slower scylla master 3.243011681 patch 2.630991522 1.23261958615x faster With only 1 core: firefox master 7.174323116 patch 7.301968002 1.01779190649x slower firefox-gc master 7.339104117 patch 7.466171668 1.01731376868x slower chromium master 4.176958448 patch 4.188387233 1.00273615003x slower chromium fast master 1.848922713 patch 1.858714219 1.00529578978x slower the gold plugin master 0.342383846 patch 0.347106743 1.01379415838x slower clang master 0.582476955 patch 0.600524655 1.03098440178x slower llvm-as master 0.033248459 patch 0.035622988 1.07141771593x slower the gold plugin fsds master 0.369510236 patch 0.376390506 1.01861997133x slower clang fsds master 0.661267753 patch 0.683417482 1.03349585535x slower llvm-as fsds master 0.030574688 patch 0.033052779 1.08105041006x slower scylla master 3.236604638 patch 3.325831407 1.02756801617x slower Given that we have an improvement even with just two cores available, LGTM. Cheers, Rafael
Joerg Sonnenberger via llvm-dev
2016-Nov-17 01:15 UTC
[llvm-dev] LLD: time to enable --threads by default
On Wed, Nov 16, 2016 at 12:44:46PM -0800, Rui Ueyama via llvm-dev wrote:> I'm thinking to enable --threads by default. We now have real users, and > they'll be happy about the performance boost. > > Any concerns?What is the total time consumped, not just the real time? When building a large project, linking is often done in parallel with other tasks, so wasting a lot of CPU to save a bit of real time is not necessarily a net win. Joerg
Rui Ueyama via llvm-dev
2016-Nov-17 01:26 UTC
[llvm-dev] LLD: time to enable --threads by default
Did you see this http://llvm.org/viewvc/llvm-project?view=revision&revision=287140 ? Interpreting these numbers may be tricky because of hyper threading, though. On Wed, Nov 16, 2016 at 5:15 PM, Joerg Sonnenberger via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Wed, Nov 16, 2016 at 12:44:46PM -0800, Rui Ueyama via llvm-dev wrote: > > I'm thinking to enable --threads by default. We now have real users, and > > they'll be happy about the performance boost. > > > > Any concerns? > > What is the total time consumped, not just the real time? When building > a large project, linking is often done in parallel with other tasks, so > wasting a lot of CPU to save a bit of real time is not necessarily a net > win. > > Joerg > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/451488e1/attachment.html>
Sean Silva via llvm-dev
2016-Nov-23 07:41 UTC
[llvm-dev] LLD: time to enable --threads by default
On Wed, Nov 16, 2016 at 12:44 PM, Rui Ueyama via llvm-dev < llvm-dev at lists.llvm.org> wrote:> LLD supports multi-threading, and it seems to be working well as you can > see in a recent result > <http://llvm.org/viewvc/llvm-project?view=revision&revision=287140>. In > short, LLD runs 30% faster with --threads option and more than 50% faster > if you are using --build-id (your mileage may vary depending on your > computer). However, I don't think most users even don't know about that > because --threads is not a default option. > > I'm thinking to enable --threads by default. We now have real users, and > they'll be happy about the performance boost. > > Any concerns? > > I can't think of problems with that, but I want to write a few notes about > that: > > - We still need to focus on single-thread performance rather than > multi-threaded one because it is hard to make a slow program faster just by > using more threads. > > - We shouldn't do "too clever" things with threads. Currently, we are > using multi-threads only at two places where they are highly parallelizable > by nature (namely, copying and applying relocations for each input section, > and computing build-id hash). We are using parallel_for_each, and that is > very simple and easy to understand. I believe this was a right design > choice, and I don't think we want to have something like workqueues/tasks > in GNU gold, for example. >Sorry for the late response. Copying and applying relocations is actually are not as parallelizable as you would imagine in current LLD. The reason is that there is an implicit serialization when mutating the kernel's VA map (which happens any time there is a minor page fault, i.e. the first time you touch a page of an mmap'd input). Since threads share the same VA, there is an implicit serialization across them. Separate processes are needed to avoid this overhead (note that the separate processes would still have the same output file mapped; so (at least with fixed partitioning) there is no need for complex IPC). For `ld.lld -O0` on Mac host, I measured <1GB/s copying speed, even though the machine I was running on had like 50 GB/s DRAM bandwidth; so the VA overhead is on the order of a 50x slowdown for this copying operation in this extreme case, so Amdahl's law indicates that there will be practically no speedup for this copy operation by adding multiple threads. I've also DTrace'd this to see massive contention on the VA lock. LInux will be better but no matter how good, it is still a serialization point and Amdahl's law will limit your speedup significantly. -- Sean Silva> > - Run benchmarks with --no-threads if you are not focusing on > multi-thread performance. > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161122/f5fe3715/attachment.html>
Rafael Espíndola via llvm-dev
2016-Nov-23 14:31 UTC
[llvm-dev] LLD: time to enable --threads by default
Interesting. Might be worth giving a try again to the idea of creating the file in anonymous memory and using a write to output it. Cheers, Rafael On 23 November 2016 at 02:41, Sean Silva via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > > On Wed, Nov 16, 2016 at 12:44 PM, Rui Ueyama via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> LLD supports multi-threading, and it seems to be working well as you can >> see in a recent result. In short, LLD runs 30% faster with --threads option >> and more than 50% faster if you are using --build-id (your mileage may vary >> depending on your computer). However, I don't think most users even don't >> know about that because --threads is not a default option. >> >> I'm thinking to enable --threads by default. We now have real users, and >> they'll be happy about the performance boost. >> >> Any concerns? >> >> I can't think of problems with that, but I want to write a few notes about >> that: >> >> - We still need to focus on single-thread performance rather than >> multi-threaded one because it is hard to make a slow program faster just by >> using more threads. >> >> - We shouldn't do "too clever" things with threads. Currently, we are >> using multi-threads only at two places where they are highly parallelizable >> by nature (namely, copying and applying relocations for each input section, >> and computing build-id hash). We are using parallel_for_each, and that is >> very simple and easy to understand. I believe this was a right design >> choice, and I don't think we want to have something like workqueues/tasks in >> GNU gold, for example. > > > Sorry for the late response. > > Copying and applying relocations is actually are not as parallelizable as > you would imagine in current LLD. The reason is that there is an implicit > serialization when mutating the kernel's VA map (which happens any time > there is a minor page fault, i.e. the first time you touch a page of an > mmap'd input). Since threads share the same VA, there is an implicit > serialization across them. Separate processes are needed to avoid this > overhead (note that the separate processes would still have the same output > file mapped; so (at least with fixed partitioning) there is no need for > complex IPC). > > For `ld.lld -O0` on Mac host, I measured <1GB/s copying speed, even though > the machine I was running on had like 50 GB/s DRAM bandwidth; so the VA > overhead is on the order of a 50x slowdown for this copying operation in > this extreme case, so Amdahl's law indicates that there will be practically > no speedup for this copy operation by adding multiple threads. I've also > DTrace'd this to see massive contention on the VA lock. LInux will be better > but no matter how good, it is still a serialization point and Amdahl's law > will limit your speedup significantly. > > -- Sean Silva > >> >> >> - Run benchmarks with --no-threads if you are not focusing on >> multi-thread performance. >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Apparently Analagous Threads
- LLD: time to enable --threads by default
- [RFC] [lld] Replace use of 'concurrency::parallel_for_each' with standard library
- [RFC] [lld] Replace use of 'concurrency::parallel_for_each' with standard library
- LLD: time to enable --threads by default
- LLD: time to enable --threads by default