Mehdi Amini via llvm-dev
2016-Nov-17 00:11 UTC
[llvm-dev] LLD: time to enable --threads by default
The current implementation was “copy/pasted” from somewhere (it was explicitly public domain).> On Nov 16, 2016, at 4:05 PM, Rui Ueyama <ruiu at google.com> wrote: > > Can we just copy-and-paste optimized code from somewhere? > > On Wed, Nov 16, 2016 at 4:03 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: > SHA1 in LLVM is *very* naive, any improvement is welcome there! > It think Amaury pointed it originally and he had an alternative implementation IIRC. > > — > Mehdi > >> On Nov 16, 2016, at 3:58 PM, Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> By the way, while running benchmark, I found that our SHA1 function seems much slower than the one in gold. gold slowed down by only 1.3 seconds to compute a SHA1 of output, but we spent 6.0 seconds to do the same thing (I believe). Something doesn't seem right. >> >> Here is a table to link the same binary with -no-threads and -build-id={none,md5,sha1}. The numbers are in seconds. >> >> LLD gold >> none 7.82 13.78 >> MD5 9.68 14.56 >> SHA1 13.85 15.05 >> >> >> On Wed, Nov 16, 2016 at 1:46 PM, Rafael Espíndola <rafael.espindola at gmail.com <mailto:rafael.espindola at gmail.com>> wrote: >> On 16 November 2016 at 15:52, Rafael Espíndola >> <rafael.espindola at gmail.com <mailto:rafael.espindola at gmail.com>> wrote: >> > I will do a quick benchmark run. >> >> >> On a mac pro (running linux) the results I got with all cores available: >> >> firefox >> master 7.146418217 <tel:7.146418217> >> patch 5.304271767 1 <tel:304271767%201>.34729488437x faster >> firefox-gc >> master 7.316743822 <tel:7.316743822> >> patch 5.46436812 1.33899174824x faster >> chromium >> master 4.265597914 >> patch 3.972218527 1.07385781648x faster >> chromium fast >> master 1.823614026 >> patch 1.686059427 1.08158348205x faster >> the gold plugin >> master 0.340167513 >> patch 0.318601465 1.06768973269x faster >> clang >> master 0.579914119 >> patch 0.520784947 1.11353855817x faster >> llvm-as >> master 0.03323043 >> patch 0.041571719 1.251013574x slower >> the gold plugin fsds >> master 0.36675887 >> patch 0.350970944 1.04498356992x faster >> clang fsds >> master 0.656180056 >> patch 0.591607603 1.10914743602x faster >> llvm-as fsds >> master 0.030324313 >> patch 0.040045353 1.32056917497x slower >> scylla >> master 3.23378908 >> patch 2.019191831 1.60152642773x faster >> >> With only 2 cores: >> >> firefox >> master 7.174839911 >> patch 6.319808477 1.13529388384x faster >> firefox-gc >> master 7.345525844 >> patch 6.493005841 1.13129820362x faster >> chromium >> master 4.180752414 >> patch 4.129515199 1.01240756179x faster >> chromium fast >> master 1.847296843 >> patch 1.78837299 1.0329483018x faster >> the gold plugin >> master 0.341725451 >> patch 0.339943222 1.0052427255x faster >> clang >> master 0.581901114 >> patch 0.566932481 1.02640284955x faster >> llvm-as >> master 0.03381059 >> patch 0.036671392 1.08461260215x slower >> the gold plugin fsds >> master 0.369184003 >> patch 0.368774353 1.00111084189x faster >> clang fsds >> master 0.660120583 >> patch 0.641040511 1.02976422187x faster >> llvm-as fsds >> master 0.031074029 >> patch 0.035421531 1.13990789543x slower >> scylla >> master 3.243011681 >> patch 2.630991522 1.23261958615x faster >> >> >> With only 1 core: >> >> firefox >> master 7.174323116 >> patch 7.301968002 1.01779190649x slower >> firefox-gc >> master 7.339104117 >> patch 7.466171668 1.01731376868x slower >> chromium >> master 4.176958448 >> patch 4.188387233 1.00273615003x slower >> chromium fast >> master 1.848922713 >> patch 1.858714219 1.00529578978x slower >> the gold plugin >> master 0.342383846 >> patch 0.347106743 1.01379415838x slower >> clang >> master 0.582476955 >> patch 0.600524655 1.03098440178x slower >> llvm-as >> master 0.033248459 >> patch 0.035622988 1.07141771593x slower >> the gold plugin fsds >> master 0.369510236 >> patch 0.376390506 1.01861997133x slower >> clang fsds >> master 0.661267753 >> patch 0.683417482 1.03349585535x slower >> llvm-as fsds >> master 0.030574688 >> patch 0.033052779 1.08105041006x slower >> scylla >> master 3.236604638 >> patch 3.325831407 1.02756801617x slower >> >> Given that we have an improvement even with just two cores available, LGTM. >> >> Cheers, >> Rafael >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/9cd2ebfe/attachment.html>
Rui Ueyama via llvm-dev
2016-Nov-17 00:13 UTC
[llvm-dev] LLD: time to enable --threads by default
I should've said that do you know if there's an optimized SHA1 implementation that we can use? On Wed, Nov 16, 2016 at 4:11 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:> The current implementation was “copy/pasted” from somewhere (it was > explicitly public domain). > > On Nov 16, 2016, at 4:05 PM, Rui Ueyama <ruiu at google.com> wrote: > > Can we just copy-and-paste optimized code from somewhere? > > On Wed, Nov 16, 2016 at 4:03 PM, Mehdi Amini <mehdi.amini at apple.com> > wrote: > >> SHA1 in LLVM is *very* naive, any improvement is welcome there! >> It think Amaury pointed it originally and he had an alternative >> implementation IIRC. >> >> — >> Mehdi >> >> On Nov 16, 2016, at 3:58 PM, Rui Ueyama via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >> By the way, while running benchmark, I found that our SHA1 function seems >> much slower than the one in gold. gold slowed down by only 1.3 seconds to >> compute a SHA1 of output, but we spent 6.0 seconds to do the same thing (I >> believe). Something doesn't seem right. >> >> Here is a table to link the same binary with -no-threads and >> -build-id={none,md5,sha1}. The numbers are in seconds. >> >> LLD gold >> none 7.82 13.78 >> MD5 9.68 14.56 >> SHA1 13.85 15.05 >> >> >> On Wed, Nov 16, 2016 at 1:46 PM, Rafael Espíndola < >> rafael.espindola at gmail.com> wrote: >> >>> On 16 November 2016 at 15:52, Rafael Espíndola >>> <rafael.espindola at gmail.com> wrote: >>> > I will do a quick benchmark run. >>> >>> >>> On a mac pro (running linux) the results I got with all cores available: >>> >>> firefox >>> master 7.146418217 >>> patch 5.304271767 1.34729488437x faster >>> firefox-gc >>> master 7.316743822 >>> patch 5.46436812 1.33899174824x faster >>> chromium >>> master 4.265597914 >>> patch 3.972218527 1.07385781648x faster >>> chromium fast >>> master 1.823614026 >>> patch 1.686059427 1.08158348205x faster >>> the gold plugin >>> master 0.340167513 >>> patch 0.318601465 1.06768973269x faster >>> clang >>> master 0.579914119 >>> patch 0.520784947 1.11353855817x faster >>> llvm-as >>> master 0.03323043 >>> patch 0.041571719 1.251013574x slower >>> the gold plugin fsds >>> master 0.36675887 >>> patch 0.350970944 1.04498356992x faster >>> clang fsds >>> master 0.656180056 >>> patch 0.591607603 1.10914743602x faster >>> llvm-as fsds >>> master 0.030324313 >>> patch 0.040045353 1.32056917497x slower >>> scylla >>> master 3.23378908 >>> patch 2.019191831 1.60152642773x faster >>> >>> With only 2 cores: >>> >>> firefox >>> master 7.174839911 >>> patch 6.319808477 1.13529388384x faster >>> firefox-gc >>> master 7.345525844 >>> patch 6.493005841 1.13129820362x faster >>> chromium >>> master 4.180752414 >>> patch 4.129515199 1.01240756179x faster >>> chromium fast >>> master 1.847296843 >>> patch 1.78837299 1.0329483018x faster >>> the gold plugin >>> master 0.341725451 >>> patch 0.339943222 1.0052427255x faster >>> clang >>> master 0.581901114 >>> patch 0.566932481 1.02640284955x faster >>> llvm-as >>> master 0.03381059 >>> patch 0.036671392 1.08461260215x slower >>> the gold plugin fsds >>> master 0.369184003 >>> patch 0.368774353 1.00111084189x faster >>> clang fsds >>> master 0.660120583 >>> patch 0.641040511 1.02976422187x faster >>> llvm-as fsds >>> master 0.031074029 >>> patch 0.035421531 1.13990789543x slower >>> scylla >>> master 3.243011681 >>> patch 2.630991522 1.23261958615x faster >>> >>> >>> With only 1 core: >>> >>> firefox >>> master 7.174323116 >>> patch 7.301968002 1.01779190649x slower >>> firefox-gc >>> master 7.339104117 >>> patch 7.466171668 1.01731376868x slower >>> chromium >>> master 4.176958448 >>> patch 4.188387233 1.00273615003x slower >>> chromium fast >>> master 1.848922713 >>> patch 1.858714219 1.00529578978x slower >>> the gold plugin >>> master 0.342383846 >>> patch 0.347106743 1.01379415838x slower >>> clang >>> master 0.582476955 >>> patch 0.600524655 1.03098440178x slower >>> llvm-as >>> master 0.033248459 >>> patch 0.035622988 1.07141771593x slower >>> the gold plugin fsds >>> master 0.369510236 >>> patch 0.376390506 1.01861997133x slower >>> clang fsds >>> master 0.661267753 >>> patch 0.683417482 1.03349585535x slower >>> llvm-as fsds >>> master 0.030574688 >>> patch 0.033052779 1.08105041006x slower >>> scylla >>> master 3.236604638 >>> patch 3.325831407 1.02756801617x slower >>> >>> Given that we have an improvement even with just two cores available, >>> LGTM. >>> >>> Cheers, >>> Rafael >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/b3385bbf/attachment.html>
Mehdi Amini via llvm-dev
2016-Nov-17 00:16 UTC
[llvm-dev] LLD: time to enable --threads by default
No, otherwise I’d have pick it instead of this one ;-) The alternative plan was either to: 1) reach out to someone who has written an optimized one and convince him to contribute it to LLVM 2) “optimize” the one in tree. But not high enough in my priority list. — Mehdi> On Nov 16, 2016, at 4:13 PM, Rui Ueyama <ruiu at google.com> wrote: > > I should've said that do you know if there's an optimized SHA1 implementation that we can use? > > On Wed, Nov 16, 2016 at 4:11 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: > The current implementation was “copy/pasted” from somewhere (it was explicitly public domain). > >> On Nov 16, 2016, at 4:05 PM, Rui Ueyama <ruiu at google.com <mailto:ruiu at google.com>> wrote: >> >> Can we just copy-and-paste optimized code from somewhere? >> >> On Wed, Nov 16, 2016 at 4:03 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: >> SHA1 in LLVM is *very* naive, any improvement is welcome there! >> It think Amaury pointed it originally and he had an alternative implementation IIRC. >> >> — >> Mehdi >> >>> On Nov 16, 2016, at 3:58 PM, Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>> By the way, while running benchmark, I found that our SHA1 function seems much slower than the one in gold. gold slowed down by only 1.3 seconds to compute a SHA1 of output, but we spent 6.0 seconds to do the same thing (I believe). Something doesn't seem right. >>> >>> Here is a table to link the same binary with -no-threads and -build-id={none,md5,sha1}. The numbers are in seconds. >>> >>> LLD gold >>> none 7.82 13.78 >>> MD5 9.68 14.56 >>> SHA1 13.85 15.05 >>> >>> >>> On Wed, Nov 16, 2016 at 1:46 PM, Rafael Espíndola <rafael.espindola at gmail.com <mailto:rafael.espindola at gmail.com>> wrote: >>> On 16 November 2016 at 15:52, Rafael Espíndola >>> <rafael.espindola at gmail.com <mailto:rafael.espindola at gmail.com>> wrote: >>> > I will do a quick benchmark run. >>> >>> >>> On a mac pro (running linux) the results I got with all cores available: >>> >>> firefox >>> master 7.146418217 <tel:7.146418217> >>> patch 5.304271767 1 <tel:304271767%201>.34729488437x faster >>> firefox-gc >>> master 7.316743822 <tel:7.316743822> >>> patch 5.46436812 1.33899174824x faster >>> chromium >>> master 4.265597914 >>> patch 3.972218527 1.07385781648x faster >>> chromium fast >>> master 1.823614026 >>> patch 1.686059427 1.08158348205x faster >>> the gold plugin >>> master 0.340167513 >>> patch 0.318601465 1.06768973269x faster >>> clang >>> master 0.579914119 >>> patch 0.520784947 1.11353855817x faster >>> llvm-as >>> master 0.03323043 >>> patch 0.041571719 1.251013574x slower >>> the gold plugin fsds >>> master 0.36675887 >>> patch 0.350970944 1.04498356992x faster >>> clang fsds >>> master 0.656180056 >>> patch 0.591607603 1.10914743602x faster >>> llvm-as fsds >>> master 0.030324313 >>> patch 0.040045353 1.32056917497x slower >>> scylla >>> master 3.23378908 >>> patch 2.019191831 1.60152642773x faster >>> >>> With only 2 cores: >>> >>> firefox >>> master 7.174839911 >>> patch 6.319808477 1.13529388384x faster >>> firefox-gc >>> master 7.345525844 >>> patch 6.493005841 1.13129820362x faster >>> chromium >>> master 4.180752414 >>> patch 4.129515199 1.01240756179x faster >>> chromium fast >>> master 1.847296843 >>> patch 1.78837299 1.0329483018x faster >>> the gold plugin >>> master 0.341725451 >>> patch 0.339943222 1.0052427255x faster >>> clang >>> master 0.581901114 >>> patch 0.566932481 1.02640284955x faster >>> llvm-as >>> master 0.03381059 >>> patch 0.036671392 1.08461260215x slower >>> the gold plugin fsds >>> master 0.369184003 >>> patch 0.368774353 1.00111084189x faster >>> clang fsds >>> master 0.660120583 >>> patch 0.641040511 1.02976422187x faster >>> llvm-as fsds >>> master 0.031074029 >>> patch 0.035421531 1.13990789543x slower >>> scylla >>> master 3.243011681 >>> patch 2.630991522 1.23261958615x faster >>> >>> >>> With only 1 core: >>> >>> firefox >>> master 7.174323116 >>> patch 7.301968002 1.01779190649x slower >>> firefox-gc >>> master 7.339104117 >>> patch 7.466171668 1.01731376868x slower >>> chromium >>> master 4.176958448 >>> patch 4.188387233 1.00273615003x slower >>> chromium fast >>> master 1.848922713 >>> patch 1.858714219 1.00529578978x slower >>> the gold plugin >>> master 0.342383846 >>> patch 0.347106743 1.01379415838x slower >>> clang >>> master 0.582476955 >>> patch 0.600524655 1.03098440178x slower >>> llvm-as >>> master 0.033248459 >>> patch 0.035622988 1.07141771593x slower >>> the gold plugin fsds >>> master 0.369510236 >>> patch 0.376390506 1.01861997133x slower >>> clang fsds >>> master 0.661267753 >>> patch 0.683417482 1.03349585535x slower >>> llvm-as fsds >>> master 0.030574688 >>> patch 0.033052779 1.08105041006x slower >>> scylla >>> master 3.236604638 >>> patch 3.325831407 1.02756801617x slower >>> >>> Given that we have an improvement even with just two cores available, LGTM. >>> >>> Cheers, >>> Rafael >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/0b5dfa27/attachment-0001.html>