Mehdi Amini via llvm-dev
2016-Nov-17 00:03 UTC
[llvm-dev] LLD: time to enable --threads by default
SHA1 in LLVM is *very* naive, any improvement is welcome there! It think Amaury pointed it originally and he had an alternative implementation IIRC. — Mehdi> On Nov 16, 2016, at 3:58 PM, Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > By the way, while running benchmark, I found that our SHA1 function seems much slower than the one in gold. gold slowed down by only 1.3 seconds to compute a SHA1 of output, but we spent 6.0 seconds to do the same thing (I believe). Something doesn't seem right. > > Here is a table to link the same binary with -no-threads and -build-id={none,md5,sha1}. The numbers are in seconds. > > LLD gold > none 7.82 13.78 > MD5 9.68 14.56 > SHA1 13.85 15.05 > > > On Wed, Nov 16, 2016 at 1:46 PM, Rafael Espíndola <rafael.espindola at gmail.com <mailto:rafael.espindola at gmail.com>> wrote: > On 16 November 2016 at 15:52, Rafael Espíndola > <rafael.espindola at gmail.com <mailto:rafael.espindola at gmail.com>> wrote: > > I will do a quick benchmark run. > > > On a mac pro (running linux) the results I got with all cores available: > > firefox > master 7.146418217 <tel:7.146418217> > patch 5.304271767 1 <tel:304271767%201>.34729488437x faster > firefox-gc > master 7.316743822 <tel:7.316743822> > patch 5.46436812 1.33899174824x faster > chromium > master 4.265597914 > patch 3.972218527 1.07385781648x faster > chromium fast > master 1.823614026 > patch 1.686059427 1.08158348205x faster > the gold plugin > master 0.340167513 > patch 0.318601465 1.06768973269x faster > clang > master 0.579914119 > patch 0.520784947 1.11353855817x faster > llvm-as > master 0.03323043 > patch 0.041571719 1.251013574x slower > the gold plugin fsds > master 0.36675887 > patch 0.350970944 1.04498356992x faster > clang fsds > master 0.656180056 > patch 0.591607603 1.10914743602x faster > llvm-as fsds > master 0.030324313 > patch 0.040045353 1.32056917497x slower > scylla > master 3.23378908 > patch 2.019191831 1.60152642773x faster > > With only 2 cores: > > firefox > master 7.174839911 > patch 6.319808477 1.13529388384x faster > firefox-gc > master 7.345525844 > patch 6.493005841 1.13129820362x faster > chromium > master 4.180752414 > patch 4.129515199 1.01240756179x faster > chromium fast > master 1.847296843 > patch 1.78837299 1.0329483018x faster > the gold plugin > master 0.341725451 > patch 0.339943222 1.0052427255x faster > clang > master 0.581901114 > patch 0.566932481 1.02640284955x faster > llvm-as > master 0.03381059 > patch 0.036671392 1.08461260215x slower > the gold plugin fsds > master 0.369184003 > patch 0.368774353 1.00111084189x faster > clang fsds > master 0.660120583 > patch 0.641040511 1.02976422187x faster > llvm-as fsds > master 0.031074029 > patch 0.035421531 1.13990789543x slower > scylla > master 3.243011681 > patch 2.630991522 1.23261958615x faster > > > With only 1 core: > > firefox > master 7.174323116 > patch 7.301968002 1.01779190649x slower > firefox-gc > master 7.339104117 > patch 7.466171668 1.01731376868x slower > chromium > master 4.176958448 > patch 4.188387233 1.00273615003x slower > chromium fast > master 1.848922713 > patch 1.858714219 1.00529578978x slower > the gold plugin > master 0.342383846 > patch 0.347106743 1.01379415838x slower > clang > master 0.582476955 > patch 0.600524655 1.03098440178x slower > llvm-as > master 0.033248459 > patch 0.035622988 1.07141771593x slower > the gold plugin fsds > master 0.369510236 > patch 0.376390506 1.01861997133x slower > clang fsds > master 0.661267753 > patch 0.683417482 1.03349585535x slower > llvm-as fsds > master 0.030574688 > patch 0.033052779 1.08105041006x slower > scylla > master 3.236604638 > patch 3.325831407 1.02756801617x slower > > Given that we have an improvement even with just two cores available, LGTM. > > Cheers, > Rafael > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/ac4c7d68/attachment-0001.html>
Rui Ueyama via llvm-dev
2016-Nov-17 00:05 UTC
[llvm-dev] LLD: time to enable --threads by default
Can we just copy-and-paste optimized code from somewhere? On Wed, Nov 16, 2016 at 4:03 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:> SHA1 in LLVM is *very* naive, any improvement is welcome there! > It think Amaury pointed it originally and he had an alternative > implementation IIRC. > > — > Mehdi > > On Nov 16, 2016, at 3:58 PM, Rui Ueyama via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > By the way, while running benchmark, I found that our SHA1 function seems > much slower than the one in gold. gold slowed down by only 1.3 seconds to > compute a SHA1 of output, but we spent 6.0 seconds to do the same thing (I > believe). Something doesn't seem right. > > Here is a table to link the same binary with -no-threads and > -build-id={none,md5,sha1}. The numbers are in seconds. > > LLD gold > none 7.82 13.78 > MD5 9.68 14.56 > SHA1 13.85 15.05 > > > On Wed, Nov 16, 2016 at 1:46 PM, Rafael Espíndola < > rafael.espindola at gmail.com> wrote: > >> On 16 November 2016 at 15:52, Rafael Espíndola >> <rafael.espindola at gmail.com> wrote: >> > I will do a quick benchmark run. >> >> >> On a mac pro (running linux) the results I got with all cores available: >> >> firefox >> master 7.146418217 >> patch 5.304271767 1.34729488437x faster >> firefox-gc >> master 7.316743822 >> patch 5.46436812 1.33899174824x faster >> chromium >> master 4.265597914 >> patch 3.972218527 1.07385781648x faster >> chromium fast >> master 1.823614026 >> patch 1.686059427 1.08158348205x faster >> the gold plugin >> master 0.340167513 >> patch 0.318601465 1.06768973269x faster >> clang >> master 0.579914119 >> patch 0.520784947 1.11353855817x faster >> llvm-as >> master 0.03323043 >> patch 0.041571719 1.251013574x slower >> the gold plugin fsds >> master 0.36675887 >> patch 0.350970944 1.04498356992x faster >> clang fsds >> master 0.656180056 >> patch 0.591607603 1.10914743602x faster >> llvm-as fsds >> master 0.030324313 >> patch 0.040045353 1.32056917497x slower >> scylla >> master 3.23378908 >> patch 2.019191831 1.60152642773x faster >> >> With only 2 cores: >> >> firefox >> master 7.174839911 >> patch 6.319808477 1.13529388384x faster >> firefox-gc >> master 7.345525844 >> patch 6.493005841 1.13129820362x faster >> chromium >> master 4.180752414 >> patch 4.129515199 1.01240756179x faster >> chromium fast >> master 1.847296843 >> patch 1.78837299 1.0329483018x faster >> the gold plugin >> master 0.341725451 >> patch 0.339943222 1.0052427255x faster >> clang >> master 0.581901114 >> patch 0.566932481 1.02640284955x faster >> llvm-as >> master 0.03381059 >> patch 0.036671392 1.08461260215x slower >> the gold plugin fsds >> master 0.369184003 >> patch 0.368774353 1.00111084189x faster >> clang fsds >> master 0.660120583 >> patch 0.641040511 1.02976422187x faster >> llvm-as fsds >> master 0.031074029 >> patch 0.035421531 1.13990789543x slower >> scylla >> master 3.243011681 >> patch 2.630991522 1.23261958615x faster >> >> >> With only 1 core: >> >> firefox >> master 7.174323116 >> patch 7.301968002 1.01779190649x slower >> firefox-gc >> master 7.339104117 >> patch 7.466171668 1.01731376868x slower >> chromium >> master 4.176958448 >> patch 4.188387233 1.00273615003x slower >> chromium fast >> master 1.848922713 >> patch 1.858714219 1.00529578978x slower >> the gold plugin >> master 0.342383846 >> patch 0.347106743 1.01379415838x slower >> clang >> master 0.582476955 >> patch 0.600524655 1.03098440178x slower >> llvm-as >> master 0.033248459 >> patch 0.035622988 1.07141771593x slower >> the gold plugin fsds >> master 0.369510236 >> patch 0.376390506 1.01861997133x slower >> clang fsds >> master 0.661267753 >> patch 0.683417482 1.03349585535x slower >> llvm-as fsds >> master 0.030574688 >> patch 0.033052779 1.08105041006x slower >> scylla >> master 3.236604638 >> patch 3.325831407 1.02756801617x slower >> >> Given that we have an improvement even with just two cores available, >> LGTM. >> >> Cheers, >> Rafael >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/0ae2dc8b/attachment.html>
Mehdi Amini via llvm-dev
2016-Nov-17 00:11 UTC
[llvm-dev] LLD: time to enable --threads by default
The current implementation was “copy/pasted” from somewhere (it was explicitly public domain).> On Nov 16, 2016, at 4:05 PM, Rui Ueyama <ruiu at google.com> wrote: > > Can we just copy-and-paste optimized code from somewhere? > > On Wed, Nov 16, 2016 at 4:03 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: > SHA1 in LLVM is *very* naive, any improvement is welcome there! > It think Amaury pointed it originally and he had an alternative implementation IIRC. > > — > Mehdi > >> On Nov 16, 2016, at 3:58 PM, Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> By the way, while running benchmark, I found that our SHA1 function seems much slower than the one in gold. gold slowed down by only 1.3 seconds to compute a SHA1 of output, but we spent 6.0 seconds to do the same thing (I believe). Something doesn't seem right. >> >> Here is a table to link the same binary with -no-threads and -build-id={none,md5,sha1}. The numbers are in seconds. >> >> LLD gold >> none 7.82 13.78 >> MD5 9.68 14.56 >> SHA1 13.85 15.05 >> >> >> On Wed, Nov 16, 2016 at 1:46 PM, Rafael Espíndola <rafael.espindola at gmail.com <mailto:rafael.espindola at gmail.com>> wrote: >> On 16 November 2016 at 15:52, Rafael Espíndola >> <rafael.espindola at gmail.com <mailto:rafael.espindola at gmail.com>> wrote: >> > I will do a quick benchmark run. >> >> >> On a mac pro (running linux) the results I got with all cores available: >> >> firefox >> master 7.146418217 <tel:7.146418217> >> patch 5.304271767 1 <tel:304271767%201>.34729488437x faster >> firefox-gc >> master 7.316743822 <tel:7.316743822> >> patch 5.46436812 1.33899174824x faster >> chromium >> master 4.265597914 >> patch 3.972218527 1.07385781648x faster >> chromium fast >> master 1.823614026 >> patch 1.686059427 1.08158348205x faster >> the gold plugin >> master 0.340167513 >> patch 0.318601465 1.06768973269x faster >> clang >> master 0.579914119 >> patch 0.520784947 1.11353855817x faster >> llvm-as >> master 0.03323043 >> patch 0.041571719 1.251013574x slower >> the gold plugin fsds >> master 0.36675887 >> patch 0.350970944 1.04498356992x faster >> clang fsds >> master 0.656180056 >> patch 0.591607603 1.10914743602x faster >> llvm-as fsds >> master 0.030324313 >> patch 0.040045353 1.32056917497x slower >> scylla >> master 3.23378908 >> patch 2.019191831 1.60152642773x faster >> >> With only 2 cores: >> >> firefox >> master 7.174839911 >> patch 6.319808477 1.13529388384x faster >> firefox-gc >> master 7.345525844 >> patch 6.493005841 1.13129820362x faster >> chromium >> master 4.180752414 >> patch 4.129515199 1.01240756179x faster >> chromium fast >> master 1.847296843 >> patch 1.78837299 1.0329483018x faster >> the gold plugin >> master 0.341725451 >> patch 0.339943222 1.0052427255x faster >> clang >> master 0.581901114 >> patch 0.566932481 1.02640284955x faster >> llvm-as >> master 0.03381059 >> patch 0.036671392 1.08461260215x slower >> the gold plugin fsds >> master 0.369184003 >> patch 0.368774353 1.00111084189x faster >> clang fsds >> master 0.660120583 >> patch 0.641040511 1.02976422187x faster >> llvm-as fsds >> master 0.031074029 >> patch 0.035421531 1.13990789543x slower >> scylla >> master 3.243011681 >> patch 2.630991522 1.23261958615x faster >> >> >> With only 1 core: >> >> firefox >> master 7.174323116 >> patch 7.301968002 1.01779190649x slower >> firefox-gc >> master 7.339104117 >> patch 7.466171668 1.01731376868x slower >> chromium >> master 4.176958448 >> patch 4.188387233 1.00273615003x slower >> chromium fast >> master 1.848922713 >> patch 1.858714219 1.00529578978x slower >> the gold plugin >> master 0.342383846 >> patch 0.347106743 1.01379415838x slower >> clang >> master 0.582476955 >> patch 0.600524655 1.03098440178x slower >> llvm-as >> master 0.033248459 >> patch 0.035622988 1.07141771593x slower >> the gold plugin fsds >> master 0.369510236 >> patch 0.376390506 1.01861997133x slower >> clang fsds >> master 0.661267753 >> patch 0.683417482 1.03349585535x slower >> llvm-as fsds >> master 0.030574688 >> patch 0.033052779 1.08105041006x slower >> scylla >> master 3.236604638 >> patch 3.325831407 1.02756801617x slower >> >> Given that we have an improvement even with just two cores available, LGTM. >> >> Cheers, >> Rafael >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/9cd2ebfe/attachment.html>
Joerg Sonnenberger via llvm-dev
2016-Nov-17 01:13 UTC
[llvm-dev] LLD: time to enable --threads by default
On Wed, Nov 16, 2016 at 04:05:38PM -0800, Rui Ueyama via llvm-dev wrote:> Can we just copy-and-paste optimized code from somewhere?The NetBSD version is also PD and uses much more aggressive loop unrolling: https://github.com/jsonn/src/blob/trunk/common/lib/libc/hash/sha1/sha1.c It's still a bit slower than an optimised assembler version, but typically good enough. Joerg