thr3ads.net - llvm dev - [llvm-dev] Improve the performance of JamCRC [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Scott Smith via llvm-dev

2017-Apr-13 03:49 UTC

[llvm-dev] Improve the performance of JamCRC

Lldb relies heavily on crc when loading shared libraries.  The existing
implementation is quite slow as it computes a byte at a time, creating a
long dependency chain.

Unfortunately the polynomial is not the same as the one implemented by x86
processors in SSE 4.2, but there's another way to make it faster by using
more lookup tables.

Zlib implements this, but rather than require zlib, I instead added the
relevant code to compute four bytes at a time in parallel.

A separate patch changes lldb to rely on JamCRC instead of its own
implementation.  This patch improves the performance, which brings my test
(starting lldb, breaking at main) from 47 seconds down to 36 seconds.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/c6d1ef7d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jamcrc.patch
Type: application/octet-stream
Size: 16782 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/c6d1ef7d/attachment.obj>

Scott Smith via llvm-dev

2017-Apr-13 04:52 UTC

head link

[llvm-dev] Improve the performance of JamCRC

Sorry, that last patch didn't handle endianness very well.  Here's an
updated patch that uses llvm::support::endian.  I assume unaligned input,
which is safer.  I have no idea whether one can expect aligned input to
this function.  It also wouldn't take much to process the first <=3 bytes
one at a time, then blast through assuming aligned reads, and then finish
up with another <=3 bytes.  Let me know if you prefer that.



On Wed, Apr 12, 2017 at 8:49 PM, Scott Smith <scott.smith at
purestorage.com>
wrote:
> Lldb relies heavily on crc when loading shared libraries.  The existing
> implementation is quite slow as it computes a byte at a time, creating a
> long dependency chain.
>
> Unfortunately the polynomial is not the same as the one implemented by x86
> processors in SSE 4.2, but there's another way to make it faster by
using
> more lookup tables.
>
> Zlib implements this, but rather than require zlib, I instead added the
> relevant code to compute four bytes at a time in parallel.
>
> A separate patch changes lldb to rely on JamCRC instead of its own
> implementation.  This patch improves the performance, which brings my test
> (starting lldb, breaking at main) from 47 seconds down to 36 seconds.
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/32c54905/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jamcrc.patch
Type: application/octet-stream
Size: 16784 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/32c54905/attachment-0001.obj>

Mehdi Amini via llvm-dev

2017-Apr-13 08:00 UTC

head link

[llvm-dev] Improve the performance of JamCRC

Hi Scott,

Usually patches are sent to llvm-commits (unless I missed a specific reason to
send this patch to llvm-dev instead), see:
http://llvm.org/docs/DeveloperPolicy.html#making-and-submitting-a-patch
(we also have a phabricator instance: http://llvm.org/docs/Phabricator.html
<http://llvm.org/docs/Phabricator.html>

Best,

— 
Mehdi

> On Apr 12, 2017, at 6:52 PM, Scott Smith via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Sorry, that last patch didn't handle endianness very well.  Here's
an updated patch that uses llvm::support::endian.  I assume unaligned input,
which is safer.  I have no idea whether one can expect aligned input to this
function.  It also wouldn't take much to process the first <=3 bytes one
at a time, then blast through assuming aligned reads, and then finish up with
another <=3 bytes.  Let me know if you prefer that.
> 
> 
> 
> On Wed, Apr 12, 2017 at 8:49 PM, Scott Smith <scott.smith at
purestorage.com <mailto:scott.smith at purestorage.com>> wrote:
> Lldb relies heavily on crc when loading shared libraries.  The existing
implementation is quite slow as it computes a byte at a time, creating a long
dependency chain.
> 
> Unfortunately the polynomial is not the same as the one implemented by x86
processors in SSE 4.2, but there's another way to make it faster by using
more lookup tables.
> 
> Zlib implements this, but rather than require zlib, I instead added the
relevant code to compute four bytes at a time in parallel.
> 
> A separate patch changes lldb to rely on JamCRC instead of its own
implementation.  This patch improves the performance, which brings my test
(starting lldb, breaking at main) from 47 seconds down to 36 seconds.
> 
> 
> <jamcrc.patch>_______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/a2b43e3e/attachment.html>

llvm dev - Apr 2017 - Improve the performance of JamCRC

[llvm-dev] Improve the performance of JamCRC

[llvm-dev] Improve the performance of JamCRC

[llvm-dev] Improve the performance of JamCRC