Scott Smith via llvm-dev
2017-Apr-13 03:49 UTC
[llvm-dev] Improve the performance of JamCRC
Lldb relies heavily on crc when loading shared libraries. The existing implementation is quite slow as it computes a byte at a time, creating a long dependency chain. Unfortunately the polynomial is not the same as the one implemented by x86 processors in SSE 4.2, but there's another way to make it faster by using more lookup tables. Zlib implements this, but rather than require zlib, I instead added the relevant code to compute four bytes at a time in parallel. A separate patch changes lldb to rely on JamCRC instead of its own implementation. This patch improves the performance, which brings my test (starting lldb, breaking at main) from 47 seconds down to 36 seconds. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/c6d1ef7d/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: jamcrc.patch Type: application/octet-stream Size: 16782 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/c6d1ef7d/attachment.obj>
Scott Smith via llvm-dev
2017-Apr-13 04:52 UTC
[llvm-dev] Improve the performance of JamCRC
Sorry, that last patch didn't handle endianness very well. Here's an updated patch that uses llvm::support::endian. I assume unaligned input, which is safer. I have no idea whether one can expect aligned input to this function. It also wouldn't take much to process the first <=3 bytes one at a time, then blast through assuming aligned reads, and then finish up with another <=3 bytes. Let me know if you prefer that. On Wed, Apr 12, 2017 at 8:49 PM, Scott Smith <scott.smith at purestorage.com> wrote:> Lldb relies heavily on crc when loading shared libraries. The existing > implementation is quite slow as it computes a byte at a time, creating a > long dependency chain. > > Unfortunately the polynomial is not the same as the one implemented by x86 > processors in SSE 4.2, but there's another way to make it faster by using > more lookup tables. > > Zlib implements this, but rather than require zlib, I instead added the > relevant code to compute four bytes at a time in parallel. > > A separate patch changes lldb to rely on JamCRC instead of its own > implementation. This patch improves the performance, which brings my test > (starting lldb, breaking at main) from 47 seconds down to 36 seconds. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/32c54905/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: jamcrc.patch Type: application/octet-stream Size: 16784 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/32c54905/attachment-0001.obj>
Mehdi Amini via llvm-dev
2017-Apr-13 08:00 UTC
[llvm-dev] Improve the performance of JamCRC
Hi Scott, Usually patches are sent to llvm-commits (unless I missed a specific reason to send this patch to llvm-dev instead), see: http://llvm.org/docs/DeveloperPolicy.html#making-and-submitting-a-patch (we also have a phabricator instance: http://llvm.org/docs/Phabricator.html <http://llvm.org/docs/Phabricator.html> Best, — Mehdi> On Apr 12, 2017, at 6:52 PM, Scott Smith via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Sorry, that last patch didn't handle endianness very well. Here's an updated patch that uses llvm::support::endian. I assume unaligned input, which is safer. I have no idea whether one can expect aligned input to this function. It also wouldn't take much to process the first <=3 bytes one at a time, then blast through assuming aligned reads, and then finish up with another <=3 bytes. Let me know if you prefer that. > > > > On Wed, Apr 12, 2017 at 8:49 PM, Scott Smith <scott.smith at purestorage.com <mailto:scott.smith at purestorage.com>> wrote: > Lldb relies heavily on crc when loading shared libraries. The existing implementation is quite slow as it computes a byte at a time, creating a long dependency chain. > > Unfortunately the polynomial is not the same as the one implemented by x86 processors in SSE 4.2, but there's another way to make it faster by using more lookup tables. > > Zlib implements this, but rather than require zlib, I instead added the relevant code to compute four bytes at a time in parallel. > > A separate patch changes lldb to rely on JamCRC instead of its own implementation. This patch improves the performance, which brings my test (starting lldb, breaking at main) from 47 seconds down to 36 seconds. > > > <jamcrc.patch>_______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/a2b43e3e/attachment.html>