thr3ads.net - similar to: "[PATCH] SSE2/SSSE3 optimized version of get

Displaying 20 results from an estimated 400 matches similar to: "[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64"

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 19

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

I've read up some more on the subject, and it seems the proper way to do this with GCC is g++ and target attributes. I've refactored the patch that way, and it indeed uses SSSE3 automatically on supporting CPUs, regardless of the build host, so this should be ideal both for home builders and distros. Getting the code to build right in c++ mode (checksum_sse2.cpp only) was a bit of an

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

What do you base this on? Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That reads to me like we're fine for SSE2. As stated in my comments, SSSE3 support must be

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

I think this is a great patch but, in my view, an even better way to tackle the fundamental problem (the performance limitations) is to use a much faster checksum like xxhash, as has been suggested before: https://lists.samba.org/archive/rsync/2019-October/031975.html Cheers, Filipe On Mon, 18 May 2020 at 17:08, Jorrit Jongma via rsync <rsync at lists.samba.org> wrote: > This drop-in

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

I don't disagree that MD5 could (or even should) be replaced so it is no longer the bottleneck in several real-world cases (including mine). However this patch is not for MD5 performance, rather for the rolling checksum rsync uses to match blocks on existing files on both ends to reduce transfer size. On Mon, May 18, 2020 at 5:44 PM Filipe Maia via rsync <rsync at lists.samba.org>

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 20

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

would it perhaps make sense to have a "--disable-sse2/3" commandline switch in rsync, too - at least for some timeframe until this is considered "rock solid" ? i dislike having automatic cpu feature switching code in a tool which needs to be reliable for me, this new optimization may have issues - and without such switch it can't be easily workarounded without replacing

Weak CheckSum Question

2002 Mar 15

Weak CheckSum Question

Hi all, I am writing a xdelta-like application as a personal experiment and am busy implementing the rsync protocol, so far so good. I am using C++ templates and creating the algorithms so that operate on any stream, array, etc. through iterators. All seems well except that I am getting a lot of false hits with the weak checksum. When generating checksums of blocksize 1024 on the RedHat 7.1

[patch] read-devices

2002 Aug 05

[patch] read-devices

Greetings, I'd like to propose a new option to rsync, which causes it to read device files as if they were regular files. This includes pipes, character devices and block devices (I'm not sure about sockets). The main motivation is cases where you need to synchronize a large amount of data that is not available as regular files, as in the following scenarios: * Keep a copy of a block

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

2013 Jul 31

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

Hi Dan, If you set the node's action to "Custom", you should be able to interfere in the type legalisation phase (before it gets promoted to a 64-bit MUL) by overriding the "ReplaceNodeResults" function. You could either expand it to a different libcall directly there, or replace it with a target-specific node (say XXXISD::MUL32) which claims to take i64 types but you

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

2013 Jul 31

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

Thanks for the information, allow maybe I can re-phrase the question or issue. Assume 64-bit register types, but integer is 32-bit. Already have table generation of the 64-bit operation descriptions. How about this modified approach? Before type-legalization, I'd really like to move all MUL I64 to a subroutine call of my own choice. This would be a form of customization, but I want this

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

2013 Jul 31

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

Thanks Tom. I really appreciate your insight. I'm able to use the customize to get the 64-bit to go to a subroutine and for the 32-bit, I am generate XXXISD::MUL32. I'm not sure then what you mean about "overriding" the ReplaceNodeResults. For ReplaceNodeResults, I'm doing: SDValue Res = LowerOperation(SDValue(N, 0), DAG); for (unsigned I = 0, E =

[LLVMdev] thinking about timing-test-driven scheduler

2010 Jun 11

[LLVMdev] thinking about timing-test-driven scheduler

On Wed, 2010-06-09 at 17:30 +0200, orthochronous wrote: > Hi, > > I've been thinking about how to implement a framework for attempting > instruction scheduling of small blocks of code by using (GA/simulated > annealing/etc) controlled timing-test-evaluations of various > orderings. This sounds interesting. > (I'm particularly interested small-ish numerical inner

[LLVMdev] thinking about timing-test-driven scheduler

2010 Jun 09

[LLVMdev] thinking about timing-test-driven scheduler

Hi, I've been thinking about how to implement a framework for attempting instruction scheduling of small blocks of code by using (GA/simulated annealing/etc) controlled timing-test-evaluations of various orderings. (I'm particularly interested small-ish numerical inner loop code in low-power CPUs like Atom and various ARMs where there CPU doesn't have the ability to

weak checksum

2009 Sep 24

weak checksum

Hi, I'm curious if anybody knows the exact reason why the weak checksum calculation is slightly different to the standard adler-32 checksum as seen for example here http://en.wikipedia.org/wiki/Adler-32 ? Thanks Julian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20090924/c4ed210b/attachment.html>

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 13

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

From: Jonathan Lennox <jonathan at vidyo.com> * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 12

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

reducing memmoves

2004 Aug 02

reducing memmoves

Attached is a patch that makes window strides constant when files are walked with a constant block size. In these cases, it completely avoids all memmoves. In my simple local test of rsyncing 57MB of 10 local files, memmoved bytes went from 18MB to zero. I haven't tested this for a big variety of file cases. I think that this will always reduce the memmoves involved with walking a large

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

2013 Jul 30

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

I'll try to run through the scenario: 64-bit register type target (all registers have 64 bits). all 32-bits are getting promoted to 64-bit integers Problem: MUL on i32 is getting promoted to MUL on i64 MUL on i64 is getting expanded to a library call in compiler-rt the problem is that MUL32 gets promoted and then converted into a subroutine call because it is now type i64, even though

Patch cleaning up Opus x86 intrinsics configury

2015 Mar 02

Patch cleaning up Opus x86 intrinsics configury

The attached patch cleans up Opus's x86 intrinsics configury. It: * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

2013 Jul 30

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

On Tue, Jul 30, 2013 at 01:14:16PM -0600, Dan wrote: > I'll try to run through the scenario: > > > 64-bit register type target (all registers have 64 bits). > > all 32-bits are getting promoted to 64-bit integers > > Problem: > > MUL on i32 is getting promoted to MUL on i64 > > MUL on i64 is getting expanded to a library call in compiler-rt > >

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

Well, don't get too excited, get_checksum1() (the function optimized here) is not the great performance limiter in this case, it's get_checksum2() and sum_update(), which will be using MD5. You can force using MD4, but on the slower CPU's I've tested in practice that is slower rather than faster, contrary to what would be expected. While this patch will improve things a little, to

similar to: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64