Displaying 20 results from an estimated 400 matches similar to: "[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64"
2020 May 19
5
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
I've read up some more on the subject, and it seems the proper way to
do this with GCC is g++ and target attributes. I've refactored the
patch that way, and it indeed uses SSSE3 automatically on supporting
CPUs, regardless of the build host, so this should be ideal both for
home builders and distros.
Getting the code to build right in c++ mode (checksum_sse2.cpp only)
was a bit of an
2020 May 18
3
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
What do you base this on?
Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html :
"For the x86-32 compiler, you must use -march=cpu-type, -msse or
-msse2 switches to enable SSE extensions and make this option
effective. For the x86-64 compiler, these extensions are enabled by
default."
That reads to me like we're fine for SSE2. As stated in my comments,
SSSE3 support must be
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
I think this is a great patch but, in my view, an even better way to tackle
the fundamental problem (the performance limitations) is to use a much
faster checksum like xxhash, as has been suggested before:
https://lists.samba.org/archive/rsync/2019-October/031975.html
Cheers,
Filipe
On Mon, 18 May 2020 at 17:08, Jorrit Jongma via rsync <rsync at lists.samba.org>
wrote:
> This drop-in
2020 May 18
2
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
I don't disagree that MD5 could (or even should) be replaced so it is
no longer the bottleneck in several real-world cases (including mine).
However this patch is not for MD5 performance, rather for the rolling
checksum rsync uses to match blocks on existing files on both ends to
reduce transfer size.
On Mon, May 18, 2020 at 5:44 PM Filipe Maia via rsync
<rsync at lists.samba.org>
2020 May 20
0
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
would it perhaps make sense to have a "--disable-sse2/3" commandline
switch in rsync, too - at least for some timeframe until this is
considered "rock solid" ?
i dislike having automatic cpu feature switching code in a tool which
needs to be reliable for me, this new optimization may have issues - and
without such switch it can't be easily workarounded without replacing
2002 Mar 15
1
Weak CheckSum Question
Hi all,
I am writing a xdelta-like application as a personal experiment and am busy
implementing the rsync protocol, so far so good. I am using C++ templates and
creating the algorithms so that operate on any stream, array, etc. through
iterators.
All seems well except that I am getting a lot of false hits with the weak
checksum. When generating checksums of blocksize 1024 on the RedHat 7.1
2002 Aug 05
5
[patch] read-devices
Greetings,
I'd like to propose a new option to rsync, which causes it to read
device files as if they were regular files. This includes pipes,
character devices and block devices (I'm not sure about sockets). The
main motivation is cases where you need to synchronize a large amount of
data that is not available as regular files, as in the following scenarios:
* Keep a copy of a block
2013 Jul 31
0
[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64
Hi Dan,
If you set the node's action to "Custom", you should be able to
interfere in the type legalisation phase (before it gets promoted to a
64-bit MUL) by overriding the "ReplaceNodeResults" function.
You could either expand it to a different libcall directly there, or
replace it with a target-specific node (say XXXISD::MUL32) which
claims to take i64 types but you
2013 Jul 31
2
[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64
Thanks for the information, allow maybe I can re-phrase the question or
issue.
Assume 64-bit register types, but integer is 32-bit. Already have table
generation of the 64-bit operation descriptions.
How about this modified approach?
Before type-legalization, I'd really like to move all MUL I64 to a
subroutine call of my own choice.
This would be a form of customization, but I want this
2013 Jul 31
1
[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64
Thanks Tom. I really appreciate your insight.
I'm able to use the customize to get the 64-bit to go to a subroutine and
for the 32-bit, I am generate XXXISD::MUL32. I'm not sure then what you
mean about "overriding" the ReplaceNodeResults.
For ReplaceNodeResults, I'm doing:
SDValue Res = LowerOperation(SDValue(N, 0), DAG);
for (unsigned I = 0, E =
2010 Jun 11
0
[LLVMdev] thinking about timing-test-driven scheduler
On Wed, 2010-06-09 at 17:30 +0200, orthochronous wrote:
> Hi,
>
> I've been thinking about how to implement a framework for attempting
> instruction scheduling of small blocks of code by using (GA/simulated
> annealing/etc) controlled timing-test-evaluations of various
> orderings.
This sounds interesting.
> (I'm particularly interested small-ish numerical inner
2010 Jun 09
2
[LLVMdev] thinking about timing-test-driven scheduler
Hi,
I've been thinking about how to implement a framework for attempting
instruction scheduling of small blocks of code by using (GA/simulated
annealing/etc) controlled timing-test-evaluations of various
orderings. (I'm particularly interested small-ish numerical inner loop
code in low-power CPUs like Atom and various ARMs where there CPU
doesn't have the ability to
2009 Sep 24
1
weak checksum
Hi,
I'm curious if anybody knows the exact reason why the weak checksum
calculation is slightly different to the standard adler-32 checksum as seen
for example here http://en.wikipedia.org/wiki/Adler-32 ?
Thanks
Julian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20090924/c4ed210b/attachment.html>
2015 Mar 13
1
[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com>
* Makes ?enable-intrinsics work with clang and other non-GCC compilers
* Enables RTCD for the floating-point-mode SSE code in Celt.
* Disables use of RTCD in cases where the compiler targets an instruction set by default.
* Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2015 Mar 12
1
[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com>
* Makes ?enable-intrinsics work with clang and other non-GCC compilers
* Enables RTCD for the floating-point-mode SSE code in Celt.
* Disables use of RTCD in cases where the compiler targets an instruction set by default.
* Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2004 Aug 02
4
reducing memmoves
Attached is a patch that makes window strides constant when files are
walked with a constant block size. In these cases, it completely
avoids all memmoves.
In my simple local test of rsyncing 57MB of 10 local files, memmoved
bytes went from 18MB to zero.
I haven't tested this for a big variety of file cases. I think that this
will always reduce the memmoves involved with walking a large
2013 Jul 30
3
[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64
I'll try to run through the scenario:
64-bit register type target (all registers have 64 bits).
all 32-bits are getting promoted to 64-bit integers
Problem:
MUL on i32 is getting promoted to MUL on i64
MUL on i64 is getting expanded to a library call in compiler-rt
the problem is that MUL32 gets promoted and then converted into a
subroutine call because it is now type i64, even though
2015 Mar 02
13
Patch cleaning up Opus x86 intrinsics configury
The attached patch cleans up Opus's x86 intrinsics configury.
It:
* Makes ?enable-intrinsics work with clang and other non-GCC compilers
* Enables RTCD for the floating-point-mode SSE code in Celt.
* Disables use of RTCD in cases where the compiler targets an instruction set by default.
* Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in
2013 Jul 30
0
[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64
On Tue, Jul 30, 2013 at 01:14:16PM -0600, Dan wrote:
> I'll try to run through the scenario:
>
>
> 64-bit register type target (all registers have 64 bits).
>
> all 32-bits are getting promoted to 64-bit integers
>
> Problem:
>
> MUL on i32 is getting promoted to MUL on i64
>
> MUL on i64 is getting expanded to a library call in compiler-rt
>
>
2020 May 18
2
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
Well, don't get too excited, get_checksum1() (the function optimized
here) is not the great performance limiter in this case, it's
get_checksum2() and sum_update(), which will be using MD5. You can
force using MD4, but on the slower CPU's I've tested in practice that
is slower rather than faster, contrary to what would be expected.
While this patch will improve things a little, to