search for: ssse3

Displaying 20 results from an estimated 300 matches for "ssse3".

2015 Jun 09
1
Why is sha256-generic preferred over sha256-ssse3?
The newly-released kernel v2.6.32-504.23.4.el6 includes the back-ported SHA256-SSSE3 driver. Why is the generic version of the SHA256 driver selected at runtime instead of the SSSE3 version on this x86_64 system? Yes, my CPU does support the SSSE3 instruction set, and the use of SHA256 is invoked by the LUKS "cipher=aes-cbc-essiv:sha256" option. On the running system...
2009 Mar 23
2
[LLVMdev] X86InstrFormats.td Question
I'm looking at the instruction formats and I can't grok the comments. For example: // SSSE3 Instruction Templates: // // SS38I - SSSE3 instructions with T8 prefix. // SS3AI - SSSE3 instructions with TA prefix. // Where are these prefix names coming from? I can't find any mention of them in the Intel literature. Also, there's this curious table: // Prefix byte classes whi...
2020 May 18
3
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...ns.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That reads to me like we're fine for SSE2. As stated in my comments, SSSE3 support must be manually enabled at build time. Your comment would imply that SSSE3 is enabled out of the box on builds on machines that support it, this is not the case (it certainly isn't on my Ubuntu box). It would be preferred to detect this at runtime but getting that to work on GCC is (ap...
2020 May 18
6
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
This drop-in patch increases the performance of the get_checksum1() function on x86-64. On the target slow CPU performance of the function increased by nearly 50% in the x86-64 default SSE2 mode, and by nearly 100% if the compiler was told to enable SSSE3 support. The increase was over 200% on the fastest CPU tested in SSSE3 mode. Transfer time improvement with large files existing on both ends but with some bits flipped was measured as 5-10%, with the target machine being CPU limited (still so due to MD5). This same patch on (my) GitHub for easie...
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...mba.org> wrote: > This drop-in patch increases the performance of the get_checksum1() > function on x86-64. > > On the target slow CPU performance of the function increased by nearly > 50% in the x86-64 default SSE2 mode, and by nearly 100% if the > compiler was told to enable SSSE3 support. The increase was over 200% > on the fastest CPU tested in SSSE3 mode. > > Transfer time improvement with large files existing on both ends but > with some bits flipped was measured as 5-10%, with the target machine > being CPU limited (still so due to MD5). > > This sa...
2009 Mar 23
0
[LLVMdev] X86InstrFormats.td Question
On Mar 23, 2009, at 12:57 PM, David A. Greene wrote: > I'm looking at the instruction formats and I can't grok the > comments. For > example: > > // SSSE3 Instruction Templates: > // > // SS38I - SSSE3 instructions with T8 prefix. > // SS3AI - SSSE3 instructions with TA prefix. > // > > Where are these prefix names coming from? I can't find any mention > of them in > the Intel literature. They come from the fact th...
2020 May 18
2
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...> This drop-in patch increases the performance of the get_checksum1() >> function on x86-64. >> >> On the target slow CPU performance of the function increased by nearly >> 50% in the x86-64 default SSE2 mode, and by nearly 100% if the >> compiler was told to enable SSSE3 support. The increase was over 200% >> on the fastest CPU tested in SSSE3 mode. >> >> Transfer time improvement with large files existing on both ends but >> with some bits flipped was measured as 5-10%, with the target machine >> being CPU limited (still so due to MD5...
2020 May 19
5
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
I've read up some more on the subject, and it seems the proper way to do this with GCC is g++ and target attributes. I've refactored the patch that way, and it indeed uses SSSE3 automatically on supporting CPUs, regardless of the build host, so this should be ideal both for home builders and distros. Getting the code to build right in c++ mode (checksum_sse2.cpp only) was a bit of an adventure, requiring modifications to mkproto.awk, configure.ac, and Makefile.in. I'...
2020 May 20
0
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...ut replacing the binary/package. regards roland Am 19.05.20 um 16:28 schrieb Jorrit Jongma via rsync: > I've read up some more on the subject, and it seems the proper way to > do this with GCC is g++ and target attributes. I've refactored the > patch that way, and it indeed uses SSSE3 automatically on supporting > CPUs, regardless of the build host, so this should be ideal both for > home builders and distros. > > Getting the code to build right in c++ mode (checksum_sse2.cpp only) > was a bit of an adventure, requiring modifications to mkproto.awk, > configure...
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...cases (including mine). > > However this patch is not for MD5 performance, rather for the rolling > checksum rsync uses to match blocks on existing files on both ends to > reduce transfer size. Still. You claim in your patch that | Benchmarks C SSE2 SSSE3 | - Intel i7-7700hq 1850 MB/s 2550 MB/s 4050 MB/s while xxhash [0] claims on a Core i5-3340M @2.7GHz that: |Version Speed on 64-bit Speed on 32-bit |XXH64 13.8 GB/s 1.9 GB/s so using xxhash64 for that work would also boost !x86 platforms. However your patch has the benefit that...
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
On 2020-05-18 21:55:13 [+0200], Jorrit Jongma wrote: > What do you base this on? So my memory was wrong. SSE2 is supported by all x86-64bit CPUs. Sorry for that. > would imply that SSSE3 is enabled out of the box on builds on machines > that support it, this is not the case (it certainly isn't on my Ubuntu > box). It would be preferred to detect this at runtime but getting that > to work on GCC is (apparently) a mess, and would probably require > modifications to co...
2020 May 21
0
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
On Tue, May 19, 2020 at 7:29 AM Jorrit Jongma via rsync < rsync at lists.samba.org> wrote: > I've read up some more on the subject, and it seems the proper way to do > this with GCC is g++ and target attributes. I've refactored the patch that > way, and it indeed uses SSSE3 automatically on supporting CPUs, regardless > of the build host, so this should be ideal both for home builders and > distros. > Very cool stuff! Sounds like this should be a nice improvement for x86 systems. I've tweaked your patch just a bit and changed it to where it is disabled...
2013 Sep 28
4
PATCH: modify/add intrinsics code
...se.c and lpc_intrin_sse2.c 2. adds FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2() function to lpc_intrin_sse2.c 3. adds lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions (useful for 24-bit en-/decoding) 4. adds precompute_partition_info_sums_intrin_sse2() / ...ssse3() and disables precompute_partition_info_sums_32bit_asm_ia32_(). SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction PABSD so it is slightly slower. MSVS 2005 doesn't support SSSE3 and SSE4, and GCC compiles everything with -msse2, so I wrapped SSSE3/SSE4.1 code w...
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
Thank you Jorrit for your detailed answer. > On 18 May 2020, at 17:58, Jorrit Jongma via rsync <rsync at lists.samba.org> wrote: > > Well, don't get too excited, get_checksum1() (the function optimized > here) is not the great performance limiter in this case, it's > get_checksum2() and sum_update(), which will be using MD5. Certainly that all other functions using
2010 Sep 08
4
[LLVMdev] MMX vs SSE
...l cases, which should stop various optimization passes from creating MMX instructions that screw up the x87 stack. Right now the MMX instructions are split between X86InstrMMX.td and X86InstrSSE.td, presumably on the historical grounds that some of them weren't introduced until SSE or SSSE3, and require support for that feature to work. I'm thinking it would be cleaner to keep them all in X86InstrMMX. Does anyone have an opinion about this?
2020 May 18
1
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
I think you're missing a point here. Two different checksum algorithms are used in concert, the Adler-based one and the MD5 one. I SSE-optimized the Adler-based one. The Adler-based hash is used to _find_ blocks that might have shifted, while the MD5 hash is a strong cryptographic hash used to _verify_ blocks and files. You wouldn't want to replace the MD5 hash with the Adler-based hash,
2013 Sep 02
0
Running libvirt on a virtualized server
...cessStop:4349 : Failed to remove cgroup for one-8 - On XCP hypervisor: [root at xenserver2 ~]# egrep '(vmx|svm)' --color=always /proc/cpuinfo flags: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht nx constant_tsc nonstop_tsc aperfmperf pni vmx est ssse3 sse4_1 sse4_2 popcnt hypervisor arat tpr_shadow vnmi flexpriority ept vpid flags: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht nx constant_tsc nonstop_tsc aperfmperf pni vmx est ssse3 sse4_1 sse4_2 popcnt hypervisor arat tpr_shadow vnmi flexpriority ept...
2015 Feb 04
2
CPU model and missing AES-NI extension
...lags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid And this is what I get in the guest: model name :...
2013 Sep 17
2
Performance and precompute_partition_info_sums_32bit_asm_ia32_()
...on_info_sums_32bit_asm_ia32_(), 3rd column: encoding time in seconds, smaller=better): no SSE disabled 53.9 no SSE enabled 55.2 SSE1 disabled 53.9 SSE1 enabled 55.3 SSE2 disabled 51.9 SSE2 enabled 53.1 SSE3 disabled 51.8 SSE3 enabled 53.2 SSSE3 disabled 45.7 SSSE3 enabled 51.4 SSE41 disabled 46.1 SSE41 enabled 51.6 SSE42 disabled 46.1 SSE42 enabled 51.6 Conclusions: 1) flac is always faster when precompute_partition_info_sums_32bit_asm_ia32_() is disabled. 2) Some C code benefits noticeably fro...
2014 May 13
1
Performance tests of the current version (git-b1b6caf)
...--------------------------- 16 bit input -m option | 32 bit codec | 64 bit codec | GCC 4.8.2 | GCC 4.9.0 | GCC 4.8.2 | GCC 4.9.0 (none) | 51.6 | 35.5 | ---- | ---- sse2 | 36.3 | 33.7 | 33.0 | 30.8 ssse3 | 34.8 | 33.9 | 31.5 | 30.8 sse4.1 | 34.8 | 33.5 | 33.0 | 29.4 ----------------------------------------------------------- ----------------------------------------------------------- 24 bit input -m option | 32 bit codec...