similar to: [LLVMdev] assertion when -sse2 on x86-64

Displaying 20 results from an estimated 600 matches similar to: "[LLVMdev] assertion when -sse2 on x86-64"

2011 May 25
2
[LLVMdev] Floating Point Register Allocation in X86 backend
Right. But there are 8 registers on the floating point stack from ST0 to ST7 and I think llvm is only using ST0 to ST6 in some code fragments. Could this be because of the assumption that X86::FP registers run from X86::FP0 to X86:FP6 ? --Aparna On Wed, May 25, 2011 at 2:28 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote: > > On May 25, 2011, at 11:09 AM, aparna kotha wrote:
2011 May 25
2
[LLVMdev] Floating Point Register Allocation in X86 backend
Hi Guys, I was working on some floating point intensive benchmarks and realize that the floating point register allocation in llvm assumes that there are only 7 floating point registers in X86, whereas the hardware has 8. Line number 00266 assert(Reg >= X86::FP0 && Reg <= X86::FP6 && "Expected FP register!"); of X86FloatingPoint.cpp. Is there any reason for
2011 May 25
0
[LLVMdev] Floating Point Register Allocation in X86 backend
On May 25, 2011, at 11:09 AM, aparna kotha wrote: > Hi Guys, > > I was working on some floating point intensive benchmarks and realize that the floating point register allocation in llvm assumes that there are only 7 floating point registers in X86, whereas the hardware has 8. > > Line number > 00266 assert(Reg >= X86::FP0 && Reg <= X86::FP6 &&
2011 May 25
0
[LLVMdev] Floating Point Register Allocation in X86 backend
On May 25, 2011, at 12:08 PM, aparna kotha wrote: > Right. But there are 8 registers on the floating point stack from ST0 to ST7 and I think llvm is only using ST0 to ST6 in some code fragments. Could this be because of the assumption that X86::FP registers run from X86::FP0 to X86:FP6 ? Yes. My guess it that the code converting from FP to ST registers sometimes needs the extra stack slot.
2013 Jul 29
0
[PATCH 2/2] xv: speed up YV12 -> NV12 conversion using SSE2 if available
memcpy() goes from taking 45% to 66% of total function time, which translates to a 30% decrease in NVPutImage runtime. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/nouveau_xv.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/src/nouveau_xv.c b/src/nouveau_xv.c index 567e30c..5569b7c 100644 --- a/src/nouveau_xv.c +++
2013 Jul 31
0
[PATCH 2/2] xv: speed up YV12 -> NV12 conversion using SSE2 if available
On 2013-07-31 19:18 +0200, Ilia Mirkin wrote: > On Wed, Jul 31, 2013 at 1:16 PM, Sven Joachim <svenjoac at gmx.de> wrote: >> >> Unfortunately, immintrin.h is not available on most architectures, >> leading to build failures as can be seen on >> https://buildd.debian.org/status/package.php?p=xserver-xorg-video-nouveau. > > Sorry :( I thought that immintrin.h
2006 Apr 19
0
[LLVMdev] floating point exception and SSE2 instructions
On Wed, 19 Apr 2006 19:28:34 +0100 Simon Burton <simon at arrowtheory.com> wrote: > > >From what I remember, this is a bug in debian libc: > some floating point flags are set incorrectly causing SIGFPE. > Can't find the bug report ATM. Oh, it just showed up on numpy-discussion: http://sources.redhat.com/bugzilla/show_bug.cgi?id=10 """ #include
2006 Apr 19
2
[LLVMdev] floating point exception and SSE2 instructions
On Thu, 20 Apr 2006, Simon Burton wrote: >>> From what I remember, this is a bug in debian libc: >> some floating point flags are set incorrectly causing SIGFPE. >> Can't find the bug report ATM. > > Oh, it just showed up on numpy-discussion: > http://sources.redhat.com/bugzilla/show_bug.cgi?id=10 > > """ > #include <fenv.h> > void
2006 Apr 19
0
[LLVMdev] floating point exception and SSE2 instructions
On Wed, 19 Apr 2006 18:21:32 -0500 (CDT) Chris Lattner <sabre at nondot.org> wrote: > > > I don't see what this has to do with anything, but... Me neither. > > > Is there a way I can disable SSE instruction generation in LLVM ? > > Yes. Pass -mattr=-sse1,-sse2,-sse3 to lli or llc. Right, that fixed it. BTW: from the --help:
2009 Jun 10
0
[LLVMdev] [Patch] Fix SSE2 packing intrinsics return type
On Tue, Jun 9, 2009 at 2:58 PM, Nicolas Capens<nicolas at capens.net> wrote: > Please consider committing the attached patch. I believe the SSE2 packsswb, > packssdw and packuswb intrinsics have an incorrect return type. If we really wanted to do this, an AutoUpgrade patch would be necessary for backwards-compatibility. I'm not sure it's worth bothering. -Eli
2009 Jun 10
1
[LLVMdev] [Patch] Fix SSE2 packing intrinsics return type
On Jun 9, 2009, at 5:56 PM, Eli Friedman wrote: > On Tue, Jun 9, 2009 at 2:58 PM, Nicolas Capens<nicolas at capens.net> > wrote: >> Please consider committing the attached patch. I believe the SSE2 >> packsswb, >> packssdw and packuswb intrinsics have an incorrect return type. > > If we really wanted to do this, an AutoUpgrade patch would be > necessary
2009 Jun 10
1
[LLVMdev] [Patch] Fix SSE2 packing intrinsics return type
Hi Eli, What exactly do mean by an AutoUpgrade patch? I don't see how this could cause any issues with backward compatibility. People currently using these intrinsics need a bitcast of the result to avoid an assert, and with the patch applied the bitcast is no longer necessary. Cheers, Nicolas -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
2012 Jan 20
0
[LLVMdev] 128-bit PXOR requires SSE2
On Fri, Jan 20, 2012 at 2:47 PM, Nicolas Capens <nicolas.capens at gmail.com> wrote: > Hi all, > > I think I found a bug in LLVM 3.0: When compiling for a target without > SSE2 support, there were some 128-bit PXOR instructions in the generated > code. > > I traced it down to the following definition in X86InstrSSE.td: > >   def FsFLD0SS : I<0xEF, MRMInitReg,
2013 Nov 22
0
[LLVMdev] [clang] SSE2 intrinsics (emmintrin.h): _mm_movpi64_pi64 should be _mm_movpi64_epi64?
Hi there, I've recently encountered a piece of code that uses some SSE2 intrinsics and builds with gcc46, but not clang: clang can't find _mm_movpi64_epi64(), while gcc46 defines it in its lib/gcc46/gcc/.../4.6.3/include/emmintrin.h: extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_movpi64_epi64 (__m64 __A) { return _mm_set_epi64
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
Thank you Jorrit for your detailed answer. > On 18 May 2020, at 17:58, Jorrit Jongma via rsync <rsync at lists.samba.org> wrote: > > Well, don't get too excited, get_checksum1() (the function optimized > here) is not the great performance limiter in this case, it's > get_checksum2() and sum_update(), which will be using MD5. Certainly that all other functions using
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
On 2020-05-18 17:55:58 [+0200], Jorrit Jongma via rsync wrote: > I don't disagree that MD5 could (or even should) be replaced so it is > no longer the bottleneck in several real-world cases (including mine). > > However this patch is not for MD5 performance, rather for the rolling > checksum rsync uses to match blocks on existing files on both ends to > reduce transfer size.
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
On 2020-05-18 21:55:13 [+0200], Jorrit Jongma wrote: > What do you base this on? So my memory was wrong. SSE2 is supported by all x86-64bit CPUs. Sorry for that. > would imply that SSSE3 is enabled out of the box on builds on machines > that support it, this is not the case (it certainly isn't on my Ubuntu > box). It would be preferred to detect this at runtime but getting that
2020 May 21
0
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
On Tue, May 19, 2020 at 7:29 AM Jorrit Jongma via rsync < rsync at lists.samba.org> wrote: > I've read up some more on the subject, and it seems the proper way to do > this with GCC is g++ and target attributes. I've refactored the patch that > way, and it indeed uses SSSE3 automatically on supporting CPUs, regardless > of the build host, so this should be ideal both for
2008 Nov 26
1
SSE2 code won't compile in VC
Jean-Marc, At least VS2005 (what I'm using) won't compile resample_sse.h with _USE_SSE2 defined because it refuses to cast __m128 to __m128d and vice versa. While there are intrinsics to do the casts, I thought it would be simpler to just use an intrinsic that accomplishes the same thing without all the casting. Thanks, --John @@ -91,7 +91,7 @@ static inline double
2014 Mar 11
2
x86_64 SSE2/SSE41 optim not used
Hi Guys, In stream_decoder.c when assigning lpc restore function, only IA32 processor benefits from SS2 and SSE4.1 optimization. Shouldn't it be the case for x86_64 processor as well ? Thanks, -- Olivier TRISTAN uvi.net -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/flac-dev/attachments/20140311/1d49b5c2/attachment.htm