Displaying 20 results from an estimated 600 matches similar to: "[LLVMdev] assertion when -sse2 on x86-64"
2011 May 25
2
[LLVMdev] Floating Point Register Allocation in X86 backend
Right. But there are 8 registers on the floating point stack from ST0 to ST7
and I think llvm is only using ST0 to ST6 in some code fragments. Could this
be because of the assumption that X86::FP registers run from X86::FP0 to
X86:FP6 ?
--Aparna
On Wed, May 25, 2011 at 2:28 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote:
>
> On May 25, 2011, at 11:09 AM, aparna kotha wrote:
2011 May 25
2
[LLVMdev] Floating Point Register Allocation in X86 backend
Hi Guys,
I was working on some floating point intensive benchmarks and realize that
the floating point register allocation in llvm assumes that there are only 7
floating point registers in X86, whereas the hardware has 8.
Line number
00266 assert(Reg >= X86::FP0 && Reg <= X86::FP6 && "Expected FP register!");
of X86FloatingPoint.cpp.
Is there any reason for
2011 May 25
0
[LLVMdev] Floating Point Register Allocation in X86 backend
On May 25, 2011, at 11:09 AM, aparna kotha wrote:
> Hi Guys,
>
> I was working on some floating point intensive benchmarks and realize that the floating point register allocation in llvm assumes that there are only 7 floating point registers in X86, whereas the hardware has 8.
>
> Line number
> 00266 assert(Reg >= X86::FP0 && Reg <= X86::FP6 &&
2011 May 25
0
[LLVMdev] Floating Point Register Allocation in X86 backend
On May 25, 2011, at 12:08 PM, aparna kotha wrote:
> Right. But there are 8 registers on the floating point stack from ST0 to ST7 and I think llvm is only using ST0 to ST6 in some code fragments. Could this be because of the assumption that X86::FP registers run from X86::FP0 to X86:FP6 ?
Yes. My guess it that the code converting from FP to ST registers sometimes needs the extra stack slot.
2013 Jul 29
0
[PATCH 2/2] xv: speed up YV12 -> NV12 conversion using SSE2 if available
memcpy() goes from taking 45% to 66% of total function time, which
translates to a 30% decrease in NVPutImage runtime.
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
src/nouveau_xv.c | 33 ++++++++++++++++++++++++++-------
1 file changed, 26 insertions(+), 7 deletions(-)
diff --git a/src/nouveau_xv.c b/src/nouveau_xv.c
index 567e30c..5569b7c 100644
--- a/src/nouveau_xv.c
+++
2013 Jul 31
0
[PATCH 2/2] xv: speed up YV12 -> NV12 conversion using SSE2 if available
On 2013-07-31 19:18 +0200, Ilia Mirkin wrote:
> On Wed, Jul 31, 2013 at 1:16 PM, Sven Joachim <svenjoac at gmx.de> wrote:
>>
>> Unfortunately, immintrin.h is not available on most architectures,
>> leading to build failures as can be seen on
>> https://buildd.debian.org/status/package.php?p=xserver-xorg-video-nouveau.
>
> Sorry :( I thought that immintrin.h
2006 Apr 19
0
[LLVMdev] floating point exception and SSE2 instructions
On Wed, 19 Apr 2006 19:28:34 +0100
Simon Burton <simon at arrowtheory.com> wrote:
>
> >From what I remember, this is a bug in debian libc:
> some floating point flags are set incorrectly causing SIGFPE.
> Can't find the bug report ATM.
Oh, it just showed up on numpy-discussion:
http://sources.redhat.com/bugzilla/show_bug.cgi?id=10
"""
#include
2006 Apr 19
2
[LLVMdev] floating point exception and SSE2 instructions
On Thu, 20 Apr 2006, Simon Burton wrote:
>>> From what I remember, this is a bug in debian libc:
>> some floating point flags are set incorrectly causing SIGFPE.
>> Can't find the bug report ATM.
>
> Oh, it just showed up on numpy-discussion:
> http://sources.redhat.com/bugzilla/show_bug.cgi?id=10
>
> """
> #include <fenv.h>
> void
2006 Apr 19
0
[LLVMdev] floating point exception and SSE2 instructions
On Wed, 19 Apr 2006 18:21:32 -0500 (CDT)
Chris Lattner <sabre at nondot.org> wrote:
>
>
> I don't see what this has to do with anything, but...
Me neither.
>
> > Is there a way I can disable SSE instruction generation in LLVM ?
>
> Yes. Pass -mattr=-sse1,-sse2,-sse3 to lli or llc.
Right, that fixed it.
BTW:
from the --help:
2009 Jun 10
0
[LLVMdev] [Patch] Fix SSE2 packing intrinsics return type
On Tue, Jun 9, 2009 at 2:58 PM, Nicolas Capens<nicolas at capens.net> wrote:
> Please consider committing the attached patch. I believe the SSE2 packsswb,
> packssdw and packuswb intrinsics have an incorrect return type.
If we really wanted to do this, an AutoUpgrade patch would be
necessary for backwards-compatibility. I'm not sure it's worth
bothering.
-Eli
2009 Jun 10
1
[LLVMdev] [Patch] Fix SSE2 packing intrinsics return type
On Jun 9, 2009, at 5:56 PM, Eli Friedman wrote:
> On Tue, Jun 9, 2009 at 2:58 PM, Nicolas Capens<nicolas at capens.net>
> wrote:
>> Please consider committing the attached patch. I believe the SSE2
>> packsswb,
>> packssdw and packuswb intrinsics have an incorrect return type.
>
> If we really wanted to do this, an AutoUpgrade patch would be
> necessary
2009 Jun 10
1
[LLVMdev] [Patch] Fix SSE2 packing intrinsics return type
Hi Eli,
What exactly do mean by an AutoUpgrade patch?
I don't see how this could cause any issues with backward compatibility.
People currently using these intrinsics need a bitcast of the result to
avoid an assert, and with the patch applied the bitcast is no longer
necessary.
Cheers,
Nicolas
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
2012 Jan 20
0
[LLVMdev] 128-bit PXOR requires SSE2
On Fri, Jan 20, 2012 at 2:47 PM, Nicolas Capens
<nicolas.capens at gmail.com> wrote:
> Hi all,
>
> I think I found a bug in LLVM 3.0: When compiling for a target without
> SSE2 support, there were some 128-bit PXOR instructions in the generated
> code.
>
> I traced it down to the following definition in X86InstrSSE.td:
>
> def FsFLD0SS : I<0xEF, MRMInitReg,
2013 Nov 22
0
[LLVMdev] [clang] SSE2 intrinsics (emmintrin.h): _mm_movpi64_pi64 should be _mm_movpi64_epi64?
Hi there,
I've recently encountered a piece of code that uses some SSE2 intrinsics
and builds with gcc46, but not clang: clang can't find _mm_movpi64_epi64(),
while gcc46 defines it in its lib/gcc46/gcc/.../4.6.3/include/emmintrin.h:
extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_movpi64_epi64 (__m64 __A)
{
return _mm_set_epi64
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
Thank you Jorrit for your detailed answer.
> On 18 May 2020, at 17:58, Jorrit Jongma via rsync <rsync at lists.samba.org> wrote:
>
> Well, don't get too excited, get_checksum1() (the function optimized
> here) is not the great performance limiter in this case, it's
> get_checksum2() and sum_update(), which will be using MD5.
Certainly that all other functions using
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
On 2020-05-18 17:55:58 [+0200], Jorrit Jongma via rsync wrote:
> I don't disagree that MD5 could (or even should) be replaced so it is
> no longer the bottleneck in several real-world cases (including mine).
>
> However this patch is not for MD5 performance, rather for the rolling
> checksum rsync uses to match blocks on existing files on both ends to
> reduce transfer size.
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
On 2020-05-18 21:55:13 [+0200], Jorrit Jongma wrote:
> What do you base this on?
So my memory was wrong. SSE2 is supported by all x86-64bit CPUs. Sorry
for that.
> would imply that SSSE3 is enabled out of the box on builds on machines
> that support it, this is not the case (it certainly isn't on my Ubuntu
> box). It would be preferred to detect this at runtime but getting that
2020 May 21
0
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
On Tue, May 19, 2020 at 7:29 AM Jorrit Jongma via rsync <
rsync at lists.samba.org> wrote:
> I've read up some more on the subject, and it seems the proper way to do
> this with GCC is g++ and target attributes. I've refactored the patch that
> way, and it indeed uses SSSE3 automatically on supporting CPUs, regardless
> of the build host, so this should be ideal both for
2008 Nov 26
1
SSE2 code won't compile in VC
Jean-Marc,
At least VS2005 (what I'm using) won't compile resample_sse.h with
_USE_SSE2 defined because it refuses to cast __m128 to __m128d and vice
versa. While there are intrinsics to do the casts, I thought it would be
simpler to just use an intrinsic that accomplishes the same thing
without all the casting. Thanks,
--John
@@ -91,7 +91,7 @@ static inline double
2014 Mar 11
2
x86_64 SSE2/SSE41 optim not used
Hi Guys,
In stream_decoder.c when assigning lpc restore function,
only IA32 processor benefits from SS2 and SSE4.1 optimization.
Shouldn't it be the case for x86_64 processor as well ?
Thanks,
--
Olivier TRISTAN
uvi.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/flac-dev/attachments/20140311/1d49b5c2/attachment.htm