search for: _den

Displaying 9 results from an estimated 9 matches for "_den".

Did you mean: 3den
2004 Aug 06
2
Coredumps when --enable-sse is selected
...Williamette core Pentium 4 (1.6Ghz) system. I've tried both speex 1.1.5 release, and the current CVS (which self-IDs as 1.1.4), and the result is the same. I suspect some funk in the use of the SSE intrinsics macros. Backtrace: #0 0x40024594 in filter_mem2_10 (x=0x805f31c, _num=0x8061fb8, _den=0x8061fe4, y=0x806071c, N=160, ord=10, _mem=0x8062150) at xmmintrin.h:790 #1 0x400248b4 in filter_mem2 (x=0x805f31c, _num=0x8061fb8, _den=0x8061fe4, y=0x806071c, N=1, ord=0, _mem=0x8061fe4) at filters_sse.h:135 #2 0x40019d1e in nb_encode (state=0x805ebd0, vin=0x80582b4, bits=0xbfffe840)...
2004 Aug 06
3
[PATCH] Make SSE Run Time option.
...roach, so it's not too bad (SSE asm is 2x faster than x87). Note that unlike the previous version which had a kludge to work with order 8 (required for wideband), this version only works with order 10, so it will only work for narrowband. <p>void filter_mem2(float *x, float *_num, float *_den, float *y, int N, int ord, float *_mem) { __m128 num[3], den[3], mem[3]; int i; /* Copy numerator, denominator and memory to aligned xmm */ for (i=0;i<2;i++) { mem[i] = _mm_loadu_ps(_mem+4*i); num[i] = _mm_loadu_ps(_num+4*i+1); den[i] = _mm_loadu_ps(_den+4*i+1);...
2004 Aug 06
5
[PATCH] Make SSE Run Time option.
> Personally, I don't think much of PNI. The complex arithmetic stuff they > added sets you up for a lot of permute overhead that is inefficient -- > especially on a processor that is already weak on permute. In my opinion, Actually, the new instructions make it possible to do complex multiplies without the need to permute and separate the add and subtract. The really useful
2004 Aug 06
0
Coredumps when --enable-sse is selected
...> I've tried both speex 1.1.5 release, and the current CVS (which self-IDs as > 1.1.4), and the result is the same. > > I suspect some funk in the use of the SSE intrinsics macros. > > Backtrace: > > #0 0x40024594 in filter_mem2_10 (x=0x805f31c, _num=0x8061fb8, > _den=0x8061fe4, y=0x806071c, N=160, ord=10, > _mem=0x8062150) at xmmintrin.h:790 > #1 0x400248b4 in filter_mem2 (x=0x805f31c, _num=0x8061fb8, _den=0x8061fe4, > y=0x806071c, N=1, ord=0, > _mem=0x8061fe4) at filters_sse.h:135 > #2 0x40019d1e in nb_encode (state=0x805ebd0, vin=0x...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
...bw_lpc(float gamma, float *lpc_in, float *lpc_out, int order) { @@ -46,41 +48,548 @@ } } -#ifdef _USE_SSE -#include "filters_sse.h" -#else -void filter_mem2(float *x, float *num, float *den, float *y, int N, int ord, float *mem) + +void filter_mem2(float *x, float *_num, float *_den, float *y, int N, int ord, float *_mem) { - int i,j; - float xi,yi; - for (i=0;i<N;i++) - { - xi=x[i]; - y[i] = num[0]*xi + mem[0]; - yi=y[i]; - for (j=0;j<ord-1;j++) - { - mem[j] = mem[j+1] + num[j+1]*xi - den[j+1]*yi; - } - mem[ord-1] =...
2004 Jul 14
4
aspect ratio ?
Can someone enlighten me on what the status is of aspect ratio in theora is ? The ti structure has aspect_num and _den values, which I assume give the intended display aspect ratio (e.g. 4/3). The sample files on the bittorrent seem to say both values are 0 for all files. I'd think it should at least be made impossible to have a 0 as the denominator. The library doesn't check the values at all on encode_...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
...One other thing we noticed is that you tend to do a lot of for loop based copies: from your new filters_sse.h around the asm code for (i=0;i<12;i++) num[i]=den[i]=0; for (i=0;i<12;i++) mem[i]=0; for (i=0;i<ord;i++) { num[i]=_num[i+1]; den[i]=_den[i+1]; } for (i=0;i<ord;i++) mem[i]=_mem[i]; <<< asm code>>> for (i=0;i<ord;i++) _mem[i]=mem[i]; <p>could easily be reduced to memset(num,0,12); memset(den,0,12); memset(mem,0,12); memcpy(num,_num+1,ord); memcpy(den,_den+1,ord); memcpy(mem,...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
Jean-Marc, There is a big difference between SSE and SSEFP. The SSEFP means that the CPU supports the xmm registers. All Intel chips with SSE support do, however no current 32 bit AMD chips support the XMM registers. They will support the SSE instructions but not those registers. You are right about the SSE2 not being used. The AMD Opterons are the first AMD CPU's which support
2011 Nov 18
3
Permutations
Hi all, why factorial(150) shows the error out of range in 'gammafn'? I have to calculate the number of subset formed by 150 samples taking 10 at a time. How is this possible? best [[alternative HTML version deleted]]