thr3ads.net - search: "b32"

Displaying 20 results from an estimated 90 matches for "b32".

Did you mean: 32

[Fast Int64 3/4] Explicitly cast results of silk OPUS_FAST_INT64 macros back to opus_int32.

2015 Nov 16

[Fast Int64 3/4] Explicitly cast results of silk OPUS_FAST_INT64 macros back to opus_int32.

--- silk/macros.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/silk/macros.h b/silk/macros.h index 1ba614a..e1e05b9 100644 --- a/silk/macros.h +++ b/silk/macros.h @@ -48,14 +48,14 @@ POSSIBILITY OF SUCH DAMAGE. /* (a32 * (opus_int32)((opus_int16)(b32))) >> 16 output have to be 32bit int */ #if OPUS_FAST_INT64 -#define silk_SMULWB(a32, b32) (((a32) * (opus_int64)((opus_int16)(b32))) >> 16) +#define silk_SMULWB(a32, b32) ((opus_int32)(((a32) * (opus_int64)((opus_int16)(b32))) >> 16)) #else #define silk_S...

[Fast Int64 1/4] Move OPUS_FAST_INT64 definition to celt/arch.h.

2015 Nov 16

[Fast Int64 1/4] Move OPUS_FAST_INT64 definition to celt/arch.h.

...2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/celt/arch.h b/celt/arch.h index 9f74ddd..670527b 100644 --- a/celt/arch.h +++ b/celt/arch.h @@ -78,6 +78,11 @@ static OPUS_INLINE void _celt_fatal(const char *str, const char *file, int line) #define UADD32(a,b) ((a)+(b)) #define USUB32(a,b) ((a)-(b)) +/* Set this if opus_int64 is a native type of the CPU. */ +/* Assume that all LP64 architectures have fast 64-bit types; also x86_64 (which can be ILP32 for x32) + and Win64 (which is LLP64). */ +#define OPUS_FAST_INT64 (defined(__LP64__) || defined(__x86_64__) || defined(_WIN64...

[PATCH] Create OPUS_FAST_INT64 macro, to abstract conditions where opus_int64 should be used.

2015 Aug 04

[PATCH] Create OPUS_FAST_INT64 macro, to abstract conditions where opus_int64 should be used.

...#define opus_unlikely(x) (!!(x)) #endif +/* Set this if opus_int64 is a native type of the CPU. */ +#define OPUS_FAST_INT64 (defined(__x86_64__) || defined(__LP64__) || defined(_WIN64)) + /* This is an OPUS_INLINE header file for general platform. */ /* (a32 * (opus_int32)((opus_int16)(b32))) >> 16 output have to be 32bit int */ -#if defined(__x86_64__) || defined(__LP64__) || defined(_WIN64) +#if OPUS_FAST_INT64 #define silk_SMULWB(a32, b32) (((a32) * (opus_int64)((opus_int16)(b32))) >> 16) #else #define silk_SMULWB(a32, b32) ((((a32) >> 16...

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

...with a 640M GPU. > > > > PTX Code (for a mandelbrot calculation): > > > > // > > // Generated by LLVM NVPTX Back-End > > // > > > > .version 3.1 > > .target sm_10, texmode_independent > > .address_size 64 > > > > .func (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_X > > ( > > > > ) > > ; > > .func (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_Y > > ( > > > > ) > > ; > > .func (.reg .b32 func_retval0) INT_PTX_SREG_TID_X > > ( > > > > ) > > ;...

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

2015 Nov 21

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

--- configure.ac | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index f52d2c2..e1a6e9b 100644 --- a/configure.ac +++ b/configure.ac @@ -190,7 +190,7 @@ AC_ARG_ENABLE([rtcd], [enable_rtcd=yes]) AC_ARG_ENABLE([intrinsics], - [AS_HELP_STRING([--disable-intrinsics], [Disable intrinsics optimizations for ARM(float) X86(fixed)])],, +

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

...ndelbrot calculation): > > > > > > // > > > // Generated by LLVM NVPTX Back-End > > > // > > > > > > .version 3.1 > > > .target sm_10, texmode_independent > > > .address_size 64 > > > > > > .func (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_X > > > ( > > > > > > ) > > > ; > > > .func (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_Y > > > ( > > > > > > ) > > > ; > > > .func (.reg .b32 func_retval0) INT_PTX_S...

[Bug 78161] New: [NV96] Artifacts in output of fragment program containing not unrolled loops with conditional break

2014 May 01

[Bug 78161] New: [NV96] Artifacts in output of fragment program containing not unrolled loops with conditional break

https://bugs.freedesktop.org/show_bug.cgi?id=78161 Priority: medium Bug ID: 78161 Assignee: nouveau at lists.freedesktop.org Summary: [NV96] Artifacts in output of fragment program containing not unrolled loops with conditional break Severity: normal Classification: Unclassified OS: Linux (All)

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

...Code (for a mandelbrot calculation): >> > >> > // >> > // Generated by LLVM NVPTX Back-End >> > // >> > >> > .version 3.1 >> > .target sm_10, texmode_independent >> > .address_size 64 >> > >> > .func (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_X >> > ( >> > >> > ) >> > ; >> > .func (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_Y >> > ( >> > >> > ) >> > ; >> > .func (.reg .b32 func_retval0) INT_PTX_SREG_TID_X >&g...

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

...; > > >> > // > >> > // Generated by LLVM NVPTX Back-End > >> > // > >> > > >> > .version 3.1 > >> > .target sm_10, texmode_independent > >> > .address_size 64 > >> > > >> > .func (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_X > >> > ( > >> > > >> > ) > >> > ; > >> > .func (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_Y > >> > ( > >> > > >> > ) > >> > ; > >> > .func...

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

...I try load the module using CUDA, I get an error: CUDA_ERROR_NO_BINARY_FOR_GPU. I'm running this on a 2012 MBP with a 640M GPU. PTX Code (for a mandelbrot calculation): // // Generated by LLVM NVPTX Back-End // .version 3.1 .target sm_10, texmode_independent .address_size 64 .func (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_X ( ) ; .func (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_Y ( ) ; .func (.reg .b32 func_retval0) INT_PTX_SREG_TID_X ( ) ; .func (.reg .b32 func_retval0) INT_PTX_SREG_NTID_X ( ) ; .func (.reg .b32 func_retval0) INT_PTX_SREG_NTID_Y ( ) ; // .globl...

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

...BINARY_FOR_GPU. I'm running this on a 2012 MBP > with a 640M GPU. > > PTX Code (for a mandelbrot calculation): > > // > // Generated by LLVM NVPTX Back-End > // > > .version 3.1 > .target sm_10, texmode_independent > .address_size 64 > > .func聽 (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_X > ( > > ) > ; > .func聽 (.reg .b32 func_retval0) INT_PTX_SREG_CTAID_Y > ( > > ) > ; > .func聽 (.reg .b32 func_retval0) INT_PTX_SREG_TID_X > ( > > ) > ; > .func聽 (.reg .b32 func_retval0) INT_PTX_SREG_NTID_X > ( &g...

[Patch]01-Add ARM5E macros

2013 May 17

[Patch]01-Add ARM5E macros

.../macros.h +++ b/silk/macros.h @@ -32,6 +32,10 @@ POSSIBILITY OF SUCH DAMAGE. #include "config.h" #endif +#ifdef ARM5E_ASM +#include "macros_arm5e.h" +#else /* Generic macro */ + /* This is an inline header file for general platform. */ /* (a32 * (opus_int32)((opus_int16)(b32))) >> 16 output have to be 32bit int */ @@ -134,5 +138,7 @@ static inline opus_int32 silk_CLZ32(opus_int32 in32) (*((Matrix_base_adr) + ((row)+(M)*(column)))) #endif +#endif + #endif /* SILK_MACROS_H */ diff --git a/silk/macros_arm5e.h b/silk/macros_arm5e.h new file mode 100644 ind...

[LLVMdev] Example for usage of LLVM/Clang/libclc

2015 Feb 03

[LLVMdev] Example for usage of LLVM/Clang/libclc

...ure if the PTX code I am generating is correct (is the one that is supposed to be generated). For example, currently, In OpenCL : get_global_id(0) translates to In LLVM : %call = tail call i32 @get_global_id(i32 0) which translates to In PTX: // .globl blur2d .func (.param .b32 func_retval0) get_global_id ( .param .b32 get_global_id_param_0 ) ; mov.u32 %r2, 0; .param .b32 param0; st.param.b32 [param0+0], %r2; .param .b32 retval0; call.uni (retval0), get_global_id, ( param0 );...

Fermi+ shader header docs

2015 May 21

Fermi+ shader header docs

On Thu, May 21, 2015 at 10:05 AM, Robert Morell <rmorell at nvidia.com> wrote: > Hi Ilia, > > On Sat, May 02, 2015 at 12:34:21PM -0400, Ilia Mirkin wrote: >> Hi, >> >> As I'm looking to add some support to nouveau for features like atomic >> counters and images, I'm running into some confusion about what the >> first word of the shader header

[Fast Int64 4/4] Add OPUS_FAST_INT64 definition of silk_SMULWT.

2015 Nov 16

[Fast Int64 4/4] Add OPUS_FAST_INT64 definition of silk_SMULWT.

--- silk/macros.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/silk/macros.h b/silk/macros.h index e1e05b9..7cefedc 100644 --- a/silk/macros.h +++ b/silk/macros.h @@ -61,7 +61,11 @@ POSSIBILITY OF SUCH DAMAGE. #endif /* (a32 * (b32 >> 16)) >> 16 */ +#if OPUS_FAST_INT64 +#define silk_SMULWT(a32, b32) ((opus_int32)(((a32) * (opus_int64)((b32) >> 16)) >> 16)) +#else #define silk_SMULWT(a32, b32) (((a32) >> 16) * ((b32) >> 16) + ((((a32) & 0x0000FFFF) * ((b32) >&gt...

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

2012 Jul 10

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

...200, i32 242, i32 285, i32 327} > llc -march=nvptx64 test.ll -o test.ptx > cat test.ptx // // Generated by LLVM NVPTX Back-End // .version 3.0 .target sm_10, texmode_independent .address_size 64 // .globl _Z5__anyi .visible .global .align 4 .b8 __local_depot0[8]; .func (.reg .b32 func_retval0) _Z5__anyi( .reg .b32 _Z5__anyi_param_0 ) // @_Z5__anyi { .reg .b64 %SP; .reg .b64 %SPL; .reg .pred %p<396>; .reg .s16 %rc<396>; .reg .s16 %rs<396>; .reg .s32 %r<396>; .reg .s64 %rl&lt...

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

Does this give correct results for special floats (0, infs)? We tried to improve (for single floats) x86 rcp in llvmpipe with newton-raphson, but unfortunately not being able to give correct results for these two cases (without even more additional code) meant it got all disabled in the end (you can still see that code in the driver) since the problems are at least as bad as those due to bad

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

2012 Jul 10

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

...st.ptx > > cat test.ptx > // > // Generated by LLVM NVPTX Back-End > // > > .version 3.0 > .target sm_10, texmode_independent > .address_size 64 > > > // .globl _Z5__anyi > .visible .global .align 4 .b8 __local_depot0[8]; > > .func (.reg .b32 func_retval0) _Z5__anyi( > .reg .b32 _Z5__anyi_param_0 > ) // @_Z5__anyi > { > .reg .b64 %SP; > .reg .b64 %SPL; > .reg .pred %p<396>; > .reg .s16 %rc<396>; > .reg .s16 %rs<396>; >...

[LLVMdev] NVPTX: __iAtomicCAS support ?

2012 May 16

[LLVMdev] NVPTX: __iAtomicCAS support ?

...d ret void } declare ptx_device i32 @_Z12__iAtomicCASPiii(i32*, i32, i32) CODEGEN ========= dmikushin at hp2:~> llc < kernelgen_monitor.ll -march=nvptx -mcpu=sm_20 // // Generated by LLVM NVPTX Back-End // .version 3.0 .target sm_20, texmode_independent .address_size 32 .func (.param .b32 func_retval0) _Z12__iAtomicCASPiii ( .param .b32 _Z12__iAtomicCASPiii_param_0, .param .b32 _Z12__iAtomicCASPiii_param_1, .param .b32 _Z12__iAtomicCASPiii_param_2 ) ; Not Implemented UNREACHABLE executed at /tmp/rpmbuild_debug/BUILD/llvm/build/include/llvm/Target/TargetLowering.h:1249! 0 libLLV...

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

2012 Jul 10

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

...o test.ptx > > cat test.ptx > // > // Generated by LLVM NVPTX Back-End > // > > .version 3.0 > .target sm_10, texmode_independent > .address_size 64 > > > // .globl _Z5__anyi > .visible .global .align 4 .b8 __local_depot0[8]; > > .func (.reg .b32 func_retval0) _Z5__anyi( > .reg .b32 _Z5__anyi_param_0 > ) // @_Z5__anyi > { > .reg .b64 %SP; > .reg .b64 %SPL; > .reg .pred %p<396>; > .reg .s16 %rc<396>; > .reg .s16 %rs<396>; >...

search for: b32