thr3ads.net - search: "clz"

Displaying 20 results from an estimated 28 matches for "clz".

Did you mean: cl

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 15

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

Hi, On ARMv6T2 this turns cttz into rbit, clz instead of the 4 instruction sequence it is now. I'm not sure if adding RBIT to ARMISD and doing this optimization in the legalize pass is the best option, but the only better way I could think of doing it was to add a bitreverse intrinsic to llvm ir, which itself might not be the best option...

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 15

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > Hi, > > On ARMv6T2 this turns cttz into rbit, clz instead of the 4 > instruction sequence it is now. > > I'm not sure if adding RBIT to ARMISD and doing this optimization in > the legalize pass is the best option, but the only better way I > could think of doing it was to add a bitreverse intrinsic to llvm > ir, which...

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 15

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

On Fri, Jan 15, 2010 at 6:03 PM, Chris Lattner <clattner at apple.com> wrote: > > On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > >> Hi, >> >> On ARMv6T2 this turns cttz into rbit, clz instead of the 4 >> instruction sequence it is now. >> >> I'm not sure if adding RBIT to ARMISD and doing this optimization in >> the legalize pass is the best option, but the only better way I >> could think of doing it was to add a bitreverse intrinsic to llvm &g...

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 15

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

On 15 Jan 2010, at 18:03, Chris Lattner wrote: > On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > >> Other targets that I know of that could potentially benefit from >> this optimization being global (that have a clz and bitreverse >> instruction but not ctz) are AVR32 and C64x, neither of which llvm >> has backends for yet. > > When/if another target wants this, we could add a ISD::RBIT operation, > it doesn't need to be added at the llvm ir level, The XCore also has ctlz and b...

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 15

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

...11:37 AM, Richard Osborne wrote: > > On 15 Jan 2010, at 18:03, Chris Lattner wrote: > >> On Jan 14, 2010, at 10:13 PM, David Conrad wrote: >> >>> Other targets that I know of that could potentially benefit from >>> this optimization being global (that have a clz and bitreverse >>> instruction but not ctz) are AVR32 and C64x, neither of which llvm >>> has backends for yet. >> >> When/if another target wants this, we could add a ISD::RBIT >> operation, >> it doesn't need to be added at the llvm ir level, > &...

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 18

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

...t; >> >> On 15 Jan 2010, at 18:03, Chris Lattner wrote: >> >>> On Jan 14, 2010, at 10:13 PM, David Conrad wrote: >>> >>>> Other targets that I know of that could potentially benefit from >>>> this optimization being global (that have a clz and bitreverse >>>> instruction but not ctz) are AVR32 and C64x, neither of which llvm >>>> has backends for yet. >>> >>> When/if another target wants this, we could add a ISD::RBIT >>> operation, >>> it doesn't need to be added at...

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 19

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

On Jan 15, 2010, at 10:03 AM, Chris Lattner wrote: > > When/if another target wants this, we could add a ISD::RBIT operation, > it doesn't need to be added at the llvm ir level, Blackfin can add with backwards carry, essentially doing (rbit (add (rbit a), (rbit b))) This is used for FFTs. I wasn't hoping to be able to pattern-match something so complicated.

Why LLVM cannot optimize this?

2016 Mar 02

Why LLVM cannot optimize this?

Hi, Yes SCEV is pretty limited on this aspect. This kind of code can trigger LLVM to explode in time/memory: https://llvm.org/bugs/show_bug.cgi?id=18606 <https://llvm.org/bugs/show_bug.cgi?id=18606> See also this llvm-dev thread: SCEV implementation and limitations, do we need "pow"? : http://lists.llvm.org/pipermail/llvm-dev/2014-February/070062.html CC: Sanjoy who may have an

Git branch with compiling fixes for win32

2012 May 04

Git branch with compiling fixes for win32

El 03/05/12 12:19, Miroslav Lichvar escribi?: > Hi Josh, > > nice to see you here again. > > On Wed, Apr 25, 2012 at 04:26:05PM -0700, Josh Coalson wrote: >> (Jumping in again, maybe at the wrong point since this doesn't seem >> to involve encoding, but here goes.) >> >> Miroslav's patches have always been high-quality for sure. But >>

[RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

2019 Aug 14

[RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

...n but is not aware of all the bits that can be optimized. I'm dealing with the fact that it is pretty hard to select some patterns of DAG nodes in order to replace them with an optimal machine equivalent machine instruction. Take for intsance the count leading zeros operation: uint32_t clz (uint32_t x) { for (int count = 0; count < 32; count++ ) { if ((x << count) < 0) return count; } return 32; } It needs a loop to be performed and that makes it difficult to be lowered because it goes through several ba...

Optimisation Help

2010 Feb 12

Optimisation Help

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> </head> <body bgcolor="#ffffff" text="#000000"> Hi,<br> <br> I have been looking into optimising the CELT decoder for speed to make it acceptable for

More than 150 MB / second encoding ?

2014 Mar 16

More than 150 MB / second encoding ?

Hello, Is there some version of FLAC that allows very very fast encoding (i.e. able to process at least 150 MB / second of .wav input data on a standard computer : laptop computer, Core i5/i7, Windows 7 64 bit, 8 GB RAM) ? (It's ok to have a compression ratio which is a little bit lower than traditionnal FLAC) I'm looking for something which is between FLAC (very good ratio, slower than

More than 150 MB / second encoding + "nanozip"

2014 Mar 17

More than 150 MB / second encoding + "nanozip"

...can > >> be downloaded from: > >> > >http://sourceforge.net/p/cuetoolsnet/code/ci/default/tree/CUETools. > >Codecs.FLACCL/flac.cl?format=raw > >> > >> For a slightly smaller size, you can change the line "cbits = > >min(cbits, > >> clz(order + 1) + 1 - shared.task.obits);" to "cbits = min(cbits, > >clz(order) > >> + 1 - shared.task.obits);". I checked with the author to make > >sure this is > >> correct. > >> > >> Also: BIG NOTE: The best results that I could get on m...

[Patch]01-Add ARM5E macros

2013 May 17

[Patch]01-Add ARM5E macros

...h" #endif +#ifdef ARM5E_ASM +#include "macros_arm5e.h" +#else /* Generic macro */ + /* This is an inline header file for general platform. */ /* (a32 * (opus_int32)((opus_int16)(b32))) >> 16 output have to be 32bit int */ @@ -134,5 +138,7 @@ static inline opus_int32 silk_CLZ32(opus_int32 in32) (*((Matrix_base_adr) + ((row)+(M)*(column)))) #endif +#endif + #endif /* SILK_MACROS_H */ diff --git a/silk/macros_arm5e.h b/silk/macros_arm5e.h new file mode 100644 index 0000000..6f47ec4 --- /dev/null +++ b/silk/macros_arm5e.h @@ -0,0 +1,197 @@ +/********************...

[PATCH 1/2] Fix mistyped variable name

2013 May 25

[PATCH 1/2] Fix mistyped variable name

...changed, 1 insertion(+), 1 deletion(-) diff --git a/src/libFLAC/include/private/bitmath.h b/src/libFLAC/include/private/bitmath.h index 42ce639..e5c7695 100644 --- a/src/libFLAC/include/private/bitmath.h +++ b/src/libFLAC/include/private/bitmath.h @@ -74,7 +74,7 @@ static inline unsigned int FLAC__clz_uint32(FLAC__uint32 v) { /* Never used with input 0 */ #if defined(__INTEL_COMPILER) - return _bit_scan_reverse(n) ^ 31U; + return _bit_scan_reverse(v) ^ 31U; #elif defined(__GNUC__) && (__GNUC__ >= 4 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) /* This will translate...

PATCH for bitmath.h: 1 typo, 1 warning

2013 Aug 16

PATCH for bitmath.h: 1 typo, 1 warning

...0 +++ b\src\libFLAC\include\private\bitmath.h 2013-08-14 10:20:51.484053700 +0400 @@ -78,12 +78,12 @@ return _bit_scan_reverse(v) ^ 31U; #elif defined(__GNUC__) && (__GNUC__ >= 4 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) /* This will translate either to (bsr ^ 31U), clz , ctlz, cntlz, lzcnt depending on - * -march= setting or to a software rutine in exotic machines. */ + * -march= setting or to a software routine in exotic machines. */ return __builtin_clz(v); #elif defined(_MSC_VER) && (_MSC_VER >= 1400) - FLAC__uint32 idx; + unsigned lon...

[ANNOUNCE] xf86-video-ati 7.9.0

2017 Mar 16

[ANNOUNCE] xf86-video-ati 7.9.0

...Thanks to everybody who contributed to this release in any way! Emil Velikov (1): autogen.sh: use quoted string variables Jammy Zhou (1): Use render node for DRI3 if available Jochen Rollwagen (3): fix build for xserver < 1.13 Calculate log base 2 in radeon.h based on clz for all platforms Fix build for XServer 1.13 Michel Dänzer (38): Post-release version bump Use DRM_MODE_PAGE_FLIP_TARGET_ABSOLUTE/RELATIVE flags when available Enable glamor by default with >= R600 and Xorg >= 1.18.3 Don't install Flush/EventCallback for GPU...

[LLVMdev] LLVM Weekly - #1, Jan 6th 2014

2014 Jan 06

[LLVMdev] LLVM Weekly - #1, Jan 6th 2014

...tream of errors covering the rest of the input file. It now recognises this error and presents a single error <http://llvm-reviews.chandlerc.com/rL198540>. ## Other project commits * Building libclc with LLVM 3.5 was fixed <http://llvm-reviews.chandlerc.com/rL198167> * In libcxx, the clz/ctz family of functions are now implemented for when building with Visual C++ on Win32 or Win64. <http://llvm-reviews.chandlerc.com/rL198481> * The PollyCananicalizePass was introduced. This is a ModulePass that schedules the Polly canonicalization passes. <http://llvm-reviews.chandlerc.c...

Reasoning about known bits of the absolute value of a signed integer

2016 May 03

Reasoning about known bits of the absolute value of a signed integer

I'm trying to reason about how to find certain bit positions of the absolute value of a given integer value. Specifically, I want to know the highest possibly set bit and lowest possibly set bit of the absolute value, in order to find the range between the two. Note that I'm specifically trying to be as conservative as possible. This is what I have so far: If the sign bit of the

[PATCH 2/2] bitmath: Finish up optimizations

2012 May 09

[PATCH 2/2] bitmath: Finish up optimizations

...endswap.h" /* Things should be fastest when this matches the machine word size */ -/* WATCHOUT: if you change this you must also change the following #defines down to COUNT_ZERO_MSBS below to match */ +/* WATCHOUT: if you change this you must also change the following #defines down to FLAC__clz_uint32 below to match */ /* WATCHOUT: there are a few places where the code will not work unless uint32_t is >= 32 bits wide */ /* also, some sections currently only have fast versions for 4 or 8 bytes per word */ #define FLAC__BYTES_PER_WORD 4 /* sizeof uint32_t */ @@ -56,27 +56,6...

search for: clz