thr3ads.net - search: "branchless"

Displaying 9 results from an estimated 9 matches for "branchless".

Did you mean: branches

[RFC] carry-less multiplication instruction

2020 Jul 05

[RFC] carry-less multiplication instruction

...ulof a number with itself inserts zeroes between each input bit. This can be useful for generatingMorton code [23].clmulof a number with -1 calculates the prefix XOR operation. This can be useful for decodinggray codes.Another application of XOR prefix sums calculated withclmulis branchless tracking of quotedstrings in high-performance parsers. [16]Carry-less multiply can also be used to implement Erasure code efficiently. [14]==clmul lowering without hardware support== A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplica...

[RFC] carry-less multiplication instruction

2020 Jul 09

[RFC] carry-less multiplication instruction

...lof a number with itself inserts zeroes between each input bit. This can be useful for generatingMorton code [23]. >> >> clmulof a number with -1 calculates the prefix XOR operation. This can be useful for decodinggray codes.Another application of XOR prefix sums calculated withclmulis branchless tracking of quotedstrings in high-performance parsers. [16] >> >> Carry-less multiply can also be used to implement Erasure code efficiently. [14] >> >> ==clmul lowering without hardware support== >> A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplic...

RFC: Should SmallVectors be smaller?

2018 Jun 22

RFC: Should SmallVectors be smaller?

>> On Jun 21, 2018, at 18:38, Chris Lattner <clattner at nondot.org> wrote: >> >> >> >> On Jun 21, 2018, at 9:52 AM, Duncan P. N. Exon Smith via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> I've been curious for a while whether SmallVectors have the right speed/memory tradeoff. It would be straightforward to shave off a couple of

An alternative algorithm for `which()`

2023 Apr 05

An alternative algorithm for `which()`

...ernative C algorithm for `which()` which uses less memory and is often faster in many real life scenarios. I've documented it in full on the bugzilla page, with many examples: https://bugs.r-project.org/show_bug.cgi?id=18495 The short version is that the performance comes from making the loops branchless, which seems to be particularly helpful for `which()`. With `which(x)`, I'd argue that branches are often hard for the compiler to predict since in most real data there is typically no indication that if the i-th element of `x` is `TRUE`, then the i+1-th element might also be `TRUE`. I've...

[RFC] carry-less multiplication instruction

2020 Jul 09

[RFC] carry-less multiplication instruction

...> clmulof a number with itself inserts zeroes between each input bit. This can be useful for generatingMorton code [23]. > > clmulof a number with -1 calculates the prefix XOR operation. This can be useful for decodinggray codes.Another application of XOR prefix sums calculated withclmulis branchless tracking of quotedstrings in high-performance parsers. [16] > > Carry-less multiply can also be used to implement Erasure code efficiently. [14] > > ==clmul lowering without hardware support== > A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplication when there is no...

RFC: Should SmallVectors be smaller?

2018 Jun 23

RFC: Should SmallVectors be smaller?

...ubiquitous, it could help the heap a bit. And if it doesn’t hurt runtime performance in practice, there’s no reason to fork the data structure. > > If no one has measured before I might try it some time. > > I think it's important to keep begin(), end(), and indexing operations branchless, so I'm not sure this pointer union is the best idea. I haven't profiled, but that's my intuition. If you wanted to limit all our vectors to 4 billion elements to save a pointer, I'd probably be fine with that. Good point, there are two separable changes here and only the union par...

[RFC] carry-less multiplication instruction

2020 Jul 05

[RFC] carry-less multiplication instruction

...ulof a number with itself inserts zeroes between each input bit. This can be useful for generatingMorton code [23]. >> >> clmulof a number with -1 calculates the prefix XOR operation. This can be useful for decodinggray codes.Another application of XOR prefix sums calculated withclmulis branchless tracking of quotedstrings in high-performance parsers. [16] >> >> Carry-less multiply can also be used to implement Erasure code efficiently. [14] >> >> ==clmul lowering without hardware support== >> A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplicati...

RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)

2018 Mar 23

RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)

...matically across an entire program rather than through manual changes to the code. While this is likely to have a high performance cost, some applications may be in a good position to take this performance / security tradeoff. The specific technique we propose is to cause loads to be checked using branchless code to ensure that they are executing along a valid control flow path. Consider the following C-pseudo-code representing the core idea of a predicate guarding potentially invalid loads: ``` void leak(int data); void example(int* pointer1, int* pointer2) { if (condition) { // ... lots of code...

[PATCH] Eliminate the ec_int32 and ec_uint32 typedefs.

2011 Mar 03

[PATCH] Eliminate the ec_int32 and ec_uint32 typedefs.

...# define EC_ILOG(_x) (ec_ilog(_x)) #endif diff --git a/libcelt/entcode.c b/libcelt/entcode.c index 17d08df..0626e51 100644 --- a/libcelt/entcode.c +++ b/libcelt/entcode.c @@ -34,7 +34,7 @@ #if !defined(EC_CLZ) -int ec_ilog(ec_uint32 _v){ +int ec_ilog(celt_uint32 _v){ /*On a Pentium M, this branchless version tested as the fastest on 1,000,000,000 random 32-bit integers, edging out a similar version with branches, and a 256-entry LUT version.*/ @@ -59,11 +59,11 @@ int ec_ilog(ec_uint32 _v){ #endif -ec_uint32 ec_tell_frac(ec_ctx *_this){ - ec_uint32 nbits; - ec_uint32 r; - int...

search for: branchless