Displaying 9 results from an estimated 9 matches for "branchless".
Did you mean:
branches
2020 Jul 05
8
[RFC] carry-less multiplication instruction
...ulof a number with itself inserts zeroes between each input bit. This can be useful for generatingMorton code [23].</p><p>clmulof a number with -1 calculates the prefix XOR operation. This can be useful for decodinggray codes.Another application of XOR prefix sums calculated withclmulis branchless tracking of quotedstrings in high-performance parsers. [16]</p><p>Carry-less multiply can also be used to implement Erasure code efficiently. [14]</p><p>==clmul lowering without hardware support==<br />A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplica...
2020 Jul 09
2
[RFC] carry-less multiplication instruction
...lof a number with itself inserts zeroes between each input bit. This can be useful for generatingMorton code [23].
>>
>> clmulof a number with -1 calculates the prefix XOR operation. This can be useful for decodinggray codes.Another application of XOR prefix sums calculated withclmulis branchless tracking of quotedstrings in high-performance parsers. [16]
>>
>> Carry-less multiply can also be used to implement Erasure code efficiently. [14]
>>
>> ==clmul lowering without hardware support==
>> A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplic...
2018 Jun 22
3
RFC: Should SmallVectors be smaller?
>> On Jun 21, 2018, at 18:38, Chris Lattner <clattner at nondot.org> wrote:
>>
>>
>>
>> On Jun 21, 2018, at 9:52 AM, Duncan P. N. Exon Smith via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>
>> I've been curious for a while whether SmallVectors have the right speed/memory tradeoff. It would be straightforward to shave off a couple of
2023 Apr 05
0
An alternative algorithm for `which()`
...ernative C algorithm for
`which()` which uses less memory and is often faster in many real life
scenarios. I've documented it in full on the bugzilla page, with many
examples:
https://bugs.r-project.org/show_bug.cgi?id=18495
The short version is that the performance comes from making the loops
branchless, which seems to be particularly helpful for `which()`. With
`which(x)`, I'd argue that branches are often hard for the compiler to
predict since in most real data there is typically no indication that
if the i-th element of `x` is `TRUE`, then the i+1-th element might
also be `TRUE`.
I've...
2020 Jul 09
2
[RFC] carry-less multiplication instruction
...> clmulof a number with itself inserts zeroes between each input bit. This can be useful for generatingMorton code [23].
>
> clmulof a number with -1 calculates the prefix XOR operation. This can be useful for decodinggray codes.Another application of XOR prefix sums calculated withclmulis branchless tracking of quotedstrings in high-performance parsers. [16]
>
> Carry-less multiply can also be used to implement Erasure code efficiently. [14]
>
> ==clmul lowering without hardware support==
> A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplication when there is no...
2018 Jun 23
2
RFC: Should SmallVectors be smaller?
...ubiquitous, it could help the heap a bit. And if it doesn’t hurt runtime performance in practice, there’s no reason to fork the data structure.
>
> If no one has measured before I might try it some time.
>
> I think it's important to keep begin(), end(), and indexing operations branchless, so I'm not sure this pointer union is the best idea. I haven't profiled, but that's my intuition. If you wanted to limit all our vectors to 4 billion elements to save a pointer, I'd probably be fine with that.
Good point, there are two separable changes here and only the union par...
2020 Jul 05
5
[RFC] carry-less multiplication instruction
...ulof a number with itself inserts zeroes between each input bit. This can be useful for generatingMorton code [23].
>>
>> clmulof a number with -1 calculates the prefix XOR operation. This can be useful for decodinggray codes.Another application of XOR prefix sums calculated withclmulis branchless tracking of quotedstrings in high-performance parsers. [16]
>>
>> Carry-less multiply can also be used to implement Erasure code efficiently. [14]
>>
>> ==clmul lowering without hardware support==
>> A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplicati...
2018 Mar 23
5
RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)
...matically across an entire program rather than
through manual changes to the code. While this is likely to have a high
performance cost, some applications may be in a good position to take this
performance / security tradeoff.
The specific technique we propose is to cause loads to be checked using
branchless code to ensure that they are executing along a valid control flow
path. Consider the following C-pseudo-code representing the core idea of
a predicate guarding potentially invalid loads:
```
void leak(int data);
void example(int* pointer1, int* pointer2) {
if (condition) {
// ... lots of code...
2011 Mar 03
0
[PATCH] Eliminate the ec_int32 and ec_uint32 typedefs.
...# define EC_ILOG(_x) (ec_ilog(_x))
#endif
diff --git a/libcelt/entcode.c b/libcelt/entcode.c
index 17d08df..0626e51 100644
--- a/libcelt/entcode.c
+++ b/libcelt/entcode.c
@@ -34,7 +34,7 @@
#if !defined(EC_CLZ)
-int ec_ilog(ec_uint32 _v){
+int ec_ilog(celt_uint32 _v){
/*On a Pentium M, this branchless version tested as the fastest on
1,000,000,000 random 32-bit integers, edging out a similar version with
branches, and a 256-entry LUT version.*/
@@ -59,11 +59,11 @@ int ec_ilog(ec_uint32 _v){
#endif
-ec_uint32 ec_tell_frac(ec_ctx *_this){
- ec_uint32 nbits;
- ec_uint32 r;
- int...