Displaying 4 results from an estimated 4 matches for "vlmul4".
Did you mean:
vlmul
2019 Feb 01
2
[RFC] Vector Predication
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
On Thu, Jan 31, 2019 at 10:22 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> We're in-progress designing a RISC-V extension (http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html) that would have variable-length vectors of short vectors (1 to 4):
> <VL x <4 x
2019 Feb 01
3
[RFC] Vector Predication
...ach 128 bit element into 4 parts.
>
> Arithmetic/logical/shift will happen on 32 bit elements, but
> predication and loads and stores (including strided or scatter/gather)
> will operate on 128 bit elements.
>
> [I just made up "vnreg8" as an alias for the standard "vlmul4" because
> "vlmul4,vdiv4" might look confusing. Either way it means to put 0b10
> into bits [1:0] of the vtype CSR specifying that the 32 vector
> registers should be ganged into 8 groups each 4x longer than standard
> because (I'm assuming) we need more than four vec...
2019 Feb 05
4
[RFC] Vector Predication
On 2/5/19 1:27 AM, Philip Reames via llvm-dev wrote:
>
> On 1/31/19 4:57 PM, Bruce Hoult wrote:
>> On Thu, Jan 31, 2019 at 4:05 PM Philip Reames via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> Do such architectures frequently have arithmetic operations on the
>>> mask registers? (i.e. can I reasonable compute a conservative
>>> length
2019 Feb 05
3
[RFC] Vector Predication
...or (size_t i=0; i<n; ++i)
> dst[i] += a[i] * b[i];
> }
>
> If 32x32->64 multiplies are cheaper than 64x64->64 multiplies then you
> might want to compile this to:
>
> # args n in a0, dst in a1, a in a2, b in a3, AVL in t0
> foo:
> vsetvli a4, a0, vsew32,vlmul4 # vtype = 32-bit integer vectors, AVL in a4
> vlw.v v0, (a2) # Get 32b vector a into v0-v3
> vlw.v v4, (a3) # Get 32b vector b into v4-v7
> slli a5, a4, 2 # multiply AVL by element size 4 bytes
> add a2, a2, a5 # Bump pointer a
&...