thr3ads.net - search: "vlmul4"

Displaying 4 results from an estimated 4 matches for "vlmul4".

Did you mean: vlmul

2019 Feb 01

[RFC] Vector Predication

--- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Thu, Jan 31, 2019 at 10:22 PM Jacob Lifshay <programmerjake at gmail.com> wrote: > > We're in-progress designing a RISC-V extension (http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html) that would have variable-length vectors of short vectors (1 to 4): > <VL x <4 x

[RFC] Vector Predication

2019 Feb 01

[RFC] Vector Predication

...ach 128 bit element into 4 parts. > > Arithmetic/logical/shift will happen on 32 bit elements, but > predication and loads and stores (including strided or scatter/gather) > will operate on 128 bit elements. > > [I just made up "vnreg8" as an alias for the standard "vlmul4" because > "vlmul4,vdiv4" might look confusing. Either way it means to put 0b10 > into bits [1:0] of the vtype CSR specifying that the 32 vector > registers should be ganged into 8 groups each 4x longer than standard > because (I'm assuming) we need more than four vec...

[RFC] Vector Predication

2019 Feb 05

[RFC] Vector Predication

On 2/5/19 1:27 AM, Philip Reames via llvm-dev wrote: > > On 1/31/19 4:57 PM, Bruce Hoult wrote: >> On Thu, Jan 31, 2019 at 4:05 PM Philip Reames via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>> Do such architectures frequently have arithmetic operations on the >>> mask registers? (i.e. can I reasonable compute a conservative >>> length

[RFC] Vector Predication

2019 Feb 05

[RFC] Vector Predication

...or (size_t i=0; i<n; ++i) > dst[i] += a[i] * b[i]; > } > > If 32x32->64 multiplies are cheaper than 64x64->64 multiplies then you > might want to compile this to: > > # args n in a0, dst in a1, a in a2, b in a3, AVL in t0 > foo: > vsetvli a4, a0, vsew32,vlmul4 # vtype = 32-bit integer vectors, AVL in a4 > vlw.v v0, (a2) # Get 32b vector a into v0-v3 > vlw.v v4, (a3) # Get 32b vector b into v4-v7 > slli a5, a4, 2 # multiply AVL by element size 4 bytes > add a2, a2, a5 # Bump pointer a &...

search for: vlmul4