search for: vload

Displaying 20 results from an estimated 24 matches for "vload".

Did you mean: load
2020 Apr 09
2
RFC: Promoting experimental reduction intrinsics to first class intrinsics
...this which uses the shuffle sequence the vectorizers generated before the reduction intrinsics existed. declare i64 @llvm.experimental.vector.reduce.or.v2i64(<2 x i64>)· declare void @TrapFunc(i64) define void @parseHeaders(i64 * %ptr) { %vptr = bitcast i64 * %ptr to <2 x i64> * %vload = load <2 x i64>, <2 x i64> * %vptr, align 8 %b = shufflevector <2 x i64> %vload, <2 x i64> undef, <2 x i32> <i32 1, i32 undef> %c = or <2 x i64> %vload, %b %vreduce = extractelement <2 x i64> %c, i32 0 %vcheck = icmp eq i64 %vreduce, 0 br...
2020 Apr 09
2
RFC: Promoting experimental reduction intrinsics to first class intrinsics
...the vectorizers generated > before the reduction intrinsics existed. > > declare i64 @llvm.experimental.vector.reduce.or.v2i64(<2 x i64>)· > declare void @TrapFunc(i64) > > define void @parseHeaders(i64 * %ptr) { > %vptr = bitcast i64 * %ptr to <2 x i64> * > %vload = load <2 x i64>, <2 x i64> * %vptr, align 8 > > %b = shufflevector <2 x i64> %vload, <2 x i64> undef, <2 x i32> <i32 1, > i32 undef> > %c = or <2 x i64> %vload, %b > %vreduce = extractelement <2 x i64> %c, i32 0 > > %vche...
2020 Jun 17
2
RFC: Promoting experimental reduction intrinsics to first class intrinsics
...ed. >> >> declare i64 @llvm.experimental.vector.reduce.or.v2i64(<2 x i64>)· >> declare void @TrapFunc(i64) >> >> define void @parseHeaders(i64 * %ptr) { >>   %vptr = bitcast i64 * %ptr to <2 x i64> * >>   %vload = load <2 x i64>, <2 x i64> * %vptr, align 8 >> >>   %b = shufflevector <2 x i64> %vload, <2 x i64> undef, <2 x >> i32> <i32 1, i32 undef> >>   %c = or <2 x i64> %vload, %b >>   %vreduce = extrac...
2020 Sep 09
4
RFC: Promoting experimental reduction intrinsics to first class intrinsics
...insics existed. >>> >>> declare i64 @llvm.experimental.vector.reduce.or.v2i64(<2 x i64>)· >>> declare void @TrapFunc(i64) >>> >>> define void @parseHeaders(i64 * %ptr) { >>> %vptr = bitcast i64 * %ptr to <2 x i64> * >>> %vload = load <2 x i64>, <2 x i64> * %vptr, align 8 >>> >>> %b = shufflevector <2 x i64> %vload, <2 x i64> undef, <2 x i32> <i32 >>> 1, i32 undef> >>> %c = or <2 x i64> %vload, %b >>> %vreduce = extractelement &lt...
2007 May 21
1
[LLVMdev] Simplifing the handling of pre-legalize vector nodes
Right now there are special SelectionDAG node kinds for operations on "abstract" vector types (VLOAD, VADD, and all the rest), and a special MVT::Vector ValueType for them. These nodes carry two additional operands, constants which specify the vector length and element type. All of this is only used before legalize; then they are replaced with regular node kinds and value types. It seems that a n...
2002 Sep 03
0
No subject
Hello, I'm doing a pca analysis and get unrotated PCA results (using "pca"). I then used "varimax" to rotate the PCs, vload <- varimax(cproj,normalize=TRUE,eps=1e-5) #cproj is the "loadings" from "pca" and calculate the score coefficients with: coef <- solve(correlation matrix of original data) %*% (vload$loadings) Then calculate the scores with: scores <- (standardised original data) %...
2002 Sep 03
0
RE:
...Statistica use? -----Original Message----- From: Williams, Allyson Sent: Tuesday, 3 September 2002 10:20 AM To: r-help at stat.math.ethz.ch Subject: Hello, I'm doing a pca analysis and get unrotated PCA results (using "pca"). I then used "varimax" to rotate the PCs, vload <- varimax(cproj,normalize=TRUE,eps=1e-5) #cproj is the "loadings" from "pca" and calculate the score coefficients with: coef <- solve(correlation matrix of original data) %*% (vload$loadings) Then calculate the scores with: scores <- (standardised original data) %...
2020 Apr 08
7
RFC: Promoting experimental reduction intrinsics to first class intrinsics
Hi, It’s been a few years now since I added some intrinsics for doing vector reductions. We’ve been using them exclusively on AArch64, and I’ve seen some traffic a while ago on list for other targets too. Sander did some work last year to refine the semantics after some discussion. Are we at the point where we can drop the “experimental” from the name? IMO all target should begin to transition
2004 Jul 21
0
[LLVMdev] GC questions.
Ok, that makes sense :). , Tobias On Wed, 21 Jul 2004, Chris Lattner wrote: > On Wed, 21 Jul 2004, Tobias Nurmiranta wrote: > > > void *llvm_gc_read(void *ObjPtr, void **FieldPtr) { > > > return *FieldPtr; > > > } > > > > Hm, but doesn't FieldPtr need to be calculated target-specific in those > > cases? > > For the field pointer, one
2004 Jul 21
2
[LLVMdev] GC questions.
On Wed, 21 Jul 2004, Tobias Nurmiranta wrote: > > void *llvm_gc_read(void *ObjPtr, void **FieldPtr) { > > return *FieldPtr; > > } > > Hm, but doesn't FieldPtr need to be calculated target-specific in those > cases? For the field pointer, one could use the getelementptr instruction: %pairty = { sbyte, sbyte, int* } %pairPtr = ... %fieldptr = getelementptr
2004 Jul 22
2
[LLVMdev] GC questions.
...GCRootValues[i]->getOperand(1); < AllocaInst* a = new AllocaInst(v->getType(), 0, "stackGcRoot", cast<Instruction>(v)); < GCRootValues[i]->setOperand(1, a); < < std::vector<Instruction*> vUses; < std::vector<Instruction*> vLoads; < < for (Value::use_iterator j = v->use_begin(), e = v->use_end(); j != e; ++j) { < if (Instruction *Inst = dyn_cast<Instruction>(*j)) { < LoadInst* l = new LoadInst(a, "loadGcRoot", Inst); < vUses.push_back(Inst); vLoads.pus...
2017 Jun 25
2
AVX Scheduling and Parallelism
...ou provided, the reuse of XMM0 and XMM1 across loop-unroll instances does not inhibit instruction-level parallelism. Modern X86 processors use register renaming that can eliminate the dependencies in the instruction stream. In the example you provided, the processor should be able to identify the 2-vloads + vadd + vstore sequences as independent and pipeline their execution. Thanks, Zvi From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: Saturday, June 24, 2017 05:17 To: hameeza ahmed <hahmed2305 at gmail.com>; llvm-dev at lists.llvm.org Cc: Demikhovsky, Elena <elena.demikhovsky at intel....
2017 Jun 25
0
AVX Scheduling and Parallelism
...M0 and XMM1 across loop-unroll instances does not inhibit > instruction-level parallelism. > > Modern X86 processors use register renaming that can eliminate the > dependencies in the instruction stream. In the example you provided, > the processor should be able to identify the 2-vloads + vadd + vstore > sequences as independent and pipeline their execution. > > Thanks, Zvi > > *From:*Hal Finkel [mailto:hfinkel at anl.gov] > *Sent:* Saturday, June 24, 2017 05:17 > *To:* hameeza ahmed <hahmed2305 at gmail.com>; llvm-dev at lists.llvm.org > *Cc:* Demi...
2015 Jan 05
2
[LLVMdev] NEON intrinsics preventing redundant load optimization?
On 4 Jan 2015, at 21:06, Tim Northover <t.p.northover at gmail.com> wrote: >>> I’ve managed to replace the load/store intrinsics with pointer dereferences (along with a typedef to get the alignment correct). This generates 100% the same IR + asm as the auto-vectorized C version (both using -O3), and works with the toolchain in the latest XCode. Are there any concerns around doing
2016 Mar 30
3
infer correct types from the pattern
i'm getting a Could not infer all types in pattern! error in my backend. it is happening on the following instruction: VGETITEM: (set GPR:{i32:f32}:$rD, (extractelt:{i32:f32} VR:{v4i32:v4f32}:$rA, GPR:i32:$rB)). how do i make it use appropriate types? in other words if it is f32 then use v4v32 and if it is i32 then use v4f32. i'm not sure even where to start? any help is appreciated.
2015 Jan 05
4
[LLVMdev] NEON intrinsics preventing redundant load optimization?
...teswaps the entire 128-bit number). While pointer dereference does work just as well (and better, given this defect) as VLD1 it is explicitly *not supported*. The ACLE mandates that there are only certain ways to legitimately "create" a vector object - vcreate, vcombine, vreinterpret and vload. NEON intrinsic types don't exist in memory (memory is modelled as a sequence of scalars, as in the C model). For this reason Renato I don't think we should advise people to work around the API, as who knows what problems that will cause later. The reason above is why we map a vloadq_f32()...
2019 Apr 28
2
[GSoC] Supporting Efficiently the Shift-vector Instructions of the Connex Vector Processor
Hello, Anton, I'd like to add a small reply regarding this GSoC project that I would like to mentor and I discussed also with Andrei. A good part of our GSoC project is indeed related to this Connex back end that it's not yet part of the LLVM source repository - an important thing proposed in the project is that we plan to perform efficient realignment for this Connex vector
2011 Jul 27
9
[PATCH 0/5] Collected vdso/vsyscall fixes for 3.1
This fixes various problems that cropped up with the vdso patches. - Patch 1 fixes an information leak to userspace. - Patches 2 and 3 fix the kernel build on gold. - Patches 4 and 5 fix Xen (I hope). Konrad, could you could test these on Xen and run 'test_vsyscall test' [1]? I don't have a usable Xen setup. Also, I'd appreciate a review of patches 4 and 5 from some
2011 Jul 27
9
[PATCH 0/5] Collected vdso/vsyscall fixes for 3.1
This fixes various problems that cropped up with the vdso patches. - Patch 1 fixes an information leak to userspace. - Patches 2 and 3 fix the kernel build on gold. - Patches 4 and 5 fix Xen (I hope). Konrad, could you could test these on Xen and run 'test_vsyscall test' [1]? I don't have a usable Xen setup. Also, I'd appreciate a review of patches 4 and 5 from some
2011 Jul 27
9
[PATCH 0/5] Collected vdso/vsyscall fixes for 3.1
This fixes various problems that cropped up with the vdso patches. - Patch 1 fixes an information leak to userspace. - Patches 2 and 3 fix the kernel build on gold. - Patches 4 and 5 fix Xen (I hope). Konrad, could you could test these on Xen and run 'test_vsyscall test' [1]? I don't have a usable Xen setup. Also, I'd appreciate a review of patches 4 and 5 from some