thr3ads.net - search: "vld"

Displaying 20 results from an estimated 21 matches for "vld".

Did you mean: old

2019 Jul 15

Tail-Loop Folding/Predication

...il loop when the number of elements processed is not a multiple of the vector length. For this use case, the tail predicate pragma could be good user experience improvement, as it would for example allow this more compact form without any predicated intrinsics: #pragma tail_predicate do { VLD(..); // some vector load intrinsic VST(..); // some vector store intrinsic .. } while (N); which can then be transformed and predication made explicit through data dependencies like so: do { mask = vctp(N); // intrinsic that generates the mask of active lanes VLD(.., mas...

Plots by subject

2002 Nov 20

Plots by subject

...the genotype df, 2 lines in the drug df (3 tables are plotted with the data from these three df) and 14 lines in the lab df (for the graph you helped me with). So the whole function for one patient is: { # Overlay plot lab <-read.xport('lab') attach(lab) celld <- as.date(CELLDATE) vld <- as.date(VIRDATE) gtd <- as.date(GTDATE) par(mar=c(2,6,1,6) + 0.1) plot(celld, T4, ylim=c(0,800), pch=4, type='o', ylab='CD4 count (cells/ul) [-x-]', xlab=??) par(new=T) plot(gtd, Y, ylim=c(0,800), type='p', pch=8, axes=F, ylab='', xlab='', cex=1.4)...

[LLVMdev] Disable vectorization for unaligned data

2013 Jul 21

[LLVMdev] Disable vectorization for unaligned data

...hought LLVM's vectorizer had something like that already in. On 21 July 2013 18:16, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > I will have to work on this soon as ARM also has pretty inefficient > unaligned vector loads. > NEON does support unaligned access via VLD*/VST*, what loads are you referring to? cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130721/3b8188a3/attachment.html>

H.264 engine differences between fermi and tesla cards

2013 Dec 07

H.264 engine differences between fermi and tesla cards

...anything else, to help translation; I don't expect it to help with this particular H264 hang. > > VP2 (same) > VP3 -> MSDEC > VP4.0 -> MSDEC2 > VP4.2 -> MSDEC3 > VP5 -> MSDEC4 > > Looking at your code, it seems that you're instantiating all 3 engines (VLD, PDEC, PPP) on the same channel. This probably isn't causing the hang, but it's bad practice in general, as it prevents the engines from running in parallel. It's also impossible to use multiple engines on the same channel like this on MSDEC4 (VP5) GPUs, so the same separate channel u...

[cfe-dev] ARM float16 intrinsic test

2019 Jul 12

[cfe-dev] ARM float16 intrinsic test

...st4lane.p0i8.v4f16(i8* %4, <4 x half> %13, <4 x half> %14, <4 x half> %15, <4 x half> %16, i32 3, i32 2) declare void @llvm.arm.neon.vst4lane.p0i8.v4f16(i8*, <4 x half>, <4 x half>, <4 x half>, <4 x half>, i32, i32) #1 $$COMP_ROOT/llc arm.ll unhandled vld/vst lane type UNREACHABLE executed at /home/nancy/rpp_llvm/llvm-project/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp:2072! May I know how to compile this .cpp correctly from FE to BE? -- Best Regards, Yu Rong Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <htt...

H.264 engine differences between fermi and tesla cards

2013 Nov 30

H.264 engine differences between fermi and tesla cards

On Thu, Nov 21, 2013 at 5:22 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: > On Thu, Nov 21, 2013 at 5:07 PM, Benjamin Morris <bmorris at nvidia.com> wrote: >> On 11/19/2013 08:16 PM, Ilia Mirkin wrote: >>> Hello, >>> >>> I hope this is an appropriate style of request for this forum. I added >>> code to support video decoding on the tesla

[PATCH] bsp/g92: disable by default

2017 Oct 01

[PATCH] bsp/g92: disable by default

G92's seem to require some additional bit of initialization before the BSP engine can work. It feels like clocks are not set up for the underlying VLD engine, which means that all commands submitted to the xtensa chip end up hanging. VP seems to work fine though. This still allows people to force-enable the bsp engine if they want to play around with it, but makes it harder for the card to hang by default. Signed-off-by: Ilia Mirkin <imirkin...

[ANNOUNCE] libXvMC 1.0.12

2019 Sep 24

[ANNOUNCE] libXvMC 1.0.12

This release fixes the pkgconfig data to not refer to libXv, adds a pkgconfig file for libXvMCW, and prepares for a future xorgproto release. There should be no functional changes. Adam Jackson (3): pkgconfig: Remove xv from xvmc.pc vld: Provide <X11/extensions/vldXvMC.h> ourself libXvMC 1.0.12 Dylan Baker (1): Add a pkgconfig file for libXvMCW git tag: libXvMC-1.0.12 https://xorg.freedesktop.org/archive/individual/lib/libXvMC-1.0.12.tar.bz2 MD5: 3569ff7f3e26864d986d6a21147eaa58 libXvMC-1.0.12.tar.bz2 SHA1:...

[LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)

2012 Feb 29

[LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)

On 29 February 2012 11:47, James Molloy <james.molloy at arm.com> wrote: >> (But if we only care about >> running on x86-64, this won't be a problem.) > > Please, no. We have a cortex-a9 native buildbot already in lab.llvm.org and > as manufacturers emit faster ARM chips we (ARM) will want to have LLVM run > native on them. > > You've also got the OpenCL

[LLVMdev] Disable vectorization for unaligned data

2013 Jul 21

[LLVMdev] Disable vectorization for unaligned data

No, I am afraid not without computing alignment based on the scalar code. In order to limit vectorization to 16-byte aligned data we need to know that data is 16-byte aligned. The way we vectorize we won’t know that until after we have vectorized. As you have observed we will pass “4” to getMemoryOpCost in the loop vectorizer (as that is the only thing that can be inferred from a consecutive

Need help for write rpm spec

2015 Mar 18

Need help for write rpm spec

Hi I am try to write rpm spec for install tomcat on a linux machine.But while build the rpm i found following error + /usr/lib/rpm/find-debuginfo.sh --strict-build-id /home/rpmbuild/BUILD/Install_tomcat-1.0 extracting debug info from /home/rpmbuild/BUILDROOT/Install_tomcat-1.0-1.el6.x86_64/usr/local/jdk1.7.0_13/lib/visualvm/profiler/lib/deployed/jdk16/linux-amd64/libprofilerinterface.so ***

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 10

[LLVMdev] NEON intrinsics preventing redundant load optimization?

...hope for this improving in the future, or anything I can do now to improve the generated code? >>> >>> If I had to guess, I'd say the intrinsic got in the way of recognising >>> the pattern. vmulq_f32 got correctly lowered to IR as "fmul", but >>> vld1q_f32 is still kept as an intrinsic, so register allocators and >>> schedulers get confused and, when lowering to assembly, you're left >>> with garbage around it. > > FWIW, with top of tree clang, I get the same (good) code for both of the implementations of operator*...

[LLVMdev] Disable vectorization for unaligned data

2013 Jul 21

[LLVMdev] Disable vectorization for unaligned data

Ok any quick workaround to limit vectorization to 16-byte aligned 128-bit data then? All the memory copying done by ExpandUnalignedStore/ExpandUnalignedLoad is just too expensive. On Sat, Jul 20, 2013 at 12:52 PM, Arnold Schwaighofer < aschwaighofer at apple.com> wrote: > > On Jul 19, 2013, at 3:14 PM, Francois Pichet <pichet2000 at gmail.com> wrote: > > > > >

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2015 Jan 13

[LLVMdev] NEON intrinsics preventing redundant load optimization?

> On 5 Jan 2015, at 13:08, Renato Golin <renato.golin at linaro.org> wrote: > > On 5 January 2015 at 12:13, James Molloy <james at jamesmolloy.co.uk> wrote: >> For this reason Renato I don't think we should advise people to work around >> the API, as who knows what problems that will cause later. > > I stand corrected (twice). But we changed the subject

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 08

[LLVMdev] NEON intrinsics preventing redundant load optimization?

...store on the stack? Is there any hope for this improving in the future, or anything I can do now to improve the generated code? > > If I had to guess, I'd say the intrinsic got in the way of recognising > the pattern. vmulq_f32 got correctly lowered to IR as "fmul", but > vld1q_f32 is still kept as an intrinsic, so register allocators and > schedulers get confused and, when lowering to assembly, you're left > with garbage around it. > > Creating a bug for this is probably the best thing to do, since this > is a common pattern that needs looking into...

[cfe-dev] ARM float16 intrinsic test

2019 Jul 12

[cfe-dev] ARM float16 intrinsic test

....eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 28, 1 .eabi_attribute 38, 1 .eabi_attribute 18, 4 .eabi_attribute 26, 2 .eabi_attribute 14, 0 .file "arm.cpp" unhandled vld/vst lane type UNREACHABLE executed at /home/nancy/rpp_llvm/llvm-project/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp:2072! Stack dump: 0. Program arguments: /home/nancy/rpp_llvm/build-project/bin/clang-8 -cc1 -triple armv8.2a-arm-unknown-eabihf -S -disable-free -main-file-name arm.cpp -mrelocation-mo...

H.264 engine differences between fermi and tesla cards

2013 Dec 07

H.264 engine differences between fermi and tesla cards

...l names. This is more of an FYI than anything else, to help translation; I don't expect it to help with this particular H264 hang. VP2 (same) VP3 -> MSDEC VP4.0 -> MSDEC2 VP4.2 -> MSDEC3 VP5 -> MSDEC4 Looking at your code, it seems that you're instantiating all 3 engines (VLD, PDEC, PPP) on the same channel. This probably isn't causing the hang, but it's bad practice in general, as it prevents the engines from running in parallel. It's also impossible to use multiple engines on the same channel like this on MSDEC4 (VP5) GPUs, so the same separate channel u...

[RFC] Vector Predication

2019 Feb 05

[RFC] Vector Predication

On 2/5/19 1:27 AM, Philip Reames via llvm-dev wrote: > > On 1/31/19 4:57 PM, Bruce Hoult wrote: >> On Thu, Jan 31, 2019 at 4:05 PM Philip Reames via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>> Do such architectures frequently have arithmetic operations on the >>> mask registers? (i.e. can I reasonable compute a conservative >>> length

[RFC] Vector Predication

2019 Feb 05

[RFC] Vector Predication

...element size 4 bytes > add a2, a2, a5 # Bump pointer a > add a3, a3, a5 # Bump pointer b > vwmul.vv v8, v0, v4 # 64b result in v8-v15 > > vsetvli zero, a0, vsew64,vlmul8 # Operate on 64b values, discard > new AVL as it's the same > vld.v v16, (a1) # Get 64b vector dst into v16-v23 > vadd.vv v16, v16, v8 # add 64b elements in v8-v15 to v16-v23 > vsd.v v16, (a1) # Store vector of 64b > slli a5, a4, 3 # multiply AVL by element size 8 bytes > add a1, a1, a5 # Bump...

[LLVMdev] 3.4.1 Release Plans

2014 Mar 26

[LLVMdev] 3.4.1 Release Plans

Hi, We are now about halfway between the 3.4 and 3.5 releases, and I would like to start preparing for a 3.4.1 release. Here is my proposed release schedule: Mar 26 - April 9: Identify and backport additional bug fixes to the 3.4 branch. April 9 - April 18: Testing Phase April 18: 3.4.1 Release How you can help: - If you have any bug fixes you think should be included to 3.4.1, send me an

search for: vld