thr3ads.net - search: "lane"

Displaying 20 results from an estimated 1577 matches for "lane".

Did you mean: land

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

2017 Dec 20

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

This is parallel to the pre-SM50 change which does this. Adjusts the shuffles / quadops to make the values correct relative to lane 0, and then splat the results to all lanes for the final move into the target register. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Entirely untested beyond compilation. Should check bin/tex-miplevel-selection textureGrad Cube bin/tex-miplevel-selection textureGrad CubeShadow...

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

2017 Dec 20

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

On Tue, Dec 19, 2017 at 11:41 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: > This is parallel to the pre-SM50 change which does this. Adjusts the > shuffles / quadops to make the values correct relative to lane 0, and > then splat the results to all lanes for the final move into the target > register. > > Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> > --- > > Entirely untested beyond compilation. Should check > > bin/tex-miplevel-selection textureGrad Cube > bin...

Describing subreg load for vectors without using vector_insert

2017 Sep 19

Describing subreg load for vectors without using vector_insert

Hi, We are using a vector_insert in our target, to describe an instruction performing a lane-load of a vector register as: set $dstReg, (vector_insert $dstReg, (load $addr)), imm:$lane) However, this means that the dstReg is also marked as used in the instruction, which we do not want. We can do a direct lane-load to a part of the vector register without disturbing the rest, and hence wo...

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 06

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

...12:39 PM, Sjoerd Meijer wrote: Hello Simon, Thanks for your replies, very useful. And yes, thanks for the example and making the target differences clear: ; Some examples: ; RISC-V V & VE(*): ; %mask = (splat i1 1) ; %evl = min(256, %n - %i) ; MVE/SVE : ; %mask = get.active.lane.mask(%i, %n) ; %evl = call @llvm.vscale() ; AVX: ; %mask = icmp (%i + (seq <8 x i32> 0,1,2,.,)), %n, ; %evl = i32 8 Unless I miss something, the AVX example is semantically the same as get.active.lane.mask: %m[i] = icmp ult (%base + i), %n with i = 8. Correct (llvm.get.acti...

[LLVMdev] Register coalescer and reg_sequence (virtual super-regs)

2013 Jun 19

[LLVMdev] Register coalescer and reg_sequence (virtual super-regs)

Was it the subreg lane masks / mapping that was added to address the missed coalescing? This solution is nice, but I don't think it'll work for me. I have 8-element vector registers that can be grouped into virtual super regs for bulk save/restore, and as soon as I have more than 4 in a tuple, the unsigned int u...

[Bug 77529] New: NVS 510 DP-3 output doesn't work

2014 Apr 16

[Bug 77529] New: NVS 510 DP-3 output doesn't work

https://bugs.freedesktop.org/show_bug.cgi?id=77529 Priority: medium Bug ID: 77529 Assignee: nouveau at lists.freedesktop.org Summary: NVS 510 DP-3 output doesn't work QA Contact: xorg-team at lists.x.org Severity: normal Classification: Unclassified OS: All Reporter: tex at sergio.spb.ru

[PATCH 00/10] extract dp helper functions

2012 Oct 18

[PATCH 00/10] extract dp helper functions

...l_eq_ok helpers drm: extract helpers to compute new training values from sink request drm/nouveau: use dp link train request helper drm: extract dp link train delay functions from radeon drm/i915: use the new dp train delay helpers drm: extract dp link bw helpers drm: extract drm_dp_max_lane_count helper drivers/gpu/drm/Makefile | 2 +- drivers/gpu/drm/drm_dp_helper.c | 328 +++++++++++++++++++++++++++++++++++ drivers/gpu/drm/drm_dp_i2c_helper.c | 208 ---------------------- drivers/gpu/drm/i915/intel_dp.c | 98 ++--------- drivers/gpu/drm/nouveau/nouveau_dp...

[RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

2017 Oct 17

[RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

...n any vector with integer elements. declare <16 x i32> @llvm.masked.sdiv.v16i32(<16 x i32> <a>, <16 x i32> <b>, <16 x i1> <mask>, <16 x i32> <passthru>) Overview: Returns the quotient of its two operands per vector lane according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent division in the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the passthru operand. Arguments: The first two arguments must be...

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 06

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

On 11/6/20 8:49 AM, Roger Ferrer Ibáñez wrote: Hi Sjoerd, Trying to remember how everything fits together here, but could get.active.lane.mask not create the %mask of the VP intrinsics? Or in other words, in the vectoriser, who's producing the %mask and %evl that is consumed by the VP intrinsics? I'm not sure what would be the best way here. I think about the Loop Vectorizer. I imagine at some point we can teach LV to emit V...

[LLVMdev] Combining physical registers

2013 May 16

[LLVMdev] Combining physical registers

...n X86, AL and AH together form AX. On Hexagon, R0 and R1 are D0. > The context here is an attempt to coalesce multiple loads/stores into fewer loads/stores using larger registers. At the moment, there is no way of determining this, but I have a patch. > It is my understanding that register lane masks are not exact in a sense that they will tell me if two register lanes alias, but not necessarily if a set of masks adds up to a full register. Is this correct? That’s right. Would this TRI function solve your problem? /// The lane masks returned by getSubRegIndexLaneMask() above can onl...

[Bug 67628] New: [BISECTED] Monitor on Display port shows distortions

2013 Aug 01

[Bug 67628] New: [BISECTED] Monitor on Display port shows distortions

https://bugs.freedesktop.org/show_bug.cgi?id=67628 Priority: medium Bug ID: 67628 Assignee: nouveau at lists.freedesktop.org Summary: [BISECTED] Monitor on Display port shows distortions QA Contact: xorg-team at lists.x.org Severity: major Classification: Unclassified OS: Linux (All) Reporter:

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 09

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

; RISC-V V & VE(*): ; %mask = get.active.lane.mask(%i, %i) ; %evl = min(256, %n - %i) ; MVE/SVE/AVX : ; %mask = get.active.lane.mask(%i, %n) ; %evl = call @llvm.vscale() For VE, we want to do as much predication as possible through %evl and as little as possible with %mask. This has performance implications on VE and RISC-V - VE...

Is this undefined behavior optimization legal?

2016 Oct 03

Is this undefined behavior optimization legal?

...y set the low 8-bits, because anything else would be undefined behavior and the program would be broken. This assumption is what causes it to remove the 'and' operation. So effectively, what has happened here, is that by inserting the result of an operation with undefined behavior into one lane of a vector, we have overwritten all the other lanes of the vector. Is this optimization legal? To me it seems wrong that undefined behavior in one lane of a vector could affect another lane. However, given that LLVM IR is SSA and we are technically creating a new vector and not modifying the ol...

[LLVMdev] Combining physical registers

2013 May 16

[LLVMdev] Combining physical registers

...t of physical registers "adds up" to a larger register. For example, on X86, AL and AH together form AX. On Hexagon, R0 and R1 are D0. The context here is an attempt to coalesce multiple loads/stores into fewer loads/stores using larger registers. It is my understanding that register lane masks are not exact in a sense that they will tell me if two register lanes alias, but not necessarily if a set of masks adds up to a full register. Is this correct? -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...ave most of the AMD-specific low-level shuffle intrinsics implemented that you need to do this, but I can think of a few concerns/questions. First of all, to implement the prefix scan, we'll need to do a code sequence that looks like this, modified from http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace v_foo_f32 with the appropriate operation): ; v0 is the input register v_mov_b32 v1, v0 v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two independent instructions to a...

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 06

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

Hello Simon, Thanks for your replies, very useful. And yes, thanks for the example and making the target differences clear: ; Some examples: ; RISC-V V & VE(*): ; %mask = (splat i1 1) ; %evl = min(256, %n - %i) ; MVE/SVE : ; %mask = get.active.lane.mask(%i, %n) ; %evl = call @llvm.vscale() ; AVX: ; %mask = icmp (%i + (seq <8 x i32> 0,1,2,.,)), %n, ; %evl = i32 8 Unless I miss something, the AVX example is semantically the same as get.active.lane.mask: %m[i] = icmp ult (%base + i), %n with i = 8. Just saying this to s...

semPLS package will not load seems to be failing on loading package lattice

2018 Jan 28

semPLS package will not load seems to be failing on loading package lattice

...;lattice': .onLoad failed in loadNamespace() for 'grid', details: call: fun(libname, pkgname) error: object 'C_initGrid' not found Error: package 'lattice' could not be loaded Any advice or help on this bug would be much appreciated Best regards Michael Dr Michael Lane USQ Profile<http://staffprofile.usq.edu.au/Profile/Michael-Lane> PhD Information Systems, USQ Email: Michael.Lane at usq.edu.au<mailto:Michael.Lane at usq.edu.au> Ph 07 4631 1268 Mobile 0407 316 391 Academic Coordinator School of Management and Enterprise Member of Editoral Board Austra...

xyplot#strips like ggplot?

2009 Oct 08

xyplot#strips like ggplot?

Dear all, I want to split the strips in xyplot and push them into the margins ... Tried to find this in common documentation (such as Deepayan's book) on lattice ... but so far without success ... Here is the situation: xyplot(Speed~Count|Lane*Day,...) where Speed and Count are numeric, Lane and Day are factors. By default, this makes a double strip on top of each graph. I can change this to make a strip on the left or change the strip layout. What I want to do, is to write the strips in the margins of the layout. That is, to put the...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...el shuffle intrinsics implemented that you need to do this, but >> I can think of a few concerns/questions. First of all, to implement >> the prefix scan, we'll need to do a code sequence that looks like >> this, modified from >> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >> v_foo_f32 with the appropriate operation): >> >> ; v0 is the input register >> v_mov_b32 v1, v0 >> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >> v_foo_f32 v1, v0, v1 row_shr:3/...

Addressing TableGen's error "Ran out of lanemask bits" in order to use more than 32 subregisters per register

2016 Sep 18

Addressing TableGen's error "Ran out of lanemask bits" in order to use more than 32 subregisters per register

Hello. I've managed to patch the various files from the back end related to lanemask - now I have 1024-bit long lanemask. But now I get the following error when giving make llc: <<error:unhandled vector type width in intrinsic!>> This error comes from this file https://github.com/llvm-mirror/llvm/blob/master/utils/TableGen/IntrinsicEmitter.cpp, comes...

search for: lane