thr3ads.net - search: "lanes"

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

2017 Dec 20

2

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

This is parallel to the pre-SM50 change which does this. Adjusts the shuffles / quadops to make the values correct relative to lane 0, and then splat the results to all lanes for the final move into the target register. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Entirely untested beyond compilation. Should check bin/tex-miplevel-selection textureGrad Cube bin/tex-miplevel-selection textureGrad CubeShadow bin/tex-miplevel-selection textureGrad Cube...

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

2017 Dec 20

0

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

On Tue, Dec 19, 2017 at 11:41 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: > This is parallel to the pre-SM50 change which does this. Adjusts the > shuffles / quadops to make the values correct relative to lane 0, and > then splat the results to all lanes for the final move into the target > register. > > Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> > --- > > Entirely untested beyond compilation. Should check > > bin/tex-miplevel-selection textureGrad Cube > bin/tex-miplevel-selection textureGrad CubeShadow &...

Describing subreg load for vectors without using vector_insert

2017 Sep 19

1

Describing subreg load for vectors without using vector_insert

Hi, We are using a vector_insert in our target, to describe an instruction performing a lane-load of a vector register as: set $dstReg, (vector_insert $dstReg, (load $addr)), imm:$lane) However, this means that the dstReg is also marked as used in the instruction, which we do not want. We can do a direct lane-load to a part of the vector register without disturbing the rest, and hence would

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 06

2

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

...s their targets. We can then concentrate on optimizing VP intrinsic code and all targets benefit. - Simon *: VE's packed mode (512 x 32bit elements) is a use case for a non-trivial setting of %mask and %evl at the same time (%evl for packs of two 32bit elements (ie %evl must be even for 32bit lanes), %mask for masking out inside packages). Thoughts? Kind regards, -- Roger Ferrer Ibáñez -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201106/72ccfa58/attachment-0001.html>

[LLVMdev] Register coalescer and reg_sequence (virtual super-regs)

2013 Jun 19

1

[LLVMdev] Register coalescer and reg_sequence (virtual super-regs)

...s nice, but I don't think it'll work for me. I have 8-element vector registers that can be grouped into virtual super regs for bulk save/restore, and as soon as I have more than 4 in a tuple, the unsigned int used to hold the lane masks overflows and switches over to the "bit 31 set == lanes unresolvable" mode, and coalescing fails. What about moving the lane masks to a BitVector, that wouldn't need to be constrained artificially? Too much of a performance impact going that way? I'd be open to any thoughts/suggestions. I studied the ARM s_sub/d_sub/q_sub structure but th...

[Bug 77529] New: NVS 510 DP-3 output doesn't work

2014 Apr 16

16

[Bug 77529] New: NVS 510 DP-3 output doesn't work

https://bugs.freedesktop.org/show_bug.cgi?id=77529 Priority: medium Bug ID: 77529 Assignee: nouveau at lists.freedesktop.org Summary: NVS 510 DP-3 output doesn't work QA Contact: xorg-team at lists.x.org Severity: normal Classification: Unclassified OS: All Reporter: tex at sergio.spb.ru

[PATCH 00/10] extract dp helper functions

2012 Oct 18

13

[PATCH 00/10] extract dp helper functions

Hi all, I've frustrated myself the last few days yelling at our link training code. Comparing the i915 code to radeon and nouveau I've noticed the lack of a nice set of dp helper functions. So I've started to extract a few. There's lots more that we can do I think (link configuration selection, the i2c over aux retry stuff which diverges already between i915 and radeon, maybe

[RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

2017 Oct 17

3

[RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

...;16 x i32> <b>, <16 x i1> <mask>, <16 x i32> <passthru>) Overview: Returns the quotient of its two operands per vector lane according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent division in the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the passthru operand. Arguments: The first two arguments must be vectors of integer values. Both arguments must have identical types. The third operand, mask, is a vector of boolean values with th...

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 06

4

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

...s their targets. We can then concentrate on optimizing VP intrinsic code and all targets benefit. - Simon *: VE's packed mode (512 x 32bit elements) is a use case for a non-trivial setting of %mask and %evl at the same time (%evl for packs of two 32bit elements (ie %evl must be even for 32bit lanes), %mask for masking out inside packages). Thoughts? Kind regards, -- Roger Ferrer Ibáñez -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201106/72f6e006/attachment-0001.html>

[LLVMdev] Combining physical registers

2013 May 16

1

[LLVMdev] Combining physical registers

...context here is an attempt to coalesce multiple loads/stores into fewer loads/stores using larger registers. At the moment, there is no way of determining this, but I have a patch. > It is my understanding that register lane masks are not exact in a sense that they will tell me if two register lanes alias, but not necessarily if a set of masks adds up to a full register. Is this correct? That’s right. Would this TRI function solve your problem? /// The lane masks returned by getSubRegIndexLaneMask() above can only be /// used to determine if sub-registers overlap - they can't be us...

[Bug 67628] New: [BISECTED] Monitor on Display port shows distortions

2013 Aug 01

32

[Bug 67628] New: [BISECTED] Monitor on Display port shows distortions

https://bugs.freedesktop.org/show_bug.cgi?id=67628 Priority: medium Bug ID: 67628 Assignee: nouveau at lists.freedesktop.org Summary: [BISECTED] Monitor on Display port shows distortions QA Contact: xorg-team at lists.x.org Severity: major Classification: Unclassified OS: Linux (All) Reporter:

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 09

0

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

...s their targets. We can then concentrate on optimizing VP intrinsic code and all targets benefit. - Simon *: VE's packed mode (512 x 32bit elements) is a use case for a non-trivial setting of %mask and %evl at the same time (%evl for packs of two 32bit elements (ie %evl must be even for 32bit lanes), %mask for masking out inside packages). Thoughts? Kind regards, -- Roger Ferrer Ibáñez -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201109/cb4c9986/attachment.html>

Is this undefined behavior optimization legal?

2016 Oct 03

5

Is this undefined behavior optimization legal?

...undefined behavior and the program would be broken. This assumption is what causes it to remove the 'and' operation. So effectively, what has happened here, is that by inserting the result of an operation with undefined behavior into one lane of a vector, we have overwritten all the other lanes of the vector. Is this optimization legal? To me it seems wrong that undefined behavior in one lane of a vector could affect another lane. However, given that LLVM IR is SSA and we are technically creating a new vector and not modifying the old one, then maybe it's OK. I'm just not sure...

[LLVMdev] Combining physical registers

2013 May 16

2

[LLVMdev] Combining physical registers

...le, on X86, AL and AH together form AX. On Hexagon, R0 and R1 are D0. The context here is an attempt to coalesce multiple loads/stores into fewer loads/stores using larger registers. It is my understanding that register lane masks are not exact in a sense that they will tell me if two register lanes alias, but not necessarily if a set of masks adds up to a full register. Is this correct? -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

4

Implementing cross-thread reduction in the AMDGPU backend

...ean that we'd have a separate intrinsic for every operation we care about, but I can't think of a better way to express it. Is there a better way that doesn't involve creating an intrinsic for each operation? Next, there's the fact that this code sequence only works when the active lanes are densely-packed, but we have to make this work even when control flow is non-uniform. Essentially, we need to "skip over" the inactive lanes by setting them to the identity, and then we need to enable them in the exec mask when doing the reduction to make sure they pass along the corre...

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 06

0

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

...s their targets. We can then concentrate on optimizing VP intrinsic code and all targets benefit. - Simon *: VE's packed mode (512 x 32bit elements) is a use case for a non-trivial setting of %mask and %evl at the same time (%evl for packs of two 32bit elements (ie %evl must be even for 32bit lanes), %mask for masking out inside packages). Thoughts? Kind regards, -- Roger Ferrer Ibáñez -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201106/24e204b1/attachment.html>

semPLS package will not load seems to be failing on loading package lattice

2018 Jan 28

1

semPLS package will not load seems to be failing on loading package lattice

Hi R Help Team I recently updated my R installation to R 3.4.3 and updated to later version of R Studio and I found that the package semPLS will not load even though installed and it seems to be failing on loading package lattice Getting the following error message: library(semPLS) Loading required package: lattice Error: package or namespace load failed for 'lattice': .onLoad failed in

xyplot#strips like ggplot?

2009 Oct 08

1

xyplot#strips like ggplot?

Dear all, I want to split the strips in xyplot and push them into the margins ... Tried to find this in common documentation (such as Deepayan's book) on lattice ... but so far without success ... Here is the situation: xyplot(Speed~Count|Lane*Day,...) where Speed and Count are numeric, Lane and Day are factors. By default, this makes a double strip on top of each graph. I can change

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

2

Implementing cross-thread reduction in the AMDGPU backend

...for every operation we care about, but I can't >> think of a better way to express it. Is there a better way that >> doesn't involve creating an intrinsic for each operation? >> >> Next, there's the fact that this code sequence only works when the >> active lanes are densely-packed, but we have to make this work even >> when control flow is non-uniform. Essentially, we need to "skip over" >> the inactive lanes by setting them to the identity, and then we need >> to enable them in the exec mask when doing the reduction to make sur...

Addressing TableGen's error "Ran out of lanemask bits" in order to use more than 32 subregisters per register

2016 Sep 18

4

Addressing TableGen's error "Ran out of lanemask bits" in order to use more than 32 subregisters per register

...pe int1024_t from the boost > library, header cpp_int.hpp) for LaneMask and change accordingly the > methods handing the type. > > > > Is there are any limitation I am not aware of (maybe in LLVMV's > register allocator) that would prevent me from using more than 32 > lanes/subregisters? > > There is no known limitation. I chose uint32_t out of concern for > compiletime. Going up for uint64_t should be no problem, I'd be more > concerned about bigger types; hopefully all code properly uses the > LaneBitmask type instead of plain unsigned, you may ne...

search for: lanes