thr3ads.net - similar to: "Status of llvm.experimental.vector.reduce.* intrinsics"

Displaying 20 results from an estimated 4000 matches similar to: "Status of llvm.experimental.vector.reduce.* intrinsics"

Status of llvm.experimental.vector.reduce.* intrinsics

2017 Aug 03

Status of llvm.experimental.vector.reduce.* intrinsics

Hi Amara, thank you for the clarification. I tested the intrinsics x86_64 and it seemed to work pretty well. Looking forward to try this intrinsics with the AArch64 backend. Maybe I find the time to look into codegen to get this intrinsics out of experimental stage. They seem pretty useful. Cheers, Michael -----Original Message----- From: Amara Emerson [amara.emerson at gmail.com] Received:

Status of llvm.experimental.vector.reduce.* intrinsics

2017 Aug 04

Status of llvm.experimental.vector.reduce.* intrinsics

I assume smaller types like <4 x i1> are getting zero extended to e.g., i8? Am 04.08.2017 um 15:58 schrieb Amara Emerson: > Actually for mask vectors of i1 values, you don't need to use reductions > at all(although for SVE this is what we'll do). You can instead bitcast > the vector value to an i8/i16/whatever and then compare against zero. > > Amara > > On

Status of llvm.experimental.vector.reduce.* intrinsics

2017 Aug 04

Status of llvm.experimental.vector.reduce.* intrinsics

I am currently working on a transformation pass that transforms masked.load and masked.store intrinsics to (hopefully) increase performance on targets where masked.load and masked.store are not legal. To check if the loads and stores are necessary at all I take the mask for the masked operations and want to reduce them to a single value. vector.reduce.or seemed very handy to do the job. I

[Hexagon] Type Legalization

2017 Sep 22

[Hexagon] Type Legalization

Is VT a legal type on Hexagon? It looks like Hexagon may be setting SHL as Custom for every defined vector type. Try adding TLI.isTypeLegal(VT) too. ~Craig On Thu, Sep 21, 2017 at 10:06 PM, Haidl, Michael < michael.haidl at uni-muenster.de> wrote: > Hi Sanjay, > > thanks for this information. I did get a little bit further with the > patch. However, Hexagon gives me headaches.

How to add optimizations to InstCombine correctly?

2017 Sep 14

How to add optimizations to InstCombine correctly?

Hi Craig, thanks for digging into this. So InstCombine is the wrong place for fixing PR34474. Can you give me a hint where such an optimization should go into CodeGen? I am not really familiar with stuff that happens after the MidLevel. Cheers, Michael Am 13.09.2017 um 19:21 schrieb Craig Topper: > And that is less instructions. So from InstCombine's perspective the > multiply is

[Hexagon] Type Legalization

2017 Sep 22

[Hexagon] Type Legalization

Hi Craig, protecting the transformation with: if (TLI.isTypeLegal(VT) && TLI.isOperationLegal(ISD::SUB, VT) && TLI.isOperationLegal(ISD::ADD, VT) && TLI.isOperationLegal(ISD::SHL, VT) && TLI.isOperationLegal(ISD::SRA, VT)) { shows the same result. Michael On 22.09.2017 07:19, Craig Topper wrote: > Is VT a legal type on Hexagon?

How to add optimizations to InstCombine correctly?

2017 Sep 16

How to add optimizations to InstCombine correctly?

This conversation has (partially) moved on to D37896 now, but if possible I was hoping that we could perform this in DAGCombiner and remove the various target specific combines that we still have. At least ARM/AARCH64 and X86 have cases that can hopefully be generalised and removed, but there will probably be a few legality/perf issues that will occur. Simon. > On 14 Sep 2017, at 06:23,

[AMDGPU] Strange results with different address spaces

2017 Dec 06

[AMDGPU] Strange results with different address spaces

> On Dec 6, 2017, at 02:28, Haidl, Michael <michael.haidl at uni-muenster.de> wrote: > > The IR goes through a backend agnostic preparation phase that brings it into SSA from and changes the AS from 0 to 1. This sounds possibly problematic to me. The IR should be created with the correct address space to begin with. Changing this in the middle sounds suspect. > After this

How to add optimizations to InstCombine correctly?

2017 Sep 19

How to add optimizations to InstCombine correctly?

Hi Sanjay, thanks for enlighten me on terms of tests. I assume I have to run the test-suite benchmarks to check for regressions? Is there a guide to get the metrics from the benchmarks? Cheers, Michael BTW the beginner tag for bugs was really a good idea to get started with contributing to llvm. On Tue, Sep 19, 2017 at 3:58 PM +0200, "Sanjay Patel" <spatel at

How to add optimizations to InstCombine correctly?

2017 Sep 19

How to add optimizations to InstCombine correctly?

For the tests that are changing, you should see if those changes are improvements, regressions, or neutral. This is unfortunately not always obvious for x86 asm, so feel free to just post those diffs in an updated version of the patch at D37896. If the test files have auto-generated assertions (look for this string on the first line of the test file: "NOTE: Assertions have been autogenerated

RFC phantom memory intrinsic

2017 Sep 13

RFC phantom memory intrinsic

Hi Michael, >I have a case where InstCombine removes a store and your approach would be >valuable for me if the entire access to an aggregate could be restored. Yes, no problem and we could add the aggregate pointer to this new intrinsic and in my particular case I should ignore it, but I am looking now at "speculation_marker" metadata and I am still not sure how to implement it

How to add optimizations to InstCombine correctly?

2017 Sep 19

How to add optimizations to InstCombine correctly?

I am currently improving the D37896 to include the suggestions from Chad. However, running the lit checks for the x86 backend I observe some changes in the generated MC, e.g.: llvm/test/CodeGen/X86/lea-3.ll:13:10: error: expected string not found in input ; CHECK: leal ([[A0]],[[A0]],2), %eax ^ <stdin>:10:2: note: scanning from here orq %rdi, %rax ^ <stdin>:10:2:

Updating LLVM Tests for Patch

2017 Sep 20

Updating LLVM Tests for Patch

There are multiple problems/questions here: 1. Make sure you've updated trunk to the latest rev before running update_llc_test_checks.py on lea-3.ll. Ie, I would only expect the output you're seeing if you're running the script on a version of that test file before r313631. After that commit, each RUN has its own check prefix, so there should be no conflict opportunity. 2. I

[Hexagon] Type Legalization

2017 Sep 22

[Hexagon] Type Legalization

Hi Sanjay, thanks for this information. I did get a little bit further with the patch. However, Hexagon gives me headaches. I tried to limit the scope of the patch to the BeforeLegalizeTypes phase and Hexagon still reaches the unreachable. Hexagon tries to split or widen a vector type for a node with custom lowering where the unreachable arises from inside TargetLowering::ReplaceNodeResults

RFC phantom memory intrinsic

2017 Sep 13

RFC phantom memory intrinsic

Hi Michael, >Interesting approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases. Yes, correct, this a little bit changed example is not working. #include <x86intrin.h> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { __m256d foo = (__m256d){ ptr[i], ptr[i+1],

RFC phantom memory intrinsic

2017 Sep 26

RFC phantom memory intrinsic

Hi Hal, >Are you primarily concerned with being able to widen loads later in the pipeline? Could we attached metadata to the remaining loads indicating that it would be legal to widen them? no, I don't have any concerns about intrinsic way of implementation, and intrinsic way looks safer for me since we somehow detach our information about memory from that actual load instruction. I updated

[AMDGPU] Strange results with different address spaces

2017 Dec 05

[AMDGPU] Strange results with different address spaces

> On Dec 5, 2017, at 13:53, Matt Arsenault <arsenm2 at gmail.com> wrote: > > > >> On Dec 5, 2017, at 02:51, Haidl, Michael via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi dev list, >> >> I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some

RFC phantom memory intrinsic

2017 Sep 26

RFC phantom memory intrinsic

On 09/13/2017 04:46 PM, Dinar Temirbulatov via llvm-dev wrote: > Hi Michael, >> I have a case where InstCombine removes a store and your approach would be >> valuable for me if the entire access to an aggregate could be restored. > Yes, no problem and we could add the aggregate pointer to this new > intrinsic and in my particular case I should ignore it, but I am > looking

RFC phantom memory intrinsic

2017 Sep 26

RFC phantom memory intrinsic

On 09/26/2017 08:31 AM, Dinar Temirbulatov wrote: > Hi Hal, >> Are you primarily concerned with being able to widen loads later in the pipeline? Could we attached metadata to the remaining loads indicating that it would be legal to widen them? > no, I don't have any concerns about intrinsic way of implementation, > and intrinsic way looks safer for me since we somehow detach our

RFC: Promoting experimental reduction intrinsics to first class intrinsics

2020 Apr 08

RFC: Promoting experimental reduction intrinsics to first class intrinsics

Hi, It’s been a few years now since I added some intrinsics for doing vector reductions. We’ve been using them exclusively on AArch64, and I’ve seen some traffic a while ago on list for other targets too. Sander did some work last year to refine the semantics after some discussion. Are we at the point where we can drop the “experimental” from the name? IMO all target should begin to transition

similar to: Status of llvm.experimental.vector.reduce.* intrinsics