Displaying 20 results from an estimated 4000 matches similar to: "Status of llvm.experimental.vector.reduce.* intrinsics"
2017 Aug 03
2
Status of llvm.experimental.vector.reduce.* intrinsics
Hi Amara,
thank you for the clarification. I tested the intrinsics x86_64 and it
seemed to work pretty well. Looking forward to try this intrinsics with
the AArch64 backend. Maybe I find the time to look into codegen to get
this intrinsics out of experimental stage. They seem pretty useful.
Cheers,
Michael
-----Original Message-----
From: Amara Emerson [amara.emerson at gmail.com]
Received:
2017 Aug 04
3
Status of llvm.experimental.vector.reduce.* intrinsics
I assume smaller types like <4 x i1> are getting zero extended to e.g., i8?
Am 04.08.2017 um 15:58 schrieb Amara Emerson:
> Actually for mask vectors of i1 values, you don't need to use reductions
> at all(although for SVE this is what we'll do). You can instead bitcast
> the vector value to an i8/i16/whatever and then compare against zero.
>
> Amara
>
> On
2017 Aug 04
2
Status of llvm.experimental.vector.reduce.* intrinsics
I am currently working on a transformation pass that transforms
masked.load and masked.store intrinsics to (hopefully) increase
performance on targets where masked.load and masked.store are not legal.
To check if the loads and stores are necessary at all I take the mask
for the masked operations and want to reduce them to a single value.
vector.reduce.or seemed very handy to do the job.
I
2017 Sep 22
2
[Hexagon] Type Legalization
Is VT a legal type on Hexagon? It looks like Hexagon may be setting SHL as
Custom for every defined vector type. Try adding TLI.isTypeLegal(VT) too.
~Craig
On Thu, Sep 21, 2017 at 10:06 PM, Haidl, Michael <
michael.haidl at uni-muenster.de> wrote:
> Hi Sanjay,
>
> thanks for this information. I did get a little bit further with the
> patch. However, Hexagon gives me headaches.
2017 Sep 14
3
How to add optimizations to InstCombine correctly?
Hi Craig,
thanks for digging into this. So InstCombine is the wrong place for
fixing PR34474. Can you give me a hint where such an optimization should
go into CodeGen? I am not really familiar with stuff that happens after
the MidLevel.
Cheers,
Michael
Am 13.09.2017 um 19:21 schrieb Craig Topper:
> And that is less instructions. So from InstCombine's perspective the
> multiply is
2017 Sep 22
0
[Hexagon] Type Legalization
Hi Craig,
protecting the transformation with:
if (TLI.isTypeLegal(VT)
&& TLI.isOperationLegal(ISD::SUB, VT)
&& TLI.isOperationLegal(ISD::ADD, VT)
&& TLI.isOperationLegal(ISD::SHL, VT)
&& TLI.isOperationLegal(ISD::SRA, VT)) {
shows the same result.
Michael
On 22.09.2017 07:19, Craig Topper wrote:
> Is VT a legal type on Hexagon?
2017 Sep 16
2
How to add optimizations to InstCombine correctly?
This conversation has (partially) moved on to D37896 now, but if possible I was hoping that we could perform this in DAGCombiner and remove the various target specific combines that we still have.
At least ARM/AARCH64 and X86 have cases that can hopefully be generalised and removed, but there will probably be a few legality/perf issues that will occur.
Simon.
> On 14 Sep 2017, at 06:23,
2017 Dec 06
2
[AMDGPU] Strange results with different address spaces
> On Dec 6, 2017, at 02:28, Haidl, Michael <michael.haidl at uni-muenster.de> wrote:
>
> The IR goes through a backend agnostic preparation phase that brings it into SSA from and changes the AS from 0 to 1.
This sounds possibly problematic to me. The IR should be created with the correct address space to begin with. Changing this in the middle sounds suspect.
> After this
2017 Sep 19
0
How to add optimizations to InstCombine correctly?
Hi Sanjay,
thanks for enlighten me on terms of tests. I assume I have to run the test-suite benchmarks to check for regressions? Is there a guide to get the metrics from the benchmarks?
Cheers,
Michael
BTW the beginner tag for bugs was really a good idea to get started with contributing to llvm.
On Tue, Sep 19, 2017 at 3:58 PM +0200, "Sanjay Patel" <spatel at
2017 Sep 19
5
How to add optimizations to InstCombine correctly?
For the tests that are changing, you should see if those changes are
improvements, regressions, or neutral. This is unfortunately not always
obvious for x86 asm, so feel free to just post those diffs in an updated
version of the patch at D37896.
If the test files have auto-generated assertions (look for this string on
the first line of the test file: "NOTE: Assertions have been autogenerated
2017 Sep 13
2
RFC phantom memory intrinsic
Hi Michael,
>I have a case where InstCombine removes a store and your approach would be
>valuable for me if the entire access to an aggregate could be restored.
Yes, no problem and we could add the aggregate pointer to this new
intrinsic and in my particular case I should ignore it, but I am
looking now at "speculation_marker" metadata and I am still not sure
how to implement it
2017 Sep 19
0
How to add optimizations to InstCombine correctly?
I am currently improving the D37896 to include the suggestions from
Chad. However, running the lit checks for the x86 backend I observe some
changes in the generated MC, e.g.:
llvm/test/CodeGen/X86/lea-3.ll:13:10: error: expected string not found
in input
; CHECK: leal ([[A0]],[[A0]],2), %eax
^
<stdin>:10:2: note: scanning from here
orq %rdi, %rax
^
<stdin>:10:2:
2017 Sep 20
3
Updating LLVM Tests for Patch
There are multiple problems/questions here:
1. Make sure you've updated trunk to the latest rev before running
update_llc_test_checks.py on lea-3.ll. Ie, I would only expect the output
you're seeing if you're running the script on a version of that test file
before r313631. After that commit, each RUN has its own check prefix, so
there should be no conflict opportunity.
2. I
2017 Sep 22
0
[Hexagon] Type Legalization
Hi Sanjay,
thanks for this information. I did get a little bit further with the
patch. However, Hexagon gives me headaches.
I tried to limit the scope of the patch to the BeforeLegalizeTypes phase
and Hexagon still reaches the unreachable. Hexagon tries to split or
widen a vector type for a node with custom lowering where the
unreachable arises from inside TargetLowering::ReplaceNodeResults
2017 Sep 13
2
RFC phantom memory intrinsic
Hi Michael,
>Interesting approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases.
Yes, correct, this a little bit changed example is not working.
#include <x86intrin.h>
__m256d vsht_d4_fold(const double* ptr, unsigned long long i) {
__m256d foo = (__m256d){ ptr[i], ptr[i+1],
2017 Sep 26
2
RFC phantom memory intrinsic
Hi Hal,
>Are you primarily concerned with being able to widen loads later in the pipeline? Could we attached metadata to the remaining loads indicating that it would be legal to widen them?
no, I don't have any concerns about intrinsic way of implementation,
and intrinsic way looks safer for me since we somehow detach our
information about memory from that actual load instruction. I updated
2017 Dec 05
2
[AMDGPU] Strange results with different address spaces
> On Dec 5, 2017, at 13:53, Matt Arsenault <arsenm2 at gmail.com> wrote:
>
>
>
>> On Dec 5, 2017, at 02:51, Haidl, Michael via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Hi dev list,
>>
>> I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some
2017 Sep 26
0
RFC phantom memory intrinsic
On 09/13/2017 04:46 PM, Dinar Temirbulatov via llvm-dev wrote:
> Hi Michael,
>> I have a case where InstCombine removes a store and your approach would be
>> valuable for me if the entire access to an aggregate could be restored.
> Yes, no problem and we could add the aggregate pointer to this new
> intrinsic and in my particular case I should ignore it, but I am
> looking
2017 Sep 26
0
RFC phantom memory intrinsic
On 09/26/2017 08:31 AM, Dinar Temirbulatov wrote:
> Hi Hal,
>> Are you primarily concerned with being able to widen loads later in the pipeline? Could we attached metadata to the remaining loads indicating that it would be legal to widen them?
> no, I don't have any concerns about intrinsic way of implementation,
> and intrinsic way looks safer for me since we somehow detach our
2020 Apr 08
7
RFC: Promoting experimental reduction intrinsics to first class intrinsics
Hi,
It’s been a few years now since I added some intrinsics for doing vector reductions. We’ve been using them exclusively on AArch64, and I’ve seen some traffic a while ago on list for other targets too. Sander did some work last year to refine the semantics after some discussion.
Are we at the point where we can drop the “experimental” from the name? IMO all target should begin to transition