thr3ads.net - similar to: "Extending SLP Vectorizer to deal with aggregates?"

[LLVMdev] Extend SLPVectorizer to struct operations that are isomorphic to vector operations?

2014 Apr 17

2

[LLVMdev] Extend SLPVectorizer to struct operations that are isomorphic to vector operations?

While playing with SLPVectorizer, I notice that it will happily vectorize cases involving extractelement/insertelement, but won't vectorize isomorphic cases involving extractvalue/insertvalue (such as the attached example). Is that something that could be straightforward to add to SLPVectorizer, or are there some hard issue? In particular, the transformation would seem to require casts of

[LLVMdev] Improving SLPVectorizer for Julia

2014 Mar 17

2

[LLVMdev] Improving SLPVectorizer for Julia

I'm working on some small improvements to SLPVectorizer.cpp so that it can deal with some tuple operations arising from Julia code. Being fairly new to LLVM, I could use some advice, particular from those familiar with the internals of SLPVectorizer. The motivation can be found in the Julia discussion https://github.com/JuliaLang/julia/issues/5857 . Here is an example of the kind of LLVM

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

2

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hey Nadav, I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem <nrotem at apple.com> wrote: > Author:

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

0

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hi Daniel, Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". Thanks, Nadav Sent from my iPhone > On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

1

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is

[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?

2014 Dec 10

2

[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?

Thanks! That’s probably close enough for practical purposes. I looked at the overrides on various targets, and they all return true if the FMA hardware exists. - Arch From: Jingyue Wu [mailto:jingyue at google.com] Sent: Wednesday, December 10, 2014 2:56 PM To: Robison, Arch Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Best way for JIT to query whether llvm.fma.* is fast? Does

[LLVMdev] Floating-point range checks

2015 Jan 13

2

[LLVMdev] Floating-point range checks

After writing a simple FPRange, I've hit a stumbling block. I don't know what LLVM code should be extended to use it. I was initially thinking of extending LazyValueInfo, but it appears to be used for passes that don’t address the case that I need for Julia. I’m now wondering if I’m better off extending SimplifyFCmpInst to handle the few cases in question instead of trying to be more

[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?

2014 Dec 10

2

[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?

For the Julia language JIT, we'd like be able to tell whether the llvm.fma.* intrinsic has hardware support. What's the best way to query LLVM (JIT) for this information? The information would be used in situations where the user wants to use different algorithms depending on whether FMA hardware is present or not. - Arch D. Robison -------------- next part -------------- An HTML

[LLVMdev] Floating-point range checks

2015 Jan 08

3

[LLVMdev] Floating-point range checks

Thanks for the pointers. Looks like LazyValueInfo has the sort of infrastructure I had in mind. LVILatticeVal could be extended to floating point. (The comment "this can be made a lot more rich in the future" is an invitation :-). I'm thinking a simple lattice would address most cases of interest for floating-point checks. The lattice points for floating-point could be all

[LLVMdev] Floating-point range checks

2015 Jan 07

2

[LLVMdev] Floating-point range checks

The Julia language implements sqrt(x) with conditional branch taken if x<0. Alas this prevents vectorization of loops with sqrt. Often the argument can be proven to be non-negative. E.g., sqrt(x*x+y*y). Is there an existing LLVM pass or analysis that does floating-point range propagation to eliminate such unnecessary checks? Arch D. Robison Intel Corporation -------------- next part

[LLVMdev] Modifications to SLP

2015 Jul 07

2

[LLVMdev] Modifications to SLP

Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

2014 Aug 28

2

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

It's a problem in the OpenMP specification. The authors (including some from Intel) intended that the OpenMP simd construct assert no lexically backward dependences exist, but as you say, it's not obvious from the spec. One of our OpenMP community members is going to bring up the ambiguity with the OpenMP committee. - Arch -----Original Message----- From: Humphreys, Jonathan

[LLVMdev] Floating-point range checks

2015 Jan 08

2

[LLVMdev] Floating-point range checks

Yes, the modeling of floating-point is trickier. The wrap-around trick used by ConstantRange seems less applicable, and there are the unordered NaNs. Though in all cases, the key abstraction is a lattice of values, so an instance of FPRange should be thought of as a point on a lattice, not an interval. The lattice needs to be complicated enough the cover the cases of interest, but not so

[LLVMdev] Floating-point range checks

2015 Jan 08

2

[LLVMdev] Floating-point range checks

> Checks against 1.0 are also common. Why not just add a FP range class, like our constant range, and go from there? That's certainly another way to go. My worry is that a more complicated lattice gets us deeper into rounding-mode issues and considerably more work for smaller gain. I like the idea of creating an FPRange class. We could start with a simple one and extend it as experience

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

2014 Aug 28

2

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

> Sorry for coming to the discussion so late. I have a couple of questions/comments: Actually, you're note is timely, because I'm planning to send out a patch (as soon as I update LangRef and rerun tests) that supports safelen but *not* lexical dependences. I don't need the safelen for Julia, but having done the work and seeing that OpenMP needs it, feel that I should finish the

[LLVMdev] Floating-point range checks

2015 Jan 08

2

[LLVMdev] Floating-point range checks

With floating point, won't you need to model a partial order for the end points? I thought there were pairs of floating point values which are incomparable. Not sure if partial order is the right abstraction, but I'm pretty sure a total order isn't. This may make implementing the range comparisons (which are themselves partially ordered) a bit tricky... Philip (Who knows just

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

2014 Aug 20

4

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

----- Original Message ----- > From: "Arnold Schwaighofer" <aschwaighofer at apple.com> > To: "Johannes Doerfert" <doerfert at cs.uni-saarland.de> > Cc: llvmdev at cs.uiuc.edu, "Arch Robison" <arch.robison at intel.com> > Sent: Wednesday, August 20, 2014 11:29:16 AM > Subject: Re: [LLVMdev] Proposal for

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

2014 Aug 12

7

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

Julia and OpenMP 4.0 have features where the user can bless a loop as having no memory dependences that prevent vectorization, thus enabling vectorization of loops not amenable to compile-time or run-time dependence analysis. LLVM currently has no metadata to express such, as explained further below. I'd like to propose new metadata that enables front-ends to tell the vectorizer that

[LLVMdev] How to broaden the SLP vectorizer's search

2014 Aug 07

3

[LLVMdev] How to broaden the SLP vectorizer's search

The BB vectorizer has an option 'bb-vectorizer-search-limit'. Is there a similar option for the SLP vectorizer? Maybe an analysis pass' scope that can be widen? I have large basic blocks with instructions that should be merged into packed versions. However, the blocks are optimized independently from each other. Now, if the instructions to be merged aren't too far apart the

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

2014 Sep 28

2

[LLVMdev] Proposal for ""llvm.mem.vectorize.safelen"

More precisely, for a simd loop, if the safelen(VL) clause is specified, there should have no loop-carried lexical backward data dependency within the specified safe vector length VL. We will make this clear in the OpenMP 4.1 spec. Xinmin Tian (Intel) -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel Sent:

similar to: Extending SLP Vectorizer to deal with aggregates?