thr3ads.net - similar to: "[LLVMdev] How to write output of a backend to a memory buffer instead of a into a file?"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] How to write output of a backend to a memory buffer instead of a into a file?"

raw_pwrite_stream to string or stdout?

2016 Feb 22

raw_pwrite_stream to string or stdout?

TargetMachine::CGFT_AssemblyFile is exactly what I am trying to write out. Frank On 02/22/2016 11:06 AM, Rafael Espíndola wrote: > On 19 February 2016 at 16:16, Frank Winter via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> TargetMachine::addPassesToEmitFile(..) >> requires as its 2nd argument an raw_pwrite_stream. >> >> Is it possible to create such a

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

[LLVMdev] Limit loop vectorizer to SSE

On 15 November 2013 20:05, Frank Winter <fwinter at jlab.org> wrote: > Good catch! That was the problem in my case too. I totally > overlooked the alignment requirement for AVX. I wonder if the validation mechanism shouldn't have caught it earlier... Do you guys run validate on the modules before JIT-ing? --renato -------------- next part -------------- An HTML attachment was

[LLVMdev] loop vectorizer

2013 Nov 06

[LLVMdev] loop vectorizer

Good that you bring this up. I still have no solution to this vectorization problem. However, I can rewrite the code and insert a second loop which eliminates the 'urem' and 'div' instructions in the index calculations. In this case, the inner loop's trip count would be equal to the SIMD length and the loop vectorizer ignores the loop. Unrolling the loop and SLP is not an

[LLVMdev] loop vectorizer

2013 Oct 31

[LLVMdev] loop vectorizer

On 30 October 2013 18:40, Frank Winter <fwinter at jlab.org> wrote: > const std::uint64_t ir0 = (i+0)%4; // not working > I thought this would be the case when I saw the original expression. Maybe we need to teach module arithmetic to SCEV? --renato -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] SLP vectorizer on AVX feature

2015 Jul 01

[LLVMdev] SLP vectorizer on AVX feature

On 1 July 2015 at 21:22, Frank Winter <fwinter at jlab.org> wrote: > there were two follow-up emails. I only got one... weird... > The issue is solved. The SLP vectorizer has > a magic number built into the code which determines the max. vector length > to search for. That was set to 128 bits. Increasing it to 256 bits solved > the issue. That looks like a simple fix. Is

[LLVMdev] loop vectorizer misses opportunity, exploit

2013 Oct 31

[LLVMdev] loop vectorizer misses opportunity, exploit

Hi Frank, This loop should be vectorized by the SLP-vectorizer. It has several scalars (C[0], C[1] … ) that can be merged into a vector. The SLP vectorizer can’t figure out that the stores are consecutive because SCEV can’t analyze the OR in the index calculation: %2 = and i64 %i.04, 3 %3 = lshr i64 %i.04, 2 %4 = shl i64 %3, 3 %5 = or i64 %4, %2 %11 = getelementptr inbounds float*

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

[LLVMdev] Limit loop vectorizer to SSE

Hmm.. I don't quite understand. How can a module validator catch this, when it's the pointers, i.e. the payload, you pass as function arguments that need to be aligned.. ?! Frank On 15/11/13 15:16, Renato Golin wrote: > On 15 November 2013 20:05, Frank Winter <fwinter at jlab.org > <mailto:fwinter at jlab.org>> wrote: > > Good catch! That was the problem in my

[LLVMdev] loop vectorizer

2013 Nov 06

[LLVMdev] loop vectorizer

Sent from my iPhone > On Nov 5, 2013, at 7:39 PM, Frank Winter <fwinter at jlab.org> wrote: > > Good that you bring this up. I still have no solution to this vectorization problem. > > However, I can rewrite the code and insert a second loop which eliminates the 'urem' and 'div' instructions in the index calculations. In this case, the inner loop's trip

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 12

[LLVMdev] Limit loop vectorizer to SSE

On 12 November 2013 15:14, Frank Winter <fwinter at jlab.org> wrote: > I am asking because the option 'force-vector-width' is too restrictive. > I would like to leave open the possibility to use vector width 2. I was about to say that, and you saved us both one cycle. ;) What you could do is to force an architecture that doesn't have AVX, only SSE. I'm not sure how

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 12

[LLVMdev] Limit loop vectorizer to SSE

On 12 November 2013 15:53, Frank Winter <fwinter at jlab.org> wrote: > .. forcing the vector size to 4 does not prevent using AVX. > Sure. That's more for tests than anything else. So, there are ways of disabling stuf in Clang, for instance "-mattr=-avx" or "-target-feature -avx", but I'm not sure how you're doing it in the JIT. I'm also not sure

[LLVMdev] MCJIT generates MOVAPS on unaligned address

2014 Aug 07

[LLVMdev] MCJIT generates MOVAPS on unaligned address

It's not reproducible with 'opt'. I call the SLP pass from my application and only then the wrong IR gets generated. On the attached module I call via the function pass manager: 1) TargetLibraryInfo with the target triple 2) Set the data layout 3) Basic Alias Analysis 4) SLP vectorizer This produces the wrong IR. On the other hand running the attached module through 'opt

[LLVMdev] SLP vectorizer on AVX feature

2015 Jul 01

[LLVMdev] SLP vectorizer on AVX feature

Frank, It sounds like the SLP vectorizer thinks that it is more profitable to use 128bit wide operations (because 256bit operations are double pumped on Sandybridge). Did you see a different result on Haswell? Thanks, Nadav > On Jul 1, 2015, at 11:06 AM, Frank Winter <fwinter at jlab.org> wrote: > > I realized that the function parameters had no alignment attributes on them.

[LLVMdev] How to broaden the SLP vectorizer's search

2014 Aug 08

[LLVMdev] How to broaden the SLP vectorizer's search

Hi Frank, Thanks for working on this. Please look at vectorizeStoreChains. In this function we process all of the stores in the function in buckets of 16 elements because constructing consecutive stores is implemented using an O(n^2) algorithm. You can try to increase this threshold to 128 and see if it helps. I also agree with Renato and Chad that adding a flag to tell the SLP-vectorizer to

[LLVMdev] MCJIT generates MOVAPS on unaligned address

2014 Aug 07

[LLVMdev] MCJIT generates MOVAPS on unaligned address

> On Aug 7, 2014, at 2:57 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > > Your .ll file does not have a data layout. Opt will not initialize the DataLayoutPass. The SLP vectorizer will not vectorize because there is no DataLayoutPass. > > debug-cmake/bin/opt -default-data-layout="e-m:e-i64:64-f80:128-n8:16:32:64-S128" -basicaa -slp-vectorizer -S

[LLVMdev] addPassesToEmitFile and Intel syntax

2015 Jul 27

[LLVMdev] addPassesToEmitFile and Intel syntax

Hi, I am using TargetMachine::addPassesToEmitFile to write out x86-64 assembly. However, the generated assembly uses the AT&T syntax. Is there a way to switch to the Intel syntax? Thanks, Frank

[LLVMdev] loop vectorizer misses opportunity, exploit

2013 Oct 31

[LLVMdev] loop vectorizer misses opportunity, exploit

A quite small but yet complete example function which all vectorization passes fail to optimize: #include <cstdint> #include <iostream> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { for ( std::uint64_t i = start ; i < end ; i += 4 ) { { const std::uint64_t ir0 = (i+0)%4 + 8*((i+0)/4);

[LLVMdev] loop vectorizer and storing to uniform addresses

2013 Nov 08

[LLVMdev] loop vectorizer and storing to uniform addresses

I am trying my luck on this global reduction kernel: float foo( int start , int end , float * A ) { float sum[4] = {0.,0.,0.,0.}; for (int i = start ; i < end ; ++i ) { for (int q = 0 ; q < 4 ; ++q ) sum[q] += A[i*4+q]; } return sum[0]+sum[1]+sum[2]+sum[3]; } LV: Checking a loop in "foo" LV: Found a loop: for.cond1 LV: Found an induction variable. LV: We

[LLVMdev] loop vectorizer

2013 Nov 06

[LLVMdev] loop vectorizer

On 06/11/13 08:54, Arnold wrote: > > > Sent from my iPhone > > On Nov 5, 2013, at 7:39 PM, Frank Winter <fwinter at jlab.org > <mailto:fwinter at jlab.org>> wrote: > >> Good that you bring this up. I still have no solution to this >> vectorization problem. >> >> However, I can rewrite the code and insert a second loop which >>

[LLVMdev] loop vectorizer

2013 Oct 31

[LLVMdev] loop vectorizer

On Oct 30, 2013, at 6:10 PM, Frank Winter <fwinter at jlab.org> wrote: > the only option I see is to unroll the loop by hand. Since the array access is consecutive over 4 loop iterations I gave it a try and unrolled the loop by a factor of 4. Which gives the following array accesses: > > loop iter 0: > index_0 = 0 index_1 = 4 > index_0 = 1 index_1 = 5 > index_0 = 2

target triple in 3.8

2016 Feb 19

target triple in 3.8

I added your suggestion and am using this now llvm::legacy::FunctionPassManager *functionPassManager = new llvm::legacy::FunctionPassManager(Mod); llvm::PassRegistry &registry = *llvm::PassRegistry::getPassRegistry(); initializeScalarOpts(registry); functionPassManager->add( new llvm::TargetLibraryInfoWrapperPass(llvm::TargetLibraryInfoImpl(targetMachine->getTargetTriple())) );

similar to: [LLVMdev] How to write output of a backend to a memory buffer instead of a into a file?