similar to: [LLVMdev] Failure to optimize vector select

Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] Failure to optimize vector select"

2013 Aug 20
0
[LLVMdev] Failure to optimize vector select
Have you tried running SLP vectorizer pass (-vectorize-slp)? Eugene On Mon, Aug 19, 2013 at 9:04 PM, Matt Arsenault <arsenm2 at gmail.com> wrote: > Hi, > > I've found a case I would expect would optimize easily, but it doesn't. A > simple implementation of vector select: > > float4 simple_select(float4 a, float4 b, int4 c) > { > float4 result; >
2013 Aug 20
3
[LLVMdev] Failure to optimize vector select
On Aug 19, 2013, at 18:47 , Eugene Toder <eltoder at gmail.com> wrote: > Have you tried running SLP vectorizer pass (-vectorize-slp)? Yes. That was the first thing i tried, and it didn't do anything. I was looking the vectorizer, but then I saw some things that made me wonder if it was even supposed to do this
2013 Aug 20
0
[LLVMdev] Failure to optimize vector select
Hi Matt, This code maintains a vector of float4 and it inserts and extracts values from this vector. The ’select’ operations are already vectorized. Maybe a sequence of inst-combines (or DAG-combines) can help. If you re-write this code using scalars then the slp-vectorizer, with some tweaks, will be able to catch it. Thanks, Nadav On Aug 20, 2013, at 1:14 PM, Matt Arsenault <arsenm2 at
2013 Aug 20
0
[LLVMdev] Failure to optimize vector select
Can you send the IR of the function ? On Aug 20, 2013, at 8:36 AM, Matt Arsenault <arsenm2 at gmail.com> wrote: > > On Aug 19, 2013, at 18:47 , Eugene Toder <eltoder at gmail.com> wrote: > >> Have you tried running SLP vectorizer pass (-vectorize-slp)? > Yes. That was the first thing i tried, and it didn't do anything. I was looking the vectorizer, but then
2013 Aug 20
1
[LLVMdev] Failure to optimize vector select
On Aug 20, 2013, at 14:49 , Nadav Rotem <nrotem at apple.com> wrote: > Hi Matt, > > This code maintains a vector of float4 and it inserts and extracts values from this vector. The ’select’ operations are already vectorized. Maybe a sequence of inst-combines (or DAG-combines) can help. If you re-write this code using scalars then the slp-vectorizer, with some tweaks, will be able
2013 Aug 20
3
[LLVMdev] Failure to optimize vector select
On Aug 20, 2013, at 10:22 , Nadav Rotem <nrotem at apple.com> wrote: > Can you send the IR of the function ? Attached is the -O0 and -O3 IR -------------- next part -------------- A non-text attachment was scrubbed... Name: vselect_optimized.ll Type: application/octet-stream Size: 1545 bytes Desc: not available URL:
2012 Feb 28
1
[LLVMdev] How to vectorize a vector type cast?
Since Clang does not seem to allow type casts, such as uchar4 to float4, between vector types, it seems it is necessary to write them as element by element conversions, such as typedef float float4 __attribute__((ext_vector_type(4))); typedef unsigned char uchar4 __attribute__((ext_vector_type(4))); float4 to_float4(uchar4 in) { float4 out = {in.x, in.y, in.z, in.w}; return out; } Running
2019 Oct 17
2
Static assert fails when compiler for i386
Hi Devs, Consider below testcase. $cat test.cpp #include <vector> #include<type_traits> typedef int _int4 __attribute__((vector_size(16))); typedef union{ int data[4]; struct {int x, y, z, w;}; _int4 vec; } int4; typedef int4 int3; int main() { static_assert(std::alignment_of<int4>::value <= alignof(max_align_t), "over aligned!"); } $clang++ -m32 error:
2008 Sep 30
4
[LLVMdev] Generalizing shuffle vector
Hi, The current definition of shuffle vector is <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <n x i32> <mask> ; yields <n x <ty>> The first two operands of a 'shufflevector' instruction are vectors with types that match each other and types that match the result of the instruction. The third
2011 Nov 02
5
[LLVMdev] About JIT by LLVM 2.9 or later
Hello guys, Thanks for your help when you are busing. I am working on an open source project. It supports shader language and I want JIT feature, so LLVM is used. But now I find the ABI & Calling Convention did not co-work with MSVC. For example, following code I have: struct float4 { float x, y, z, w; }; struct float4x4 { float4 x, y, z, w; }; float4 fetch_vs( float4x4* mat
2009 Oct 05
5
[LLVMdev] Functions: sret and readnone
Hi all, I'm currently building a DSL for a computer graphics project that is not unlike NVIDIA's Cg. I have an intrinsic with the following signature float4 sample(texture tex, float2 coords); that is translated to this LLVM IR code: declare void @"sample"(%float4* noalias nocapture sret, %texture, $float2) nounwind readnone The type float4 is basically an array of four
2009 Dec 03
4
[LLVMdev] Win64 Calling Convention problem
Hi! I have discovered a problem with LLVM's interpretation of the Win64 calling convention w.r.t. passing of aggregates as arguments. The following code is part of my host application that is compiled with Visual Studio 2005 in 64-bit debug mode. noise4 expects a structure of four floats as its first and only argument, which is - in accordance with the specs of the Win64 calling convention -
2003 Apr 02
1
RODBC sqlSave problem.
Dear list, Being new to both the postgres database, ODBC and the RODBC interface, I am somewhat confused by some of the problems I am experiencing trying to connect R to the database. Whai I am trying is basically the example part of the help file for the sqlSave function: > library(RODBC) > odbcConnect("theodor") -> channel > data(USArrests) > sqlSave(channel,
2009 Nov 05
0
[LLVMdev] Functions: sret and readnone
It's been a while and I finally had the time to look into this. What I did was to build a custom AliasAnalysis pass, as Chris suggested, that returns AliasAnalysis::Mod for values passed to the sample function in the sret spot, and NoModRef for all other values. I'm also returning AliasAnalysis::AccessesArguments in the pass' getModRefBehavior methods. However, I haven't been
2008 Oct 14
4
[LLVMdev] Making GEP into vector illegal?
In Joe programmer language (i.e. C ;) ), are we basically talking about disallowing: float4 a; float* ptr_z = &a.z; ? Won't programmers just resort to: float4 a; float* ptr_z = (float*)(&a) + 3; ? On Oct 14, 2008, at 3:55 PM, Mon Ping Wang wrote: > Hi, > > Something like a sequential type makes sense especially in light of > what Duncan is point out. I agree
2017 Jan 20
1
How to handle INT8 data
Right, they are identifiers. Storing them as String has drawbacks: - huge to store in memory - slow to process - huge to index (by eg data.table columns indexes) Why not storing them as numeric ? Thanks, Le 20 janv. 2017 ? 18h16, William Dunlap ?crivait : > If these are identifiers, store them as strings. If not, what sort of > calculations do you plan on doing with them? > Bill
2017 Jan 20
9
How to handle INT8 data
Hello r users, I have to deal with int8 data with R. AFAIK R does only handle int4 with `as.integer` function [1]. I wonder: 1. what is the better approach to handle int8 ? `as.character` ? `as.numeric` ? 2. is there any plan to handle int8 in the future ? As you might know, int4 is to small to deal with earth population right now. Thanks for you ideas, int8 eg: human_id
2008 Oct 14
0
[LLVMdev] Making GEP into vector illegal?
On Tue, Oct 14, 2008 at 1:34 PM, Daniel M Gessel <gessel at apple.com> wrote: > In Joe programmer language (i.e. C ;) ), are we basically talking > about disallowing: > > float4 a; > float* ptr_z = &a.z; > > ? That's my reading as well; the argument for not allowing it is just to make optimization easier. We don't allow addressing individual bits either,
2008 Sep 30
0
[LLVMdev] Generalizing shuffle vector
Hi Mon Ping, Generalizing shufflevector would be great. I have an additional suggestion below. On 29-Sep-08, at 11:11 PM, Mon Ping Wang wrote: > I am proposing to extend the shuffle vector definition to be > <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> > <mask> ; yields <m x <ty>> > > The
2008 Jun 27
0
[LLVMdev] Vector instructions
On Jun 27, 2008, at 8:02 AM, Stefanus Du Toit wrote: >>>> <result> = shufflevector <a x <ty>> <v1>, <b x <ty>> <v2>, <d x >>>> i32> >>>> <mask> ; yields <d x <ty>> >>> >>> With the requirement that the entries in the (still constant) mask >>> are >>> within