thr3ads.net - similar to: "[LLVMdev] Generalizing shuffle vector"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Generalizing shuffle vector"

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

On Mon, Sep 29, 2008 at 8:11 PM, Mon Ping Wang <wangmp at apple.com> wrote: > The problem with generating insert and extracts is that we can generate poor > code > %tmp16 = extractelement <4 x float> %f4b, i32 0 > %f8a = insertelement <8 x float> %f8a, float %tmp16, i32 0 > %tmp18 = extractelement <4 x float> %f4b, i32 1 > %f8c

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

Hi Mon Ping, Generalizing shufflevector would be great. I have an additional suggestion below. On 29-Sep-08, at 11:11 PM, Mon Ping Wang wrote: > I am proposing to extend the shuffle vector definition to be > <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> > <mask> ; yields <m x <ty>> > > The

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

I agree further generalization seems like a very good idea. But I'd like to see what Mon Ping proposed implemented first so we have a better idea of the implementation cost. Thanks, Evan On Sep 30, 2008, at 6:44 AM, Stefanus Du Toit wrote: > Hi Mon Ping, > > Generalizing shufflevector would be great. I have an additional > suggestion below. > > On 29-Sep-08, at 11:11

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

Hi, I agree that the more general shufflevector is more useful. I narrowed the original proposal a little bit because of the concern for the implementation cost. However, the slightly narrowed definition will probably require falling backing to generate insert and extracts for complex masks so it is possible that there will be no extra cost in supporting the more general definition.

vectorisation

2013 Feb 02

vectorisation

Hi I'm trying to set up a simulation problem without resorting to (m)any loops. I want to set entries in a data frame of zeros ('starts' in the code below) to 1 at certain points and the points have been randomly generated and stored in a separate data.frame ('sl'), which has the same number of columns. An example of the procedure is as follows: ml <-

[LLVMdev] Vector instructions

2008 Jun 27

[LLVMdev] Vector instructions

Hi Dan, Thanks for your comments. I've responded inline below. On 26-Jun-08, at 6:49 PM, Dan Gohman wrote: > On Jun 26, 2008, at 1:56 PM, Stefanus Du Toit wrote: >> >> === >> 1. Shufflevector only accepts vectors of the same type >> >> I would propose to change the syntax from: >> >>> <result> = shufflevector <n x <ty>>

[LLVMdev] Vector instructions

2008 Jun 27

[LLVMdev] Vector instructions

On Jun 27, 2008, at 8:02 AM, Stefanus Du Toit wrote: >>>> <result> = shufflevector <a x <ty>> <v1>, <b x <ty>> <v2>, <d x >>>> i32> >>>> <mask> ; yields <d x <ty>> >>> >>> With the requirement that the entries in the (still constant) mask >>> are >>> within

HELP

2003 Jun 26

HELP

I am including my config.log in the hopes that someone can tell me what is going on with dovecot installing on Mac OSX 10.2.6 Sincerely, ? Roger Cates, CCNA Vice President & Chief Technical Officer Xpower Internet, LLC Xpowerhosting.com | Xpoweronline.com P 888.245.7501 | F 270.338.4602 Internet to the power of X. ? begin 666 dovecot_config.log

RE: [R] when can we expect Prof Tierney's compiled R?

2005 Apr 27

RE: [R] when can we expect Prof Tierney's compiled R?

Luke, Thank you for sharing the benchmark results. The improvement is very substantial, I am looking forward to the release of the byte compiler! The arithmetic shows that x[i]<- is still the bottleneck. I suspect that this is due to a very involved dispatching/search for the appropriate function on the C level. There might be significant gain if loops somehow cached the result of the initial

RE: [R] when can we expect Prof Tierney's compiled R?

2005 Apr 22

RE: [R] when can we expect Prof Tierney's compiled R?

If we are on the subject of byte compilation, let me bring a couple of examples which have been puzzling me for some time. I'd like to know a) if the compilation will likely to improve the performance for this type of computations, and b) at least roughly understand the reasons for the observed numbers, specifically why x[i]<- assignment is so much slower than x[i] extraction. The loops

VGA passthrough - GA-890FXA with ASUS EAH5750 video

2010 Nov 21

VGA passthrough - GA-890FXA with ASUS EAH5750 video

Hello, I would like to report to the group that I have been successful with VGA passthrough using the following configuration: Gigabyte GA-890FXA-UD5 ASUS EAH5750 1GB GDDR5 video Fedora 13 XEN 4.0.1 Windows 7 64 bit HVM guest Xen was installed per Pasi''s Fedora 13 Xen 4.0 Tutorial found here: http://wiki.xen.org/xenwiki/Fedora13Xen4Tutorial The tutorial was very helpful and many

IPsec with racoon2

2006 May 06

IPsec with racoon2

Hi, I'm trying to get IPsec running between 2 FreeBSD (VMware) boxes, using racoon2. spmd and iked start up okay, but I get an error when I try a ping across the tunnel. /var/log/messages shows: May 5 13:52:36 biosa-vm4 iked: [INTERNAL_ERR]: if_spmd.c:726: SLID failed: 550 Operation failed May 5 13:52:36 biosa-vm4 iked: [INTERNAL_ERR]: isakmp.c:647:isakmp_initiate_cont(): 0:172.20.36.55[0]

LangRef semantics for shufflevector with undef mask is incorrect

2019 Nov 26

LangRef semantics for shufflevector with undef mask is incorrect

Hi, This is a follow up on a discussion around shufflevector with undef mask in https://reviews.llvm.org/D70641 and https://bugs.llvm.org/show_bug.cgi?id=43958. The current semantics of shufflevector in http://llvm.org/docs/LangRef.html#shufflevector-instruction states: "If the shuffle mask is undef, the result vector is undef. If any element of the mask operand is undef, that element

LangRef semantics for shufflevector with undef mask is incorrect

2019 Nov 27

LangRef semantics for shufflevector with undef mask is incorrect

Ok, makes sense. My suggestion is that we patch the IR Verifier to ensure that the mask is indeed a vector of constants and/or undefs. Right now it only runs the standard checks for instructions. We will also run Alive2 on the test suite to make sure undef is never replaced in practice. Thanks, Nuno -----Original Message----- From: Eli Friedman <efriedma at quicinc.com> Sent: 27 de

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

2019 Dec 09

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

Sanjay, I'm looking at some missed optimizations caused by D70246. Here's a test case: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 0, i32 undef,

InstructionSimplify: adding a hook for shufflevector instructions

2017 Mar 30

InstructionSimplify: adding a hook for shufflevector instructions

As Sanjay noted in D31426<https://reviews.llvm.org/D31426#712701>, InstructionSimplify is missing the following simplification: This function: define <4 x i32> @splat_operand(<4 x i32> %x) { %splat = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> zeroinitializer %shuf = shufflevector <4 x i32> %splat, <4 x i32> undef, <4 x i32>

[LLVMdev] Canonicalizing vector masking.

2014 Sep 26

[LLVMdev] Canonicalizing vector masking.

Hi, I received an internal test case from a game team (it wasn't about this in particular), and I was wondering if there was maybe an opportunity to canonicalize a particular code pattern: %inputi = bitcast <4 x float> %input to <4 x i32> %row0i = and <4 x i32> %inputi, <i32 -1, i32 0, i32 0, i32 0> %row0 = bitcast <4 x i32> %row0i to <4 x float>

InstructionSimplify: adding a hook for shufflevector instructions

2017 Mar 30

InstructionSimplify: adding a hook for shufflevector instructions

Thanks, Sanjay, that makes sense. The opportunity for improving instcombining splat sounds promising. Another question about shuffle simplification. This is a testcase from test/Transforms/InstCombine/vec_shuffle.ll: define <4 x i32> @test10(<4 x i32> %tmp5) nounwind { %tmp6 = shufflevector <4 x i32> %tmp5, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32

IR canonicalization: shufflevector or vector trunc?

2017 Jan 21

IR canonicalization: shufflevector or vector trunc?

On Thu, Jan 19, 2017 at 9:17 AM, Rackover, Zvi <zvi.rackover at intel.com> wrote: > Hi Sanjay, > > > > I agree we should also discuss **if** this canonicalization is beneficial. > > For starters, do we have a concrete case where we would benefit from > canonicalizing shuffles <-> truncates in LLVM IR? > > IMO, we should not count benefits for codegen

IR canonicalization: shufflevector or vector trunc?

2017 Jan 17

IR canonicalization: shufflevector or vector trunc?

We use InstCombiner::ShouldChangeType() to prevent transforms to illegal integer types, but I'm not sure how that would apply to vector types. Ie, let's say v256 is a legal type in your example. DataLayout doesn't appear to specify what configurations of a 256-bit vector are legal, so I don't think we can currently use that to say v2i128 should be treated differently than v16i16.

similar to: [LLVMdev] Generalizing shuffle vector