thr3ads.net - similar to: "[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo"

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

2015 May 04

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

Thanks Nadav for the info. It clears my query :) Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1. One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (which is

[LLVMdev] Phabricator update

2014 Dec 11

[LLVMdev] Phabricator update

Hi Manuel, Thanks for the help. Still persists for me too. Instead of waiting indefinitely, now I get this error: Unhandled Exception ("AphrontDeadlockQueryException") #1205: Lock wait timeout exceeded; try restarting transaction On Thu, Dec 11, 2014 at 11:26 AM, suyog sarda <sardask01 at gmail.com> wrote: > The problem still persist :( > > On 12/11/14, Manuel Klimek

[LLVMdev] Phabricator update

2014 Dec 11

[LLVMdev] Phabricator update

Another php type problem; can you please try again. Thanks! On Thu Dec 11 2014 at 1:37:32 PM Bruno Cardoso Lopes < bruno.cardoso at gmail.com> wrote: > I'm facing the same problem. > > On Thu, Dec 11, 2014 at 10:16 AM, suyog sarda <sardask01 at gmail.com> wrote: > > Hi, > > I am facing problem while submitting patch on phab. All things go smooth > - >

[LLVMdev] Phabricator update

2014 Dec 11

[LLVMdev] Phabricator update

Hi, I am facing problem while submitting patch on phab. All things go smooth - create diff, create revision, specify title and comments. However, when I try to submit the diff by clicking "save" button, it takes a lot of time and eventually times out, failing to submit the patch. Any help on this? On Thursday, December 11, 2014, Manuel Klimek <klimek at google.com> wrote: >

[LLVMdev] Efficient Pattern matching in Instruction Combine

2014 Aug 07

[LLVMdev] Efficient Pattern matching in Instruction Combine

Hi, All, Duncan, Rafael, David, Nick. This is regarding pattern matching in InstructionCombine pass. We use 'match' functions many times, but it doesn't do the pattern matching effectively. e.x. Lets take pattern : (A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C Both the patterns above are same, since ^ is commutative in Op0. But,

[LLVMdev] [cfe-dev] Phabricator update

2014 Dec 11

[LLVMdev] [cfe-dev] Phabricator update

On Wed, Dec 10, 2014 at 2:38 PM, Jonathan Roelofs <jonathan at codesourcery.com > wrote: > I think the send-email part of phab has yet to come back up. > Yes, restarting it would be very helpful. > > > Cheers, > > Jon > > > On 12/10/14 1:59 PM, Manuel Klimek wrote: > >> Phab is back up - it's still a little slow (the mysql database we use is

[LLVMdev] Phabricator update

2014 Dec 10

[LLVMdev] Phabricator update

Phab is back up - it's still a little slow (the mysql database we use is doing some cleanups). On Wed Dec 10 2014 at 5:07:07 PM suyog sarda <sardask01 at gmail.com> wrote: > And i was thinking something wrong with my proxy configuration :P > > On Wed, Dec 10, 2014 at 6:47 PM, Manuel Klimek <klimek at google.com> wrote: > >> Heya, >> >> if you wonder

[LLVMdev] MMX/SSE subtarget feature in IR

2015 Apr 10

[LLVMdev] MMX/SSE subtarget feature in IR

Your clang invocation below works for me, and generates target triple in the llvm IR of i386. And then in the specific options for the functions it generates the following: ; Function Attrs: nounwind define float @foo() #0 { entry: ret float 1.000000e+00 } attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"= "true"

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

[LLVMdev] Can LLVM vectorize <2 x i32> type

For example, I have the following IR code, for.cond.preheader: ; preds = %if.end18 %mul = mul i32 %12, %3 %cmp21128 = icmp sgt i32 %mul, 0 br i1 %cmp21128, label %for.body.preheader, label %return for.body.preheader: ; preds = %for.cond.preheader %19 = mul i32 %12, %3 %20 = add i32 %19, -1 %21 = zext i32 %20 to i64 %22 =

[LLVMdev] [cfe-dev] Phabricator update

2014 Dec 11

[LLVMdev] [cfe-dev] Phabricator update

On Thu, Dec 11, 2014 at 1:29 AM, Manuel Klimek <klimek at google.com> wrote: > On Thu Dec 11 2014 at 2:16:00 AM Alexey Samsonov <vonosmas at gmail.com> > wrote: > >> On Wed, Dec 10, 2014 at 2:38 PM, Jonathan Roelofs < >> jonathan at codesourcery.com> wrote: >> >>> I think the send-email part of phab has yet to come back up. >>>

[LLVMdev] Efficient Pattern matching in Instruction Combine

2014 Aug 08

[LLVMdev] Efficient Pattern matching in Instruction Combine

Hi Duncan, David, Sean. Thanks for your reply. > It'd be interesting if you could find a design that also treated these > the same: > > (B ^ A) | ((A ^ B) ^ C) -> (A ^ B) | C > (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C > (B ^ A) | ((C ^ A) ^ B) -> (A ^ B) | C > > I.e., `^` is also associative. Agree with Duncan on including associative operation too.

[LLVMdev] Efficient Pattern matching in Instruction Combine

2014 Aug 13

[LLVMdev] Efficient Pattern matching in Instruction Combine

Thanks Sean for the reference. I will go through it and see if i can implement it for generic boolean expression minimization. Regards, Suyog On Wed, Aug 13, 2014 at 2:30 AM, Sean Silva <chisophugis at gmail.com> wrote: > Re-adding the mailing list (remember to hit "reply all") > > > On Tue, Aug 12, 2014 at 9:36 AM, suyog sarda <sardask01 at gmail.com> wrote:

[LLVMdev] Phabricator update

2014 Dec 10

[LLVMdev] Phabricator update

Heya, if you wonder why phabricator is down - it's an upgrade that is running a database update that takes a while (probably 3-5 more hours). I'll update this thread once it's finished and phab is up again. Cheers, /Manuel -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] [Vectorization] Mis match in code generated

2014 Nov 10

[LLVMdev] [Vectorization] Mis match in code generated

Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully

[LLVMdev] [cfe-dev] Phabricator update

2014 Dec 11

[LLVMdev] [cfe-dev] Phabricator update

Heya, I'll look into it first thing tomorrow - probably a problem with the encoding settings. On Thu Dec 11 2014 at 9:17:40 PM Robinson, Paul < Paul_Robinson at playstation.sony.com> wrote: > What I'm seeing is that Phabricator emails double-space *everything* > (not just the diffs). > > --paulr > > > > *From:* cfe-dev-bounces at cs.uiuc.edu

[LLVMdev] MMX/SSE subtarget feature in IR

2015 Apr 09

[LLVMdev] MMX/SSE subtarget feature in IR

Thanks Kevin for the reply. I got the point now :) On 10 Apr 2015 00:18, "Smith, Kevin B" <kevin.b.smith at intel.com> wrote: > For x86_64 ABI, a minimum feature set of SSE2 is required. > > > > Kevin > > > > *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] *On > Behalf Of *suyog sarda > *Sent:* Thursday, April 09,

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 19

[LLVMdev] [Vectorization] Mis match in code generated

Hi Arnold, Thanks for your reply. I tried test case as suggested by you. *void foo(int *a, int *sum) {*sum = a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}* so that it has a 'store' in its IR. *IR before vectorization :*target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple =

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Hi Tim, > > cortex-a15 vfpv4 : vmla instruction emitted (which is a NEON instruction) > > I get a VFP vmla here rather than a NEON one (clang -target > armv7-linux-gnueabihf -mcpu=cortex-a15): "vmla.f32 s0, s1, s2". Are > you seeing something different? > As per Renato comment above, vmla instruction is NEON instruction while vmfa is VFP instruction. Correct

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 24

[LLVMdev] Can LLVM vectorize <2 x i32> type

Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL:

how to force llvm generate gather intrinsic

2016 Feb 25

how to force llvm generate gather intrinsic

Yes, masked load/store/gather/scatter are completed. - Elena From: zhi chen [mailto:zchenhn at gmail.com] Sent: Thursday, February 25, 2016 01:20 To: Demikhovsky, Elena <elena.demikhovsky at intel.com> Cc: Sanjay Patel <spatel at rotateright.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] how to

similar to: [LLVMdev] AVX2 Cost Table in X86TargetTransformInfo