similar to: [RFC] llvm-mca: a static performance analysis tool

Displaying 20 results from an estimated 10000 matches similar to: "[RFC] llvm-mca: a static performance analysis tool"

2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
> On Mar 1, 2018, at 9:22 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote: > > Hi all, > > At Sony we developed an LLVM based performance analysis tool named llvm-mca. We > currently use it internally to statically measure the performance of code, and > to help triage potential problems with target scheduling models. We decided to > post this RFC because
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
Reading through this last night got be thinking about how to model control flow.  Given most of my source code tends to be very branchy, be limited to straight line code is quite restrictive.  The main thing is that it requires a lot of hand simplification which can be rather error prone at times. It occurs to me that we could remove the restriction around branches without necessarily fully
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
Hi Andrea, Thanks for this great RFC ! I've put some high-level comments here, and I'll give more focused comments in the review on Phabricator. On Thu, Mar 1, 2018 at 6:22 PM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote: > Hi all, > > At Sony we developed an LLVM based performance analysis tool named > llvm-mca. We > currently use it internally to
2018 Mar 02
5
[RFC] llvm-mca: a static performance analysis tool
Hi Andrew, Thanks for the feedback! On Fri, Mar 2, 2018 at 1:16 AM, Andrew Trick <atrick at apple.com> wrote: > > On Mar 1, 2018, at 9:22 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: > > Hi all, > > At Sony we developed an LLVM based performance analysis tool named > llvm-mca. We > currently use it internally to statically measure the
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
Hi! When compiling the attached module with the JIT engine on an Intel KNL I see wrong code getting emitted. I attach a complete exploit program which shows the bug in LLVM 3.8. It loads and JIT compiles the module and prints the assembler. I stumbled on this since the result of an actual calculation was wrong. So, it's not only the text version of the assembler also the machine
2016 Jun 29
0
avx512 JIT backend generates wrong code on <4 x float>
Hi Frank, I recommend trying trunk LLVM. AVX-512 development has been very active recently. -Hal ----- Original Message ----- > From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org> > To: "LLVM Dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, June 29, 2016 2:41:39 PM > Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4
2016 Jun 30
1
avx512 JIT backend generates wrong code on <4 x float>
Hi Hal! Thanks, but unfortunately it didn't help. The exact same assembler instructions are generated for both 3.8 (yesterday) and trunk (from today). So, this really looks like a bug. Best, Frank On 06/29/2016 03:48 PM, Hal Finkel wrote: > Hi Frank, > > I recommend trying trunk LLVM. AVX-512 development has been very active recently. > > -Hal > > ----- Original
2015 Jul 14
4
[LLVMdev] Poor register allocation (constants causing spilling)
Hi, While investigating a performance issue with an internal codebase I came across what looks to be poor register allocation. I have constructed a small(ish) reproducible which demonstrates the issue (see test.ll attached). I have spent some time going through the register allocator to understand what is happening. I have also experimented with some small changes to try and improve the
2010 Jun 11
0
[LLVMdev] thinking about timing-test-driven scheduler
On Wed, 2010-06-09 at 17:30 +0200, orthochronous wrote: > Hi, > > I've been thinking about how to implement a framework for attempting > instruction scheduling of small blocks of code by using (GA/simulated > annealing/etc) controlled timing-test-evaluations of various > orderings. This sounds interesting. > (I'm particularly interested small-ish numerical inner
2010 Jun 09
2
[LLVMdev] thinking about timing-test-driven scheduler
Hi, I've been thinking about how to implement a framework for attempting instruction scheduling of small blocks of code by using (GA/simulated annealing/etc) controlled timing-test-evaluations of various orderings. (I'm particularly interested small-ish numerical inner loop code in low-power CPUs like Atom and various ARMs where there CPU doesn't have the ability to
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote: > Hi Chandler, > > I've been looking at the regressions Quentin mentioned, and filed a PR > for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377 > > As for the others, I'm working on reducing them, but for now, here are > some raw observations, in case any of
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
I filed a couple more, in case they're actually different issues: - http://llvm.org/bugs/show_bug.cgi?id=22412 - http://llvm.org/bugs/show_bug.cgi?id=22413 And that's pretty much it for internal changes. I'm fine with flipping the switch; Quentin, are you? Also, just to have an idea, do you (or someone else!) plan to tackle these in the near future? -Ahmed On Thu, Jan 29, 2015 at
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
For example, I have the following IR code, for.cond.preheader: ; preds = %if.end18 %mul = mul i32 %12, %3 %cmp21128 = icmp sgt i32 %mul, 0 br i1 %cmp21128, label %for.body.preheader, label %return for.body.preheader: ; preds = %for.cond.preheader %19 = mul i32 %12, %3 %20 = add i32 %19, -1 %21 = zext i32 %20 to i64 %22 =
2010 May 11
2
[LLVMdev] How does SSEDomainFix work?
Hello. This is my 1st post. I have tried SSE execution domain fixup pass. But I am not able to see any improvements. I expect for the example below to use MOVDQA, PAND &c. (On nehalem, ANDPS is extremely slower than PAND) Please tell me if something would be wrong for me. Thank you. Takumi Host: i386-mingw32 Build: trunk at 103373 foo.ll: define <4 x i32> @foo(<4 x i32> %x,
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > > On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> > wrote: > >> Hi Chandler, >> >> I've been looking at the regressions Quentin mentioned, and filed a PR >> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
I may get one or two in the next month, but not more than that. Focused on the pass manager for now. If none get there first, I'll eventually circle back though, so they won't rot forever. On Jan 30, 2015 11:21 AM, "Ahmed Bougacha" <ahmed.bougacha at gmail.com> wrote: > I filed a couple more, in case they're actually different issues: > -
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL:
2016 May 26
2
enabling interleaved access loop vectorization
Interleaved access is not enabled on X86 yet. We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer