similar to: PATCH: match calls and returns

Displaying 20 results from an estimated 900 matches similar to: "PATCH: match calls and returns"

2014 Jan 14
1
PATCH for lpc_asm.nasm
1) Two comments ";ASSERT(lp_quantization <= 31)" in the new functions ..._wide_asm_ia32() -- just to mention this constraint. (max. possible value of lp_quantization is 15, so it's not a problem) 2) "mov cl, ..." was replaced with "mov ecx, ..." (again Agner Fog, optimizing_assembly.pdf) summary: write to a partial register may result in false dependencies
2018 Mar 15
0
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote: > [You can find an easier to read and more complete version of this RFC > here > <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#>.] > > Knowing instruction scheduling properties (latency, uops) is the basis > for all scheduling work done by LLVM. > > >
2018 Mar 15
0
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
Sounds like a very useful tool.  Thank you for contributing. Taking a step back and looking at the big picture, combining this with the recently contributed llvm-mca dramatically improves our scheduling and performance analysis story.  Being able to take a snippet of code on a particular machine, measure latency/throughput/ports for each instruction (this tool), and then analyze the entire
2018 Mar 15
3
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
On Thu, Mar 15, 2018 at 4:41 PM, Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote: > > [You can find an easier to read and more complete version of this RFC here > <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#> > .] > > Knowing
2018 Mar 15
5
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
[You can find an easier to read and more complete version of this RFC here <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#> .] Knowing instruction scheduling properties (latency, uops) is the basis for all scheduling work done by LLVM. Unfortunately, vendors usually release only partial (and sometimes incorrect) information. Updating the
2012 Jul 27
0
[LLVMdev] X86 FMA4
On Fri, Jul 27, 2012 at 2:37 PM, Michael Gottesman <mgottesman at apple.com> wrote: ... > I have actually timed said instructions in the past and reproduced Agner > Fog's results. I just prefer to speak by referring to facts that can not be > misconstrued as hearsay = ). That would be great. Also, can you point me to the Agner Fog table that you are referring to? Thanks.
2012 Jul 27
3
[LLVMdev] X86 FMA4
> It appears that the stats you listed are for movaps [SSE], not vmovaps [AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256), since they are both AVX instructions. Although, yes, I agree that this is not clear from Agner's report. Please correct me if I am misunderstanding. You are misunderstanding [no worries, happens to everyone = )]. The timings I listed were for
2014 Dec 22
2
[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Herbie Robinson > Subject: Re: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences > > On 12/21/14 4:27 AM, Kuperstein, Michael M wrote: > > Which performance guidelines are you referring to? > Table C-21 in "Intel(r) 64 and IA-32 Architectures
2018 Aug 14
4
Why did Intel change his static branch prediction mechanism during these years?
( I don't know if it's allowed to ask such question, if not, please remind me. ) I know Intel implemented several static branch prediction mechanisms these years: * 80486 age: Always-not-take * Pentium4 age: Backwards Taken/Forwards Not-Taken * PM, Core2: Didn't use static prediction, randomly depending on what happens to be in corresponding BTB entry , according to agner's
2004 Sep 10
3
patch
So here is quick patch solving the problem, now it should be PIC. -- Miroslav Lichvar lichvarm@phoenix.inf.upol.cz -------------- next part -------------- --- lpc_asm.nasm.orig Wed Jul 18 02:23:40 2001 +++ lpc_asm.nasm Sat Nov 17 21:09:46 2001 @@ -59,10 +59,10 @@ ; ALIGN 16 cident FLAC__lpc_compute_autocorrelation_asm_ia32 - ;[esp + 24] == autoc[] - ;[esp + 20] == lag - ;[esp + 16] ==
2018 Mar 15
0
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
On 03/15/2018 10:49 AM, Clement Courbet wrote: > > > On Thu, Mar 15, 2018 at 4:41 PM, Hal Finkel via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > > On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote: >> [You can find an easier to read and more complete version of this >> RFC here >>
2019 Aug 20
1
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
"H. Peter Anvin" <hpa at zytor.com> wrote August 20, 2019 12:51 AM: > On 8/14/19 9:42 PM, Stefan Kanthak wrote: >> Hi, >> >> both >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S >> and >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S
2019 May 13
3
How shall I evaluate the latency of each instruction in LLVM IR?
Inspired by https://www.agner.org/optimize/instruction_tables.pdf, which gives us the latency and reciprocal throughput of each instruction in the different architecture of X86, Is there anybody taking the effort to do a similar job for LLVM IR? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL:
2016 Jan 21
2
Adding support for self-modifying branches to LLVM?
On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote: > > AFAIK, the cost of a well-predicted, not-taken branch is the same as a > nop on every x86 made in the last many years. > See http://www.agner.org/optimize/instruction_tables.pdf > <http://www.agner.org/optimize/instruction_tables.pdf> > Generally speaking a correctly-predicted not-taken branch is basically >
2014 Jan 03
2
PATCH: asm versions for two _wide() functions
As I wrote earlier, GCC generates slow ia32 code for FLAC__lpc_compute_residual_from_qlp_coefficients_wide() and FLAC__lpc_restore_signal_wide(). So 24-bit encoding/decoding is slower for GCC compile than for MSVS or ICC compile. I took FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32 and FLAC__lpc_restore_signal_asm_ia32 asm functions and wrote their _wide versions. -------------- next
2012 Jul 27
0
[LLVMdev] X86 FMA4
Hey Michael, Thanks for the legwork! It appears that the stats you listed are for movaps [SSE], not vmovaps [AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256), since they are both AVX instructions. Although, yes, I agree that this is not clear from Agner's report. Please correct me if I am misunderstanding. As I am sure you are aware, we cannot use SSE (movaps)
2012 Jul 27
2
[LLVMdev] X86 FMA4
Just looked up the numbers from Agner Fog for Sandy Bridge for vmovaps/etc for loading/storing from memory. vmovaps - load takes 1 load mu op, 3 latency, with a reciprocal throughput of 0.5. vmovaps - store takes 1 store mu op, 1 load mu op for address calculation, 3 latency, with a reciprocal throughput of 1. He does not list vmovsd, but movsd has the same stats as vmovaps, so I feel it is a
2012 Nov 07
1
[LLVMdev] AVX broadcast Vs. vector constant pool load
Hey guys, I'm currently investigating broadcasts from the constant pool on Sandy Bridge. I see this comment in llvm/lib/Target/X86/X86ISelLowering.cpp: // Handle the broadcasting a single constant scalar from the constant pool // into a vector. On Sandybridge it is still better to load a constant vector // from the constant pool and not to broadcast it from a scalar. Would anyone
2017 Nov 03
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hello all, >> >> >> >> I would like to propose adding the -mprefer-avx256 and -mprefer-avx128 >> command line flags supported by latest GCC to
2012 Sep 29
7
[LLVMdev] LLVM's Pre-allocation Scheduler Tested against a Branch-and-Bound Scheduler
Hi, We are currently working on revising a journal article that describes our work on pre-allocation scheduling using LLVM and have some questions about LLVM's pre-allocation scheduler. The answers to these question will help us better document and analyze the results of our benchmark tests that compare our algorithm with LLVM's pre-allocation scheduling algorithm. First, here is a