similar to: Adding support for self-modifying branches to LLVM?

Displaying 20 results from an estimated 20000 matches similar to: "Adding support for self-modifying branches to LLVM?"

2016 Jan 21
2
Adding support for self-modifying branches to LLVM?
On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote: > > AFAIK, the cost of a well-predicted, not-taken branch is the same as a > nop on every x86 made in the last many years. > See http://www.agner.org/optimize/instruction_tables.pdf > <http://www.agner.org/optimize/instruction_tables.pdf> > Generally speaking a correctly-predicted not-taken branch is basically >
2016 Jan 20
2
Adding support for self-modifying branches to LLVM?
Thanks for the information. This has been very useful! Patch points indeed *almost* do what I need. I will try to build a similar solution. Self-modifying code for truly zero-overhead (when not enabled) > instrumentation is a real thing (look at e.g. DTrace pid provider) but > unless the number of instrumentation point is very large (100's of > thousands? millions?) or not known
2016 Jan 21
3
Adding support for self-modifying branches to LLVM?
On 01/21/2016 01:51 PM, Sean Silva wrote: > > > On Thu, Jan 21, 2016 at 1:33 PM, Philip Reames > <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote: > > > > On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote: >> >> AFAIK, the cost of a well-predicted, not-taken branch is the same >> as a nop on every x86
2016 Feb 09
2
Adding support for self-modifying branches to LLVM?
Hi, I'm coming back to this old thread with data about the performance of NOPs. Recalling that I was considering transforming NOP instructions into branches and back, in order to dynamically enable code. One use case for this was enabling/disabling individual sanitizer checks (ASan, UBSan) on demand. I wrote a pass which takes an ASan-instrumented program, and replaces each ASan check with
2019 May 13
3
How shall I evaluate the latency of each instruction in LLVM IR?
Inspired by https://www.agner.org/optimize/instruction_tables.pdf, which gives us the latency and reciprocal throughput of each instruction in the different architecture of X86, Is there anybody taking the effort to do a similar job for LLVM IR? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL:
2014 Dec 22
2
[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Herbie Robinson > Subject: Re: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences > > On 12/21/14 4:27 AM, Kuperstein, Michael M wrote: > > Which performance guidelines are you referring to? > Table C-21 in "Intel(r) 64 and IA-32 Architectures
2019 Aug 20
1
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
"H. Peter Anvin" <hpa at zytor.com> wrote August 20, 2019 12:51 AM: > On 8/14/19 9:42 PM, Stefan Kanthak wrote: >> Hi, >> >> both >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S >> and >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S
2016 Jan 22
2
Adding support for self-modifying branches to LLVM?
On Thu, Jan 21, 2016 at 2:52 PM, Jonas Wagner <jonas.wagner at epfl.ch> wrote: > Hello, > > There is some data on this, e.g, in “High System-Code Security with Low > Overhead” <http://dslab.epfl.ch/proj/asap/#publications>. In this work we > found that, for ASan as well as other instrumentation tools, most overhead > comes from the checks. Especially for
2019 Aug 15
2
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
Hi, both https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S and https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S use the following code sequences for shift counts greater 31: 1: 1: xorl %edx,%edx shrl %cl,%edx shl %cl,%eax xorl %eax,%eax
2018 Dec 21
2
[OpenMP][AArch64][GlobalISel] AArch64 OMPT tests failing
Curious. I removed -fno-experimental-isel and all of the tests *except* control_tool.c passed. I would have expected all of them to pass if blockaddress works. I'll try to look at some asm and see what's going on. -David Jonas Hahnfeld <hahnjo at hahnjo.de> writes: > Hi David, > > I was the one who originally added the flag to fix failures
2016 Jul 04
4
[XRay] RFC: LLVM-side Changes for nop-sleds
Hi llvm-dev (cc google-xray), As a follow-up to the first XRay RFC [0] introducing the technology, I've been able to recently implement a functional prototype of the major parts of the XRay functionality [1]. This RFC is limited to exploring potential alternatives to the current LLVM-side changes, with the interest of getting clear guidance for landing the changes first in LLVM. Background /
2018 Aug 14
4
Why did Intel change his static branch prediction mechanism during these years?
( I don't know if it's allowed to ask such question, if not, please remind me. ) I know Intel implemented several static branch prediction mechanisms these years: * 80486 age: Always-not-take * Pentium4 age: Backwards Taken/Forwards Not-Taken * PM, Core2: Didn't use static prediction, randomly depending on what happens to be in corresponding BTB entry , according to agner's
2008 May 21
9
Slow pkginstalls due to long door_calls to nscd
Hi all, I am installing a zone onto two different V445s running S10U4 and the zones are taking hours to install (about 1000 packages), that is, the problem is identical on both systems. A bit of trussing and dtracing has shown that the pkginstalls being run by the zoneadm install are making door_call calls to nscd that are taking very long, so far observed to be 5 to 40 seconds, but always in
2011 Nov 18
1
[LLVMdev] Greedy regalloc
Hi, I get strange code when using regalloc=greedy. A value spill is redundant and cleared, as another spill of same value is inserted. The former spill is however not NOP:ed, but KILL:ed, thus the operands get a kill status. The code becomes: %vreg301<def> = mv32Imm 200000000, pred:0, pred:%noreg, %CCReg<imp-def,dead>, %ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg301
2017 Apr 04
3
[inline-asm][asm-goto] Supporting "asm goto" in inline assembly
Asm goto feature was introduces to GCC in order to optimize the support for tracepoints in Linux kernel (it can be used for other things that do nop patching). GCC documentation describes their motivating example here: https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html #define TRACE1(NUM) \ do { \
2012 Jul 27
0
[LLVMdev] X86 FMA4
On Fri, Jul 27, 2012 at 2:37 PM, Michael Gottesman <mgottesman at apple.com> wrote: ... > I have actually timed said instructions in the past and reproduced Agner > Fog's results. I just prefer to speak by referring to facts that can not be > misconstrued as hearsay = ). That would be great. Also, can you point me to the Agner Fog table that you are referring to? Thanks.
2018 Mar 15
5
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
[You can find an easier to read and more complete version of this RFC here <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#> .] Knowing instruction scheduling properties (latency, uops) is the basis for all scheduling work done by LLVM. Unfortunately, vendors usually release only partial (and sometimes incorrect) information. Updating the
2008 Mar 11
6
Bad instruction on x86_64 build on OS X with DTRACE_PROBE
I was looking at mod_trace (http://prefetch.net/projects/apache_modtrace/index.html ) and playing with getting it to compile on OS X. When building for x86_64 with -arch x86_64 we get bad instructions generated: gcc -o foo -arch x86_64 foo.c /var/folders/rV/rV1x2DafFr0R6tGG+1bbk++++TM/-Tmp-//ccnykQ1o.s:11:bad register name `%%esi)'' Using gcc -S I can definitely see we are not
2020 Jul 24
7
Zero length function pointer equality
LLVM can produce zero length functions from cases like this (when optimizations are enabled): void f1() { __builtin_unreachable(); } int f2() { /* missing return statement */ } This code is valid, so long as the functions are never called. I believe C++ requires that all functions have a distinct address (ie: &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2) gets
2014 Dec 21
2
[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences
Which performance guidelines are you referring to? I'm not that familiar with decade-old CPUs, but to the best of my knowledge, this is not true on current hardware. There is one specific circumstance where PUSHes should be avoided - for Atom/Silvermont processors, the memory form of PUSH is inefficient, so the register-freeing optimization below may not be profitable (see 14.3.3.6 and