similar to: Issue with inline assembly, function inlining, and position independent code

Displaying 20 results from an estimated 7000 matches similar to: "Issue with inline assembly, function inlining, and position independent code"

2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=
2020 May 22
2
[PATCH] Optimized assembler version of md5_process() for x86-64
This patch introduces an optimized assembler version of md5_process(), the inner loop of MD5 checksumming. It affects the performance of all MD5 operations in rsync - including block matching and whole-file checksums. Performance gain is 5-10% depending on the specific CPU. Originally created by Marc Bevand and placed in the public domain, later integrated into OpenSSL. This is the original
2020 Jan 10
2
Register Dataflow Analysis on X86
Hi Scott, Sorry for the late reply, I was out of office during the holidays. 1. A def node can reach either a use node, or another def node. In the highlighted phi node (p3224), the def (d3225) reaches another def (1598) in statement (s1597), that’s why it’s needed. 2. The reason why the def of R11 in s1578 is not connected directly to the use in s1725 is that there may be an intervening
2015 Feb 13
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
I submitted the problem report to clang's bugzilla but no one seems to care so I have to send it to the mailing list. clang 3.7 svn (trunk 229055 as the time I was to report this problem) generates slower code than 3.5 (Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)) for the following code. It is a "8 queens puzzle" solver written as an educational example. As
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
The regressions in the performance of generated code, introduced by the llvm 3.6 release, don't seem to be limited to this 8 queens puzzle" solver test case. See... http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1 where a bit hit in the performance of the Sparse Matrix Multiply test of the SciMark v2.0 benchmark was observed as well as others.
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
Using the SciMark 2.0 code from http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the same... make CFLAGS="-O3 -march=native" I am able to reproduce the 22% performance regression in the run time of the Sparse matmult benchmark. For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with the release llvm clang 3.5.1 compiler and 1217.363+/-1.1004 for the current
2018 Feb 06
3
What does a dead register mean?
Hi, My understanding of a "dead" register is a def that is never used. However, when I dump the MI after reg alloc on a simple program I see the following sequence: ADJCALLSTACKDOWN64 0, 0, 0, *implicit-def dead %rsp*, implicit-def dead %eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp CALL64pcrel32 @foo, <regmask %bh %bl %bp %bpl %bx %ebp %ebx %rbp %rbx %r12 %r13 %r14
2013 Sep 12
1
[LLVMdev] bug in X86 disasm code?
hi, i found this code in X86DisassemblerDecoder.h #define EA_BASES_32BIT \ ENTRY(EAX) \ ENTRY(ECX) \ ENTRY(EDX) \ ENTRY(EBX) \ ENTRY(sib) \ ENTRY(EBP) \ ENTRY(ESI) \ ENTRY(EDI) \ ENTRY(R8D) \ ENTRY(R9D) \ ENTRY(R10D) \ ENTRY(R11D) \
2016 Jan 02
13
[Bug 93557] New: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a
https://bugs.freedesktop.org/show_bug.cgi?id=93557 Bug ID: 93557 Summary: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a Product: xorg Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: blocker
2019 Dec 11
2
IR inline assembly: the x86 Intel "offset" operator
Interesting - the patch doesn't address this yet. It looks like we have a difference (maybe bug?) in how we handle Intel vs. AT&T inline assembly: https://godbolt.org/z/GQw9ED Suppose we're expanding an operand with an 'i' constraint, where the operand is given as, e.g. (i32* @Bar). If the inline assembly is in Intel dialect, this expands as "Bar" in AT&T syntax
2018 Feb 06
0
What does a dead register mean?
You are right about your interpretation of "dead". The case here is that RSP is a reserved register and so its liveness isn't really tracked. The "implicit-def dead" is an idiom used to mean that the register (reserved or not) is clobbered. The other implicit uses/defs can come from instruction definitions to indicate that this instruction uses and/or modifies a given
2019 Apr 04
3
question about --emit-relocs with lld
Hi, While doing Linux kernel builds linked with lld, I've tracked down a difference that breaks relocation of the kernel image (e.g. under KASLR[1]). Some relocations are changed to ABS (weirdly, all are in .rodata section). Note the difference below in the resulting linked output. .L__const._start.instance becomes *ABS* only under lld: $ cat minimal.c struct minimal { void *pointer;
2019 Dec 09
4
IR inline assembly: the x86 Intel "offset" operator
Hi all, I'm trying to land (a rebased version of) http://llvm.org/D37461 - which should make it possible to handle x86 Intel assembly like mov eax, offset Foo::ptr + 1 (Currently, omitting the +1 works... but offset doesn't work in compound expressions.) I'm having trouble figuring out what inline assembly I can emit into the LLVM IR that will work properly. So far, the closest
2019 Dec 23
2
Register Dataflow Analysis on X86
Hi Scott, That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both. These separate defs are there because they reach disjoint registers. -- Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development From: Scott
2015 Mar 30
0
[Bug 82714] [G84] nouveau fails to properly initialize GPU
https://bugs.freedesktop.org/show_bug.cgi?id=82714 --- Comment #10 from Bruno <bonbons at sysophe.eu> --- (In reply to Bruno from comment #9) > Created attachment 114735 [details] > 4.0-rc6 dmesg of nouveau loading (debug, runpm=0) The first BUG happens in evo_wait() at line 420 of nv50_display.c Seems like dmac->ptr[put] is bad. 413: evo_wait(void *evoc, int nr) 414: { 415:
2016 Jun 25
3
Tail call optimization is getting affected due to local function related optimization with IPRA
Hello LLVM Community, To improve Interprocedural Register Allocation (IPRA) we are trying to force caller saved registers for local functions (which has likage type local). To achive it I have modified TargetFrameLowering::determineCalleeSaves() to return early for function which satisfies if (F->hasLocalLinkage() && !F->hasAddressTaken()) and also reflecting the fact that for local
2016 Sep 27
2
[lld][ELF] Addends adjustment for relocatable object
Hi, Now in case of relocatable object generation LLD merges and copies REL/RELA sections as is and does not touch any addends. But it is incorrect. If we have a relocation which targets a section, we have to adjust the relocation's addend to take in account that the section might be merged with other ones. Here is the reproduction script: % cat t1.s .data .long 0 .text bar: movl $1,
2011 Mar 19
2
[LLVMdev] Apparent optimizer bug on X86_64
Compiling a simple automaton created by GNU bison with -O1 or -O2 resulted in the following machine code: 1300 /*-----------------------------. 1301 | yyreduce -- Do a reduction. | 1302 `-----------------------------*/ 1303 yyreduce: 1304 /* yyn is the number of a rule to reduce with. */ 1305 yylen = yyr2[yyn]; 0x0000000000400c14 <rpcalc_parse+628>: mov
2020 Nov 02
2
[llvm-mc] FreeBSD kernel module performance impact when upgrading clang
Hi, I'm in the process of migrating from clang5 to clang10. Unfortunately clang10 introduced a negative performance impact. The cause is an increase of PLT entries from this patch (first released in clang7): https://bugs.llvm.org/show_bug.cgi?id=36370 https://reviews.llvm.org/D43383 If I revert that clang patch locally, the additional PLT entries and the performance impact disappear. This
2016 Jun 27
3
Finding caller-saved registers at a function call site
Hi Sanjoy, I'm having trouble finding caller-saved registers using the RegMask operand you've mentioned. As an example, I've got a C function that looks like this: double recurse(int depth, double val) { if(depth < max_depth) return recurse(depth + 1, val * 1.2) + val; else return outer_func(val); } As a quick refresher, all "xmm" registers are considered