thr3ads.net - search: "movs"

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 13

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...shl rax, 20h mov rsi, offset __mh_execute_header add rsi, rax sar rsi, 20h ; size_t mov edi, 4 ; size_t call _calloc lea edx, [r15-1] movsxd r8, edx mov ecx, r15d add ecx, 0FFFFFFFEh js loc_100000DFA test r15d, r15d mov r11d, [rax+r8*4] jle loc_100000EAE mov ecx, r15d add ecx,...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...i, offset __mh_execute_header >> add rsi, rax >> sar rsi, 20h ; size_t >> mov edi, 4 ; size_t >> call _calloc >> lea edx, [r15-1] >> movsxd r8, edx >> mov ecx, r15d >> add ecx, 0FFFFFFFEh >> js loc_100000DFA >> test r15d, r15d >> mov r11d, [rax+r8*4] >> jle loc_100000EAE >>...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...add rsi, rax >>>> sar rsi, 20h ; size_t >>>> mov edi, 4 ; size_t >>>> call _calloc >>>> lea edx, [r15-1] >>>> movsxd r8, edx >>>> mov ecx, r15d >>>> add ecx, 0FFFFFFFEh >>>> js loc_100000DFA >>>> test r15d, r15d >>>> mov r11d, [rax+r8*4] >>>&gt...

[RFC][PATCH] Gfxboot COMBOOT module

2008 Nov 22

5

[RFC][PATCH] Gfxboot COMBOOT module

...ngth] +; read file +; si - file handle +; es:bx - buffer +; cx - number of blocks to read + +read: + push eax + mov ax,7 + mov bx,trackbuf + mov cx,[BufSafe] + int 22h + + push edi + push ecx + push si + push es + + mov si,trackbuf + push edi + call gfx_l2so + pop di + pop es + + rep movsb ; move ds:si -> es:di, length ecx + pop es + pop si + pop ecx + pop edi + + pop eax + add edi, ecx + sub eax, ecx + jnz read + +bootlogo_read_done: + call find_file + or eax,eax + jnz found_bootlogo + stc + ret + +found_bootlogo: + push edi + push eax + add eax,edi + push dword...

[PATCH] Gfxboot COMBOOT module

2009 Apr 05

3

[PATCH] Gfxboot COMBOOT module

...em] + +; read file +; si - file handle +; es:bx - buffer +; cx - number of blocks to read + +read: + push eax + mov ax,7 + mov bx,trackbuf + mov cx,[BufSafe] + int 22h + + push edi + push ecx + push si + push es + + mov si,trackbuf + push edi + call gfx_l2so + pop di + pop es + + rep movsb ; move ds:si -> es:di, length ecx + pop es + pop si + pop ecx + pop edi + + pop eax + + ; si == 0: EOF + or si,si + jz gfx_read_done + add edi,ecx + sub eax,ecx + ja read + jmp gfx_file_too_big +gfx_read_done: + sub eax,ecx + mov edx,[file_length] + sub edx,eax + ; edx = real fi...

[LLVMdev] FP Intrinsics

2005 Mar 11

0

[LLVMdev] FP Intrinsics

Update: I have been working on this all day, and I finally got it working more or less with the pattern instruction selector... However, the generated code is not very good, and I haven't implemented the expand to calls if the target does not support these FP instructions. As an example, in the following function the sub abs and compare compiles to 13 instructions! Also it has changed the

x86_64 build break in rombios

2007 Jan 29

8

x86_64 build break in rombios

I am getting the following build break on changeset 13662. I am compiling on x86_64 SLES10 with gcc 4.1.0. Is there a fix for this? Thanks, Aravindh Puthiyaparambil Xen Development Team Unisys, Tredyffrin PA make[1]: Entering directory `/root/xen/xen-unstable.hg/tools/firmware'' make[2]: Entering directory `/root/xen/xen-unstable.hg/tools/firmware/rombios'' gcc -o biossums

[PATCH] Optimized assembler version of md5_process() for x86-64

2020 May 22

2

[PATCH] Optimized assembler version of md5_process() for x86-64

This patch introduces an optimized assembler version of md5_process(), the inner loop of MD5 checksumming. It affects the performance of all MD5 operations in rsync - including block matching and whole-file checksums. Performance gain is 5-10% depending on the specific CPU. Originally created by Marc Bevand and placed in the public domain, later integrated into OpenSSL. This is the original

[LLVMdev] Possible missed optimization on function calling?

2010 Sep 21

1

[LLVMdev] Possible missed optimization on function calling?

Hello, I noticed that the following code could be improved a little bit further. If the optimization is too tricky for the compiler or something and it's done this way by design forgive me, but in any case i just wanted to point it out. Consider the following C code: extern int mcos(int a); extern int msin(int a); extern int mdiv(int a, int b); int foo(int a, int b) { int a4 =

[LLVMdev] Area for improvement

2005 Feb 22

0

[LLVMdev] Area for improvement

On Mon, 21 Feb 2005, Jeff Cohen wrote: > I noticed that fourinarow is one of the programs in which LLVM is much slower > than GCC, so I decided to take a look and see why that is so. The program > has many loops that look like this: > > #define ROWS 6 > #define COLS 7 > > void init_board(char b[COLS][ROWS+1]) > { > int i,j; > > for

[LLVMdev] Issue with X86FrameLowering __chkstk on Windows 8 64-bit / Visual Studio 2012

2013 Aug 19

3

[LLVMdev] Issue with X86FrameLowering __chkstk on Windows 8 64-bit / Visual Studio 2012

Hi, I'm using LLVM to convert expressions to native assembly, the problem is when LLVM compiles this code: define void @fn_0000000000000000(i8*, i8*, i8*) { bb: %res = alloca i32 %3 = load i32* %res %4 = bitcast i8* %0 to i32* %5 = load i32* %4 %6 = bitcast i8* %0 to i32* %7 = load i32* %6 %8 = xor i32 %5, %7 store volatile i32 %8, i32* %res %9 = load i32* %res %10 = icmp

[PATCH 1/1] COMBOOT API: Add calls for directory functions; Implement for FAT; Try 2

2009 Feb 08

1

[PATCH 1/1] COMBOOT API: Add calls for directory functions; Implement for FAT; Try 2

From: Gene Cumm <gene.cumm at gmail.com> COMBOOT API: Add calls for directory functions; Implement most only for FAT (SYSLINUX). Uses INT 22h AX= 001Fh, 0020h, 0021h and 0022h to prepare for the COM32 C functions getcwd(), opendir(), readdir(), and closedir(), respectively. INT22h, AX=001Fh will return a valid value for all variants. INT22h, AX= 0020h, 0021h, and 0022h are only

(Question regarding the) incomplete "builtins library" of "Compiler-RT"

2018 Nov 30

2

(Question regarding the) incomplete "builtins library" of "Compiler-RT"

"Friedman, Eli" <efriedma at codeaurora.org> wrote: > On 11/30/2018 8:31 AM, Stefan Kanthak via llvm-dev wrote: >> Hi @ll, >> >> compiler-rt implements (for example) the MSVC (really Windows) >> specific routines compiler-rt/lib/builtins/i386/chkstk.S and >> compiler-rt/lib/builtins/x86_64/chkstk.S as __chkstk_ms() >> See

[LLVMdev] How could I get memory address for each assemble instruction?

2004 Sep 13

2

[LLVMdev] How could I get memory address for each assemble instruction?

Hi all, I am trying to disassemble *.bc to assemble code by using llvm-dis command, but what I got is like the following. So how could I get the assemble code like objdump? I mean the memory address for each instruction. Thanks Qiuyu llvm-dis: .text .align 16 .globl adpcm_coder .type adpcm_coder, @function adpcm_coder: .LBBadpcm_coder_0: # entry sub %ESP, 116 mov DWORD PTR [%ESP + 12],

[LLD] Linking static library does not resolve symbols as gold/ld

2017 Mar 15

2

[LLD] Linking static library does not resolve symbols as gold/ld

Compilers don't know about functions that are not defined in the same compilation unit, so they leave call instruction operands as zero (because they can't compute any absolute nor relative address of the destinations), and let linkers fix the address by binary patching. So, what you are seeing is likely a bug of LLD that it fails to fix the address for some reason. Can you dump that

[LLVMdev] Area for improvement

2005 Feb 22

2

[LLVMdev] Area for improvement

Sorry, I thought I was running selection dag isel but I screwed up when trying out the really big array. You're right, it does clean it up except for the multiplication. So LoopStrengthReduce is not ready for prime time and doesn't actually get used? I might consider whipping it into shape. Does it still have to handle getelementptr in its full generality? Chris Lattner wrote:

[LLVMdev] About JIT by LLVM 2.9 or later

2011 Nov 02

5

[LLVMdev] About JIT by LLVM 2.9 or later

...[ebp+8] } 002C13F0 pop edi 002C13F1 pop esi 002C13F2 pop ebx 002C13F3 mov esp,ebp 002C13F5 pop ebp 002C13F6 ret *Callee( 'fetch' LLVM ):* 010B0010 mov eax,dword ptr [esp+4] 010B0014 mov ecx,dword ptr [esp+8] 010B0018 movss xmm0,dword ptr [ecx+1Ch] 010B001D movss dword ptr [eax+0Ch],xmm0 010B0022 movss xmm0,dword ptr [ecx+18h] 010B0027 movss dword ptr [eax+8],xmm0 010B002C movss xmm0,dword ptr [ecx+10h] 010B0031 movss xmm1,dword ptr [ecx+14h] 010B0036 movss dword ptr [e...

[PATCH 0/4] ia64/xen: paravirtualization of hand written assembly code

2008 Feb 25

6

[PATCH 0/4] ia64/xen: paravirtualization of hand written assembly code

Hi. The patch I send before was too large so that it was dropped from the maling list. I'm sending again with smaller size. This patch set is the xen paravirtualization of hand written assenbly code. And I expect that much clean up is necessary before merge. We really need the feed back before starting actual clean up as Eddie already said before. Eddie discussed how to clean up and suggested

[PATCH 0/4] ia64/xen: paravirtualization of hand written assembly code

2008 Feb 25

6

[PATCH 0/4] ia64/xen: paravirtualization of hand written assembly code

Hi. The patch I send before was too large so that it was dropped from the maling list. I'm sending again with smaller size. This patch set is the xen paravirtualization of hand written assenbly code. And I expect that much clean up is necessary before merge. We really need the feed back before starting actual clean up as Eddie already said before. Eddie discussed how to clean up and suggested

[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code

2008 Feb 26

8

[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code

Hi. I rewrote the patch according to the comments. I adopted generating in-place code because it looks the quickest way. The point Eddie wanted to discuss is how to generate code and its ABI. i.e. in-place generating v.s. direct jump v.s. indirect function call Indirect function call doesn't make sense because ivt.S is compiled multi times. And it is up to pv instances to choose in-place

search for: movs