thr3ads.net - search: "r9d"

2015 Sep 01

2

[RFC] New pass: LoopExitValues

...unsigned int Val) { for (int Outer = 0; Outer < Size; ++Outer) for (int Inner = 0; Inner < Size; ++Inner) Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val; } With LoopExitValues ------------------------------- matrix_mul: testl %edi, %edi je .LBB0_5 xorl %r9d, %r9d xorl %r8d, %r8d .LBB0_2: xorl %r11d, %r11d .LBB0_3: movl %r9d, %r10d movl (%rdx,%r10,4), %eax imull %ecx, %eax movl %eax, (%rsi,%r10,4) incl %r11d incl %r9d cmpl %r11d, %edi jne .LBB0_3 incl %r8d cmpl %edi, %r8d jne .LBB0_2 .LBB0_5: retq...

[RFC] New pass: LoopExitValues

2015 Aug 31

2

[RFC] New pass: LoopExitValues

Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the

Register Dataflow Analysis on X86

2019 Nov 08

2

Register Dataflow Analysis on X86

Do you know whether it has been fixed on the 8.0.1 release? Scott On Fri, Nov 8, 2019 at 9:45 AM Krzysztof Parzyszek <kparzysz at quicinc.com<mailto:kparzysz at quicinc.com>> wrote: The one blocking issue that existed in the past has been fixed. I haven’t had time to do any work on it lately, but I’m not aware of any fundamental problems that would make it not work on x86. --

Register Dataflow Analysis on X86

2019 Dec 23

2

Register Dataflow Analysis on X86

Hi Scott, That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both. These separate defs are there because they reach disjoint registers. -- Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development From: Scott Douglas Constable <sdconsta at syr.edu> Se...

[LLVMdev] Question about ExpandPostRAPseudos.cpp

2012 Jul 26

1

[LLVMdev] Question about ExpandPostRAPseudos.cpp

...er *** - function: autogen_SD24657 - basic block: BB 0x2662d60 (BB#0) - instruction: %XMM0<def> = MOV64toPQIrr %RAX<kill> - operand 1: %RAX<kill> LLVM ERROR: Found 1 machine code errors. This happens because, on entry to the pass, we have %RAX<def> = SUBREG_TO_REG 0, %R9D, 4 %XMM0<def> = MOV64toPQIrr %RAX<kill> The pass converts (around about line 132 in ExpandPostRAPseudos.cpp) the SUBREG_TO_REG pseudo op to %EAX<def> = MOV32rr %R9D Because of "-mcpu-atom", post RA scheduling is enabled, so is post RA li...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 13

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...CODE XREF: _main+107 j mov ebx, 1 jmp loc_100000D30 ; --------------------------------------------------------------------------- loc_100000DFA: ; CODE XREF: _main+5E j mov ecx, [rax+r8*4] lea r9d, [rcx+1] mov [rax+r8*4], r9d cmp ecx, r8d jge loc_100000F0E lea r12, [rax+4] xor r14d, r14d db 2Eh nop word ptr [rax+rax+00000000h] loc_100000E20:...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...1 >> jmp loc_100000D30 >> ; --------------------------------------------------------------------------- >> >> loc_100000DFA: ; CODE XREF: _main+5E j >> mov ecx, [rax+r8*4] >> lea r9d, [rcx+1] >> mov [rax+r8*4], r9d >> cmp ecx, r8d >> jge loc_100000F0E >> lea r12, [rax+4] >> xor r14d, r14d >> db 2Eh >>...

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

2

[LLVMdev] trunk's optimizer generates slower code than 3.5

...00000D30 >>>> ; --------------------------------------------------------------------------- >>>> >>>> loc_100000DFA: ; CODE XREF: _main+5E j >>>> mov ecx, [rax+r8*4] >>>> lea r9d, [rcx+1] >>>> mov [rax+r8*4], r9d >>>> cmp ecx, r8d >>>> jge loc_100000F0E >>>> lea r12, [rax+4] >>>> xor r14d, r14d >>>>...

Register Dataflow Analysis on X86

2020 Jan 10

2

Register Dataflow Analysis on X86

...<RSP>!(d1555):, u1568<SSP>!(d1556):] s1569: MOV32r0 [d1570<R10D>(\~d1554",,u3776"):d1565, d1571<EFLAGS>!(d1565,d1574,):] s1572: MOV32r0 [d1573<R8D>(\~d1554",,u3775"):d1570, d1574<EFLAGS>!(d1571,d1577,):] s1575: MOV32r0 [d1576<R9D>(\~d1554",,u3774"):d1573, d1577<EFLAGS>!(d1574,,u3773"):] ---> s1578: MOV64rm [d1579<R11>(\~d1554",,u3226"):d1576] b1580: --- %bb.37 --- preds(3): %bb.36, %bb.49, %bb.64 succs(1): %bb.38 p3209: phi [+d3210<RBP>(,d1731,u3212):, u3211<RB...

[LLVMdev] Splice and undefined physical reg

2015 Jul 28

1

[LLVMdev] Splice and undefined physical reg

...*** Bad machine code: Using an undefined physical register *** - function: foo - basic block: BB#126 (null) (0x6127658) - instruction: CALL64r %vreg41, <regmask>, %RSP<imp-use>, %EDI<imp-use>, %RSI<imp-use>, %RDX<imp-use>, %RCX<imp-use>, %R8<imp-use>, %R9D<imp-use>, %AL<imp-use>, %RSP<imp-def>, %EAX<imp-def>; GR64:%vreg41 How can i get rid of this errors? Thank you very much, -- Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/201507...

RFC: Insertion of nops for performance stability

2016 Nov 17

4

RFC: Insertion of nops for performance stability

...7: 01 c8 addl %ecx, %eax 9: 44 39 c0 cmpl %r8d, %eax c: 75 0f jne 15 <foo+0x1D> e: ff 05 00 00 00 00 incl (%rip) 14: ff 05 00 00 00 00 incl (%rip) 1a: 31 c0 xorl %eax, %eax 1c: c3 retq 1d: 44 39 c9 cmpl %r9d, %ecx 20: 74 ec je -20 <foo+0xE> 22: 48 8b 44 24 30 movq 48(%rsp), %rax 27: 2b 08 subl (%rax), %ecx 29: 39 d1 cmpl %edx, %ecx 2b: 7f e1 jg -31 <foo+0xE> 2d: 31 c0 xorl %eax, %eax 2f: c3 retq Note: the first...

[LLVMdev] bug in X86 disasm code?

2013 Sep 12

1

[LLVMdev] bug in X86 disasm code?

...de in X86DisassemblerDecoder.h #define EA_BASES_32BIT \ ENTRY(EAX) \ ENTRY(ECX) \ ENTRY(EDX) \ ENTRY(EBX) \ ENTRY(sib) \ ENTRY(EBP) \ ENTRY(ESI) \ ENTRY(EDI) \ ENTRY(R8D) \ ENTRY(R9D) \ ENTRY(R10D) \ ENTRY(R11D) \ ENTRY(R12D) \ ENTRY(R13D) \ ENTRY(R14D) \ ENTRY(R15D) the ENTRY(sib) looks suspicious. that should be ENTRY(ESP), no? thanks. J -------------- next part -------------- An HTML attachment was s...

[LLVMdev] splice and undefined physical reg

2015 Jul 28

1

[LLVMdev] splice and undefined physical reg

...code: Using an undefined physical register *** - function: parse_and_dump_tv_tag - basic block: BB#126 (null) (0x6127658) - instruction: CALL64r %vreg41, <regmask>, %RSP<imp-use>, %EDI<imp-use>, %RSI<imp-use>, %RDX<imp-use>, %RCX<imp-use>, %R8<imp-use>, %R9D<imp-use>, %AL<imp-use>, %RSP<imp-def>, %EAX<imp-def>; GR64:%vreg41 How can i get rid of this errors? Thank you very much, -- Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/201507...

[PATCH] MSVC2015U2 workaround, version 2

2016 May 02

3

[PATCH] MSVC2015U2 workaround, version 2

...esidual_partition_sums[partition] = (FLAC__uint32)_mm_cvtsi128_si32(mm_sum); into this: movq QWORD PTR [rsi], xmm2 while it should be movd eax, xmm2 mov QWORD PTR [rsi], rax With this patch, MSVC emits movq QWORD PTR [rsi], xmm2 mov DWORD PTR [rsi+4], r9d so the price of this workaround is 1 extra write instruction per partition. -------------- next part -------------- A non-text attachment was scrubbed... Name: msvc_bug_v2.patch Type: application/octet-stream Size: 2700 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/flac-dev/at...

[Bug 93557] New: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a

2016 Jan 02

13

[Bug 93557] New: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a

https://bugs.freedesktop.org/show_bug.cgi?id=93557 Bug ID: 93557 Summary: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a Product: xorg Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: blocker

windows ABI problem with i128?

2018 Apr 26

2

windows ABI problem with i128?

...%xmm0,-0x30(%rbp) 89: 66 0f d6 4d d8 movq %xmm1,-0x28(%rbp) 8e: 0f 10 45 d0 movups -0x30(%rbp),%xmm0 92: 0f 10 4d e0 movups -0x20(%rbp),%xmm1 96: 66 0f 74 c1 pcmpeqb %xmm1,%xmm0 9a: 66 44 0f d7 c8 pmovmskb %xmm0,%r9d 9f: 41 81 e9 ff ff 00 00 sub $0xffff,%r9d a6: 44 89 4d ac mov %r9d,-0x54(%rbp) aa: 74 06 je b2 <_start+0xa2> ac: eb 00 jmp ae <_start+0x9e> ae: eb 00 jmp b0 <_start+0xa0...

[LLVMdev] Passing structures by value on Windows

2010 Jun 03

4

[LLVMdev] Passing structures by value on Windows

...Some research showed, that { i16 i16 } addition also fails on x86, so I guess the problem is in passing structures as values. On x64 VC++ passes two { i32 i32 } structs in RCX and RDX respectively and reads result from RAX, but it seems LLVM reads parameters from ECX, EDX (first vector) and R8D, R9D (second vector). Currently, I can't figure out how to dump IR, but there is a link with disassembly shown by Visual Studio for generated functions: comparing { i32 i32 } add on 32-bit and 64-bit (first one works): http://pastebin.com/ijjCNWKJ Best regards, Milovanov Victor.

[Bug 63263] New: X server crash in nouveau_xv.c:NVPutImage (NVCopyNV12ColorPlanes)

2013 Apr 08

6

[Bug 63263] New: X server crash in nouveau_xv.c:NVPutImage (NVCopyNV12ColorPlanes)

...887>: mov %rdx,%r11 0x000000000000a62a <+4890>: mov %eax,%r10d 0x000000000000a62d <+4893>: nopl (%rax) 0x000000000000a630 <+4896>: lea (%rsi,%r13,1),%rdi 0x000000000000a634 <+4900>: xor %edx,%edx 0x000000000000a636 <+4902>: test %r9d,%r9d 0x000000000000a639 <+4905>: jle 0xaa2a <NVPutImage+5914> 0x000000000000a63f <+4911>: nop 0x000000000000a640 <+4912>: movzbl (%rdi,%rdx,2),%eax --> 0x000000000000a644 <+4916>: movzbl 0x1(%rsi,%rdx,2),%ecx 0x000000000000a649 <+4921>: s...

windows ABI problem with i128?

2018 Apr 26

0

windows ABI problem with i128?

...89: 66 0f d6 4d d8 movq %xmm1,-0x28(%rbp) > 8e: 0f 10 45 d0 movups -0x30(%rbp),%xmm0 > 92: 0f 10 4d e0 movups -0x20(%rbp),%xmm1 > 96: 66 0f 74 c1 pcmpeqb %xmm1,%xmm0 > 9a: 66 44 0f d7 c8 pmovmskb %xmm0,%r9d > 9f: 41 81 e9 ff ff 00 00 sub $0xffff,%r9d > a6: 44 89 4d ac mov %r9d,-0x54(%rbp) > aa: 74 06 je b2 <_start+0xa2> > ac: eb 00 jmp ae <_start+0x9e> > ae: eb 00...

[LLVMdev] better code for IV

2014 Feb 19

2

[LLVMdev] better code for IV

...br label %2 ; <label>:2 ; preds = %L_exit ret void } Asm code: ArrayAdd1: # @ArrayAdd1 .cfi_startproc # BB#0: # %Entry xorl %r9d, %r9d movabsq $4294967296, %r8 # imm = 0x100000000 .align 16, 0x90 .LBB0_1: # %L_entry # =>This Inner Loop Header: Depth=1 movq %r9, %rax sarq $32, %rax...

search for: r9d