thr3ads.net - search: "xmm8"

Displaying 20 results from an estimated 36 matches for "xmm8".

Did you mean: xmm0

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

2010 Nov 03

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

...ack dependencies. It is as if all p1 = .. expressions are collected at once followed by all p2 = .. expressions and so forth. p1 = p1 * a p1 = p1 * a . . p2 = p2 * b p2 = p2 * b . . p3 = p3 * c p3 = p3 * c . . An actual excerpt of the generated x86 assembly follows: mulss %xmm8, %xmm10 mulss %xmm8, %xmm10 . . repeated 512 times . mulss %xmm7, %xmm9 mulss %xmm7, %xmm9 . . repeated 512 times . mulss %xmm6, %xmm3 mulss %xmm6, %xmm3 . . repeated 512 times . Since p1, p2, p3, and p4 are all independent, this r...

[LLVMdev] Greedy register allocation

2011 May 03

[LLVMdev] Greedy register allocation

...ad header1: ; large blocks with lots of floating point ops header2: ; small loop using %vreg1 jnz header2 ... jnz header1 The def of %vreg1 has been hoisted by LICM so it is live across a block with lots of floating point code. The allocator uses the low xmm registers for the large block, and %xmm8 is left for %vreg1 which has a low spill weight. This significantly improves code size, but the small loop suffers. A low xmm register could be used for %vreg1, but would need to be rematerialized. The allocator won't go that far just to use cheaper registers. In this case it might have helpe...

[LLVMdev] Greedy register allocation

2011 May 03

[LLVMdev] Greedy register allocation

...ting point ops > header2: > ; small loop using %vreg1 > jnz header2 > ... > jnz header1 > > The def of %vreg1 has been hoisted by LICM so it is live across a > block with lots of floating point code. The allocator uses the low xmm > registers for the large block, and %xmm8 is left for %vreg1 which has > a low spill weight. This significantly improves code size, but the > small loop suffers. Why does %xmm8 have a low spill weight? It's used in an inner loop. > In this case it might have helped to split the live range and > rematerialize, but usually...

[LLVMdev] Poor register allocation (constants causing spilling)

2015 Jul 14

[LLVMdev] Poor register allocation (constants causing spilling)

...splitting rematerializable live-ranges lead to significantly better register allocation, and an overall performance improvement of 3%. *** The Problem Compile the attached testcase as follows: llc -mcpu=btver2 test.ll Examining the assembly in test.s we can see a constant is being loaded into %xmm8 (second instruction in foo). Tracing the constant we can see the following: foo: ... vmovaps .LCPI0_0(%rip), %xmm8 # xmm8 = [6.366197e-01,6.366197e-01,...] ... vmulps %xmm8, %xmm0, %xmm1 # first use of constant vmovaps %xmm8, %xmm9 # move constant into ano...

[LLVMdev] Greedy register allocation

2011 May 03

[LLVMdev] Greedy register allocation

...; small loop using %vreg1 >> jnz header2 >> ... >> jnz header1 >> > >> The def of %vreg1 has been hoisted by LICM so it is live across a >> block with lots of floating point code. The allocator uses the low xmm >> registers for the large block, and %xmm8 is left for %vreg1 which has >> a low spill weight. This significantly improves code size, but the >> small loop suffers. > > Why does %xmm8 have a low spill weight? It's used in an inner loop. Because it is rematerializable and live across a big block where it isn't us...

error:Ran out of lanemask bits to represent subregisterr

2017 Jul 19

error:Ran out of lanemask bits to represent subregisterr

...have made changes in 3 files: LaneBitmask.h, codegenregisters.cpp and miparser.cpp. files are attached here. Now i am getting following errors. which means registerinfo.inc file is not generated successfully. /PIM/lib/Target/X86/MCTargetDesc/X86BaseInfo.h:733:24: error: no member named 'XMM8' in namespace 'llvm::X86' if ((RegNo >= X86::XMM8 && RegNo <= X86::XMM31) || fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. When i comment out the line to construct 65536 bit register in registerinfo.td. it run fine. What t...

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

[LLVMdev] Can LLVM vectorize <2 x i32> type

...punpcklqdq %xmm4, %xmm5, %xmm4 # xmm4 = xmm5[0],xmm4[0] vpcmpgtq %xmm3, %xmm4, %xmm3 vptest %xmm3, %xmm3 je .LBB10_66 # BB#5: # %for.body.preheader vpaddq %xmm15, %xmm2, %xmm3 vpand %xmm15, %xmm3, %xmm3 vpaddq .LCPI10_1(%rip), %xmm3, %xmm8 vpand .LCPI10_5(%rip), %xmm8, %xmm5 vpxor %xmm4, %xmm4, %xmm4 vpcmpeqq %xmm4, %xmm5, %xmm6 vptest %xmm6, %xmm6 jne .LBB10_9 It turned out that the vector one is way more complicated than the scalar one. I was expecting that it would be not so tedious. On Fri, Jun 26,...

error:Ran out of lanemask bits to represent subregisterr

2017 Jul 20

error:Ran out of lanemask bits to represent subregisterr

...attached here. >> >> Now i am getting following errors. which means registerinfo.inc >> file is not generated successfully. >> >> /PIM/lib/Target/X86/MCTargetDesc/X86BaseInfo.h:733:24: error: >> no member named 'XMM8' in namespace 'llvm::X86' >> if ((RegNo >= X86::XMM8 && RegNo <= X86::XMM31) || >> >> >> fatal error: too many errors emitted, stopping now >> [-ferror-limit=] >> 20 errors generated. >> >>...

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

..... > > Look in X86InstrControl.td. The call instructions are all prefixed > by: > > let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, > FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, > XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, > XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], > > This is the fixed list of call-clobbered registers. It should really > be controlled by the calling convention of the called function > instead. > > The WINCALL* instructions only exist because of this. Ahh I see...

[LLVMdev] Greedy register allocation

2011 May 03

[LLVMdev] Greedy register allocation

...tivational case for live range >> splitting. > > Well, not really. Note there there are plenty of registers available > and no spilling is neccessary. Oh, I misunderstood then. Thanks for clarifying. > It's just that an REX prefix is required on some instructions when > %xmm8 is used. Is it worth it to undo LICM just for that? In this > case, probably. In general, no. Ah, so you're saying the regression is due to the inner loop icache footprint increasing. Ok, that makes total sense to me. I agree this is a difficult thing to get right in a general sort of way...

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

...in X86InstrControl.td. The call instructions are all prefixed >> by: >> >> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, >> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, >> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, >> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], >> >> This is the fixed list of call-clobbered registers. It should really >> be controlled by the calling convention of the called function >> instead. >> >> The WINCALL* instructions only exis...

error:Ran out of lanemask bits to represent subregisterr

2017 Jul 19

error:Ran out of lanemask bits to represent subregisterr

What about the static asserts protecting a Log call and another in the parser? On Wed, Jul 19, 2017 at 2:26 PM Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote: > On 7/19/2017 4:18 PM, Craig Topper wrote: > > LaneMask isn't as self contained as it should be. 64 bits is enough > > here. The problem is accidental leaking of the current size. > > > > For

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

...td. The call instructions are all prefixed >>> by: >>> >>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, >>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, >>> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, >>> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], >>> >>> This is the fixed list of call-clobbered registers. It should really >>> be controlled by the calling convention of the called function >>> instead. >>> >>> The WINCA...

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 24

[LLVMdev] Can LLVM vectorize <2 x i32> type

Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

...trol.td. The call instructions are all prefixed by: let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], This is the fixed list of call-clobbered registers. It should really be controlled by the calling convention of the called function instead. The WINCALL* instructions only exist because of this. One problem is that calling conventions are...

[LLVMdev] JIT on Windows x64

2009 Jun 30

[LLVMdev] JIT on Windows x64

...mpting to use the hack/patch proposed in this bug http://llvm.org/bugs/show_bug.cgi?id=3739. I checked out the revision the patch was created for (66183) and applied it but the assembler generated seems to fail whenever it reaches a movaps insctruction. eg. movaps xmmword ptr [rsp+20h],xmm8 movaps xmmword ptr [rsp+30h],xmm7 movaps xmmword ptr [rsp+40h],xmm6 Would this have something to do with the stack alignment? I am wondering if anybody else has had any success using that patch to get Windows x64 JIT to work correctly. Or if my problem may be unrelated. Any sugge...

[LLVMdev] Greedy register allocation

2011 May 04

[LLVMdev] Greedy register allocation

On May 3, 2011, at 4:08 PM, David A. Greene wrote: >> >> It's just that an REX prefix is required on some instructions when >> %xmm8 is used. Is it worth it to undo LICM just for that? In this >> case, probably. In general, no. > > Ah, so you're saying the regression is due to the inner loop icache > footprint increasing. Ok, that makes total sense to me. I agree this > is a difficult thing to get right...

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

Thanks for giving it a look! On 19.10.2010 23:21, Jakob Stoklund Olesen wrote: > On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote: > >> So I saw that the code is doing lots of register >> spilling/reloading. Now I understand that due to calling >> conventions, there's not really a way to avoid this - I tried using >> coldcc but apparently the backend

[LLVMdev] Codegen/Register allocation question.

2008 Sep 03

[LLVMdev] Codegen/Register allocation question.

...gt;, %MM5<imp-def,dead>, %MM6<imp-def,dead>, %MM7<imp-def,dead>, %XMM0<imp-def,dead>, %XMM1<imp-def,dead>, %XMM2<imp-def,dead>, %XMM3<imp-def,dead>, %XMM4<imp-def,dead>, %XMM5<imp-def,dead>, %XMM6<imp-def,dead>, %XMM7<imp-def,dead>, %XMM8<imp-def,dead>, %XMM9<imp-def,dead>, %XMM10<imp-def,dead>, %XMM11<imp-def,dead>, %XMM12<imp-def,dead>, %XMM13<imp-def,dead>, %XMM14<imp-def,dead>, %XMM15<imp-def,dead>, %EFLAGS<imp-def,dead>, %EAX<imp-def>, %ECX<imp-def,dead>, %EDI&lt...

[LLVMdev] Codegen/Register allocation question.

2008 Sep 04

[LLVMdev] Codegen/Register allocation question.

...,dead>, %MM6<imp-def,dead>, > %MM7<imp-def,dead>, %XMM0<imp-def,dead>, %XMM1<imp-def,dead>, > %XMM2<imp-def,dead>, %XMM3<imp-def,dead>, %XMM4<imp-def,dead>, > %XMM5<imp-def,dead>, %XMM6<imp-def,dead>, %XMM7<imp-def,dead>, > %XMM8<imp-def,dead>, %XMM9<imp-def,dead>, %XMM10<imp-def,dead>, > %XMM11<imp-def,dead>, %XMM12<imp-def,dead>, %XMM13<imp-def,dead>, > %XMM14<imp-def,dead>, %XMM15<imp-def,dead>, %EFLAGS<imp-def,dead>, > %EAX<imp-def>, %ECX<imp-def,de...

search for: xmm8