Displaying 20 results from an estimated 36 matches for "xmm8".
Did you mean:
xmm0
2010 Nov 03
1
[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism
...ack dependencies. It is as if
all p1 = .. expressions are collected at once followed by all p2 = .. expressions and so
forth.
p1 = p1 * a
p1 = p1 * a
.
.
p2 = p2 * b
p2 = p2 * b
.
.
p3 = p3 * c
p3 = p3 * c
.
.
An actual excerpt of the generated x86 assembly follows:
mulss %xmm8, %xmm10
mulss %xmm8, %xmm10
.
. repeated 512 times
.
mulss %xmm7, %xmm9
mulss %xmm7, %xmm9
.
. repeated 512 times
.
mulss %xmm6, %xmm3
mulss %xmm6, %xmm3
.
. repeated 512 times
.
Since p1, p2, p3, and p4 are all independent, this r...
2011 May 03
2
[LLVMdev] Greedy register allocation
...ad
header1:
; large blocks with lots of floating point ops
header2:
; small loop using %vreg1
jnz header2
...
jnz header1
The def of %vreg1 has been hoisted by LICM so it is live across a block with lots of floating point code. The allocator uses the low xmm registers for the large block, and %xmm8 is left for %vreg1 which has a low spill weight. This significantly improves code size, but the small loop suffers.
A low xmm register could be used for %vreg1, but would need to be rematerialized. The allocator won't go that far just to use cheaper registers.
In this case it might have helpe...
2011 May 03
0
[LLVMdev] Greedy register allocation
...ting point ops
> header2:
> ; small loop using %vreg1
> jnz header2
> ...
> jnz header1
>
> The def of %vreg1 has been hoisted by LICM so it is live across a
> block with lots of floating point code. The allocator uses the low xmm
> registers for the large block, and %xmm8 is left for %vreg1 which has
> a low spill weight. This significantly improves code size, but the
> small loop suffers.
Why does %xmm8 have a low spill weight? It's used in an inner loop.
> In this case it might have helped to split the live range and
> rematerialize, but usually...
2015 Jul 14
4
[LLVMdev] Poor register allocation (constants causing spilling)
...splitting rematerializable live-ranges lead to significantly
better register allocation, and an overall performance improvement of
3%.
*** The Problem
Compile the attached testcase as follows:
llc -mcpu=btver2 test.ll
Examining the assembly in test.s we can see a constant is being loaded
into %xmm8 (second instruction in foo). Tracing the constant we can
see the following:
foo:
...
vmovaps .LCPI0_0(%rip), %xmm8 # xmm8 = [6.366197e-01,6.366197e-01,...]
...
vmulps %xmm8, %xmm0, %xmm1 # first use of constant
vmovaps %xmm8, %xmm9 # move constant into ano...
2011 May 03
2
[LLVMdev] Greedy register allocation
...; small loop using %vreg1
>> jnz header2
>> ...
>> jnz header1
>>
>
>> The def of %vreg1 has been hoisted by LICM so it is live across a
>> block with lots of floating point code. The allocator uses the low xmm
>> registers for the large block, and %xmm8 is left for %vreg1 which has
>> a low spill weight. This significantly improves code size, but the
>> small loop suffers.
>
> Why does %xmm8 have a low spill weight? It's used in an inner loop.
Because it is rematerializable and live across a big block where it isn't us...
2017 Jul 19
5
error:Ran out of lanemask bits to represent subregisterr
...have made changes in 3 files:
LaneBitmask.h, codegenregisters.cpp and miparser.cpp. files are attached
here.
Now i am getting following errors. which means registerinfo.inc file is not
generated successfully.
/PIM/lib/Target/X86/MCTargetDesc/X86BaseInfo.h:733:24: error:
no member named 'XMM8' in namespace 'llvm::X86'
if ((RegNo >= X86::XMM8 && RegNo <= X86::XMM31) ||
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
When i comment out the line to construct 65536 bit register in
registerinfo.td. it run fine.
What t...
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...punpcklqdq %xmm4, %xmm5, %xmm4 # xmm4 = xmm5[0],xmm4[0]
vpcmpgtq %xmm3, %xmm4, %xmm3
vptest %xmm3, %xmm3
je .LBB10_66
# BB#5: # %for.body.preheader
vpaddq %xmm15, %xmm2, %xmm3
vpand %xmm15, %xmm3, %xmm3
vpaddq .LCPI10_1(%rip), %xmm3, %xmm8
vpand .LCPI10_5(%rip), %xmm8, %xmm5
vpxor %xmm4, %xmm4, %xmm4
vpcmpeqq %xmm4, %xmm5, %xmm6
vptest %xmm6, %xmm6
jne .LBB10_9
It turned out that the vector one is way more complicated than the scalar
one. I was expecting that it would be not so tedious.
On Fri, Jun 26,...
2017 Jul 20
2
error:Ran out of lanemask bits to represent subregisterr
...attached here.
>>
>> Now i am getting following errors. which means registerinfo.inc
>> file is not generated successfully.
>>
>> /PIM/lib/Target/X86/MCTargetDesc/X86BaseInfo.h:733:24: error:
>> no member named 'XMM8' in namespace 'llvm::X86'
>> if ((RegNo >= X86::XMM8 && RegNo <= X86::XMM31) ||
>>
>>
>> fatal error: too many errors emitted, stopping now
>> [-ferror-limit=]
>> 20 errors generated.
>>
>>...
2010 Oct 20
2
[LLVMdev] llvm register reload/spilling around calls
.....
>
> Look in X86InstrControl.td. The call instructions are all prefixed
> by:
>
> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2,
> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10,
> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
>
> This is the fixed list of call-clobbered registers. It should really
> be controlled by the calling convention of the called function
> instead.
>
> The WINCALL* instructions only exist because of this.
Ahh I see...
2011 May 03
0
[LLVMdev] Greedy register allocation
...tivational case for live range
>> splitting.
>
> Well, not really. Note there there are plenty of registers available
> and no spilling is neccessary.
Oh, I misunderstood then. Thanks for clarifying.
> It's just that an REX prefix is required on some instructions when
> %xmm8 is used. Is it worth it to undo LICM just for that? In this
> case, probably. In general, no.
Ah, so you're saying the regression is due to the inner loop icache
footprint increasing. Ok, that makes total sense to me. I agree this
is a difficult thing to get right in a general sort of way...
2010 Oct 20
0
[LLVMdev] llvm register reload/spilling around calls
...in X86InstrControl.td. The call instructions are all prefixed
>> by:
>>
>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2,
>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
>> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10,
>> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
>>
>> This is the fixed list of call-clobbered registers. It should really
>> be controlled by the calling convention of the called function
>> instead.
>>
>> The WINCALL* instructions only exis...
2017 Jul 19
2
error:Ran out of lanemask bits to represent subregisterr
What about the static asserts protecting a Log call and another in the
parser?
On Wed, Jul 19, 2017 at 2:26 PM Krzysztof Parzyszek <kparzysz at codeaurora.org>
wrote:
> On 7/19/2017 4:18 PM, Craig Topper wrote:
> > LaneMask isn't as self contained as it should be. 64 bits is enough
> > here. The problem is accidental leaking of the current size.
> >
> > For
2010 Oct 20
1
[LLVMdev] llvm register reload/spilling around calls
...td. The call instructions are all prefixed
>>> by:
>>>
>>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2,
>>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
>>> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10,
>>> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
>>>
>>> This is the fixed list of call-clobbered registers. It should really
>>> be controlled by the calling convention of the called function
>>> instead.
>>>
>>> The WINCA...
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi,
Is LLVM be able to generate code for the following code?
%mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type.
I am running it on a Haswell processor with LLVM-3.4.2. It seems that it
will generates really complicated code with vpaddq, vpmuludq, vpsllq,
vpsrlq.
Thanks,
Zhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2010 Oct 20
0
[LLVMdev] llvm register reload/spilling around calls
...trol.td. The call instructions are all prefixed by:
let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11,
FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0, ST1,
MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
This is the fixed list of call-clobbered registers. It should really be controlled by the calling convention of the called function instead.
The WINCALL* instructions only exist because of this.
One problem is that calling conventions are...
2009 Jun 30
2
[LLVMdev] JIT on Windows x64
...mpting to use
the hack/patch proposed in this bug
http://llvm.org/bugs/show_bug.cgi?id=3739.
I checked out the revision the patch was created for (66183) and applied
it but the assembler generated seems to fail whenever it reaches a
movaps insctruction.
eg.
movaps xmmword ptr [rsp+20h],xmm8
movaps xmmword ptr [rsp+30h],xmm7
movaps xmmword ptr [rsp+40h],xmm6
Would this have something to do with the stack alignment?
I am wondering if anybody else has had any success using that patch to
get Windows x64 JIT to work correctly. Or if my problem may be unrelated.
Any sugge...
2011 May 04
4
[LLVMdev] Greedy register allocation
On May 3, 2011, at 4:08 PM, David A. Greene wrote:
>>
>> It's just that an REX prefix is required on some instructions when
>> %xmm8 is used. Is it worth it to undo LICM just for that? In this
>> case, probably. In general, no.
>
> Ah, so you're saying the regression is due to the inner loop icache
> footprint increasing. Ok, that makes total sense to me. I agree this
> is a difficult thing to get right...
2010 Oct 20
3
[LLVMdev] llvm register reload/spilling around calls
Thanks for giving it a look!
On 19.10.2010 23:21, Jakob Stoklund Olesen wrote:
> On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote:
>
>> So I saw that the code is doing lots of register
>> spilling/reloading. Now I understand that due to calling
>> conventions, there's not really a way to avoid this - I tried using
>> coldcc but apparently the backend
2008 Sep 03
2
[LLVMdev] Codegen/Register allocation question.
...gt;, %MM5<imp-def,dead>, %MM6<imp-def,dead>,
%MM7<imp-def,dead>, %XMM0<imp-def,dead>, %XMM1<imp-def,dead>,
%XMM2<imp-def,dead>, %XMM3<imp-def,dead>, %XMM4<imp-def,dead>,
%XMM5<imp-def,dead>, %XMM6<imp-def,dead>, %XMM7<imp-def,dead>,
%XMM8<imp-def,dead>, %XMM9<imp-def,dead>, %XMM10<imp-def,dead>,
%XMM11<imp-def,dead>, %XMM12<imp-def,dead>, %XMM13<imp-def,dead>,
%XMM14<imp-def,dead>, %XMM15<imp-def,dead>, %EFLAGS<imp-def,dead>,
%EAX<imp-def>, %ECX<imp-def,dead>, %EDI<...
2008 Sep 04
0
[LLVMdev] Codegen/Register allocation question.
...,dead>, %MM6<imp-def,dead>,
> %MM7<imp-def,dead>, %XMM0<imp-def,dead>, %XMM1<imp-def,dead>,
> %XMM2<imp-def,dead>, %XMM3<imp-def,dead>, %XMM4<imp-def,dead>,
> %XMM5<imp-def,dead>, %XMM6<imp-def,dead>, %XMM7<imp-def,dead>,
> %XMM8<imp-def,dead>, %XMM9<imp-def,dead>, %XMM10<imp-def,dead>,
> %XMM11<imp-def,dead>, %XMM12<imp-def,dead>, %XMM13<imp-def,dead>,
> %XMM14<imp-def,dead>, %XMM15<imp-def,dead>, %EFLAGS<imp-def,dead>,
> %EAX<imp-def>, %ECX<imp-def,de...