search for: dldreg

2011 Dec 08

[LLVMdev] Register allocation in two passes

...n there are actually none. What is happening is that although execution reaches to the line spiller().spill(LRE); inside RAGreedy::selectOrSplit() the insertion of the spill is avoided because the register gets rematted. This is the debug output I'm getting to show what I mean: Inline spilling DLDREGS:%vreg25,1.436782e-03 = [344r,640r:0) 0 at 344r >From original %vreg8,1.838235e-03 = [224r,640r:0) 0 at 224r Value %vreg25:0 at 344r may remat from %vreg25<def> = LDIWRdK 2; DLDREGS:%vreg25 remat: 632r %vreg28<def> = LDIWRdK 2; DLDREGS:%vreg28 640e %R15R14&lt...

2011 Mar 26

[LLVMdev] Possible missed optimization?

...%R25R24<kill>; PTRREGS:%vreg8 updated: 32L %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5 Joined. Result = %R25R24,inf = [0L,96d:0) 0 at 0L-phidef 32L %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5 Not coalescable. 64L %vreg6<def> = COPY %vreg4<kill>; DLDREGS:%vreg6,%vreg4 Considering merging %vreg4 with %vreg6 to DLDREGS RHS = %vreg4 = [48d,64d:0) 0 at 48d LHS = %vreg6 = [64d,80d:1)[80d,112d:0) 0 at 80d 1 at 64d updated: 48L %vreg6<def> = LDWRd %vreg5<kill>; mem:LD2[%a](align=1)(tbaa=!"int") DLDRE...

2011 Mar 26

[LLVMdev] Possible missed optimization?

Hello Jakob, thanks for the reply. The three regclasses involved here are all subsets from each other and aren't disjoint. These are the basic descriptions of the regclasses involved to show what i mean: DREGS: R31R30, R29R28 down to R1R0 (16 regs) DLDREGS: R31R30, R29R28 down to R17R16 (8 regs) PTRREGS: R31R30, R29R28, R27R26 (3 regs) All classes intersect each other giving as a result the smaller class: DREGSxDLDREGS=DLDREGS / DLDREGSxPTRREGS=PTRREGS, etc. That's why i think the coalescer should work since the regclasses overlap...

2011 Mar 26

[LLVMdev] Possible missed optimization?

...ja Ferrer wrote: > Hello Jakob, thanks for the reply. The three regclasses involved here are all subsets from each other and aren't disjoint. These are the basic descriptions of the regclasses involved to show what i mean: > > DREGS: R31R30, R29R28 down to R1R0 (16 regs) > DLDREGS: R31R30, R29R28 down to R17R16 (8 regs) > PTRREGS: R31R30, R29R28, R27R26 (3 regs) > > All classes intersect each other giving as a result the smaller class: DREGSxDLDREGS=DLDREGS / DLDREGSxPTRREGS=PTRREGS, etc. That's why i think the coalescer should work since the regc...

2011 Nov 30

[LLVMdev] Register allocation in two passes

On Nov 30, 2011, at 12:17 PM, Borja Ferrer wrote: > Thanks for all the hints Jakob, I've added the following piece of code after the spill code handling inside selectOrSplit() (ignoring some control logic): > > for (LiveIntervals::const_iterator I = LIS->begin(), E = LIS->end(); I != E; > ++I) > { > unsigned VirtReg = I->first; > if

2011 Dec 08

[LLVMdev] Register allocation in two passes

...ually none. > What is happening is that although execution reaches to the line spiller().spill(LRE); inside RAGreedy::selectOrSplit() the insertion of the spill is avoided because the register gets rematted. This is the debug output I'm getting to show what I mean: > > Inline spilling DLDREGS:%vreg25,1.436782e-03 = [344r,640r:0) 0 at 344r > From original %vreg8,1.838235e-03 = [224r,640r:0) 0 at 224r > Value %vreg25:0 at 344r may remat from %vreg25<def> = LDIWRdK 2; DLDREGS:%vreg25 > remat: 632r %vreg28<def> = LDIWRdK 2; DLDREGS:%vreg28 > 64...

2011 Nov 30

[LLVMdev] Register allocation in two passes

Thanks for all the hints Jakob, I've added the following piece of code after the spill code handling inside selectOrSplit() (ignoring some control logic): for (LiveIntervals::const_iterator I = LIS->begin(), E = LIS->end(); I != E; ++I) { unsigned VirtReg = I->first; if ((TargetRegisterInfo::isVirtualRegister(VirtReg)) && (VRM->getPhys(VirtReg)

2011 Mar 28

[LLVMdev] Possible missed optimization?

...EGS:%vreg8 > updated: 32L %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5 > Joined. Result = %R25R24,inf = [0L,96d:0) 0 at 0L-phidef > 32L %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5 > Not coalescable. > 64L %vreg6<def> = COPY %vreg4<kill>; DLDREGS:%vreg6,%vreg4 > Considering merging %vreg4 with %vreg6 to DLDREGS > RHS = %vreg4 = [48d,64d:0) 0 at 48d > LHS = %vreg6 = [64d,80d:1)[80d,112d:0) 0 at 80d 1 at 64d > updated: 48L %vreg6<def> = LDWRd %vreg5<kill>; mem:LD2[%a](align=1)(tbaa=!&q...

2011 Mar 25

[LLVMdev] Possible missed optimization?

...// int here is 16bits { *a &= 0xFF; } This is the code before regalloc: Live Ins: %R25R24 %vreg0<def> = COPY %R25R24; DREGS:%vreg0 %vreg2<def> = COPY %vreg0; PTRREGS:%vreg2 DREGS:%vreg0 %vreg1<def> = LDWRd %vreg2; mem:LD2[%a](align=1)(tbaa=!"int") DLDREGS:%vreg1 PTRREGS:%vreg2 %vreg3<def> = ANDIWRdK %vreg1, 255; DLDREGS:%vreg3,%vreg1 %vreg5<def> = COPY %vreg0; PTRREGS:%vreg5 DREGS:%vreg0 STWRr %vreg5, %vreg3<kill>; mem:ST2[%a](align=1)(tbaa=!"int") PTRREGS:%vreg5 DLDREGS:%vreg3 RET >From above, the 3r...

2011 Apr 26

[LLVMdev] Symbol folding with MC

Hello Jim thanks for the reply, For normal additions with immediates I've done the same as ARM does, basically transforming add(x, imm) nodes to sub(x, -imm) with a pattern in the .td file like this: def : Pat<(add DLDREGS:$src1, imm:$src2), (SUBIWRdK DLDREGS:$src1, (imm16_neg_XFORM imm:$src2))>; Now, the typical pattern concerning additions with global addresses looks like this: (taken from x86) def : Pat<(add GR32:$src1, (X86Wrapper tglobaladdr :$src2)), (ADD32ri GR32:$src1, tglo...

2011 Apr 26

[LLVMdev] Symbol folding with MC

On Apr 26, 2011, at 1:27 PM, Borja Ferrer wrote: > Hello Jim thanks for the reply, > > For normal additions with immediates I've done the same as ARM does, basically transforming add(x, imm) nodes to sub(x, -imm) with a pattern in the .td file like this: > def : Pat<(add DLDREGS:$src1, imm:$src2), > (SUBIWRdK DLDREGS:$src1, (imm16_neg_XFORM imm:$src2))>; > Cool. That's exactly the sort of thing I was referring to. > Now, the typical pattern concerning additions with global addresses looks like this: (taken from x86) > def : Pat<(add...

2011 Apr 27

[LLVMdev] Symbol folding with MC

...M, Borja Ferrer wrote: > > > Hello Jim thanks for the reply, > > > > For normal additions with immediates I've done the same as ARM does, > basically transforming add(x, imm) nodes to sub(x, -imm) with a pattern in > the .td file like this: > > def : Pat<(add DLDREGS:$src1, imm:$src2), > > (SUBIWRdK DLDREGS:$src1, (imm16_neg_XFORM imm:$src2))>; > > > > Cool. That's exactly the sort of thing I was referring to. > > > > Now, the typical pattern concerning additions with global addresses looks > like this: (ta...

2011 Mar 26

[LLVMdev] Possible missed optimization?

On Mar 24, 2011, at 5:42 PM, Borja Ferrer wrote: > The last copy instruction should be removed as pointed out above, but since R27R26 is killed in the load instruction it has to be emitted. About the insane amount of regclasses there, the load/store and the andi instructions take subsets of regs from the main register class, they cant work with all registers, that's why STW and LDW needs

2011 Apr 26

[LLVMdev] Symbol folding with MC

Hello, On Apr 26, 2011, at 6:30 AM, Borja Ferrer wrote: > Hello, I have some questions regarding folding operations with symbols during the instruction print stage with MC. At the moment I'm working with global symbols but i guess that other symbol types should be equivalent. > > My first question is how can i negate the address of a symbol? > > Consider this piece of code: