thr3ads.net - llvm dev - [LLVMdev] Instruction Scheduling [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Fernando Magno Quintao Pereira

2008-Mar-01 06:52 UTC

[LLVMdev] Instruction Scheduling

Hi, guys,

     I am comparing the performance of the default scheduler (seems to be 
the one that minimizes register pressure) with no scheduler 
(-pre-RA-sched=none), and I got these numbers. The ratio is 
low_reg_pressure/none, that is, the lower the number, the better the 
performance with low register pressure:

CFP2000/177.mesa/177.mesa               1.00
CFP2000/179.art/179.art                 0.98
CFP2000/183.equake/183.equake           1.00
CFP2000/188.ammp/188.ammp               0.98
CINT2000/164.gzip/164.gzip              0.97
CINT2000/175.vpr/175.vpr                0.97
CINT2000/176.gcc/176.gcc                n/a // crashed!
CINT2000/181.mcf/181.mcf                1.02
CINT2000/186.crafty/186.crafty          1.00
CINT2000/197.parser/197.parser          1.01
CINT2000/252.eon/252.eon                n/a // never runs
CINT2000/253.perlbmk/253.perlbmk        1.05
CINT2000/254.gap/254.gap                0.97
CINT2000/255.vortex/255.vortex          1.00
CINT2000/256.bzip2/256.bzip2            0.98
CINT2000/300.twolf/300.twolf            0.92

In three cases, I got a ratio above 1 [Must mean: scheduling had a 
negative impact on performance.] I just run it once, but I was wondering 
if this could make sense, or if I am setting the tests wrongly. I am 
running the nightly test Makefile, in a x86 linux 32 bits machine.

best,

Fernando

Evan Cheng

2008-Mar-03 08:30 UTC

head link

[LLVMdev] Instruction Scheduling

It's hard to say. I remember -pre-RA-sched=none (when it used to  
exist) does a depth first traversal on the dag and translates the  
nodes in that order. It's not particularly good at anything.

I assume your target of choice is x86. In that case, yes the default  
is burr. On modern x86 cpu's, it's far more important to avoid  
register spills / restores. Scheduling for latency before register  
allocation hasn't proven to be a win. Benchmarking x86 is very very  
tricky. On many cases where obviously better code ended up being  
slower. Hidden hazards like loop alignment, instructions crossing  
instruction dispatch buffer, etc. are very hard to model. If the  
scheduler ended up reducing the number of instructions (and loads and  
stores), then it's doing its job. It's probably more important to  
those than the actual runtime.

Also, all x86 cpu's do not perform the same. Are you seeing these  
results on current generation of x86 cpu's? Are you using the latest  
llvm release (when I guess is not since -pre-RA-sched=none is gone)?

Evan

On Feb 29, 2008, at 10:52 PM, Fernando Magno Quintao Pereira wrote:
>
> Hi, guys,
>
>     I am comparing the performance of the default scheduler (seems  
> to be
> the one that minimizes register pressure) with no scheduler
> (-pre-RA-sched=none), and I got these numbers. The ratio is
> low_reg_pressure/none, that is, the lower the number, the better the
> performance with low register pressure:
>
> CFP2000/177.mesa/177.mesa               1.00
> CFP2000/179.art/179.art                 0.98
> CFP2000/183.equake/183.equake           1.00
> CFP2000/188.ammp/188.ammp               0.98
> CINT2000/164.gzip/164.gzip              0.97
> CINT2000/175.vpr/175.vpr                0.97
> CINT2000/176.gcc/176.gcc                n/a // crashed!
> CINT2000/181.mcf/181.mcf                1.02
> CINT2000/186.crafty/186.crafty          1.00
> CINT2000/197.parser/197.parser          1.01
> CINT2000/252.eon/252.eon                n/a // never runs
> CINT2000/253.perlbmk/253.perlbmk        1.05
> CINT2000/254.gap/254.gap                0.97
> CINT2000/255.vortex/255.vortex          1.00
> CINT2000/256.bzip2/256.bzip2            0.98
> CINT2000/300.twolf/300.twolf            0.92
>
> In three cases, I got a ratio above 1 [Must mean: scheduling had a
> negative impact on performance.] I just run it once, but I was  
> wondering
> if this could make sense, or if I am setting the tests wrongly. I am
> running the nightly test Makefile, in a x86 linux 32 bits machine.
>
> best,
>
> Fernando
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Mar 2008 - [LLVMdev] Instruction Scheduling

[LLVMdev] Instruction Scheduling

[LLVMdev] Instruction Scheduling

Maybe Matching Threads