thr3ads.net - llvm dev - [LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3 [Sep 2013]

If this information is useful, please help other people find it:
Share via:

Ghassan Shobaki

2013-Sep-17 18:04 UTC

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

Hi Andy,

We have done some experimental evaluation of the different schedulers in LLVM
3.3 (source, BURR, ILP, fast, MI). The evaluation was done
on x86-64 using SPEC CPU2006. We have measured both the amount of spill code as
well as the execution time as detailed below.

Here are our main findings:

1. The SD schedulers significantly impact the spill counts and the execution
times for many benchmarks, but the machine instruction (MI) scheduler in 3.3
has very limited impact on both spill counts and execution times. Is this
because most of you work on MI did not make it into the 3.3 release? We
don't have a strong motivation to test the trunk at this point (we'll
wait for 3.4), because we are working on a
publication and prefer to base that on an official release. However, if you tell
me that you expect things to be significantly different in the trunk, we'll
try
to find the time to give that a shot (unfortunately, we only have one test
machine, and SPEC tests take a lot of time as detailed below).


2. The BURR scheduler gives the minimum amount of spill code
and the best overall execution time (SPEC geo-mean).

3. The source scheduler is the second best scheduler in terms of spill code and
execution time, and its performance is very close to that of BURR in both
metrics. This result is surprising for me, because, as far as I understand,
this scheduler is a conservative scheduler that tries to preserve the original
program order, isn't it? Does this result surprise you?

4. The ILP scheduler has the worst execution times on FP2006 and the second
worst spill counts, although it is the default on x86-64. Is this surprising?
BTW, Dragon Egg sets the scheduler to source. On Line 368 in Backend.cpp, we
find:
if (!flag_schedule_insns)
    Args.push_back("--pre-RA-sched=source");   

Here are the details of our results:

Spill Counts
---------------
CPU2006 has a total of 47448 functions, out of which 10363
functions (22%) have spills. If we break this down by FP and INT, we’ll see
that 42% of the functions in FP2006 have spills, while 10% of the functions in
INT2006 have spills. The amount of spill code was measured by printing the
number of ranges spilled by the default (greedy) register allocator (printing
the variable NumSpilledRanges in InlineSpiller.cpp). This is not a perfectly
accurate
metric, but, given the large sample size (> 10K functions), the total number
of spills across such a statistically significant sample is believed to give a
very strong indication about each scheduler's performance at reducing
register
pressure. The differences in the table below are calculated relative to the
source scheduler. 
 
Heuristic Total Source      
  Spills Spills Spill Difference % Spill Difference 
Source 294471 294471 0 0.00% 
ILP 298222 294471 3751 1.27% 
BURR 287932 294471 -6539 -2.22% 
Fast 312787 294471 18316 6.22% 
source + MI 294979 294471 508 0.17% 
ILP + MI 296681 294471 2210 0.75% 
BURR + MI 289328 294471 -5143 -1.75% 
Fast + MI 302131 294471 7660 2.60% 

So, the best register pressure reduction scheduler is BURR. Note that enabling
the MI scheduler makes things better when the SD scheduler is relatively weak
at reducing register pressure (fast or ILP), while it makes things worse when
the SD scheduler is relatively good at reducing register pressure (BURR or
source).


Execution Times
---------------------
Execution times were measured by running the benchmarks on
an x86-64 machine with 5 or 9 iterations per benchmark as indicated below (in
most cases, no significant difference was observed between 9 iterations (which
take about two days) and 5 iterations (which take about one day)). The
differences in the table below are calculated relative to the source scheduler.
The %Diff Max (Min) is the maximum (minimum) percentage difference on a single
benchmark between each scheduler and the source scheduler. These numbers show
the differences on individual FP benchmarks can be quite significant.


Heuristic FP %Diff FP %Diff  FP %Diff INT %Diff INT %Diff INT %Diff iterations 
  Geo-mean Max Min Geo-mean Max Min   
source 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%   
ILP -2.02% 2.30% -22.04% 0.42% 3.61% -2.16% 9 iterations 
BURR 0.70% 8.62% -5.56% 0.66% 3.09% -1.40% 9 iteratation 
fast -1.34% 9.48% -6.72% 0.12% 3.09% -2.34% 5 iterations 
source + MI 0.21% 3.42% -1.26% -0.01% 0.83% -0.94% 5 iterations 
 
Most of the above performance differences have been correlated with significant
changes in spill counts in hot functions. Note that the ILP scheduler causes a
degradation of 22% on
one benchmark (lbm) relative to the source scheduler. We have verified that this
happens because of poor scheduling that increases the register pressure and thus
leads to generating excessive spills in this benchmark’s
hottest loop. We should
probably report this as a performance bug if ILP stays the default scheduler on
x86-64.

Regards

Ghassan Shobaki
Assistant Professor 
Department of Computer Science 
PrincessSumayaUniversity for Technology 
Amman, Jordan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130917/a5c1400e/attachment.html>

Benjamin Kramer

2013-Sep-19 13:53 UTC

head link

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

On 17.09.2013, at 20:04, Ghassan Shobaki <ghassan_shobaki at yahoo.com>
wrote:
> Hi Andy,
> 
> We have done some experimental evaluation of the different schedulers in
LLVM 3.3 (source, BURR, ILP, fast, MI). The evaluation was done on x86-64 using
SPEC CPU2006. We have measured both the amount of spill code as well as the
execution time as detailed below.
> 
> Here are our main findings:
> 
> 1. The SD schedulers significantly impact the spill counts and the
execution times for many benchmarks, but the machine instruction (MI) scheduler
in 3.3 has very limited impact on both spill counts and execution times. Is this
because most of you work on MI did not make it into the 3.3 release? We
don't have a strong motivation to test the trunk at this point (we'll
wait for 3.4), because we are working on a publication and prefer to base that
on an official release. However, if you tell me that you expect things to be
significantly different in the trunk, we'll try to find the time to give
that a shot (unfortunately, we only have one test machine, and SPEC tests take a
lot of time as detailed below).
> 
> 2. The BURR scheduler gives the minimum amount of spill code and the best
overall execution time (SPEC geo-mean).
> 
> 3. The source scheduler is the second best scheduler in terms of spill code
and execution time, and its performance is very close to that of BURR in both
metrics. This result is surprising for me, because, as far as I understand, this
scheduler is a conservative scheduler that tries to preserve the original
program order, isn't it? Does this result surprise you?
> 
> 4. The ILP scheduler has the worst execution times on FP2006 and the second
worst spill counts, although it is the default on x86-64. Is this surprising?
BTW, Dragon Egg sets the scheduler to source. On Line 368 in Backend.cpp, we
find:
> if (!flag_schedule_insns)
>     Args.push_back("--pre-RA-sched=source");   
> 
> Here are the details of our results:
> 
> Spill Counts
> ---------------
> CPU2006 has a total of 47448 functions, out of which 10363 functions (22%)
have spills. If we break this down by FP and INT, we’ll see that 42% of the
functions in FP2006 have spills, while 10% of the functions in INT2006 have
spills. The amount of spill code was measured by printing the number of ranges
spilled by the default (greedy) register allocator (printing the variable
NumSpilledRanges in InlineSpiller.cpp). This is not a perfectly accurate metric,
but, given the large sample size (> 10K functions), the total number of
spills across such a statistically significant sample is believed to give a very
strong indication about each scheduler's performance at reducing register
pressure. The differences in the table below are calculated relative to the
source scheduler.
>  
> Heuristic
> Total
> Source
>  
>  
>  
> Spills
> Spills
> Spill Difference
> % Spill Difference
> Source
> 294471
> 294471
> 0
> 0.00%
> ILP
> 298222
> 294471
> 3751
> 1.27%
> BURR
> 287932
> 294471
> -6539
> -2.22%
> Fast
> 312787
> 294471
> 18316
> 6.22%
> source + MI
> 294979
> 294471
> 508
> 0.17%
> ILP + MI
> 296681
> 294471
> 2210
> 0.75%
> BURR + MI
> 289328
> 294471
> -5143
> -1.75%
> Fast + MI
> 302131
> 294471
> 7660
> 2.60%
> 
> So, the best register pressure reduction scheduler is BURR. Note that
enabling the MI scheduler makes things better when the SD scheduler is
relatively weak at reducing register pressure (fast or ILP), while it makes
things worse when the SD scheduler is relatively good at reducing register
pressure (BURR or source).
> 
> Execution Times
> ---------------------
> Execution times were measured by running the benchmarks on an x86-64
machine with 5 or 9 iterations per benchmark as indicated below (in most cases,
no significant difference was observed between 9 iterations (which take about
two days) and 5 iterations (which take about one day)). The differences in the
table below are calculated relative to the source scheduler. The %Diff Max (Min)
is the maximum (minimum) percentage difference on a single benchmark between
each scheduler and the source scheduler. These numbers show the differences on
individual FP benchmarks can be quite significant.
> 
> Heuristic
> FP %Diff
> FP %Diff
>  FP %Diff
> INT %Diff
> INT %Diff
> INT %Diff
> iterations
>  
> Geo-mean
> Max
> Min
> Geo-mean
> Max
> Min
>  
> source
> 0.00%
> 0.00%
> 0.00%
> 0.00%
> 0.00%
> 0.00%
>  
> ILP
> -2.02%
> 2.30%
> -22.04%
> 0.42%
> 3.61%
> -2.16%
> 9 iterations
> BURR
> 0.70%
> 8.62%
> -5.56%
> 0.66%
> 3.09%
> -1.40%
> 9 iteratation
> fast
> -1.34%
> 9.48%
> -6.72%
> 0.12%
> 3.09%
> -2.34%
> 5 iterations
> source + MI
> 0.21%
> 3.42%
> -1.26%
> -0.01%
> 0.83%
> -0.94%
> 5 iterations
>  
> Most of the above performance differences have been correlated with
significant changes in spill counts in hot functions. Note that the ILP
scheduler causes a degradation of 22% on one benchmark (lbm) relative to the
source scheduler. We have verified that this happens because of poor scheduling
that increases the register pressure and thus leads to generating excessive
spills in this benchmark’s hottest loop. We should probably report this as a
performance bug if ILP stays the default scheduler on x86-64.
I find the results surprising, too. What CPU did you perform your tests on,
scheduler performance can vary a lot depending on the microarchitecture of your
chip.

- Ben

Ghassan Shobaki

2013-Sep-19 14:18 UTC

head link

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

Our test machine has two Intel
Xeon E5540 processors running at 2.53 GHz with 24 GB of memory. Each CPU has 8
threads (16 threads in total). All our tests, however, were single threaded.
Which result is particularly surprising for you? The low impact of the MI
scheduler, the relatively good performance of the source scheduler or the
relatively poor performance of the ILP scheduler?

Thanks
-Ghassan  


________________________________
 From: Benjamin Kramer <benny.kra at gmail.com>
To: Ghassan Shobaki <ghassan_shobaki at yahoo.com> 
Cc: Andrew Trick <atrick at apple.com>; "llvmdev at cs.uiuc.edu"
<llvmdev at cs.uiuc.edu>
Sent: Thursday, September 19, 2013 4:53 PM
Subject: Re: [LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3
 


On 17.09.2013, at 20:04, Ghassan Shobaki <ghassan_shobaki at yahoo.com>
wrote:
> Hi Andy,
> 
> We have done some experimental evaluation of the different schedulers in
LLVM 3.3 (source, BURR, ILP, fast, MI). The evaluation was done on x86-64 using
SPEC CPU2006. We have measured both the amount of spill code as well as the
execution time as detailed below.
> 
> Here are our main findings:
> 
> 1. The SD schedulers significantly impact the spill counts and the
execution times for many benchmarks, but the machine instruction (MI) scheduler
in 3.3 has very limited impact on both spill counts and execution times. Is this
because most of you work on MI did not make it into the 3.3 release? We
don't have a strong motivation to test the trunk at this point (we'll
wait for 3.4), because we are working on a publication and prefer to base that
on an official release. However, if you tell me that you expect things to be
significantly different in the trunk, we'll try to find the time to give
that a shot (unfortunately, we only have one test machine, and SPEC tests take a
lot of time as detailed below).
> 
> 2. The BURR scheduler gives the minimum amount of spill code and the best
overall execution time (SPEC geo-mean).
> 
> 3. The source scheduler is the second best scheduler in terms of spill code
and execution time, and its performance is very close to that of BURR in both
metrics. This result is surprising for me, because, as far as I understand, this
scheduler is a conservative scheduler that tries to preserve the original
program order, isn't it? Does this result surprise you?
> 
> 4. The ILP scheduler has the worst execution times on FP2006 and the second
worst spill counts, although it is the default on x86-64. Is this surprising?
BTW, Dragon Egg sets the scheduler to source. On Line 368 in Backend.cpp, we
find:
> if (!flag_schedule_insns)
>     Args.push_back("--pre-RA-sched=source");  
> 
> Here are the details of our results:
> 
> Spill Counts
> ---------------
> CPU2006 has a total of 47448 functions, out of which 10363 functions (22%)
have spills. If we break this down by FP and INT, we’ll see that 42% of the
functions in FP2006 have spills, while 10% of the functions in INT2006 have
spills. The amount of spill code was measured by printing the number of ranges
spilled by the default (greedy) register allocator (printing the variable
NumSpilledRanges in InlineSpiller.cpp). This is not a perfectly accurate metric,
but, given the large sample size (> 10K functions), the total number of
spills across such a statistically significant sample is believed to give a very
strong indication about each scheduler's performance at reducing register
pressure. The differences in the table below are calculated relative to the
source scheduler.
>  
> Heuristic
> Total
> Source
>  
>  
>  
> Spills
> Spills
> Spill Difference
> % Spill Difference
> Source
> 294471
> 294471
> 0
> 0.00%
> ILP
> 298222
> 294471
> 3751
> 1.27%
> BURR
> 287932
> 294471
> -6539
> -2.22%
> Fast
> 312787
> 294471
> 18316
> 6.22%
> source + MI
> 294979
> 294471
> 508
> 0.17%
> ILP + MI
> 296681
> 294471
> 2210
> 0.75%
> BURR + MI
> 289328
> 294471
> -5143
> -1.75%
> Fast + MI
> 302131
> 294471
> 7660
> 2.60%
> 
> So, the best register pressure reduction scheduler is BURR. Note that
enabling the MI scheduler makes things better when the SD scheduler is
relatively weak at reducing register pressure (fast or ILP), while it makes
things worse when the SD scheduler is relatively good at reducing register
pressure (BURR or source).
> 
> Execution Times
> ---------------------
> Execution times were measured by running the benchmarks on an x86-64
machine with 5 or 9 iterations per benchmark as indicated below (in most cases,
no significant difference was observed between 9 iterations (which take about
two days) and 5 iterations (which take about one day)). The differences in the
table below are calculated relative to the source scheduler. The %Diff Max (Min)
is the maximum (minimum) percentage difference on a single benchmark between
each scheduler and the source scheduler. These numbers show the differences on
individual FP benchmarks can be quite significant.
> 
> Heuristic
> FP %Diff
> FP %Diff
>  FP %Diff
> INT %Diff
> INT %Diff
> INT %Diff
> iterations
>  
> Geo-mean
> Max
> Min
> Geo-mean
> Max
> Min
>  
> source
> 0.00%
> 0.00%
> 0.00%
> 0.00%
> 0.00%
> 0.00%
>  
> ILP
> -2.02%
> 2.30%
> -22.04%
> 0.42%
> 3.61%
> -2.16%
> 9 iterations
> BURR
> 0.70%
> 8.62%
> -5.56%
> 0.66%
> 3.09%
> -1.40%
> 9 iteratation
> fast
> -1.34%
> 9.48%
> -6.72%
> 0.12%
> 3.09%
> -2.34%
> 5 iterations
> source + MI
> 0.21%
> 3.42%
> -1.26%
> -0.01%
> 0.83%
> -0.94%
> 5 iterations
>  
> Most of the above performance differences have been correlated with
significant changes in spill counts in hot functions. Note that the ILP
scheduler causes a degradation of 22% on one benchmark (lbm) relative to the
source scheduler. We have verified that this happens because of poor scheduling
that increases the register pressure and thus leads to generating excessive
spills in this benchmark’s hottest loop. We should probably report this as a
performance bug if ILP stays the default scheduler on x86-64.
I find the results surprising, too. What CPU did you perform your tests on,
scheduler performance can vary a lot depending on the microarchitecture of your
chip.

- Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130919/3b7373ae/attachment.html>

Renato Golin

2013-Sep-19 14:30 UTC

head link

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

On 17 September 2013 19:04, Ghassan Shobaki <ghassan_shobaki at
yahoo.com>wrote:
> We have done some experimental evaluation of the different schedulers in
> LLVM 3.3 (source, BURR, ILP, fast, MI). The evaluation was done on x86-64
> using SPEC CPU2006. We have measured both the amount of spill code as well
> as the execution time as detailed below.
>
Hi Ghassan,

This is an amazing piece of work, thanks for doing this. We need more
benchmarks like yours, and more often, too.


3. The source scheduler is the second best scheduler in terms of spill
code> and execution time, and its performance is very close to that of BURR in
> both metrics. This result is surprising for me, because, as far as I
> understand, this scheduler is a conservative scheduler that tries to
> preserve the original program order, isn't it? Does this result
surprise
> you?
>
Well, SPEC is an old benchmark, when code was written to accommodate the
hardware requirements, so preserving the code order might not be that big
of a deal on SPEC, as it is on other types of code. So far, I haven't found
SPEC being too good to judge overall compilers' performance, but specific
micro-optimized features.

Besides, hardware and software are designed nowadays based on some version
of Dhrystone, EEMBC, SPEC or CoreMark, so it's not impossible to see 50%
increase in performance with little changes in either.


4. The ILP scheduler has the worst execution times on FP2006 and the
second> worst spill counts, although it is the default on x86-64. Is this
> surprising? BTW, Dragon Egg sets the scheduler to source. On Line 368 in
> Backend.cpp, we find:
> if (!flag_schedule_insns)
>     Args.push_back("--pre-RA-sched=source");
>
This looks like someone ran a similar test and did the sensible thing. How
that reflects with Clang, or how important it is to be the default, I don't
know. This is the same discussion as the optimization levels, and what
passes should be included in what. It also depends on which scheduler will
evolve faster or further in time, and what kind of code you're compiling...


This is not a perfectly accurate metric, but, given the large sample
size> (> 10K functions), the total number of spills across such a
statistically
> significant sample is believed to give a very strong indication about each
> scheduler's performance at reducing register pressure.
>
I agree this is a good enough metric, but I'd be cautious in stating that
there is a "very strong indication about each scheduler's
performance".
SPEC is, after all, a special case in compiler/hardware world, and anything
you see here might not happen anywhere else.

Real world, modern code, (such as LAMP stack, browsers, office suites, etc)
are written expecting the compiler to do magic, while old-school benchmarks
weren't, and they were optimized for decades by both compiler and hardware
engineers.



> The %Diff Max (Min) is the maximum (minimum) percentage difference on a
> single benchmark between each scheduler and the source scheduler. These
> numbers show the differences on individual FP benchmarks can be quite
> significant.
>
I'm surprised that you didn't run "source" 5/9 times, too. Did
you get the
exact performance numbers multiple times? Would be good to have a more
realistic geo-mean for source as well, so we could estimate how much the
other geo-means vary in comparison to source's.


Most of the above performance differences have been correlated
with> significant changes in spill counts in hot functions.
>
Which is a beautiful correlation between spill-rate and performance,
showing that your metrics are at least reasonably accurate, for all
purposes.


We should probably report this as a performance bug if ILP stays
the> default scheduler on x86-64.
>
You should, regardless of what's the default choice.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130919/2317ac20/attachment.html>

Ghassan Shobaki

2013-Sep-19 16:25 UTC

head link

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

Hi Renato,

Please see my answers below.

Thanks
-Ghassan

________________________________
 From: Renato Golin <renato.golin at linaro.org>
To: Ghassan Shobaki <ghassan_shobaki at yahoo.com> 
Cc: Andrew Trick <atrick at apple.com>; "llvmdev at cs.uiuc.edu"
<llvmdev at cs.uiuc.edu>
Sent: Thursday, September 19, 2013 5:30 PM
Subject: Re: [LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

On 17 September 2013 19:04, Ghassan Shobaki <ghassan_shobaki at yahoo.com>
wrote:

We have done some experimental evaluation of the different schedulers in LLVM
3.3 (source, BURR, ILP, fast, MI). The evaluation was done
on x86-64 using SPEC CPU2006. We have measured both the amount of spill code as
well as the execution time as detailed below.>
Hi Ghassan,

This is an amazing piece of work, thanks for doing this. We need more benchmarks
like yours, and more often, too.

3. The source scheduler is the second best scheduler in terms of spill code and
execution time, and its performance is very close to that of BURR in both
metrics. This result is surprising for me, because, as far as I understand,
this scheduler is a conservative scheduler that tries to preserve the original
program order, isn't it? Does this result surprise you?

Well, SPEC is an old benchmark, when code was written to accommodate the
hardware requirements, so preserving the code order might not be that big of a
deal on SPEC, as it is on other types of code. So far, I haven't found SPEC
being too good to judge overall compilers' performance, but specific
micro-optimized features.

Besides, hardware and software are designed nowadays based on some version of
Dhrystone, EEMBC, SPEC or CoreMark, so it's not impossible to see 50%
increase in performance with little changes in either.

Ghassan: You have made me so curious to try other benchmarks in our future work.
Most academic publications on CPU performance though use SPEC. You can even find
some recent publications that are still using SPEC CPU2000! When I was at AMD in
2009, performance optimization and benchmarking was all about SPEC CPU2006. Have
things changed so much in the past 4 years? And the more important question is:
what specific features do these non-SPEC benchmarks have that are likely to
affect the scheduler's register pressure reduction behavior?

4. The ILP scheduler has the worst execution times on FP2006 and the second
worst spill counts, although it is the default on x86-64. Is this surprising?
BTW, Dragon Egg sets the scheduler to source. On Line 368 in Backend.cpp, we
find:>
>if (!flag_schedule_insns)
>    Args.push_back("--pre-RA-sched=source");
This looks like someone ran a similar test and did the sensible thing. How that
reflects with Clang, or how important it is to be the default, I don't know.
This is the same discussion as the optimization levels, and what passes should
be included in what. It also depends on which scheduler will evolve faster or
further in time, and what kind of code you're compiling...

This is not a perfectly accurate
metric, but, given the large sample size (> 10K functions), the total number
of spills across such a statistically significant sample is believed to give a
very strong indication about each scheduler's performance at reducing
register
pressure.

I agree this is a good enough metric, but I'd be cautious in stating that
there is a "very strong indication about each scheduler's
performance". SPEC is, after all, a special case in compiler/hardware
world, and anything you see here might not happen anywhere else. 

Real world, modern code, (such as LAMP stack, browsers, office suites, etc) are
written expecting the compiler to do magic, while old-school benchmarks
weren't, and they were optimized for decades by both compiler and hardware
engineers.

Ghassan: Can you please give more specific features in these modern benchmarks
that affect spill code reduction? Note that our study included over ten thousand
functions with spills. Such a large sample is expected to cover many different
kinds of behavior, and that's why I am calling it a "statistically
significant" sample. 

The %Diff Max (Min) is the maximum (minimum) percentage difference on a single
benchmark between each scheduler and the source scheduler. These numbers show
the differences on individual FP benchmarks can be quite significant.

I'm surprised that you didn't run "source" 5/9 times, too. Did
you get the exact performance numbers multiple times? Would be good to have a
more realistic geo-mean for source as well, so we could estimate how much the
other geo-means vary in comparison to source's.

Ghassan: Sorry if I did not include a clear enough description of the numbers
meanings. Let me explain that more precisely:
First of all, the "source" scheduler was indeed run for 9 iterations
(which took about 2 days), and that was our baseline. All the numbers in the
execution-time table are percentage differences relative to "source".
Of course, there were random variations in the numbers, but we did the standard
SPEC practice of taking the median. For most benchmarks, the random variation
was not significant. There was one particular benchmark though (libquantum), on
which we thought that the random variation is too large to make a meaningful
comparison, and therefore we decided to exclude that.

The "% Diff Max" and "% Diff Min" numbers reported in our
table are NOT random variations on an individual benchmark. Rather, the "%
Diff Max" for a given heuristic is the percentage difference
(in median scores) between this heuristic and source heuristic for the benchmark
on which this heuristic gave its the biggest *gain* relative to source.
Similarly, the "% Diff Min" for a given heuristic is the percentage
difference
(in median scores) between this heuristic and source heuristic for the benchmark
on which this heuristic gave its biggest *degradation* relative to source. So,
they are for two different benchmarks. The point in giving these numbers is to
show that, although the geometric-mean differences may look small, the
differences on individual benchmarks were quite significant. I can provide more
detailed numbers for all benchmarks if people are interested. I can post those
on our web site or any benchmarking page that LLVM may have.       

Most of the above performance differences have been correlated with significant
changes in spill counts in hot functions.

Which is a beautiful correlation between spill-rate and performance, showing
that your metrics are at least reasonably accurate, for all purposes.

We should
probably report this as a performance bug if ILP stays the default scheduler on
x86-64.

You should, regardless of what's the default choice.

cheers,

--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130919/c36541d0/attachment.html>

Andrew Trick

2013-Sep-24 06:10 UTC

head link

[LLVMdev] MI Scheduler Update (was Experimental Evaluation of the Schedulers in LLVM 3.3)

On Sep 17, 2013, at 11:04 AM, Ghassan Shobaki <ghassan_shobaki at
yahoo.com> wrote:
> 1. The SD schedulers significantly impact the spill counts and the
execution times for many benchmarks, but the machine instruction (MI) scheduler
in 3.3 has very limited impact on both spill counts and execution times. Is this
because most of you work on MI did not make it into the 3.3 release?
Ghassan, and anyone else interested in the scheduler:

This is a good time for me to give a thourough update of the MI scheduler.
Hopefully that will answer many of your questions.

Some important things changed between the time I introduced the MI scheduler a
year ago, and the release of 3.3. The biggest change was loop vectorization,
which reduces register pressure and somewhat preschedules loops. Since 3.3 was
released, the generic MI scheduler's heuristics were reevaluated in
preparation for making it the default for targets without a custom scheduling
strategy--more on that later. The source order scheduler was also fixed so that
it actually preserves IR order, which is at least closer to source order.

For many benchmarks we've looked at, source order scheduling approaches the
lower bound on register pressure--heuristics can only hurt--making it difficult
to distinguish between a lucky scheduler and a good scheduler.

It's not surprising that SelectionDAG scheduling with BURR reduces spill
code on average. It is fully register pressure aggressive. It gives highest
priority to Sethi-Ullman number, which is typically nonsense, but does prevent
some of the worst register pressure situations. It then does an expensive check
to determine the shortest live range. This is also inaccurate, but on average
reduces pressure.

The reason we switched from BURR to ILP a couple years ago was that although
BURR is very aggressive, it is not very smart. Giving highest priority to
inaccurate heuristics means generating pathologically bad schedules for some
class of code. Regardless of how the programmer wrote the code, or what earlier
passes have done, it will reschedule everything, fully serializing dependence
chains. At that time, we noticed horrible performance on some crypto benchmarks.
We decided to pay a small price in spill code for avoiding worst-case
performance. We also realized after performance anlaysis, that incrementally
tuning these heuristics to avoid test-suite regressions was not leading toward
an overall better scheduler for real programs. We decided that, since some
targets need an MI-level scheduler anyway, we should redirect efforts into that
project.

The high-level design goal of MI scheduler is to allow subtargets to plug in
custom scheduling strategies, while providing a "safe" generic
scheduler. The generic scheduler is safe in that it preserves instruction order
until it detects a performance problem according to the subtarget's machine
model. This is a nice feature. It means that the scheduler should not often
introduce a performance problem that did not already exist, and it makes the
scheduled code much easier to understand and debug. So the close correlation
between source order and MI scheduler is natural. In fact, you'll find that,
when scheduling for SandyBridge, the scheduler seldom perturbs the instruction
sequence. This is a fundamental departure from the conventional approach of
scheduling for out-of-order processors as if they execute in-order.

This does raise a difficult challenge of how the scheduler can know when the
out-of-order processor is likely to stall. The new machine model has enough
information to roughly estimate stalls if a long enough execution trace can be
fed through it. However, for very heavily out-of-order processors (Nehalem+) it
is extremely rare for acyclic code to saturate any resources. As a cheap,
partial solution, the MI scheduler now computes the cyclic critical path,
allowing it to estimate.

One major advantage of the MI scheduler is that it models register
pressure with almost perfect precision. This is great for analyzing register
pressure, but by itself isn't a solution, and greedy heuristics are often
unable to solve the problem without backtracking. The difficulty hasn't been
thinking of new heuristics and solving individual cases. Rather, finding a
strong justification to add cost and complexity to the scheduler.

A month ago, Arnold Schwaighofer and I investigated this issue. We didn't do
this because spilling was a serious performance problem, but because the
performance of the scheduler is annoyingly random when governed by greedy
heuristics. If the scheduler always did the right thing, that would simplify
performance tracking. We were able to solve each individual case with some
combination of heuristics. The most efficient approach I've found so far
involves partitioning the DAG into subtrees (see computeDFSResult--I think the
implementation of subtree is still somewhat flawed though). We've tried
biased scheduling by subtree, computing Sethi-Ullman numbers according to the
subtree partition, and tracking live-ins that are reachable from dag nodes,
among other things.

Ultimately, we decided not to enable any of these techniques in the generic
scheduler--targets are still free to do what they like. The problem is that
there are always cases in which these cheap heuristics do the wrong thing. So,
while we could engineer good results for SPEC, we would not be solving the
underlying problem of unstable scheduling heuristics. Given the primary goals of
reducing compile time and maintaining instruction order unless performance is at
stake, the bar for adding heuristics is high. Complicating the heuristics now
also means making them harder to understand and improve in the future.

I would like to see a general solution to scheduling for register pressure. I
had plenty of ideas for more ad-hoc heuristcs within the bounds of list
scheduling, but given that we haven't dominstrated the value of simple
heuristics, I don't want to pursue anything more complicated. I think better
solutions will have to transcend list scheduling. I do like to the idea of
constraining the DAG prior to scheduling [Touati, "Register Saturation in
Superscalar and VLIW Codes", CC 2001], because that entirely separates the
problem from list scheduler heuristics. However, I won't be able to justify
adding more complexity, beyond list scheduling heuristics, to the LLVM codebase
to solve this problem. Work in this area would need to be done as side project.
I don't expect to do any more work on it.

In my next message I'll explain the near-term plans for the scheduler.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130923/f8811d2d/attachment.html>

Andrew Trick

2013-Sep-24 06:11 UTC

head link

[LLVMdev] Enabling MI Scheduler on x86 (was Experimental Evaluation of the Schedulers in LLVM 3.3)

In my last message, I explained the goals of the generic MI scheduler and
current status. This week, I'll see if we can enable MI scheduling by
default for x86. I'm not sure which flags you're using to test it now.
But by making it default and enabling the corresponding coalescer changes, we
can be confident that benchmarking efforts are improving on the same baseline.
At that point, I expect bugs to be filed for specific instances of badly
scheduled code. Getting a fix committed may not be easy, because we have to show
that new heuristics aren't likely to pessimize other code. But at least
I'll be able to provide an explanation for why MI isn't currently
handling it.

There are other reasons that MI sched should be enabled now on x86 anyway:

(1) The Selection DAG scheduler will be disabled as soon as I can implement a
complete replacement. That should eliminate about 10% of codegen (llc) compile
time. The Selection DAG scheduler has also long suffered from unnacceptable
worst-cast compile time behavior and unresolved defects. We been chipping away
at the problems, but some remain: PR15941, PR16365. This is a fundamentally bad
place to perform scheduling.

(2) The postRA scheduler will also be eliminated. That will eliminate another
10% of compile time for targets that currently enable it. It also eliminates a
maintenance problem because its dependence on kill flags and implicit operands
is frightening--these can easily be valid for some targets but not others.

(3) Non-x86 targets have been using MI sched for the past year to achieve
important performance and compile time benefits. For quality and maintenance
reasons, we should use the same scheduling infrastructure for mainstream
targets.

The basic theme here is that we want a single scheduling infrastructure that is
efficient enough to enable by default--even if it is typically
performance-neutral, can leverage verification across many targets, and can be
safely customized by plugging in heuristics.

-Andy

Andrew Trick

2013-Sep-24 06:22 UTC

head link

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

On Sep 17, 2013, at 11:04 AM, Ghassan Shobaki <ghassan_shobaki at
yahoo.com> wrote:> 1. The SD schedulers significantly impact the spill counts and the
execution times for many benchmarks, but the machine instruction (MI) scheduler
in 3.3 has very limited impact on both spill counts and execution times. Is this
because most of you work on MI did not make it into the 3.3 release? We
don't have a strong motivation to test the trunk at this point (we'll
wait for 3.4), because we are working on a publication and prefer to base that
on an official release. However, if you tell me that you expect things to be
significantly different in the trunk, we'll try to find the time to give
that a shot (unfortunately, we only have one test machine, and SPEC tests take a
lot of time as detailed below).
...>  
> Most of the above performance differences have been correlated with
significant changes in spill counts in hot functions. Note that the ILP
scheduler causes a degradation of 22% on one benchmark (lbm) relative to the
source scheduler. We have verified that this happens because of poor scheduling
that increases the register pressure and thus leads to generating excessive
spills in this benchmark’s hottest loop. We should probably report this as a
performance bug if ILP stays the default scheduler on x86-64.
The life of ILP as default scheduler is numbered in days.

Regarding your publication. It's fine to use BURR with 3.3 and call it the
best LLVM scheduler for your set of benchmarks. Even though the default MI
scheduler on trunk has improved, I doubt it will beat BURR at average register
pressure reduction (as I mentioned in a previous email, we decided not to enable
global register pressure heuristics). I did notice a 10% improvement on 470.lbm
at some point after 3.3, but you would need to verify.

In 3.4, the SD scheduler may still be available, but won't be maintained. So
I recommend using MI scheduler for baselines.

Note that you can easily plug in alternate scheduling heuristics using the MI
scheduler. It wouldn't be hard to implement a BURR strategy for the new
scheduler.

For the purpose of evaluating register reduction, I agree that your spill metric
is a reasonable indicator. It is true that we will readily trade a single spill
inside a loop for multiple spills outside. But it seems good enough for
evaluating different versions of the scheduler before gathering your final
numbers for publication. That gets you results in minutes instead of days.

I’m not really sure why spills have such an impact on your SPEC2006 performance,
whether it’s critical path load latency, load bandwidth (SandyBridge has doubled
this), decode bandwidth, or even loop stream detector capacity.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130923/97259286/attachment.html>

Chandler Carruth

2013-Sep-24 06:36 UTC

head link

[LLVMdev] Enabling MI Scheduler on x86 (was Experimental Evaluation of the Schedulers in LLVM 3.3)

On Tue, Sep 24, 2013 at 1:11 AM, Andrew Trick <atrick at apple.com> wrote:
> This week, I'll see if we can enable MI scheduling by default for x86.
I'm
> not sure which flags you're using to test it now. But by making it
default
> and enabling the corresponding coalescer changes, we can be confident that
> benchmarking efforts are improving on the same baseline.

While I'm generally really excited by this, I would ask for a bit more
staging of this change.

Specifically, I would really like for a single, clear switch to enable
exactly what you want benchmark data on *before* it becomes the default,
and to give various folks time to run benchmarks and report serious
regressions.

I don't want our ability to ship LLVM from top-of-tree to be seriously
impaired by this, and enabling a feature that can have dramatic performance
impact without a giving folks a really simple way to try it out and a
period of time to run benchmarks and collect data seems to do that. =/

Once it is the default, it would be really good to leave in the single,
simple switch for a period of time for folks to disable it if need be.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130924/326463c8/attachment.html>

Ghassan Shobaki

2013-Sep-25 23:31 UTC

head link

[LLVMdev] MI Scheduler Update (was Experimental Evaluation of the Schedulers in LLVM 3.3)

Andy,

Thank you for the explanation! Using a statistical approach, we have also come
to the conclusion that it is extremely hard to find one good register pressure
heuristic that works well in all cases or even a large enough percentage of the
cases. In our statistical study, we applied about 20 different heuristics to the
7216 functions with spills in FP2006. The BURR heuristic gave the best overall
result (total sum of spills), but it was the best heuristic on only 64% of the
functions. This means that on more than one third of the functions, BURR
resulted in extra spills relative to the best heuristic for each function.

So, I agree that a greedy heuristic cannot give a general solution to this
problem. We have been exploring combinatorial approaches to the problem, but the
problem with our current algorithm is that the scheduling cost function, which
is the peak excess register pressure (PERP), does not always correlate well with
the amount of spill code. That's because in a sufficiently large basic
block, you may minimize the peak register pressure in the block but still have
unnecessarily high register pressure at non-peak points in the block (see
Section 5 in our recent paper: Preallocation Instruction Scheduling with
Register Pressure Minimization using a Combinatorial Approach, ACM TACO, V10,
Issue 3, 2013, with doi10.1145/2512432). We are currently exploring alternate
cost functions. However, we do not expect any combinatorial algorithm that will
come out of this work to be fast enough to become the default scheduling
algorithm in a production compiler in the
 present or even in the near future. It may be a good reference for evaluating
heuristics though.

For the time being, and until a good general solution to the problem is
discovered, I think it makes sense for an open-source compiler to only support,
by default, a basic scheduler that uses the minimum amount of compile time. Then
people who are interested in optimizing the performance of a specific
application or the performance of their hardware on a specific benchmark suite
can write more complex heuristics if necessary and tune them for their target
programs. It'd be nice though to have multiple heuristics optionally
available in addition to the default heuristic.

Regards
Ghassan



________________________________
 From: Andrew Trick <atrick at apple.com>
To: Ghassan Shobaki <ghassan_shobaki at yahoo.com> 
Cc: "llvmdev at cs.uiuc.edu" <llvmdev at cs.uiuc.edu> 
Sent: Tuesday, September 24, 2013 9:10 AM
Subject: Re: MI Scheduler Update (was Experimental Evaluation of the Schedulers
in LLVM 3.3)
 




On Sep 17, 2013, at 11:04 AM, Ghassan Shobaki <ghassan_shobaki at
yahoo.com> wrote:

1. The SD schedulers significantly impact the spill counts and the execution
times for many benchmarks, but the machine instruction (MI) scheduler in 3.3 has
very limited impact on both spill counts and execution times. Is this because
most of you work on MI did not make it into the 3.3 release?

Ghassan, and anyone else interested in the scheduler:

This is a good time for me to give a thourough update of the MI scheduler.
Hopefully that will answer many of your questions.

Some important things changed between the time I introduced the MI scheduler a
year ago, and the release of 3.3. The biggest change was loop vectorization,
which reduces register pressure and somewhat preschedules loops. Since 3.3 was
released, the generic MI scheduler's heuristics were reevaluated in
preparation for making it the default for targets without a custom scheduling
strategy--more on that later. The source order scheduler was also fixed so that
it actually preserves IR order, which is at least closer to source order.

For many benchmarks we've looked at, source order scheduling approaches the
lower bound on register pressure--heuristics can only hurt--making it difficult
to distinguish between a lucky scheduler and a good scheduler.

It's not surprising that SelectionDAG scheduling with BURR reduces spill
code on average. It is fully register pressure aggressive. It gives highest
priority to Sethi-Ullman number, which is typically nonsense, but does prevent
some of the worst register pressure situations. It then does an expensive check
to determine the shortest live range. This is also inaccurate, but on average
reduces pressure.

The reason we switched from BURR to ILP a couple years ago was that although
BURR is very aggressive, it is not very smart. Giving highest priority to
inaccurate heuristics means generating pathologically bad schedules for some
class of code. Regardless of how the programmer wrote the code, or what earlier
passes have done, it will reschedule everything, fully serializing dependence
chains. At that time, we noticed horrible performance on some crypto benchmarks.
We decided to pay a small price in spill code for avoiding worst-case
performance. We also realized after performance anlaysis, that incrementally
tuning these heuristics to avoid test-suite regressions was not leading toward
an overall better scheduler for real programs. We decided that, since some
targets need an MI-level scheduler anyway, we should redirect efforts into that
project.

The high-level design goal of MI scheduler is to allow subtargets to plug in
custom scheduling strategies, while providing a "safe" generic
scheduler. The generic scheduler is safe in that it preserves instruction order
until it detects a performance problem according to the subtarget's machine
model. This is a nice feature. It means that the scheduler should not often
introduce a performance problem that did not already exist, and it makes the
scheduled code much easier to understand and debug. So the close correlation
between source order and MI scheduler is natural. In fact, you'll find that,
when scheduling for SandyBridge, the scheduler seldom perturbs the instruction
sequence. This is a fundamental departure from the conventional approach of
scheduling for out-of-order processors as if they execute in-order.

This does raise a difficult challenge of how the scheduler can know when the
out-of-order processor is likely to stall. The new machine model has enough
information to roughly estimate stalls if a long enough execution trace can be
fed through it. However, for very heavily out-of-order processors (Nehalem+) it
is extremely rare for acyclic code to saturate any resources. As a cheap,
partial solution, the MI scheduler now computes the cyclic critical path,
allowing it to estimate.

One major advantage of the MI scheduler is that it models register
pressure with almost perfect precision. This is great for analyzing register
pressure, but by itself isn't a solution, and greedy heuristics are often
unable to solve the problem without backtracking. The difficulty hasn't been
thinking of new heuristics and solving individual cases. Rather, finding a
strong justification to add cost and complexity to the scheduler.

A month ago, Arnold Schwaighofer and I investigated this issue. We didn't do
this because spilling was a serious performance problem, but because the
performance of the scheduler is annoyingly random when governed by greedy
heuristics. If the scheduler always did the right thing, that would simplify
performance tracking. We were able to solve each individual case with some
combination of heuristics. The most efficient approach I've found so far
involves partitioning the DAG into subtrees (see computeDFSResult--I think the
implementation of subtree is still somewhat flawed though). We've tried
biased scheduling by subtree, computing Sethi-Ullman numbers according to the
subtree partition, and tracking live-ins that are reachable from dag nodes,
among other things.

Ultimately, we decided not to enable any of these techniques in the generic
scheduler--targets are still free to do what they like. The problem is that
there are always cases in which these cheap heuristics do the wrong thing. So,
while we could engineer good results for SPEC, we would not be solving the
underlying problem of unstable scheduling heuristics. Given the primary goals of
reducing compile time and maintaining instruction order unless performance is at
stake, the bar for adding heuristics is high. Complicating the heuristics now
also means making them harder to understand and improve in the future.

I would like to see a general solution to scheduling for register pressure. I
had plenty of ideas for more ad-hoc heuristcs within the bounds of list
scheduling, but given that we haven't dominstrated the value of simple
heuristics, I don't want to pursue anything more complicated. I think better
solutions will have to transcend list scheduling. I do like to the idea of
constraining the DAG prior to scheduling [Touati, "Register Saturation in
Superscalar and VLIW Codes", CC 2001], because that entirely separates the
problem from list scheduler heuristics. However, I won't be able to justify
adding more complexity, beyond list scheduling heuristics, to the LLVM codebase
to solve this problem. Work in this area would need to be done as side project.
I don't expect to do any more work on it.

In my next message I'll explain the near-term plans for the scheduler.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130925/4e8970d7/attachment.html>

Stefan Hepp

2013-Sep-26 13:33 UTC

head link

[LLVMdev] Enabling MI Scheduler on x86 (was Experimental Evaluation of the Schedulers in LLVM 3.3)

Hi,

Thanks for your explanations!

How is the big picture for supporting in-order VLIW architectures and 
the like though?

I am asking because I am currently implementing instruction scheduling 
in our own backend for our custom Patmos processor, for which I need to 
support both branch delay slots and bundles, some restrictions regarding 
bundles.
For the moment, I am quite happy with a simple bottom-up basic-block 
scheduler. I tried to use a combination of the DFAPacketizer and a 
simple delay-slot-filler pass first, but the results are quite bad, both 
in terms of performance and in terms of maintainability/code quality.

I found that currently the PostRA scheduler is nearly similar to the MI 
Scheduler, except that it uses the Anti-Dep-Breaker instead of live 
register tracking and that it is not customizable, while the MI 
scheduler cannot be run post-RA due to the dependency on the live 
variable analysis which requires SSA code.

We would like to be able to schedule spill code and prologue/epilogue 
code (and if-converted code, which is currently required to be post-RA, 
I think?). Hence, I basically created a new post-RA scheduler similar to 
MI scheduler, which does bundling and handles delay slots and NOOP 
insertion. The downside is that there is a lot of code duplication, 
since the MI scheduler usually uses ScheduleDAGMI and not the more 
generic ScheduleDAGInstr at the interfaces.


So here are my questions:

- Are there any plans for a (more generic) post-RA scheduler 
replacement, or the possibility to run the MI scheduler post-RA (i.e., 
without live variable analysis depenency), or is simply creating a 
completely separate pass based on ScheduleDAGInstr the 'official' way to
handle hardware with no hazard detection?

- Is the MI scheduler supposed to create bundles (there is no support 
for this now as far as I can see, and some passes might need to break up 
some bundles later on) or should this only be done post-RA? Is the 
register allocator (supposed to be) able to handle bundles, or should 
the MI scheduler just order the instructions in the right sequence 
without actually creating bundles (which might cause some live ranges to 
seem to overlap when in fact they don't)?

- Will there be any generic pass or framework for filling delay slots 
and inserting NOOPs on hazards, similar to post-RA-sched, or has this 
always to be a custom scheduler/delay-slot-filler/...?

- And slightly unrelated: At which point should/must bundles be finalized?


Kind regards,
  Stefan

On 2013-09-24 08:11, Andrew Trick wrote:> In my last message, I explained the goals of the generic MI scheduler and
current status. This week, I'll see if we can enable MI scheduling by
default for x86. I'm not sure which flags you're using to test it now.
But by making it default and enabling the corresponding coalescer changes, we
can be confident that benchmarking efforts are improving on the same baseline.
At that point, I expect bugs to be filed for specific instances of badly
scheduled code. Getting a fix committed may not be easy, because we have to show
that new heuristics aren't likely to pessimize other code. But at least
I'll be able to provide an explanation for why MI isn't currently
handling it.
>
> There are other reasons that MI sched should be enabled now on x86 anyway:
>
> (1) The Selection DAG scheduler will be disabled as soon as I can implement
a complete replacement. That should eliminate about 10% of codegen (llc) compile
time. The Selection DAG scheduler has also long suffered from unnacceptable
worst-cast compile time behavior and unresolved defects. We been chipping away
at the problems, but some remain: PR15941, PR16365. This is a fundamentally bad
place to perform scheduling.
>
> (2) The postRA scheduler will also be eliminated. That will eliminate
another 10% of compile time for targets that currently enable it. It also
eliminates a maintenance problem because its dependence on kill flags and
implicit operands is frightening--these can easily be valid for some targets but
not others.
>
> (3) Non-x86 targets have been using MI sched for the past year to achieve
important performance and compile time benefits. For quality and maintenance
reasons, we should use the same scheduling infrastructure for mainstream
targets.
>
> The basic theme here is that we want a single scheduling infrastructure
that is efficient enough to enable by default--even if it is typically
performance-neutral, can leverage verification across many targets, and can be
safely customized by plugging in heuristics.
>
> -Andy
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Svyatoslav Kuzmich

2013-Sep-28 11:04 UTC

head link

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

Hi Andy,

Are there plans to change the default scheduler for ARM targets in 3.4?

-Slava



--
View this message in context:
http://llvm.1065342.n5.nabble.com/MI-Scheduler-vs-SD-Scheduler-tp58975p61607.html
Sent from the LLVM - Dev mailing list archive at Nabble.com.

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Sep 2013 - [LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

[LLVMdev] MI Scheduler Update (was Experimental Evaluation of the Schedulers in LLVM 3.3)

[LLVMdev] Enabling MI Scheduler on x86 (was Experimental Evaluation of the Schedulers in LLVM 3.3)

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

[LLVMdev] Enabling MI Scheduler on x86 (was Experimental Evaluation of the Schedulers in LLVM 3.3)

[LLVMdev] MI Scheduler Update (was Experimental Evaluation of the Schedulers in LLVM 3.3)

[LLVMdev] Enabling MI Scheduler on x86 (was Experimental Evaluation of the Schedulers in LLVM 3.3)

[LLVMdev] Experimental Evaluation of the Schedulers in LLVM 3.3

Seemingly Similar Threads