thr3ads.net - freebsd stable - SCHED_ULE should not be the default [Dec 2011]

If this information is useful, please help other people find it:
Share via:

George Mitchell

2011-Dec-09 11:03 UTC

SCHED_ULE should not be the default

dnetc is an open-source program from http://www.distributed.net/.  It
tries a brute-force approach to cracking RC4 puzzles and also computes
optimal Golomb rulers.  It starts up one process per CPU and runs at
nice 20 and is, for all intents and purposes, 100% compute bound.

Here is what happens on my system, running 9.0-PRERELEASE, with and
without dnetc running, with SCHED_ULE and SCHED-4BSD, when I run the
command:

time make buildkernel KERNCONF=WONDERLAND

(I get similar results on 8.x as well.)

SCHED_4BSD, dnetc not running:
1329.715u 123.739s 24:47.95 97.6%	6310+1987k 11233+11098io 419pf+0w

SCHED_4BSD, dnetc running:
1329.364u 115.158s 26:14.83 91.7%	6325+1987k 10912+11060io 393pf+0w

SCHED_ULE, dnetc not running:
1357.457u 121.526s 25:20.64 97.2%	6326+1990k 11234+11149io 419pf+0w

SCHED_ULE, dnetc running:
Still going after seven and a half hours of clock time, up to
compiling netgraph/bluetooth.  (Completed in another five minutes
after stopping dnetc so I could write this message in a reasonable
amount of time.)

Not everybody runs this sort of program, but there are plenty of
similar projects out there, and people who try to participate in
them will be mightily displeased with their FreeBSD systems when
they do.  Is there some case where SCHED_ULE exhibits significantly
better performance than SCHED_4BSD?  If not, I think SCHED-4BSD
should remain the default GENERIC configuration until this is fixed.
-- George Mitchell

Volodymyr Kostyrko

2011-Dec-09 15:08 UTC

head link

SCHED_ULE should not be the default

09.12.2011 13:03, George Mitchell wrote:> dnetc is an open-source program from http://www.distributed.net/. It
> tries a brute-force approach to cracking RC4 puzzles and also computes
> optimal Golomb rulers. It starts up one process per CPU and runs at
> nice 20 and is, for all intents and purposes, 100% compute bound.
nice 20 doesn't mean it should give time to just any other program. Have 
you tried setting dnetc_idprio?
> Not everybody runs this sort of program, but there are plenty of
> similar projects out there, and people who try to participate in
> them will be mightily displeased with their FreeBSD systems when
> they do. Is there some case where SCHED_ULE exhibits significantly
> better performance than SCHED_4BSD? If not, I think SCHED-4BSD
> should remain the default GENERIC configuration until this is fixed.
Not fully right, boinc defaults to run on idprio 31 so this isn't an 
issue. And yes, there are cases where SCHED_ULE shows much better 
performance then SCHED_4BSD. You incidentally found rare misbehavior of 
SCHED_ULE and I think this would be treated.

-- 
Sphinx of black quartz judge my vow.

Attilio Rao

2011-Dec-09 15:48 UTC

head link

SCHED_ULE should not be the default

2011/12/9 George Mitchell
<george+freebsd@m5p.com>:> dnetc is an open-source program from http://www.distributed.net/. ?It
> tries a brute-force approach to cracking RC4 puzzles and also computes
> optimal Golomb rulers. ?It starts up one process per CPU and runs at
> nice 20 and is, for all intents and purposes, 100% compute bound.
>
> Here is what happens on my system, running 9.0-PRERELEASE, with and
> without dnetc running, with SCHED_ULE and SCHED-4BSD, when I run the
> command:
>
> time make buildkernel KERNCONF=WONDERLAND
>
> (I get similar results on 8.x as well.)
>
> SCHED_4BSD, dnetc not running:
> 1329.715u 123.739s 24:47.95 97.6% ? ? ? 6310+1987k 11233+11098io 419pf+0w
>
> SCHED_4BSD, dnetc running:
> 1329.364u 115.158s 26:14.83 91.7% ? ? ? 6325+1987k 10912+11060io 393pf+0w
>
> SCHED_ULE, dnetc not running:
> 1357.457u 121.526s 25:20.64 97.2% ? ? ? 6326+1990k 11234+11149io 419pf+0w
>
> SCHED_ULE, dnetc running:
> Still going after seven and a half hours of clock time, up to
> compiling netgraph/bluetooth. ?(Completed in another five minutes
> after stopping dnetc so I could write this message in a reasonable
> amount of time.)
>
> Not everybody runs this sort of program, but there are plenty of
> similar projects out there, and people who try to participate in
> them will be mightily displeased with their FreeBSD systems when
> they do. ?Is there some case where SCHED_ULE exhibits significantly
> better performance than SCHED_4BSD? ?If not, I think SCHED-4BSD
> should remain the default GENERIC configuration until this is fixed.
Hi George,
are you interested in exploring more the case with SCHED_ULE and dnetc?

More precisely I'd be interested in KTR traces.
To be even more precise:
With a completely stable GENERIC configuration (or otherwise please
post your kernel config) please add the following:
options KTR
options KTR_ENTRIES=262144
options KTR_COMPILE=(KTR_SCHED)
options KTR_MASK=(KTR_SCHED)

While you are in the middle of the slow-down (so once it is well
established) please do:
# sysclt debug.ktr.cpumask=""

In the end go with:
# ktrdump -ctf > ktr-ule-problem.out

and send the file to this mailing list.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

Eitan Adler

2011-Dec-10 01:11 UTC

head link

SCHED_ULE should not be the default

On Fri, Dec 9, 2011 at 6:03 AM, George Mitchell <george+freebsd@m5p.com>
wrote:> dnetc is an open-source program from http://www.distributed.net/. ?It
> tries a brute-force approach to cracking RC4 puzzles and also computes
> optimal Golomb rulers. ?It starts up one process per CPU and runs at
> nice 20 and is, for all intents and purposes, 100% compute bound.
Try idprio as well (atm it requires root to use though).

nice only means "play nice". idprio means "only run when nothing
else
wants to run".

-- 
Eitan Adler

Attilio Rao

2011-Dec-10 01:24 UTC

head link

SCHED_ULE should not be the default

2011/12/10 Eitan Adler <lists@eitanadler.com>:> On Fri, Dec 9, 2011 at 8:15 PM, George Mitchell <george@m5p.com>
wrote:
>> Hope the attached helps. ? ? ? ? ? ? ? ? ? ? ? ? -- George Mitchell
>
> You attached dmesg, not a patch.
This is what is needed for a schedgraph analysis, along with KTR
points collection.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

O. Hartmann

2011-Dec-12 14:09 UTC

head link

SCHED_ULE should not be the default

> Not fully right, boinc defaults to run on idprio 31 so this isn't an
> issue. And yes, there are cases where SCHED_ULE shows much better
> performance then SCHED_4BSD.  [...]
Do we have any proof at hand for such cases where SCHED_ULE performs
much better than SCHED_4BSD? Whenever the subject comes up, it is
mentioned, that SCHED_ULE has better performance on boxes with a ncpu >
2. But in the end I see here contradictionary statements. People
complain about poor performance (especially in scientific environments),
and other give contra not being the case.

Within our department, we developed a highly scalable code for planetary
science purposes on imagery. It utilizes present GPUs via OpenCL if
present. Otherwise it grabs as many cores as it can.
By the end of this year I'll get a new desktop box based on Intels new
Sandy Bridge-E architecture with plenty of memory. If the colleague who
developed the code is willing performing some benchmarks on the same
hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
recent Suse. For FreeBSD I intent also to look for performance with both
different schedulers available.

O.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20111212/d861bac0/signature.pgp

mdf@FreeBSD.org

2011-Dec-14 00:02 UTC

head link

SCHED_ULE should not be the default

On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko <fidaj@ukr.net>
wrote:> ? Wed, 14 Dec 2011 00:04:42 +0100
> Jilles Tjoelker <jilles@stack.nl> ?????:
>
>> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
>> > If the algorithm ULE does not contain problems - it means the
>> > problem has Core2Duo, or in a piece of code that uses the ULE
>> > scheduler. I already wrote in a mailing list that specifically in
>> > my case (Core2Duo) partially helps the following patch:
>> > --- sched_ule.c.orig ? ? ? ?2011-11-24 18:11:48.000000000 +0200
>> > +++ sched_ule.c ? ? 2011-12-10 22:47:08.000000000 +0200
>> > @@ -794,7 +794,8 @@
>> > ? ? ?* 1.5 * balance_interval.
>> > ? ? ?*/
>> > ? ? balance_ticks = max(balance_interval / 2, 1);
>> > - ? balance_ticks += random() % balance_interval;
>> > +// balance_ticks += random() % balance_interval;
>> > + ? balance_ticks += ((int)random()) % balance_interval;
>> > ? ? if (smp_started == 0 || rebalance == 0)
>> > ? ? ? ? ? ? return;
>> > ? ? tdq = TDQ_SELF();
>>
>> This avoids a 64-bit division on 64-bit platforms but seems to have no
>> effect otherwise. Because this function is not called very often, the
>> change seems unlikely to help.
>
> Yes, this section does not apply to this problem :)
> Just I posted the latest patch which i using now...
>
>>
>> > @@ -2118,13 +2119,21 @@
>> > ? ? struct td_sched *ts;
>> >
>> > ? ? THREAD_LOCK_ASSERT(td, MA_OWNED);
>> > + ? if (td->td_pri_class & PRI_FIFO_BIT)
>> > + ? ? ? ? ? return;
>> > + ? ts = td->td_sched;
>> > + ? /*
>> > + ? ?* We used up one time slice.
>> > + ? ?*/
>> > + ? if (--ts->ts_slice > 0)
>> > + ? ? ? ? ? return;
>>
>> This skips most of the periodic functionality (long term load
>> balancer, saving switch count (?), insert index (?), interactivity
>> score update for long running thread) if the thread is not going to
>> be rescheduled right now.
>>
>> It looks wrong but it is a data point if it helps your workload.
>
> Yes, I did it for as long as possible to delay the execution of the code in
section:
> ...
> #ifdef SMP
> ? ? ? ?/*
> ? ? ? ? * We run the long term load balancer infrequently on the first cpu.
> ? ? ? ? */
> ? ? ? ?if (balance_tdq == tdq) {
> ? ? ? ? ? ? ? ?if (balance_ticks && --balance_ticks == 0)
> ? ? ? ? ? ? ? ? ? ? ? ?sched_balance();
> ? ? ? ?}
> #endif
> ...
>
>>
>> > ? ? tdq = TDQ_SELF();
>> > ?#ifdef SMP
>> > ? ? /*
>> > ? ? ?* We run the long term load balancer infrequently on the
>> > first cpu. */
>> > - ? if (balance_tdq == tdq) {
>> > - ? ? ? ? ? if (balance_ticks && --balance_ticks == 0)
>> > + ? if (balance_ticks && --balance_ticks == 0) {
>> > + ? ? ? ? ? if (balance_tdq == tdq)
>> > ? ? ? ? ? ? ? ? ? ? sched_balance();
>> > ? ? }
>> > ?#endif
>>
>> The main effect of this appears to be to disable the long term load
>> balancer completely after some time. At some point, a CPU other than
>> the first CPU (which uses balance_tdq) will set balance_ticks = 0, and
>> sched_balance() will never be called again.
>>
>
> That is, for the same reason as above in the text...
>
>> It also introduces a hypothetical race condition because the access to
>> balance_ticks is no longer restricted to one CPU under a spinlock.
>>
>> If the long term load balancer may be causing trouble, try setting
>> kern.sched.balance_interval to a higher value with unpatched code.
>
> I checked it in the first place - but it did not help fix the situation...
>
> The impression of malfunction rebalancing...
> It seems that the thread is passed on to the same core that is loaded and
so...
> Perhaps this is a consequence of an incorrect definition of the topology
CPU?
>
>>
>> > @@ -2144,9 +2153,6 @@
>> > ? ? ? ? ? ? if
>> >
(TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
>> > tdq->tdq_ridx = tdq->tdq_idx; }
>> > - ? ts = td->td_sched;
>> > - ? if (td->td_pri_class & PRI_FIFO_BIT)
>> > - ? ? ? ? ? return;
>> > ? ? if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
>> > ? ? ? ? ? ? /*
>> > ? ? ? ? ? ? ?* We used a tick; charge it to the thread so
>> > @@ -2157,11 +2163,6 @@
>> > ? ? ? ? ? ? sched_priority(td);
>> > ? ? }
>> > ? ? /*
>> > - ? ?* We used up one time slice.
>> > - ? ?*/
>> > - ? if (--ts->ts_slice > 0)
>> > - ? ? ? ? ? return;
>> > - ? /*
>> > ? ? ?* We're out of time, force a requeue at userret().
>> > ? ? ?*/
>> > ? ? ts->ts_slice = sched_slice;
>>
>> > and refusal to use options FULL_PREEMPTION
>> > But no one has unsubscribed to my letter, my patch helps or not in
>> > the case of Core2Duo...
>> > There is a suspicion that the problems stem from the sections of
>> > code associated with the SMP...
>> > Maybe I'm in something wrong, but I want to help in solving
this
>> > problem ...

Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ?

I don't remember what our specific problem at $WORK was, perhaps it
was just interrupt threads not getting serviced fast enough, but we've
hard-coded this to 1 and removed the code that sets it in
sched_initticks().  The same effect should be had by setting the
sysctl after a box is up.

Thanks,
matthew

Michael Larabel

2011-Dec-15 14:28 UTC

head link

Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

On 12/15/2011 08:26 AM, Sergey Matveychuk wrote:> 15.12.2011 17:36, Michael Larabel ?????:
>> On 12/15/2011 07:25 AM, Stefan Esser wrote:
>>> Am 15.12.2011 11:10, schrieb Michael Larabel:
>>>> No, the same hardware was used for each OS.
>>>>
>>>> In terms of the software, the stock software stack for each OS
was
>>>> used.
>>> Just curious: Why did you choose ZFS on FreeBSD, while UFS2 (with
>>> journaling enabled) should be an obvious choice since it is more 
>>> similar
>>> in concept to ext4 and since that is what most FreeBSD users will
use
>>> with FreeBSD?
>>
>> I was running some ZFS vs. UFS tests as well and this happened to have
>> ZFS on when I was running some other tests.
>>
>
> Can we look at the tests?
> My opinion is ZFS without tuning is much slower than UFS2.
>
http://www.phoronix.com/scan.php?page=news_item&px=MTAyNjg

Eitan Adler

2011-Dec-15 15:43 UTC

head link

SCHED_ULE should not be the default

On Thu, Dec 15, 2011 at 10:32 AM, Steven Hartland
<killing@multiplay.co.uk> wrote:> Lars Engels wrote:
>>
>> 9.0 ships with gcc and clang which both need to be compiled, 8.2 only
>> has gcc.
>
>
> Ahh, any reason we need both, and is it possible to disable clang?
man src.conf
add WITHOUT_CLANG=yes to /etc/src.conf
-- 
Eitan Adler

Attilio Rao

2011-Dec-15 16:26 UTC

head link

SCHED_ULE should not be the default

2011/12/9 George Mitchell
<george+freebsd@m5p.com>:> dnetc is an open-source program from http://www.distributed.net/. ?It
> tries a brute-force approach to cracking RC4 puzzles and also computes
> optimal Golomb rulers. ?It starts up one process per CPU and runs at
> nice 20 and is, for all intents and purposes, 100% compute bound.
[Posting on the first message of the thread]

I basically went through all the e-mail you just sent and identified 4
real report on which we could work on and summarizied in the attached
Excel file.
I'd like that George, Steve, Doug, Andrey and Mike possibly review the
few datas there and add more, if they want, or make more important
clarifications in particular about the Xorg presence (or rather not)
in their workload.

I've readed a couple of message in the thread pointing the finger to
Xorg to be excessively CPU-intensive and I think they are right, we
might try to find a solution for that at some point, but it is really
a very edge case.
Geroge's and Steve's case, instead, look very different from this and
I want to analyze them in detail.
George already provided schedgraph traces and for others, if they
cannot provide them directly, I'd really appreciate they would at
least describe in detail the workload so that I get a chance to
reproduce it.

If someone else thinks he has a specific problem that is not
characterized by one of the cases above please let me know and I will
put this in the chart.

Thanks for the hard work you guys put in pointing out ULE's problem, I
think we will get at the bottom of this if we keep up sharing thoughts
and reports.

Attilio

--
Peace can only be achieved by understanding - A. Einstein

Alexander Best

2011-Dec-19 21:51 UTC

head link

SCHED_ULE should not be the default

On Mon Dec 19 11, Nathan Whitehorn wrote:> On 12/18/11 04:34, Adrian Chadd wrote:
> >The trouble is that there's lots of anecdotal evidence, but
noone's
> >really gone digging deep into _their_ example of why it's broken.
The
> >developers who know this stuff don't see anything wrong. That hints
to
> >me it may be something a little more creepy - as an example, the
> >interplay between netisr/swi/taskqueue/callbacks and such. It may be
> >that something is being starved that isn't obviously obvious.
It's
> >just a stab in the dark, but it sounds somewhat plausible based on
> >what I've seen ULE do in my network throughput hacking.
> >
> >I applaud reppie for trying to make it as easy as possible for people
> >to use KTR to provide scheduler traces for him to go digging with, so
> >please, if you have these issues and you can absolutely reproduce
> >them, please follow his instructions and work with him to get him what
> >he needs.
> 
> The thing I've seen is that ULE is substantially more enthusiastic
about
> migrating processes between cores than 4BSD. Often, this is a good 
> thing, but can increase the rate of cache misses, hurting performance 
> for cache-bound processes (I see this particularly in HPC-type 
> scientific workloads). It might be interesting to add some kind of 
> tunable here.
does r228718 have any impact regarding this behaviour?

cheers.
alex
> 
> Another more interesting and slightly longer-term possibility if someone 
> wants a project would be to integrate scheduling decisions with hwpmc 
> counters, to accumulate statistics on cache hits at each context switch 
> and preferentially keep processes with a high hits/misses ratio on the 
> same thread/cache domain relative to processes with a low one.
> -Nathan
> 
> P.S. The other thing that could be very interesting from a research and 
> scheduling standpoint would be to integrate heterogeneous SMP support 
> into the operating system, with a FreeBSD-4 "Application
Processor"
> syscall model. We seem to be going down the road where GPGPU computing 
> has MMUs, timer interrupts, IPIs, etc. (the next AMD Fusions, IBM Cell). 
> This is something that no operating system currently supports well, and 
> would be a place for BSD to shine. If anyone has a free graduate student...

freebsd stable - Dec 2011 - SCHED_ULE should not be the default

SCHED_ULE should not be the default

SCHED_ULE should not be the default

SCHED_ULE should not be the default

SCHED_ULE should not be the default

SCHED_ULE should not be the default

SCHED_ULE should not be the default

SCHED_ULE should not be the default

Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

SCHED_ULE should not be the default

SCHED_ULE should not be the default

SCHED_ULE should not be the default