Hi,
I have tested following with FreeBSD 5.3-Stable.
On several different PCs I have used
make -j$n buildworld
with $n ranging from 1 to 9.
Although people suggest "-j4" as optimal in general
case, I have come to a very different conclusion:
1) single CPU with enough RAM (2 GHz, 512 MB)
there's no significant speed up in the range
"-j1" to "-j9".
So "-j1" is as good as "-j9".
2) single CPU with little RAM (333 MHz, 64 MB)
speed slows down rapidly from "-j1" to "-j9",
because of intensive swapping.
So "-j1" performs best in this case.
3) dual CPU with enough RAM (2 x 800 MHz, 1GB)
speed up by almost two from "-j1" to "-j2",
but after that no noticeable speed up anymore.
So "-j2" is as good as "-j9".
----------------------------------------
With these simple tests, I come to the conclusion that
"make -j$n buildworld" is best with n = number of CPUs.
Does that make sense?
Rob.
> With these simple tests, I come to the conclusion that > "make -j$n buildworld" is best with n = number of CPUs. > Does that make sense?Yes, I believe this makes sense. The recommendations made in the handbook (n >= 4) date back from the time when IO was the bottleneck in the compilation process i.e. one needed to run multiple build processes in parallel just to keep the cpu busy. Today this is no longer true, so it's optimal to use n = number of CPUs. Not sure about Intel cpus with HT enabled though... regards, Derkjan
Rob wrote:> I have tested following with FreeBSD 5.3-Stable. > > On several different PCs I have used > make -j$n buildworld > with $n ranging from 1 to 9. > > Although people suggest "-j4" as optimal in general > case, I have come to a very different conclusion: > > 1) single CPU with enough RAM (2 GHz, 512 MB) > there's no significant speed up in the range > "-j1" to "-j9". > So "-j1" is as good as "-j9". > > 2) single CPU with little RAM (333 MHz, 64 MB) > speed slows down rapidly from "-j1" to "-j9", > because of intensive swapping. > So "-j1" performs best in this case. > > 3) dual CPU with enough RAM (2 x 800 MHz, 1GB) > speed up by almost two from "-j1" to "-j2", > but after that no noticeable speed up anymore. > So "-j2" is as good as "-j9". > > ---------------------------------------- > > With these simple tests, I come to the conclusion that > "make -j$n buildworld" is best with n = number of CPUs. > Does that make sense?I believe the current recommendation for SMP machines is to use "-j(n+1)", so I've been using "-j3" on my dual CPU machines. This ensures there is always a task waiting to be executed when one of the CPUs completes a job. Note that a parallel build (using "-j") can cause problems when compiling some programs (in fact, it didn't work for the kernel last time I checked). For my UP machines I don't use "-j" at all. If you read the make(1) man page, you'll note that omitting "-j" implies "-B". As you already found, with a single CPU there is no speed gained by using "-j" and without it I get the assurance of running in compatibility mode. Jon
Did you try any machines that used Hyperthreading? I'd be interested to see how those machines fare based on the number of logical and real CPUs.> Although people suggest "-j4" as optimal in general > case, I have come to a very different conclusion: > > 1) single CPU with enough RAM (2 GHz, 512 MB) > there's no significant speed up in the range > "-j1" to "-j9". > So "-j1" is as good as "-j9".If you went to all that trouble, you might as well post the numbers :-)> 2) single CPU with little RAM (333 MHz, 64 MB) > speed slows down rapidly from "-j1" to "-j9", > because of intensive swapping. > So "-j1" performs best in this case.This is expected. A note should probably be added to the handbook giving rough approximations of how much memory per simultaneous process is necessary for optimal performance. I'd guess 48MB * p + c, where c = the machine's memory load while idle and p = the number of compile processes (most don't take nearly that much memory, but c++ can gobble it)> 3) dual CPU with enough RAM (2 x 800 MHz, 1GB) > speed up by almost two from "-j1" to "-j2", > but after that no noticeable speed up anymore. > So "-j2" is as good as "-j9".Again, you went to the trouble, post the numbers?> With these simple tests, I come to the conclusion that > "make -j$n buildworld" is best with n = number of CPUs. > Does that make sense?Sort of. It depends on more than just the number of CPUs. IO speed is also very important. If you're using NFS over non-gigabit ethernet or to a slow NFS server, it's worth ratcheting the number of threads up. The same would go for old slow disks, or if you have /usr/src union-mounted from a cdrom drive, etc. Also disk layout: having /usr/src on a different drive from /usr/obj can speed up the IO-bound portions of the process a great deal by eliminating contention. If you do less waiting for IO, adding more threads has a less pronounced or even negative effect due to cpu contention instead of the positive "work while the other thread waits on IO" effect. This is the basic underlying principle, which the handbook doesn't really point out. Seems to me the pluses and minuses of increasing n are: + More chances to do work when other processes are waiting on IO. - CPU contention resulting in context switches and other wasted cycles due to extra scheduling overhead (probably negligible, maybe significant with high HZ in kernel config). - Memory contention (aka usage). It might be worth decreasing the number recommended somewhat, but I think j = ncpu is too small for a general recommendation, because unless you are memory tight there is very little harm in increasing the number. I'd suspect j = 2 * ncpu or even j = ncpu + 1 are better rules of thumb. A better formula would take average IO thruput and latency rates from bonnie++, amount of available memory, and the number and speed of cpus. A perl script that measures these numbers and determines the optimal setting is left as an excersize to the reader. Extra credit - code it in C and get it integrated in -CURRENT so that "make buildworld" automagically calls "make -j=$n real_buildworld" with the optimal value of n :-) My results, for what it's worth: Specs: Athlon XP 2500+, 512M of 333MHz DDR ram. /usr/obj is a gvinum raid0 (striped) volume of two SATA disks. /usr/src is on a gvinum raid1 (mirrored) volume of two PATA disks. options HZ=1000 in the kernel config, pretty vanilla besides that.. in make.conf: CFLAGS=-O2 -pipe -march=athlon-xp CXXFLAGS empty due to a bug with memoization last time i tried a compile... make -j1 buildworld: real 64m54.298s user 52m56.915s sys 9m13.041s make -j2 buildworld: real 67m55.816s user 56m20.778s sys 10m20.247s make -j3 buildworld: real 70m53.936s user 59m2.447s sys 10m43.325s make -j4 buildworld: real 72m25.904s user 60m19.098s sys 10m59.492s -- Brian Szymanski ski@indymedia.org
Nick Barnes
2004-Nov-25 02:18 UTC
port make index (was: Re: make -j$n buildworld : use of -j investigated)
On Thu, 25 Nov 2004 16:19:02 +0900, Rob <spamrefuse@yahoo.com> wrote:> > time(minutes) * speed(MHz) * nproc / 1000 MHzLooking at your examples, it seems you divide by 1e5, not by 1000. In other words, buildworld is CPU bound and takes about 6e12 clock cycles. Use -j<nproc>. Nick B
I read this thread with interest and saw the question, how the system
wil behave with hyperthreading. Should I not benchmark my system?
here you have the results. The interpretation is left to the experts.
IMHO HT is not as useless as expected. :-)
I did not switch off SMP with sysctl, but used an extra UP Kernel to
allow some optimizations during compile. But I don't know if there
are any..
Hardware is
CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz (2798.66-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0xf33 Stepping = 3
Hyperthreading: 2 logical CPUs
real memory = 1072889856 (1023 MB)
avail memory = 1040453632 (992 MB)
Two 117246MB <Maxtor 6Y120M0/YAR51HW0> [238216/16/63] at ata2-master
SATA150, one is on /usr/src, the other on /usr/obj.
SMP Kernel 5.3-STABLE, nearly GENERIC, option SMP, some drivers removed
=========Fri Nov 26 13:58:54 CET 2004
Start make -j 5 -DNOGAMES buildworld
32m51.01s real 50m51.61s user 11m33.17s sys
47540 maximum resident set size
2834 average shared memory size
1441 average unshared data size
128 average unshared stack size
21126270 page reclaims
531 page faults
0 swaps
17576 block input operations
2747 block output operations
0 messages sent
0 messages received
25289 signals received
426800 voluntary context switches
519922 involuntary context switches
Fri Nov 26 14:31:45 CET 2004
END make -j 5 -DNOGAMES buildworld
Fri Nov 26 14:31:45 CET 2004
Start make -j 4 -DNOGAMES buildworld
32m36.07s real 50m59.94s user 11m23.83s sys
47540 maximum resident set size
2843 average shared memory size
1444 average unshared data size
128 average unshared stack size
20968831 page reclaims
471 page faults
0 swaps
1572 block input operations
2625 block output operations
0 messages sent
0 messages received
24577 signals received
399521 voluntary context switches
499416 involuntary context switches
Fri Nov 26 15:04:21 CET 2004
END make -j 4 -DNOGAMES buildworld
Fri Nov 26 15:04:21 CET 2004
Start make -j 3 -DNOGAMES buildworld
32m30.77s real 50m48.61s user 11m23.91s sys
47540 maximum resident set size
2839 average shared memory size
1443 average unshared data size
128 average unshared stack size
20968366 page reclaims
408 page faults
0 swaps
1500 block input operations
2638 block output operations
0 messages sent
0 messages received
24902 signals received
406593 voluntary context switches
494799 involuntary context switches
Fri Nov 26 15:36:52 CET 2004
END make -j 3 -DNOGAMES buildworld
Fri Nov 26 15:36:52 CET 2004
Start make -j 2 -DNOGAMES buildworld
32m54.63s real 50m7.62s user 11m6.64s sys
47540 maximum resident set size
2846 average shared memory size
1449 average unshared data size
128 average unshared stack size
20968367 page reclaims
408 page faults
0 swaps
1500 block input operations
2610 block output operations
0 messages sent
0 messages received
25218 signals received
415829 voluntary context switches
484130 involuntary context switches
Fri Nov 26 16:09:46 CET 2004
END make -j 2 -DNOGAMES buildworld
Fri Nov 26 16:09:46 CET 2004
Start make -j 1 -DNOGAMES buildworld
39m19.52s real 31m57.60s user 8m27.33s sys
47540 maximum resident set size
2724 average shared memory size
1411 average unshared data size
127 average unshared stack size
20969173 page reclaims
408 page faults
0 swaps
1500 block input operations
2620 block output operations
0 messages sent
0 messages received
25283 signals received
411973 voluntary context switches
279205 involuntary context switches
Fri Nov 26 16:49:06 CET 2004
END make -j 1 -DNOGAMES buildworld
UP Kernel, the same kernel without option SMP
========Fri Nov 26 17:30:46 CET 2004
Start make -j 3 -DNOGAMES buildworld
38m17.37s real 31m13.04s user 5m47.43s sys
47428 maximum resident set size
2865 average shared memory size
1503 average unshared data size
128 average unshared stack size
20973951 page reclaims
1656 page faults
0 swaps
27380 block input operations
2653 block output operations
0 messages sent
0 messages received
24813 signals received
422752 voluntary context switches
563619 involuntary context switches
Fri Nov 26 18:09:04 CET 2004
END make -j 3 -DNOGAMES buildworld
Fri Nov 26 18:09:04 CET 2004
Start make -j 2 -DNOGAMES buildworld
38m31.50s real 31m9.44s user 5m43.27s sys
47428 maximum resident set size
2867 average shared memory size
1497 average unshared data size
128 average unshared stack size
20973698 page reclaims
408 page faults
0 swaps
1963 block input operations
2593 block output operations
0 messages sent
0 messages received
25191 signals received
403269 voluntary context switches
582855 involuntary context switches
Fri Nov 26 18:47:35 CET 2004
END make -j 2 -DNOGAMES buildworld
Fri Nov 26 18:47:35 CET 2004
Start make -j 1 -DNOGAMES buildworld
37m13.98s real 30m50.79s user 5m36.54s sys
47428 maximum resident set size
2869 average shared memory size
1498 average unshared data size
128 average unshared stack size
20974104 page reclaims
408 page faults
0 swaps
1894 block input operations
2546 block output operations
0 messages sent
0 messages received
25283 signals received
412027 voluntary context switches
640783 involuntary context switches
Fri Nov 26 19:24:49 CET 2004
END make -j 1 -DNOGAMES buildworld
Regards,
Frank
--
Frank Behrens, Osterwieck, Germany
e-mail: <frank@pinky.sax.de>
PGP-key 0x5B7C47ED on public servers available.
I read this thread with interest and saw the question, how the system
wil behave with hyperthreading. Should I not benchmark my system?
here you have the results. The interpretation is left to the experts.
IMHO HT is not as useless as expected. :-)
I did not switch off SMP with sysctl, but used an extra UP Kernel to
allow some optimizations during compile. But I don't know if there
are any..
Hardware is
CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz (2798.66-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0xf33 Stepping = 3
Hyperthreading: 2 logical CPUs
real memory = 1072889856 (1023 MB)
avail memory = 1040453632 (992 MB)
Two 117246MB <Maxtor 6Y120M0/YAR51HW0> [238216/16/63] at ata2-master
SATA150, one is on /usr/src, the other on /usr/obj.
SMP Kernel 5.3-STABLE, nearly GENERIC, option SMP, some drivers removed
=========Fri Nov 26 13:58:54 CET 2004
Start make -j 5 -DNOGAMES buildworld
32m51.01s real 50m51.61s user 11m33.17s sys
47540 maximum resident set size
2834 average shared memory size
1441 average unshared data size
128 average unshared stack size
21126270 page reclaims
531 page faults
0 swaps
17576 block input operations
2747 block output operations
0 messages sent
0 messages received
25289 signals received
426800 voluntary context switches
519922 involuntary context switches
Fri Nov 26 14:31:45 CET 2004
END make -j 5 -DNOGAMES buildworld
Fri Nov 26 14:31:45 CET 2004
Start make -j 4 -DNOGAMES buildworld
32m36.07s real 50m59.94s user 11m23.83s sys
47540 maximum resident set size
2843 average shared memory size
1444 average unshared data size
128 average unshared stack size
20968831 page reclaims
471 page faults
0 swaps
1572 block input operations
2625 block output operations
0 messages sent
0 messages received
24577 signals received
399521 voluntary context switches
499416 involuntary context switches
Fri Nov 26 15:04:21 CET 2004
END make -j 4 -DNOGAMES buildworld
Fri Nov 26 15:04:21 CET 2004
Start make -j 3 -DNOGAMES buildworld
32m30.77s real 50m48.61s user 11m23.91s sys
47540 maximum resident set size
2839 average shared memory size
1443 average unshared data size
128 average unshared stack size
20968366 page reclaims
408 page faults
0 swaps
1500 block input operations
2638 block output operations
0 messages sent
0 messages received
24902 signals received
406593 voluntary context switches
494799 involuntary context switches
Fri Nov 26 15:36:52 CET 2004
END make -j 3 -DNOGAMES buildworld
Fri Nov 26 15:36:52 CET 2004
Start make -j 2 -DNOGAMES buildworld
32m54.63s real 50m7.62s user 11m6.64s sys
47540 maximum resident set size
2846 average shared memory size
1449 average unshared data size
128 average unshared stack size
20968367 page reclaims
408 page faults
0 swaps
1500 block input operations
2610 block output operations
0 messages sent
0 messages received
25218 signals received
415829 voluntary context switches
484130 involuntary context switches
Fri Nov 26 16:09:46 CET 2004
END make -j 2 -DNOGAMES buildworld
Fri Nov 26 16:09:46 CET 2004
Start make -j 1 -DNOGAMES buildworld
39m19.52s real 31m57.60s user 8m27.33s sys
47540 maximum resident set size
2724 average shared memory size
1411 average unshared data size
127 average unshared stack size
20969173 page reclaims
408 page faults
0 swaps
1500 block input operations
2620 block output operations
0 messages sent
0 messages received
25283 signals received
411973 voluntary context switches
279205 involuntary context switches
Fri Nov 26 16:49:06 CET 2004
END make -j 1 -DNOGAMES buildworld
UP Kernel, the same kernel without option SMP
========Fri Nov 26 17:30:46 CET 2004
Start make -j 3 -DNOGAMES buildworld
38m17.37s real 31m13.04s user 5m47.43s sys
47428 maximum resident set size
2865 average shared memory size
1503 average unshared data size
128 average unshared stack size
20973951 page reclaims
1656 page faults
0 swaps
27380 block input operations
2653 block output operations
0 messages sent
0 messages received
24813 signals received
422752 voluntary context switches
563619 involuntary context switches
Fri Nov 26 18:09:04 CET 2004
END make -j 3 -DNOGAMES buildworld
Fri Nov 26 18:09:04 CET 2004
Start make -j 2 -DNOGAMES buildworld
38m31.50s real 31m9.44s user 5m43.27s sys
47428 maximum resident set size
2867 average shared memory size
1497 average unshared data size
128 average unshared stack size
20973698 page reclaims
408 page faults
0 swaps
1963 block input operations
2593 block output operations
0 messages sent
0 messages received
25191 signals received
403269 voluntary context switches
582855 involuntary context switches
Fri Nov 26 18:47:35 CET 2004
END make -j 2 -DNOGAMES buildworld
Fri Nov 26 18:47:35 CET 2004
Start make -j 1 -DNOGAMES buildworld
37m13.98s real 30m50.79s user 5m36.54s sys
47428 maximum resident set size
2869 average shared memory size
1498 average unshared data size
128 average unshared stack size
20974104 page reclaims
408 page faults
0 swaps
1894 block input operations
2546 block output operations
0 messages sent
0 messages received
25283 signals received
412027 voluntary context switches
640783 involuntary context switches
Fri Nov 26 19:24:49 CET 2004
END make -j 1 -DNOGAMES buildworld
Regards,
Frank
--
Frank Behrens, Osterwieck, Germany
e-mail: <frank@pinky.sax.de>
PGP-key 0x5B7C47ED on public servers available.
At 2:08 PM +0900 11/23/04, Rob wrote:>Hi, > >I have tested following with FreeBSD 5.3-Stable. > >On several different PCs I have used > make -j$n buildworld >with $n ranging from 1 to 9. > >Although people suggest "-j4" as optimal in general >case, I have come to a very different conclusion...So, I finally got around to doing some timings on my newest PC. It is a AMD Athlon(tm) XP 3000+ (2166.43-MHz 686-class CPU) with 1-gig of memory, and fast SATA disks. It was certainly different that what I saw on my previous single-CPU systems. Roughly: Real User Sys Max-LA 2670.86 2071.66 543.49 1.25 buildworld 2751.60 2085.95 603.69 1.35 -j1 buildworld 2825.87 2137.19 637.15 5.58 -j2 buildworld 2887.03 2158.60 648.37 11.85 -j3 buildworld 2856.75 2156.48 647.43 19.06 -j4 buildworld 2851.71 2154.39 647.19 25.43 -j5 buildworld 2850.92 2155.40 646.19 31.59 -j6 buildworld 2850.07 2153.77 648.41 36.19 -j7 buildworld 2852.64 2155.74 647.82 47.00 -j8 buildworld 2851.66 2153.43 650.23 53.51 -j9 buildworld (I've actually done multiple runs of each, but they all show about the same numbers). I had a separate session doing 'uptime's every 30 seconds, and the "Max-LA" column is the maximum load-average that was seen by that separate session. Apparently the faster disks made a much bigger difference than I had expected to see. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu