Hi, I have tested following with FreeBSD 5.3-Stable. On several different PCs I have used make -j$n buildworld with $n ranging from 1 to 9. Although people suggest "-j4" as optimal in general case, I have come to a very different conclusion: 1) single CPU with enough RAM (2 GHz, 512 MB) there's no significant speed up in the range "-j1" to "-j9". So "-j1" is as good as "-j9". 2) single CPU with little RAM (333 MHz, 64 MB) speed slows down rapidly from "-j1" to "-j9", because of intensive swapping. So "-j1" performs best in this case. 3) dual CPU with enough RAM (2 x 800 MHz, 1GB) speed up by almost two from "-j1" to "-j2", but after that no noticeable speed up anymore. So "-j2" is as good as "-j9". ---------------------------------------- With these simple tests, I come to the conclusion that "make -j$n buildworld" is best with n = number of CPUs. Does that make sense? Rob.
> With these simple tests, I come to the conclusion that > "make -j$n buildworld" is best with n = number of CPUs. > Does that make sense?Yes, I believe this makes sense. The recommendations made in the handbook (n >= 4) date back from the time when IO was the bottleneck in the compilation process i.e. one needed to run multiple build processes in parallel just to keep the cpu busy. Today this is no longer true, so it's optimal to use n = number of CPUs. Not sure about Intel cpus with HT enabled though... regards, Derkjan
Rob wrote:> I have tested following with FreeBSD 5.3-Stable. > > On several different PCs I have used > make -j$n buildworld > with $n ranging from 1 to 9. > > Although people suggest "-j4" as optimal in general > case, I have come to a very different conclusion: > > 1) single CPU with enough RAM (2 GHz, 512 MB) > there's no significant speed up in the range > "-j1" to "-j9". > So "-j1" is as good as "-j9". > > 2) single CPU with little RAM (333 MHz, 64 MB) > speed slows down rapidly from "-j1" to "-j9", > because of intensive swapping. > So "-j1" performs best in this case. > > 3) dual CPU with enough RAM (2 x 800 MHz, 1GB) > speed up by almost two from "-j1" to "-j2", > but after that no noticeable speed up anymore. > So "-j2" is as good as "-j9". > > ---------------------------------------- > > With these simple tests, I come to the conclusion that > "make -j$n buildworld" is best with n = number of CPUs. > Does that make sense?I believe the current recommendation for SMP machines is to use "-j(n+1)", so I've been using "-j3" on my dual CPU machines. This ensures there is always a task waiting to be executed when one of the CPUs completes a job. Note that a parallel build (using "-j") can cause problems when compiling some programs (in fact, it didn't work for the kernel last time I checked). For my UP machines I don't use "-j" at all. If you read the make(1) man page, you'll note that omitting "-j" implies "-B". As you already found, with a single CPU there is no speed gained by using "-j" and without it I get the assurance of running in compatibility mode. Jon
Did you try any machines that used Hyperthreading? I'd be interested to see how those machines fare based on the number of logical and real CPUs.> Although people suggest "-j4" as optimal in general > case, I have come to a very different conclusion: > > 1) single CPU with enough RAM (2 GHz, 512 MB) > there's no significant speed up in the range > "-j1" to "-j9". > So "-j1" is as good as "-j9".If you went to all that trouble, you might as well post the numbers :-)> 2) single CPU with little RAM (333 MHz, 64 MB) > speed slows down rapidly from "-j1" to "-j9", > because of intensive swapping. > So "-j1" performs best in this case.This is expected. A note should probably be added to the handbook giving rough approximations of how much memory per simultaneous process is necessary for optimal performance. I'd guess 48MB * p + c, where c = the machine's memory load while idle and p = the number of compile processes (most don't take nearly that much memory, but c++ can gobble it)> 3) dual CPU with enough RAM (2 x 800 MHz, 1GB) > speed up by almost two from "-j1" to "-j2", > but after that no noticeable speed up anymore. > So "-j2" is as good as "-j9".Again, you went to the trouble, post the numbers?> With these simple tests, I come to the conclusion that > "make -j$n buildworld" is best with n = number of CPUs. > Does that make sense?Sort of. It depends on more than just the number of CPUs. IO speed is also very important. If you're using NFS over non-gigabit ethernet or to a slow NFS server, it's worth ratcheting the number of threads up. The same would go for old slow disks, or if you have /usr/src union-mounted from a cdrom drive, etc. Also disk layout: having /usr/src on a different drive from /usr/obj can speed up the IO-bound portions of the process a great deal by eliminating contention. If you do less waiting for IO, adding more threads has a less pronounced or even negative effect due to cpu contention instead of the positive "work while the other thread waits on IO" effect. This is the basic underlying principle, which the handbook doesn't really point out. Seems to me the pluses and minuses of increasing n are: + More chances to do work when other processes are waiting on IO. - CPU contention resulting in context switches and other wasted cycles due to extra scheduling overhead (probably negligible, maybe significant with high HZ in kernel config). - Memory contention (aka usage). It might be worth decreasing the number recommended somewhat, but I think j = ncpu is too small for a general recommendation, because unless you are memory tight there is very little harm in increasing the number. I'd suspect j = 2 * ncpu or even j = ncpu + 1 are better rules of thumb. A better formula would take average IO thruput and latency rates from bonnie++, amount of available memory, and the number and speed of cpus. A perl script that measures these numbers and determines the optimal setting is left as an excersize to the reader. Extra credit - code it in C and get it integrated in -CURRENT so that "make buildworld" automagically calls "make -j=$n real_buildworld" with the optimal value of n :-) My results, for what it's worth: Specs: Athlon XP 2500+, 512M of 333MHz DDR ram. /usr/obj is a gvinum raid0 (striped) volume of two SATA disks. /usr/src is on a gvinum raid1 (mirrored) volume of two PATA disks. options HZ=1000 in the kernel config, pretty vanilla besides that.. in make.conf: CFLAGS=-O2 -pipe -march=athlon-xp CXXFLAGS empty due to a bug with memoization last time i tried a compile... make -j1 buildworld: real 64m54.298s user 52m56.915s sys 9m13.041s make -j2 buildworld: real 67m55.816s user 56m20.778s sys 10m20.247s make -j3 buildworld: real 70m53.936s user 59m2.447s sys 10m43.325s make -j4 buildworld: real 72m25.904s user 60m19.098s sys 10m59.492s -- Brian Szymanski ski@indymedia.org
Nick Barnes
2004-Nov-25 02:18 UTC
port make index (was: Re: make -j$n buildworld : use of -j investigated)
On Thu, 25 Nov 2004 16:19:02 +0900, Rob <spamrefuse@yahoo.com> wrote:> > time(minutes) * speed(MHz) * nproc / 1000 MHzLooking at your examples, it seems you divide by 1e5, not by 1000. In other words, buildworld is CPU bound and takes about 6e12 clock cycles. Use -j<nproc>. Nick B
I read this thread with interest and saw the question, how the system wil behave with hyperthreading. Should I not benchmark my system? here you have the results. The interpretation is left to the experts. IMHO HT is not as useless as expected. :-) I did not switch off SMP with sysctl, but used an extra UP Kernel to allow some optimizations during compile. But I don't know if there are any.. Hardware is CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz (2798.66-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf33 Stepping = 3 Hyperthreading: 2 logical CPUs real memory = 1072889856 (1023 MB) avail memory = 1040453632 (992 MB) Two 117246MB <Maxtor 6Y120M0/YAR51HW0> [238216/16/63] at ata2-master SATA150, one is on /usr/src, the other on /usr/obj. SMP Kernel 5.3-STABLE, nearly GENERIC, option SMP, some drivers removed =========Fri Nov 26 13:58:54 CET 2004 Start make -j 5 -DNOGAMES buildworld 32m51.01s real 50m51.61s user 11m33.17s sys 47540 maximum resident set size 2834 average shared memory size 1441 average unshared data size 128 average unshared stack size 21126270 page reclaims 531 page faults 0 swaps 17576 block input operations 2747 block output operations 0 messages sent 0 messages received 25289 signals received 426800 voluntary context switches 519922 involuntary context switches Fri Nov 26 14:31:45 CET 2004 END make -j 5 -DNOGAMES buildworld Fri Nov 26 14:31:45 CET 2004 Start make -j 4 -DNOGAMES buildworld 32m36.07s real 50m59.94s user 11m23.83s sys 47540 maximum resident set size 2843 average shared memory size 1444 average unshared data size 128 average unshared stack size 20968831 page reclaims 471 page faults 0 swaps 1572 block input operations 2625 block output operations 0 messages sent 0 messages received 24577 signals received 399521 voluntary context switches 499416 involuntary context switches Fri Nov 26 15:04:21 CET 2004 END make -j 4 -DNOGAMES buildworld Fri Nov 26 15:04:21 CET 2004 Start make -j 3 -DNOGAMES buildworld 32m30.77s real 50m48.61s user 11m23.91s sys 47540 maximum resident set size 2839 average shared memory size 1443 average unshared data size 128 average unshared stack size 20968366 page reclaims 408 page faults 0 swaps 1500 block input operations 2638 block output operations 0 messages sent 0 messages received 24902 signals received 406593 voluntary context switches 494799 involuntary context switches Fri Nov 26 15:36:52 CET 2004 END make -j 3 -DNOGAMES buildworld Fri Nov 26 15:36:52 CET 2004 Start make -j 2 -DNOGAMES buildworld 32m54.63s real 50m7.62s user 11m6.64s sys 47540 maximum resident set size 2846 average shared memory size 1449 average unshared data size 128 average unshared stack size 20968367 page reclaims 408 page faults 0 swaps 1500 block input operations 2610 block output operations 0 messages sent 0 messages received 25218 signals received 415829 voluntary context switches 484130 involuntary context switches Fri Nov 26 16:09:46 CET 2004 END make -j 2 -DNOGAMES buildworld Fri Nov 26 16:09:46 CET 2004 Start make -j 1 -DNOGAMES buildworld 39m19.52s real 31m57.60s user 8m27.33s sys 47540 maximum resident set size 2724 average shared memory size 1411 average unshared data size 127 average unshared stack size 20969173 page reclaims 408 page faults 0 swaps 1500 block input operations 2620 block output operations 0 messages sent 0 messages received 25283 signals received 411973 voluntary context switches 279205 involuntary context switches Fri Nov 26 16:49:06 CET 2004 END make -j 1 -DNOGAMES buildworld UP Kernel, the same kernel without option SMP ========Fri Nov 26 17:30:46 CET 2004 Start make -j 3 -DNOGAMES buildworld 38m17.37s real 31m13.04s user 5m47.43s sys 47428 maximum resident set size 2865 average shared memory size 1503 average unshared data size 128 average unshared stack size 20973951 page reclaims 1656 page faults 0 swaps 27380 block input operations 2653 block output operations 0 messages sent 0 messages received 24813 signals received 422752 voluntary context switches 563619 involuntary context switches Fri Nov 26 18:09:04 CET 2004 END make -j 3 -DNOGAMES buildworld Fri Nov 26 18:09:04 CET 2004 Start make -j 2 -DNOGAMES buildworld 38m31.50s real 31m9.44s user 5m43.27s sys 47428 maximum resident set size 2867 average shared memory size 1497 average unshared data size 128 average unshared stack size 20973698 page reclaims 408 page faults 0 swaps 1963 block input operations 2593 block output operations 0 messages sent 0 messages received 25191 signals received 403269 voluntary context switches 582855 involuntary context switches Fri Nov 26 18:47:35 CET 2004 END make -j 2 -DNOGAMES buildworld Fri Nov 26 18:47:35 CET 2004 Start make -j 1 -DNOGAMES buildworld 37m13.98s real 30m50.79s user 5m36.54s sys 47428 maximum resident set size 2869 average shared memory size 1498 average unshared data size 128 average unshared stack size 20974104 page reclaims 408 page faults 0 swaps 1894 block input operations 2546 block output operations 0 messages sent 0 messages received 25283 signals received 412027 voluntary context switches 640783 involuntary context switches Fri Nov 26 19:24:49 CET 2004 END make -j 1 -DNOGAMES buildworld Regards, Frank -- Frank Behrens, Osterwieck, Germany e-mail: <frank@pinky.sax.de> PGP-key 0x5B7C47ED on public servers available.
I read this thread with interest and saw the question, how the system wil behave with hyperthreading. Should I not benchmark my system? here you have the results. The interpretation is left to the experts. IMHO HT is not as useless as expected. :-) I did not switch off SMP with sysctl, but used an extra UP Kernel to allow some optimizations during compile. But I don't know if there are any.. Hardware is CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz (2798.66-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf33 Stepping = 3 Hyperthreading: 2 logical CPUs real memory = 1072889856 (1023 MB) avail memory = 1040453632 (992 MB) Two 117246MB <Maxtor 6Y120M0/YAR51HW0> [238216/16/63] at ata2-master SATA150, one is on /usr/src, the other on /usr/obj. SMP Kernel 5.3-STABLE, nearly GENERIC, option SMP, some drivers removed =========Fri Nov 26 13:58:54 CET 2004 Start make -j 5 -DNOGAMES buildworld 32m51.01s real 50m51.61s user 11m33.17s sys 47540 maximum resident set size 2834 average shared memory size 1441 average unshared data size 128 average unshared stack size 21126270 page reclaims 531 page faults 0 swaps 17576 block input operations 2747 block output operations 0 messages sent 0 messages received 25289 signals received 426800 voluntary context switches 519922 involuntary context switches Fri Nov 26 14:31:45 CET 2004 END make -j 5 -DNOGAMES buildworld Fri Nov 26 14:31:45 CET 2004 Start make -j 4 -DNOGAMES buildworld 32m36.07s real 50m59.94s user 11m23.83s sys 47540 maximum resident set size 2843 average shared memory size 1444 average unshared data size 128 average unshared stack size 20968831 page reclaims 471 page faults 0 swaps 1572 block input operations 2625 block output operations 0 messages sent 0 messages received 24577 signals received 399521 voluntary context switches 499416 involuntary context switches Fri Nov 26 15:04:21 CET 2004 END make -j 4 -DNOGAMES buildworld Fri Nov 26 15:04:21 CET 2004 Start make -j 3 -DNOGAMES buildworld 32m30.77s real 50m48.61s user 11m23.91s sys 47540 maximum resident set size 2839 average shared memory size 1443 average unshared data size 128 average unshared stack size 20968366 page reclaims 408 page faults 0 swaps 1500 block input operations 2638 block output operations 0 messages sent 0 messages received 24902 signals received 406593 voluntary context switches 494799 involuntary context switches Fri Nov 26 15:36:52 CET 2004 END make -j 3 -DNOGAMES buildworld Fri Nov 26 15:36:52 CET 2004 Start make -j 2 -DNOGAMES buildworld 32m54.63s real 50m7.62s user 11m6.64s sys 47540 maximum resident set size 2846 average shared memory size 1449 average unshared data size 128 average unshared stack size 20968367 page reclaims 408 page faults 0 swaps 1500 block input operations 2610 block output operations 0 messages sent 0 messages received 25218 signals received 415829 voluntary context switches 484130 involuntary context switches Fri Nov 26 16:09:46 CET 2004 END make -j 2 -DNOGAMES buildworld Fri Nov 26 16:09:46 CET 2004 Start make -j 1 -DNOGAMES buildworld 39m19.52s real 31m57.60s user 8m27.33s sys 47540 maximum resident set size 2724 average shared memory size 1411 average unshared data size 127 average unshared stack size 20969173 page reclaims 408 page faults 0 swaps 1500 block input operations 2620 block output operations 0 messages sent 0 messages received 25283 signals received 411973 voluntary context switches 279205 involuntary context switches Fri Nov 26 16:49:06 CET 2004 END make -j 1 -DNOGAMES buildworld UP Kernel, the same kernel without option SMP ========Fri Nov 26 17:30:46 CET 2004 Start make -j 3 -DNOGAMES buildworld 38m17.37s real 31m13.04s user 5m47.43s sys 47428 maximum resident set size 2865 average shared memory size 1503 average unshared data size 128 average unshared stack size 20973951 page reclaims 1656 page faults 0 swaps 27380 block input operations 2653 block output operations 0 messages sent 0 messages received 24813 signals received 422752 voluntary context switches 563619 involuntary context switches Fri Nov 26 18:09:04 CET 2004 END make -j 3 -DNOGAMES buildworld Fri Nov 26 18:09:04 CET 2004 Start make -j 2 -DNOGAMES buildworld 38m31.50s real 31m9.44s user 5m43.27s sys 47428 maximum resident set size 2867 average shared memory size 1497 average unshared data size 128 average unshared stack size 20973698 page reclaims 408 page faults 0 swaps 1963 block input operations 2593 block output operations 0 messages sent 0 messages received 25191 signals received 403269 voluntary context switches 582855 involuntary context switches Fri Nov 26 18:47:35 CET 2004 END make -j 2 -DNOGAMES buildworld Fri Nov 26 18:47:35 CET 2004 Start make -j 1 -DNOGAMES buildworld 37m13.98s real 30m50.79s user 5m36.54s sys 47428 maximum resident set size 2869 average shared memory size 1498 average unshared data size 128 average unshared stack size 20974104 page reclaims 408 page faults 0 swaps 1894 block input operations 2546 block output operations 0 messages sent 0 messages received 25283 signals received 412027 voluntary context switches 640783 involuntary context switches Fri Nov 26 19:24:49 CET 2004 END make -j 1 -DNOGAMES buildworld Regards, Frank -- Frank Behrens, Osterwieck, Germany e-mail: <frank@pinky.sax.de> PGP-key 0x5B7C47ED on public servers available.
At 2:08 PM +0900 11/23/04, Rob wrote:>Hi, > >I have tested following with FreeBSD 5.3-Stable. > >On several different PCs I have used > make -j$n buildworld >with $n ranging from 1 to 9. > >Although people suggest "-j4" as optimal in general >case, I have come to a very different conclusion...So, I finally got around to doing some timings on my newest PC. It is a AMD Athlon(tm) XP 3000+ (2166.43-MHz 686-class CPU) with 1-gig of memory, and fast SATA disks. It was certainly different that what I saw on my previous single-CPU systems. Roughly: Real User Sys Max-LA 2670.86 2071.66 543.49 1.25 buildworld 2751.60 2085.95 603.69 1.35 -j1 buildworld 2825.87 2137.19 637.15 5.58 -j2 buildworld 2887.03 2158.60 648.37 11.85 -j3 buildworld 2856.75 2156.48 647.43 19.06 -j4 buildworld 2851.71 2154.39 647.19 25.43 -j5 buildworld 2850.92 2155.40 646.19 31.59 -j6 buildworld 2850.07 2153.77 648.41 36.19 -j7 buildworld 2852.64 2155.74 647.82 47.00 -j8 buildworld 2851.66 2153.43 650.23 53.51 -j9 buildworld (I've actually done multiple runs of each, but they all show about the same numbers). I had a separate session doing 'uptime's every 30 seconds, and the "Max-LA" column is the maximum load-average that was seen by that separate session. Apparently the faster disks made a much bigger difference than I had expected to see. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu