Mark Linimon
2018-Feb-14 11:10 UTC
package building performance (was: Re: FreeBSD on AMD Epyc boards)
On Wed, Feb 14, 2018 at 09:15:53AM +0100, Kurt Jaeger wrote:> On the plus side: 16+16 cores, on the minus: A low CPU tact of 2.2 GHz. > Would a box like this be better for a package build host instead of 4+4 cores > with 3.x GHz ?In my experience, "it depends". I think that above a certain number of cores, I/O will dominate. I _think_; I have never done any metrics on any of this. The dominant term of the equation is, as you might guess, RAM. Previous experience suggests that you need at least 2GB per build. By default, nbuilds is set equal to ncores. Less than 2GB-per and you're going to be unhappy. (It's true that for modern systems, where large amounts of RAM are standard, that this is probably no longer a concern.) Put it this way: with 4 cores and 16GB and netbooting (7GB of which was devoted to md(4)), I was having lots of problems on powerpc64. The same machine with 64GB gives me no problems. My guess is that after RAM, there is I/O, ncores, and speed. But I'm just speculating. mcl
Don Lewis
2018-Feb-17 09:09 UTC
package building performance (was: Re: FreeBSD on AMD Epyc boards)
On 14 Feb, Mark Linimon wrote:> On Wed, Feb 14, 2018 at 09:15:53AM +0100, Kurt Jaeger wrote: >> On the plus side: 16+16 cores, on the minus: A low CPU tact of 2.2 GHz. >> Would a box like this be better for a package build host instead of 4+4 cores >> with 3.x GHz ? > > In my experience, "it depends". > > I think that above a certain number of cores, I/O will dominate. I _think_; > I have never done any metrics on any of this. > > The dominant term of the equation is, as you might guess, RAM. Previous > experience suggests that you need at least 2GB per build. By default, > nbuilds is set equal to ncores. Less than 2GB-per and you're going to be > unhappy. > > (It's true that for modern systems, where large amounts of RAM are standard, > that this is probably no longer a concern.) > > Put it this way: with 4 cores and 16GB and netbooting (7GB of which was > devoted to md(4)), I was having lots of problems on powerpc64. The same > machine with 64GB gives me no problems. > > My guess is that after RAM, there is I/O, ncores, and speed. But I'm just > speculating.I've been configuring 4 GB per builder, so on my 8-core 16-thread Ryzen machine, that means 64 GB of RAM. I also set USE_TMPS to "wrkdir data localbase" in poudriere.conf, so I'm leaning pretty heavily on RAM. I do figure that zfs clone is more efficient than tmpfs for the builder jails. With this configuration, building my default set of ports is pretty much CPU-bound. When it starts building the the larger ports that need a lot of space for WRKDIR, like openoffice-4, openoffice-devel, libreoffice, chromium, etc. the machine does end up using a lot of swap space, but it is mostly dead data from the wrkdirs, so generally there isn't a lot of paging activity. I also have ALLOW_MAKE_JOBS=yes to load up the CPUs a bit more, though I did get the best results with MAKE_JOBS_NUMBER=7 building my default port set on this machine. The hard drive is a fairly old WD Green that I removed from one of my other machines, and it is plenty fast enough to keep CPU idle % at or near zero most of the time during the build run. I did just try out "poudriere bulk -a" on this machine to build ports for 11.1-RELEASE amd64 and got these results: [111amd64-default] [2018-02-14_23h40m24s] [committing:] Queued: 29787 Built: 29277 Failed: 59 Skipped: 112 Ignored: 339 Tobuild: 0 Time: 47:39:48 I did notice some periods of high idle CPU during this run, but a lot of that was due to a bunch of the builders in the fetch state at the same time. Without that, the runtime would have been lower. On the other hand, some ports failed due to a gmake issue, and others looked like they failed due to having problems with ALLOW_MAKE_JOBS=yes. The runtime would have been higher without those problems. As far as Epyc goes, I think the larger core count would win. A lot depends on how effective cache is for this workload, so it would be interesting to plot poudriere run time vs. clock speed. If cache misses dominate execution time, then lowering the clock speed would not hurt that much. Something important to keep in mind with Threadripper and Epync is NUMA. For best results, all of the memory channels should be used and the work should be distributed so that the processes on each core primarily access RAM local to that CPU die. If this isn't the case then the infinity fabric that connects all of the CPU die will be the bottleneck. The lower core clock speed on Epyc lessens that penalty, but it is still something to be avoided if possible. Something else to consider is price/performance. If you want to build packages for four OS/arch combinations, then doing it in parallel on four Ryzen machines is likely to be both cheaper and faster than doing the same builds sequentially on an Epyc machine with 4x the core count and RAM. It is unfortunate that there don't seem to be any server-grade Ryzen motherboards. They all seem to be gamer boards with a lot of unnecessary bling.