Felix Palmen
2021-Apr-15 16:29 UTC
Frequent disk I/O stalls while building (poudriere), processes in "zfs tear" state
After more experimentation, I finally found what's causing these problems for me on 13: * Felix Palmen <felix at palmen-it.de> [20210412 11:44]:> * Poudriere running on idprio 22 with 8 parallel build jobsRunning poudriere with normal priority works perfectly fine. Now, I've had poudriere running on idprio because there are several other services on that machine that shouldn't be slowed down by running a heavy build and I still want to use all the CPU resources available for building. Right now, I'm running a test with idprio 0 instead, which still seems to have the desired effect, and so far, I didn't have any of these stalls. If this persists, the problem is solved for me! I'd still be curious about what might be the cause, and, what this state "zfs tear" actually means. But that's kind of an "academic interest" now. -- Dipl.-Inform. Felix Palmen <felix at palmen-it.de> ,.//.......... {web} http://palmen-it.de {jabber} [see email] ,//palmen-it.de {pgp public key} http://palmen-it.de/pub.txt // """"""""""" {pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20210415/a8b0a1f1/attachment.sig>
Dewayne Geraghty
2021-Apr-15 20:26 UTC
Frequent disk I/O stalls while building (poudriere), processes in "zfs tear" state
On 16/04/2021 2:29 am, Felix Palmen wrote:> After more experimentation, I finally found what's causing these > problems for me on 13: > > * Felix Palmen <felix at palmen-it.de> [20210412 11:44]: >> * Poudriere running on idprio 22 with 8 parallel build jobs > > Running poudriere with normal priority works perfectly fine. Now, I've > had poudriere running on idprio because there are several other services > on that machine that shouldn't be slowed down by running a heavy build > and I still want to use all the CPU resources available for building. > > Right now, I'm running a test with idprio 0 instead, which still seems > to have the desired effect, and so far, I didn't have any of these > stalls. If this persists, the problem is solved for me! > > I'd still be curious about what might be the cause, and, what this state > "zfs tear" actually means. But that's kind of an "academic interest" > now. >Most likely your other processes are pre-empting your build, which is what you want :). Use /usr/bin/top to see the priority of the processes (ie under the PRI column). Using an idprio 22, means (on my 12.2Stable) a PRI of 146. If your kern.sched.preempt_thresh is using the default (of 80) then processes with a PRI of <80 will preempt (for IO). Even with an idprio 0, the PRI is 124. So I suspect that was more a matter of timing (ie good luck). You could increase your pre-emption threshold for the duration of the build, to include your nice value. But... (not really a good idea). Better if you run your build using nice (PRI of 76) which should avoid the stalls, but should also influence your more important services. Re zfs - sorry, I'm peculiar and don't use it ;)