Peter Zijlstra
2022-Sep-26 12:13 UTC
[PATCH v3 6/6] freezer,sched: Rewrite core freezer logic
On Mon, Sep 26, 2022 at 12:55:21PM +0200, Christian Borntraeger wrote:> > > Am 26.09.22 um 10:06 schrieb Christian Borntraeger: > > > > > > Am 23.09.22 um 09:53 schrieb Christian Borntraeger: > > > Am 23.09.22 um 09:21 schrieb Christian Borntraeger: > > > > Peter, > > > > > > > > as a heads-up. This commit (bisected and verified) triggers a > > > > regression in our KVM on s390x CI. The symptom is that a specific > > > > testcase (start a guest with next kernel and a poky ramdisk, > > > > then ssh via vsock into the guest and run the reboot command) now > > > > takes much longer (300 instead of 20 seconds). From a first look > > > > it seems that the sshd takes very long to end during shutdown > > > > but I have not looked into that yet. > > > > Any quick idea? > > > > > > > > Christian > > > > > > the sshd seems to hang in virtio-serial (not vsock). > > > > FWIW, sshd does not seem to hang, instead it seems to busy loop in > > wait_port_writable calling into the scheduler over and over again. > > -#define TASK_FREEZABLE 0x00002000 > +#define TASK_FREEZABLE 0x00000000 > > "Fixes" the issue. Just have to find out which of users is responsible.Since it's not the wait_port_writable() one -- we already tested that by virtue of 's/wait_event_freezable/wait_event/' there, it must be on the producing side of that port. But I'm having a wee bit of trouble following that code. Is there a task stuck in FROZEN state? -- then again, I thought you said there was no actual suspend involved, so that should not be it either. I'm curious though -- how far does it get into the scheduler? It should call schedule() with __state == TASK_INTERRUPTIBLE|TASK_FREEZABLE, which is quite sufficient to get it off the runqueue, who then puts it back? Or is it bailing early in the wait_event loop?