Stefano Garzarella
2023-May-30 16:17 UTC
[syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote:>On 5/30/23 11:00 AM, Stefano Garzarella wrote: >> I think it is partially related to commit 6e890c5d5021 ("vhost: use >> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move >> worker thread fields to new struct"). Maybe that commits just >> highlighted the issue and it was already existing. > >See my mail about the crash. Agree with your analysis about worker->vtsk >not being set yet. It's a bug from my commit where I should have not set >it so early or I should be checking for > >if (dev->worker && worker->vtsk) > >instead of > >if (dev->worker)Yes, though, in my opinion the problem may persist depending on how the instructions are reordered. Should we protect dev->worker() with an RCU to be safe?> >One question about the behavior before my commit though and what we want in >the end going forward. Before that patch we would just drop work if >vhost_work_queue was called before VHOST_SET_OWNER. Was that correct/expected?I think so, since we ask the guest to call VHOST_SET_OWNER, before any other command.> >The call to vhost_work_queue in vhost_vsock_start was only seeing the >works queued after VHOST_SET_OWNER. Did you want works queued before that? >Yes, for example if an application in the host has tried to connect and is waiting for a timeout, we already have work queued up to flush as soon as we start the device. (See commit 0b841030625c ("vhost: vsock: kick send_pkt worker once device is started")). Thanks, Stefano
michael.christie at oracle.com
2023-May-30 16:30 UTC
[syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On 5/30/23 11:17 AM, Stefano Garzarella wrote:> On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote: >> On 5/30/23 11:00 AM, Stefano Garzarella wrote: >>> I think it is partially related to commit 6e890c5d5021 ("vhost: use >>> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move >>> worker thread fields to new struct"). Maybe that commits just >>> highlighted the issue and it was already existing. >> >> See my mail about the crash. Agree with your analysis about worker->vtsk >> not being set yet. It's a bug from my commit where I should have not set >> it so early or I should be checking for >> >> if (dev->worker && worker->vtsk) >> >> instead of >> >> if (dev->worker) > > Yes, though, in my opinion the problem may persist depending on how the > instructions are reordered.Ah ok.> > Should we protect dev->worker() with an RCU to be safe?For those multiple worker patchsets Jason had asked me about supporting where we don't have a worker while we are swapping workers around. To do that I had added rcu around the dev->worker. I removed it in later patchsets because I didn't think anyone would use it. rcu would work for your case and for what Jason had requested.