michael.christie at oracle.com
2023-May-30 16:30 UTC
[syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On 5/30/23 11:17 AM, Stefano Garzarella wrote:> On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote: >> On 5/30/23 11:00 AM, Stefano Garzarella wrote: >>> I think it is partially related to commit 6e890c5d5021 ("vhost: use >>> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move >>> worker thread fields to new struct"). Maybe that commits just >>> highlighted the issue and it was already existing. >> >> See my mail about the crash. Agree with your analysis about worker->vtsk >> not being set yet. It's a bug from my commit where I should have not set >> it so early or I should be checking for >> >> if (dev->worker && worker->vtsk) >> >> instead of >> >> if (dev->worker) > > Yes, though, in my opinion the problem may persist depending on how the > instructions are reordered.Ah ok.> > Should we protect dev->worker() with an RCU to be safe?For those multiple worker patchsets Jason had asked me about supporting where we don't have a worker while we are swapping workers around. To do that I had added rcu around the dev->worker. I removed it in later patchsets because I didn't think anyone would use it. rcu would work for your case and for what Jason had requested.
Stefano Garzarella
2023-May-31 07:27 UTC
[syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On Tue, May 30, 2023 at 6:30?PM <michael.christie at oracle.com> wrote:> > On 5/30/23 11:17 AM, Stefano Garzarella wrote: > > On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote: > >> On 5/30/23 11:00 AM, Stefano Garzarella wrote: > >>> I think it is partially related to commit 6e890c5d5021 ("vhost: use > >>> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move > >>> worker thread fields to new struct"). Maybe that commits just > >>> highlighted the issue and it was already existing. > >> > >> See my mail about the crash. Agree with your analysis about worker->vtsk > >> not being set yet. It's a bug from my commit where I should have not set > >> it so early or I should be checking for > >> > >> if (dev->worker && worker->vtsk) > >> > >> instead of > >> > >> if (dev->worker) > > > > Yes, though, in my opinion the problem may persist depending on how the > > instructions are reordered. > > Ah ok. > > > > > Should we protect dev->worker() with an RCU to be safe? > > For those multiple worker patchsets Jason had asked me about supporting > where we don't have a worker while we are swapping workers around. To do > that I had added rcu around the dev->worker. I removed it in later patchsets > because I didn't think anyone would use it. > > rcu would work for your case and for what Jason had requested.Yeah, so you already have some patches? Do you want to send it to solve this problem? Thanks, Stefano