Mike Christie
2023-Jun-06 17:19 UTC
[PATCH 1/1] vhost: Fix crash during early vhost_transport_send_pkt calls
On 6/6/23 4:49 AM, Stefano Garzarella wrote:> On Mon, Jun 05, 2023 at 01:57:30PM -0500, Mike Christie wrote: >> If userspace does VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER we >> can race where: >> 1. thread0 calls vhost_transport_send_pkt -> vhost_work_queue >> 2. thread1 does VHOST_SET_OWNER which calls vhost_worker_create. >> 3. vhost_worker_create will set the dev->worker pointer before setting >> the worker->vtsk pointer. >> 4. thread0's vhost_work_queue will see the dev->worker pointer is >> set and try to call vhost_task_wake using not yet set worker->vtsk >> pointer. >> 5. We then crash since vtsk is NULL. >> >> Before commit 6e890c5d5021 ("vhost: use vhost_tasks for worker >> threads"), we only had the worker pointer so we could just check it to >> see if VHOST_SET_OWNER has been done. After that commit we have the >> vhost_worker and vhost_task pointers, so we can now hit the bug above. >> >> This patch embeds the vhost_worker in the vhost_dev, so we can just >> check the worker.vtsk pointer to check if VHOST_SET_OWNER has been done >> like before. >> >> Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") > > We should add: > > Reported-by: syzbot+d0d442c22fa8db45ff0e at syzkaller.appspotmail.comOk. Will do.>> -??? } >> +??? vtsk = vhost_task_create(vhost_worker, &dev->worker, name); >> +??? if (!vtsk) >> +??????? return -ENOMEM; >> >> -??? worker->vtsk = vtsk; >> +??? dev->worker.kcov_handle = kcov_common_handle(); >> +??? dev->worker.vtsk = vtsk; > > vhost_work_queue() is called by vhost_transport_send_pkt() without > holding vhost_dev.mutex (like vhost_poll_queue() in several places). > > If vhost_work_queue() finds dev->worker.vtsk not NULL, how can we > be sure that for example `work_list` has been initialized? > > Maybe I'm overthinking since we didn't have this problem before or the > race is really short that it never happened.Yeah, I dropped the READ/WRITE_ONCE use because I didn't think we needed it for the vhost_vsock_start case, and for the case you mentioned above, I wondered the same thing as you but was not sure so I added the same behavior as before. When I read memory-barriers.txt, it sounds like we've been getting lucky. I'll add back the READ/WRITE_ONCE for vtsk access since that's what we are keying off as signaling that the worker is ready to be used. I didn't see any type of perf hit when using it, and from the memory-barriers.txt doc it sounds like that's what we should be doing.
Michael S. Tsirkin
2023-Jun-06 19:22 UTC
[PATCH 1/1] vhost: Fix crash during early vhost_transport_send_pkt calls
On Tue, Jun 06, 2023 at 12:19:10PM -0500, Mike Christie wrote:> On 6/6/23 4:49 AM, Stefano Garzarella wrote: > > On Mon, Jun 05, 2023 at 01:57:30PM -0500, Mike Christie wrote: > >> If userspace does VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER we > >> can race where: > >> 1. thread0 calls vhost_transport_send_pkt -> vhost_work_queue > >> 2. thread1 does VHOST_SET_OWNER which calls vhost_worker_create. > >> 3. vhost_worker_create will set the dev->worker pointer before setting > >> the worker->vtsk pointer. > >> 4. thread0's vhost_work_queue will see the dev->worker pointer is > >> set and try to call vhost_task_wake using not yet set worker->vtsk > >> pointer. > >> 5. We then crash since vtsk is NULL. > >> > >> Before commit 6e890c5d5021 ("vhost: use vhost_tasks for worker > >> threads"), we only had the worker pointer so we could just check it to > >> see if VHOST_SET_OWNER has been done. After that commit we have the > >> vhost_worker and vhost_task pointers, so we can now hit the bug above. > >> > >> This patch embeds the vhost_worker in the vhost_dev, so we can just > >> check the worker.vtsk pointer to check if VHOST_SET_OWNER has been done > >> like before. > >> > >> Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") > > > > We should add: > > > > Reported-by: syzbot+d0d442c22fa8db45ff0e at syzkaller.appspotmail.com > > > Ok. Will do. > > > >> -??? } > >> +??? vtsk = vhost_task_create(vhost_worker, &dev->worker, name); > >> +??? if (!vtsk) > >> +??????? return -ENOMEM; > >> > >> -??? worker->vtsk = vtsk; > >> +??? dev->worker.kcov_handle = kcov_common_handle(); > >> +??? dev->worker.vtsk = vtsk; > > > > vhost_work_queue() is called by vhost_transport_send_pkt() without > > holding vhost_dev.mutex (like vhost_poll_queue() in several places). > > > > If vhost_work_queue() finds dev->worker.vtsk not NULL, how can we > > be sure that for example `work_list` has been initialized? > > > > Maybe I'm overthinking since we didn't have this problem before or the > > race is really short that it never happened. > > Yeah, I dropped the READ/WRITE_ONCE use because I didn't think we needed > it for the vhost_vsock_start case, and for the case you mentioned above, I > wondered the same thing as you but was not sure so I added the same > behavior as before. When I read memory-barriers.txt, it sounds like we've > been getting lucky.Yea READ/WRITE_ONCE is one of these things. Once you start adding them it's hard to stop, you begin to think it's needed everywhere. To actually know you need a language lawyer (READ/WRITE_ONCE is a compiler thing not a CPU thing).> I'll add back the READ/WRITE_ONCE for vtsk access since that's what we are > keying off as signaling that the worker is ready to be used. I didn't see > any type of perf hit when using it, and from the memory-barriers.txt doc > it sounds like that's what we should be doing.
Stefano Garzarella
2023-Jun-07 08:48 UTC
[PATCH 1/1] vhost: Fix crash during early vhost_transport_send_pkt calls
On Tue, Jun 06, 2023 at 12:19:10PM -0500, Mike Christie wrote:>On 6/6/23 4:49 AM, Stefano Garzarella wrote: >> On Mon, Jun 05, 2023 at 01:57:30PM -0500, Mike Christie wrote: >>> If userspace does VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER we >>> can race where: >>> 1. thread0 calls vhost_transport_send_pkt -> vhost_work_queue >>> 2. thread1 does VHOST_SET_OWNER which calls vhost_worker_create. >>> 3. vhost_worker_create will set the dev->worker pointer before setting >>> the worker->vtsk pointer. >>> 4. thread0's vhost_work_queue will see the dev->worker pointer is >>> set and try to call vhost_task_wake using not yet set worker->vtsk >>> pointer. >>> 5. We then crash since vtsk is NULL. >>> >>> Before commit 6e890c5d5021 ("vhost: use vhost_tasks for worker >>> threads"), we only had the worker pointer so we could just check it to >>> see if VHOST_SET_OWNER has been done. After that commit we have the >>> vhost_worker and vhost_task pointers, so we can now hit the bug above. >>> >>> This patch embeds the vhost_worker in the vhost_dev, so we can just >>> check the worker.vtsk pointer to check if VHOST_SET_OWNER has been done >>> like before. >>> >>> Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") >> >> We should add: >> >> Reported-by: syzbot+d0d442c22fa8db45ff0e at syzkaller.appspotmail.com > > >Ok. Will do. > > >>> -??? } >>> +??? vtsk = vhost_task_create(vhost_worker, &dev->worker, name); >>> +??? if (!vtsk) >>> +??????? return -ENOMEM; >>> >>> -??? worker->vtsk = vtsk; >>> +??? dev->worker.kcov_handle = kcov_common_handle(); >>> +??? dev->worker.vtsk = vtsk; >> >> vhost_work_queue() is called by vhost_transport_send_pkt() without >> holding vhost_dev.mutex (like vhost_poll_queue() in several places). >> >> If vhost_work_queue() finds dev->worker.vtsk not NULL, how can we >> be sure that for example `work_list` has been initialized? >> >> Maybe I'm overthinking since we didn't have this problem before or the >> race is really short that it never happened. > >Yeah, I dropped the READ/WRITE_ONCE use because I didn't think we needed >it for the vhost_vsock_start case, and for the case you mentioned above, I >wondered the same thing as you but was not sure so I added the same >behavior as before. When I read memory-barriers.txt, it sounds like we've >been getting lucky.Yep, it happened to me too before I highlighted it :-)> >I'll add back the READ/WRITE_ONCE for vtsk access since that's what we are >keying off as signaling that the worker is ready to be used. I didn't see >any type of perf hit when using it, and from the memory-barriers.txt doc >it sounds like that's what we should be doing. >LGTM, I just wonder if RCU on dev->worker can save us from future problems, but maybe better to do it in the next release. Thanks, Stefano
Apparently Analagous Threads
- [PATCH 1/1] vhost: Fix crash during early vhost_transport_send_pkt calls
- [PATCH 1/1] vhost: Fix crash during early vhost_transport_send_pkt calls
- [PATCH 1/1] vhost: Fix crash during early vhost_transport_send_pkt calls
- [PATCH 1/1] vhost: Fix crash during early vhost_transport_send_pkt calls
- [PATCH 1/1] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression