Halil Pasic
2017-Jan-26 17:39 UTC
[BUG/RFC] vhost: net: big endian viring access despite virtio 1
Hi! Recently I have been investigating some strange migration problems on s390x. It turned out under certain circumstances vhost_net corrupts avail.idx by using wrong endianness. I managed to track the problem down (I'm pretty sure). It boils down to the following. When stopping vhost userspace (QEMU) calls vhost_net_set_backend with the fd argument set to -1, this leads to is_le being reset in vhost_vq_init_access. On a BE system resetting to legacy means resetting to BE. Usually this is not a problem, but in the case when oldubufs is not zero (which is not likely if no network stress applied) it is a problem. That code path needs to write avail.idx, and ends up using wrong endianness when doing that (but only on a BE system). That is the story in prose, now let's see the corresponding code annotated with some comments. from drivers/vhost/net.c: static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) { /* [..] some not too interesting stuff */ sock = get_socket(fd); /* fd == -1 --> sock == NULL */ if (IS_ERR(sock)) { r = PTR_ERR(sock); goto err_vq; } /* start polling new socket */ oldsock = vq->private_data; if (sock != oldsock) { ubufs = vhost_net_ubuf_alloc(vq, sock && vhost_sock_zcopy(sock)); if (IS_ERR(ubufs)) { r = PTR_ERR(ubufs); goto err_ubufs; } vhost_net_disable_vq(n, vq); ==> vq->private_data = sock; /* now vq->private_data is NULL */ ==> r = vhost_vq_init_access(vq); if (r) goto err_used; /* vq endianness has been reset to BE on s390 */ r = vhost_net_enable_vq(n, vq); if (r) goto err_used; ==> oldubufs = nvq->ubufs; /* here oldubufs might become != 0 */ nvq->ubufs = ubufs; n->tx_packets = 0; n->tx_zcopy_err = 0; n->tx_flush = false; } mutex_unlock(&vq->mutex); if (oldubufs) { vhost_net_ubuf_put_wait_and_free(oldubufs); mutex_lock(&vq->mutex); ==> vhost_zerocopy_signal_used(n, vq); /* tries to update virtqueue structures; endianness is BE on s390 * now (but should be LE for virtio-1) */ mutex_unlock(&vq->mutex); } /*[..] rest of the function */ } from drivers/vhost/vhost.c: int vhost_vq_init_access(struct vhost_virtqueue *vq) { __virtio16 last_used_idx; int r; bool is_le = vq->is_le; if (!vq->private_data) { ==> vhost_reset_is_le(vq); /* resets to native endianness and returns */ return 0; } ==> vhost_init_is_le(vq); /* here we init is_le */ r = vhost_update_used_flags(vq); if (r) goto err; vq->signalled_used_valid = false; if (!vq->iotlb && !access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx)) { r = -EFAULT; goto err; } r = vhost_get_user(vq, last_used_idx, &vq->used->idx); if (r) { vq_err(vq, "Can't access used idx at %p\n", &vq->used->idx); goto err; } vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx); return 0; err: vq->is_le = is_le; return r; } AFAIU this can be fixed very simply by omitting the reset. Below the patch. I'm not sure though, the endianness handling ain't simple in vhost. Am I going in the right direction? We have a test (on s390x only) running for a couple of hours now and so far so good but it's a bit early to announce it is tested for s390x. If the patch is reasonable, I'm intend to do a version with proper attribution and stuff. By the way the reset was first introduced by https://lkml.org/lkml/2015/4/10/226 (dug it up in the hope that reasons for the reset were discussed -- but no enlightenment for me). Finally I would like to credit Dave Gilbert for hinting that the unreasonable avail.idx may be due to an endianness problem and Michael A. Tebolt for reporting the bug and testing. -------------------------8<-------------->From b26e2bbdc03832a0204ee2b42967a1b49a277dc8 Mon Sep 17 00:00:00 2001From: Halil Pasic <pasic at linux.vnet.ibm.com> Date: Thu, 26 Jan 2017 00:06:15 +0100 Subject: [PATCH] vhost: remove useless/dangerous reset of is_le The reset of is_le does no good, but it contributes its fair share to a bug in vhost_net, which occurs if we have some oldubufs when stopping and setting a fd = -1 as a backend. Instead of doing something convoluted in vhost_net, let's just get rid of the reset. Signed-off-by: Halil Pasic <pasic at linux.vnet.ibm.com> Fixes: commit 2751c9882b94 --- drivers/vhost/vhost.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d643260..08072a2 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1714,10 +1714,8 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) int r; bool is_le = vq->is_le; - if (!vq->private_data) { - vhost_reset_is_le(vq); + if (!vq->private_data) return 0; - } vhost_init_is_le(vq); -- 2.8.4
Michael S. Tsirkin
2017-Jan-26 19:20 UTC
[BUG/RFC] vhost: net: big endian viring access despite virtio 1
On Thu, Jan 26, 2017 at 06:39:14PM +0100, Halil Pasic wrote:> > Hi! > > Recently I have been investigating some strange migration problems on > s390x. > > It turned out under certain circumstances vhost_net corrupts avail.idx by > using wrong endianness. > > I managed to track the problem down (I'm pretty sure). It boils down to > the following. > > When stopping vhost userspace (QEMU) calls vhost_net_set_backend with > the fd argument set to -1, this leads to is_le being reset in > vhost_vq_init_access. On a BE system resetting to legacy means resetting > to BE. Usually this is not a problem, but in the case when oldubufs is not > zero (which is not likely if no network stress applied) it is a problem. > That code path needs to write avail.idx, and ends up using wrong > endianness when doing that (but only on a BE system). > > That is the story in prose, now let's see the corresponding code annotated > with some comments. > > from drivers/vhost/net.c: > static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) > { > /* [..] some not too interesting stuff */ > sock = get_socket(fd); > /* fd == -1 --> sock == NULL */ > if (IS_ERR(sock)) { > r = PTR_ERR(sock); > goto err_vq; > } > > /* start polling new socket */ > oldsock = vq->private_data; > if (sock != oldsock) { > ubufs = vhost_net_ubuf_alloc(vq, > sock && vhost_sock_zcopy(sock)); > if (IS_ERR(ubufs)) { > r = PTR_ERR(ubufs); > goto err_ubufs; > } > > vhost_net_disable_vq(n, vq); > ==> vq->private_data = sock; > /* now vq->private_data is NULL */ > ==> r = vhost_vq_init_access(vq); > if (r) > goto err_used; > /* vq endianness has been reset to BE on s390 */ > r = vhost_net_enable_vq(n, vq); > if (r) > goto err_used; > > ==> oldubufs = nvq->ubufs; > /* here oldubufs might become != 0 */ > nvq->ubufs = ubufs; > > n->tx_packets = 0; > n->tx_zcopy_err = 0; > n->tx_flush = false; > } > mutex_unlock(&vq->mutex); > > if (oldubufs) { > vhost_net_ubuf_put_wait_and_free(oldubufs); > mutex_lock(&vq->mutex); > ==> vhost_zerocopy_signal_used(n, vq); > /* tries to update virtqueue structures; endianness is BE on s390 > * now (but should be LE for virtio-1) */ > mutex_unlock(&vq->mutex); > } > /*[..] rest of the function */ > }> > from drivers/vhost/vhost.c: > > int vhost_vq_init_access(struct vhost_virtqueue *vq) > { > __virtio16 last_used_idx; > int r; > bool is_le = vq->is_le; > > if (!vq->private_data) { > ==> vhost_reset_is_le(vq); > /* resets to native endianness and returns */ > return 0; > } > > ==> vhost_init_is_le(vq); > /* here we init is_le */ > > r = vhost_update_used_flags(vq); > if (r) > goto err; > vq->signalled_used_valid = false; > if (!vq->iotlb && > !access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx)) { > r = -EFAULT; > goto err; > } > r = vhost_get_user(vq, last_used_idx, &vq->used->idx); > if (r) { > vq_err(vq, "Can't access used idx at %p\n", > &vq->used->idx); > goto err; > } > vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx); > return 0; > > err: > vq->is_le = is_le; > return r; > } > > AFAIU this can be fixed very simply by omitting the reset. Below the > patch. I'm not sure though, the endianness handling ain't simple in > vhost. Am I going in the right direction? > > We have a test (on s390x only) running for a couple of hours now and so > far so good but it's a bit early to announce it is tested for s390x. > If the patch is reasonable, I'm intend to do a version with proper > attribution and stuff. > > By the way the reset was first introduced by > https://lkml.org/lkml/2015/4/10/226 (dug it up in the hope that reasons > for the reset were discussed -- but no enlightenment for me). > > Finally I would like to credit Dave Gilbert for hinting that the > unreasonable avail.idx may be due to an endianness problem and Michael A. > Tebolt for reporting the bug and testing. > > -------------------------8<-------------- > >From b26e2bbdc03832a0204ee2b42967a1b49a277dc8 Mon Sep 17 00:00:00 2001 > From: Halil Pasic <pasic at linux.vnet.ibm.com> > Date: Thu, 26 Jan 2017 00:06:15 +0100 > Subject: [PATCH] vhost: remove useless/dangerous reset of is_le > > The reset of is_le does no good, but it contributes its fair share to a > bug in vhost_net, which occurs if we have some oldubufs when stopping and > setting a fd = -1 as a backend. Instead of doing something convoluted in > vhost_net, let's just get rid of the reset. > > Signed-off-by: Halil Pasic <pasic at linux.vnet.ibm.com> > Fixes: commit 2751c9882b94 > --- > drivers/vhost/vhost.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index d643260..08072a2 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -1714,10 +1714,8 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) > int r; > bool is_le = vq->is_le; > > - if (!vq->private_data) { > - vhost_reset_is_le(vq); > + if (!vq->private_data) > return 0; > - } > > vhost_init_is_le(vq);I think you do need to reset it, just maybe within vhost_init_is_le. if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) vq->is_le = true; else vhost_reset_is_le(vq);> -- > 2.8.4
Halil Pasic
2017-Jan-27 12:24 UTC
[BUG/RFC] vhost: net: big endian viring access despite virtio 1
On 01/26/2017 08:20 PM, Michael S. Tsirkin wrote:> On Thu, Jan 26, 2017 at 06:39:14PM +0100, Halil Pasic wrote: >> >> Hi! >> >> Recently I have been investigating some strange migration problems on >> s390x. >> >> It turned out under certain circumstances vhost_net corrupts avail.idx by >> using wrong endianness.[..]>> -------------------------8<-------------- >> >From b26e2bbdc03832a0204ee2b42967a1b49a277dc8 Mon Sep 17 00:00:00 2001 >> From: Halil Pasic <pasic at linux.vnet.ibm.com> >> Date: Thu, 26 Jan 2017 00:06:15 +0100 >> Subject: [PATCH] vhost: remove useless/dangerous reset of is_le >> >> The reset of is_le does no good, but it contributes its fair share to a >> bug in vhost_net, which occurs if we have some oldubufs when stopping and >> setting a fd = -1 as a backend. Instead of doing something convoluted in >> vhost_net, let's just get rid of the reset. >> >> Signed-off-by: Halil Pasic <pasic at linux.vnet.ibm.com> >> Fixes: commit 2751c9882b94 >> --- >> drivers/vhost/vhost.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c >> index d643260..08072a2 100644 >> --- a/drivers/vhost/vhost.c >> +++ b/drivers/vhost/vhost.c >> @@ -1714,10 +1714,8 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) >> int r; >> bool is_le = vq->is_le; >> >> - if (!vq->private_data) { >> - vhost_reset_is_le(vq); >> + if (!vq->private_data) >> return 0; >> - } >> >> vhost_init_is_le(vq); > > > I think you do need to reset it, just maybe within vhost_init_is_le. > > if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) > vq->is_le = true; > else > vhost_reset_is_le(vq); > >That is a very good point! I have overlooked that while the CONFIG_VHOST_CROSS_ENDIAN_LEGACY variant static void vhost_init_is_le(struct vhost_virtqueue *vq) { /* Note for legacy virtio: user_be is initialized at reset time * according to the host endianness. If userspace does not set an * explicit endianness, the default behavior is native endian, as * expected by legacy virtio. */ vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be; } is fine the other variant static void vhost_init_is_le(struct vhost_virtqueue *vq) { if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) vq->is_le = true; } is a very strange initializer (makes assumptions about the state to be initialized). I agree, setting native endianness there sounds very reasonable. I have a question regarding readability. IMHO the relationship of reset_is_le and int_is_le is a bit confusing, and I'm afraid it could become even more confusing with using reset in one of the init_is_le's. How about we do the following? static void vhost_init_is_le(struct vhost_virtqueue *vq) { if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) vq->is_le = true; + else + vq->is_le = virtio_legacy_is_little_endian(); } static void vhost_reset_is_le(struct vhost_virtqueue *vq) { - vq->is_le = virtio_legacy_is_little_endian(); + vhost_init_is_le(vq); } That way we would have correct endianness both after reset and after init, I think :). Thank you very much! Halil
Reasonably Related Threads
- [BUG/RFC] vhost: net: big endian viring access despite virtio 1
- [BUG/RFC] vhost: net: big endian viring access despite virtio 1
- [BUG/RFC] vhost: net: big endian viring access despite virtio 1
- [BUG/RFC] vhost: net: big endian viring access despite virtio 1
- [BUG/RFC] vhost: net: big endian viring access despite virtio 1