Halil Pasic
2017-Jan-26  17:39 UTC
[BUG/RFC] vhost: net: big endian viring access despite virtio 1
Hi!
Recently I have been investigating some strange migration problems on
s390x.
It turned out under certain circumstances vhost_net corrupts avail.idx by
using wrong endianness.
I managed to track the problem down (I'm pretty sure). It boils down to
the following.
When stopping vhost userspace (QEMU) calls vhost_net_set_backend with
the fd argument set to -1, this leads to is_le being reset in
vhost_vq_init_access.  On a BE system resetting to legacy means resetting
to BE. Usually this is not a problem, but in the case when oldubufs is not
zero (which is not likely if no network stress applied) it is a problem.
That code path needs to write avail.idx, and ends up using wrong
endianness when doing that (but only on a BE system).
That is the story in prose, now let's see the corresponding code annotated
with some comments.
from drivers/vhost/net.c:
static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
{
/* [..] some not too interesting stuff  */
        sock = get_socket(fd);
/* fd == -1 --> sock == NULL */
        if (IS_ERR(sock)) {
                r = PTR_ERR(sock);
                goto err_vq;
        }
        /* start polling new socket */
        oldsock = vq->private_data;
        if (sock != oldsock) {
                ubufs = vhost_net_ubuf_alloc(vq,
                                             sock &&
vhost_sock_zcopy(sock));
                if (IS_ERR(ubufs)) {
                        r = PTR_ERR(ubufs);
                        goto err_ubufs;
                }
                vhost_net_disable_vq(n, vq);
==>             vq->private_data = sock; 
/* now vq->private_data is NULL */
==>             r = vhost_vq_init_access(vq);
                if (r)
                        goto err_used;
/* vq endianness has been reset to BE on s390 */
                r = vhost_net_enable_vq(n, vq);
                if (r)
                        goto err_used;
==>             oldubufs = nvq->ubufs;
/* here oldubufs might become != 0 */               
                nvq->ubufs = ubufs;
                n->tx_packets = 0;
                n->tx_zcopy_err = 0;
                n->tx_flush = false;
        }
        mutex_unlock(&vq->mutex);
        if (oldubufs) {
                vhost_net_ubuf_put_wait_and_free(oldubufs);
                mutex_lock(&vq->mutex);
==>             vhost_zerocopy_signal_used(n, vq);
/* tries to update virtqueue structures; endianness is BE on s390
 * now (but should be LE for virtio-1) */
                mutex_unlock(&vq->mutex);
        }
/*[..] rest of the function */
}
from drivers/vhost/vhost.c:
int vhost_vq_init_access(struct vhost_virtqueue *vq)
{
        __virtio16 last_used_idx;
        int r;
        bool is_le = vq->is_le;
        if (!vq->private_data) {
==>             vhost_reset_is_le(vq);
/* resets to native endianness and returns */
                return 0;
        }
==>      vhost_init_is_le(vq);
/* here we init is_le */
        r = vhost_update_used_flags(vq);
        if (r)
                goto err;
        vq->signalled_used_valid = false;
        if (!vq->iotlb &&
            !access_ok(VERIFY_READ, &vq->used->idx, sizeof
vq->used->idx)) {
                r = -EFAULT;
                goto err;
        }
        r = vhost_get_user(vq, last_used_idx, &vq->used->idx);
        if (r) {
                vq_err(vq, "Can't access used idx at %p\n",
                       &vq->used->idx);
                goto err;
        }
        vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx);
        return 0;
err:
        vq->is_le = is_le;
        return r;
}
AFAIU this can be fixed very simply by omitting the reset. Below the
patch. I'm not sure though, the endianness handling ain't simple in
vhost. Am I going in the right direction?
We have a test (on s390x only) running for a couple of hours now and so
far so good but it's a bit early to announce it is tested  for s390x.
If the patch is reasonable, I'm intend to do a version with proper
attribution and stuff.
By the way the reset was first introduced by 
https://lkml.org/lkml/2015/4/10/226 (dug it up in the hope that reasons
for the reset were discussed -- but no enlightenment for me).
Finally I would like to credit Dave Gilbert for hinting that the
unreasonable avail.idx may be due to an endianness problem and Michael A.
Tebolt for reporting the bug and testing.
-------------------------8<-------------->From b26e2bbdc03832a0204ee2b42967a1b49a277dc8 Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic at linux.vnet.ibm.com>
Date: Thu, 26 Jan 2017 00:06:15 +0100
Subject: [PATCH] vhost: remove useless/dangerous reset of is_le
The reset of is_le does no good, but it contributes its fair share to a
bug in vhost_net, which occurs if we have some oldubufs when stopping and
setting a fd = -1 as a backend. Instead of doing something convoluted in
vhost_net, let's just get rid of the reset.
Signed-off-by: Halil Pasic <pasic at linux.vnet.ibm.com>
Fixes: commit 2751c9882b94 
---
 drivers/vhost/vhost.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d643260..08072a2 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1714,10 +1714,8 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq)
        int r;
        bool is_le = vq->is_le;
-       if (!vq->private_data) {
-               vhost_reset_is_le(vq);
+       if (!vq->private_data)
                return 0;
-       }
        vhost_init_is_le(vq);
-- 
2.8.4
Michael S. Tsirkin
2017-Jan-26  19:20 UTC
[BUG/RFC] vhost: net: big endian viring access despite virtio 1
On Thu, Jan 26, 2017 at 06:39:14PM +0100, Halil Pasic wrote:> > Hi! > > Recently I have been investigating some strange migration problems on > s390x. > > It turned out under certain circumstances vhost_net corrupts avail.idx by > using wrong endianness. > > I managed to track the problem down (I'm pretty sure). It boils down to > the following. > > When stopping vhost userspace (QEMU) calls vhost_net_set_backend with > the fd argument set to -1, this leads to is_le being reset in > vhost_vq_init_access. On a BE system resetting to legacy means resetting > to BE. Usually this is not a problem, but in the case when oldubufs is not > zero (which is not likely if no network stress applied) it is a problem. > That code path needs to write avail.idx, and ends up using wrong > endianness when doing that (but only on a BE system). > > That is the story in prose, now let's see the corresponding code annotated > with some comments. > > from drivers/vhost/net.c: > static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) > { > /* [..] some not too interesting stuff */ > sock = get_socket(fd); > /* fd == -1 --> sock == NULL */ > if (IS_ERR(sock)) { > r = PTR_ERR(sock); > goto err_vq; > } > > /* start polling new socket */ > oldsock = vq->private_data; > if (sock != oldsock) { > ubufs = vhost_net_ubuf_alloc(vq, > sock && vhost_sock_zcopy(sock)); > if (IS_ERR(ubufs)) { > r = PTR_ERR(ubufs); > goto err_ubufs; > } > > vhost_net_disable_vq(n, vq); > ==> vq->private_data = sock; > /* now vq->private_data is NULL */ > ==> r = vhost_vq_init_access(vq); > if (r) > goto err_used; > /* vq endianness has been reset to BE on s390 */ > r = vhost_net_enable_vq(n, vq); > if (r) > goto err_used; > > ==> oldubufs = nvq->ubufs; > /* here oldubufs might become != 0 */ > nvq->ubufs = ubufs; > > n->tx_packets = 0; > n->tx_zcopy_err = 0; > n->tx_flush = false; > } > mutex_unlock(&vq->mutex); > > if (oldubufs) { > vhost_net_ubuf_put_wait_and_free(oldubufs); > mutex_lock(&vq->mutex); > ==> vhost_zerocopy_signal_used(n, vq); > /* tries to update virtqueue structures; endianness is BE on s390 > * now (but should be LE for virtio-1) */ > mutex_unlock(&vq->mutex); > } > /*[..] rest of the function */ > }> > from drivers/vhost/vhost.c: > > int vhost_vq_init_access(struct vhost_virtqueue *vq) > { > __virtio16 last_used_idx; > int r; > bool is_le = vq->is_le; > > if (!vq->private_data) { > ==> vhost_reset_is_le(vq); > /* resets to native endianness and returns */ > return 0; > } > > ==> vhost_init_is_le(vq); > /* here we init is_le */ > > r = vhost_update_used_flags(vq); > if (r) > goto err; > vq->signalled_used_valid = false; > if (!vq->iotlb && > !access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx)) { > r = -EFAULT; > goto err; > } > r = vhost_get_user(vq, last_used_idx, &vq->used->idx); > if (r) { > vq_err(vq, "Can't access used idx at %p\n", > &vq->used->idx); > goto err; > } > vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx); > return 0; > > err: > vq->is_le = is_le; > return r; > } > > AFAIU this can be fixed very simply by omitting the reset. Below the > patch. I'm not sure though, the endianness handling ain't simple in > vhost. Am I going in the right direction? > > We have a test (on s390x only) running for a couple of hours now and so > far so good but it's a bit early to announce it is tested for s390x. > If the patch is reasonable, I'm intend to do a version with proper > attribution and stuff. > > By the way the reset was first introduced by > https://lkml.org/lkml/2015/4/10/226 (dug it up in the hope that reasons > for the reset were discussed -- but no enlightenment for me). > > Finally I would like to credit Dave Gilbert for hinting that the > unreasonable avail.idx may be due to an endianness problem and Michael A. > Tebolt for reporting the bug and testing. > > -------------------------8<-------------- > >From b26e2bbdc03832a0204ee2b42967a1b49a277dc8 Mon Sep 17 00:00:00 2001 > From: Halil Pasic <pasic at linux.vnet.ibm.com> > Date: Thu, 26 Jan 2017 00:06:15 +0100 > Subject: [PATCH] vhost: remove useless/dangerous reset of is_le > > The reset of is_le does no good, but it contributes its fair share to a > bug in vhost_net, which occurs if we have some oldubufs when stopping and > setting a fd = -1 as a backend. Instead of doing something convoluted in > vhost_net, let's just get rid of the reset. > > Signed-off-by: Halil Pasic <pasic at linux.vnet.ibm.com> > Fixes: commit 2751c9882b94 > --- > drivers/vhost/vhost.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index d643260..08072a2 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -1714,10 +1714,8 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) > int r; > bool is_le = vq->is_le; > > - if (!vq->private_data) { > - vhost_reset_is_le(vq); > + if (!vq->private_data) > return 0; > - } > > vhost_init_is_le(vq);I think you do need to reset it, just maybe within vhost_init_is_le. if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) vq->is_le = true; else vhost_reset_is_le(vq);> -- > 2.8.4
Halil Pasic
2017-Jan-27  12:24 UTC
[BUG/RFC] vhost: net: big endian viring access despite virtio 1
On 01/26/2017 08:20 PM, Michael S. Tsirkin wrote:> On Thu, Jan 26, 2017 at 06:39:14PM +0100, Halil Pasic wrote: >> >> Hi! >> >> Recently I have been investigating some strange migration problems on >> s390x. >> >> It turned out under certain circumstances vhost_net corrupts avail.idx by >> using wrong endianness.[..]>> -------------------------8<-------------- >> >From b26e2bbdc03832a0204ee2b42967a1b49a277dc8 Mon Sep 17 00:00:00 2001 >> From: Halil Pasic <pasic at linux.vnet.ibm.com> >> Date: Thu, 26 Jan 2017 00:06:15 +0100 >> Subject: [PATCH] vhost: remove useless/dangerous reset of is_le >> >> The reset of is_le does no good, but it contributes its fair share to a >> bug in vhost_net, which occurs if we have some oldubufs when stopping and >> setting a fd = -1 as a backend. Instead of doing something convoluted in >> vhost_net, let's just get rid of the reset. >> >> Signed-off-by: Halil Pasic <pasic at linux.vnet.ibm.com> >> Fixes: commit 2751c9882b94 >> --- >> drivers/vhost/vhost.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c >> index d643260..08072a2 100644 >> --- a/drivers/vhost/vhost.c >> +++ b/drivers/vhost/vhost.c >> @@ -1714,10 +1714,8 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) >> int r; >> bool is_le = vq->is_le; >> >> - if (!vq->private_data) { >> - vhost_reset_is_le(vq); >> + if (!vq->private_data) >> return 0; >> - } >> >> vhost_init_is_le(vq); > > > I think you do need to reset it, just maybe within vhost_init_is_le. > > if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) > vq->is_le = true; > else > vhost_reset_is_le(vq); > >That is a very good point! I have overlooked that while the CONFIG_VHOST_CROSS_ENDIAN_LEGACY variant static void vhost_init_is_le(struct vhost_virtqueue *vq) { /* Note for legacy virtio: user_be is initialized at reset time * according to the host endianness. If userspace does not set an * explicit endianness, the default behavior is native endian, as * expected by legacy virtio. */ vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be; } is fine the other variant static void vhost_init_is_le(struct vhost_virtqueue *vq) { if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) vq->is_le = true; } is a very strange initializer (makes assumptions about the state to be initialized). I agree, setting native endianness there sounds very reasonable. I have a question regarding readability. IMHO the relationship of reset_is_le and int_is_le is a bit confusing, and I'm afraid it could become even more confusing with using reset in one of the init_is_le's. How about we do the following? static void vhost_init_is_le(struct vhost_virtqueue *vq) { if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) vq->is_le = true; + else + vq->is_le = virtio_legacy_is_little_endian(); } static void vhost_reset_is_le(struct vhost_virtqueue *vq) { - vq->is_le = virtio_legacy_is_little_endian(); + vhost_init_is_le(vq); } That way we would have correct endianness both after reset and after init, I think :). Thank you very much! Halil
Possibly Parallel Threads
- [BUG/RFC] vhost: net: big endian viring access despite virtio 1
- [BUG/RFC] vhost: net: big endian viring access despite virtio 1
- [BUG/RFC] vhost: net: big endian viring access despite virtio 1
- [BUG/RFC] vhost: net: big endian viring access despite virtio 1
- [BUG/RFC] vhost: net: big endian viring access despite virtio 1