Greg Kurz
2015-May-06 12:07 UTC
[PATCH RFC 0/7] vhost: cross-endian support (vhost-net only)
Hi, This series allows QEMU to use vhost with legacy virtio devices when host and target don't have the same endianness. Only network devices are covered for the moment. I had already posted a series some monthes ago but it never got reviewed. Moreover, the underlying kernel support was entirely re-written and is still waiting to be applied by Michael. I hence post as RFC. The corresponding kernel patches are available here: http://lists.linuxfoundation.org/pipermail/virtualization/2015-April/029885.html Please comment. --- C?dric Le Goater (1): vhost_net: re-enable when cross endian Greg Kurz (6): virtio: relax feature check linux-headers: sync vhost.h virtio: introduce virtio_legacy_is_cross_endian() vhost: set vring endianness for legacy virtio tap: add VNET_LE/VNET_BE operations vhost-net: tell tap backend about the vnet endianness hw/net/vhost_net.c | 50 +++++++++++++++++++++++-------------- hw/virtio/vhost.c | 50 ++++++++++++++++++++++++++++++++++++- include/hw/virtio/virtio-access.h | 13 ++++++++++ include/hw/virtio/virtio.h | 1 - include/net/net.h | 6 ++++ linux-headers/linux/vhost.h | 14 ++++++++++ net/net.c | 18 +++++++++++++ net/tap-linux.c | 34 +++++++++++++++++++++++++ net/tap-linux.h | 2 + net/tap.c | 16 ++++++++++++ net/tap_int.h | 2 + 11 files changed, 185 insertions(+), 21 deletions(-) -- Greg
Unlike with add and clear, there is no valid reason to abort when checking for a feature. It makes more sense to return false (i.e. the feature bit isn't set). This is exactly what __virtio_has_feature() does if fbit >= 32. This allows to introduce code that is aware about new 64-bit features like VIRTIO_F_VERSION_1, even if they are still not implemented. Signed-off-by: Greg Kurz <gkurz at linux.vnet.ibm.com> --- include/hw/virtio/virtio.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index d95f8b6..6ef70f1 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -233,7 +233,6 @@ static inline void virtio_clear_feature(uint32_t *features, unsigned int fbit) static inline bool __virtio_has_feature(uint32_t features, unsigned int fbit) { - assert(fbit < 32); return !!(features & (1 << fbit)); }
This patch brings the cross-endian vhost API to QEMU. Signed-off-by: Greg Kurz <gkurz at linux.vnet.ibm.com> --- linux-headers/linux/vhost.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h index c656f61..ead86db 100644 --- a/linux-headers/linux/vhost.h +++ b/linux-headers/linux/vhost.h @@ -103,6 +103,20 @@ struct vhost_memory { /* Get accessor: reads index, writes value in num */ #define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct vhost_vring_state) +/* Set the vring byte order in num. Valid values are VHOST_VRING_LITTLE_ENDIAN + * or VHOST_VRING_BIG_ENDIAN (other values return -EINVAL). + * The byte order cannot be changed while the device is active: trying to do so + * returns -EBUSY. + * This is a legacy only API that is simply ignored when VIRTIO_F_VERSION_1 is + * set. + * Not all kernel configurations support this ioctl, but all configurations that + * support SET also support GET. + */ +#define VHOST_VRING_LITTLE_ENDIAN 0 +#define VHOST_VRING_BIG_ENDIAN 1 +#define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state) +#define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state) + /* The following ioctls use eventfd file descriptors to signal and poll * for events. */
Greg Kurz
2015-May-06 12:07 UTC
[PATCH RFC 3/7] virtio: introduce virtio_legacy_is_cross_endian()
This helper will be used by vhost and tap to detect cross-endianness in the legacy virtio case. Signed-off-by: Greg Kurz <gkurz at linux.vnet.ibm.com> --- include/hw/virtio/virtio-access.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index 46456fd..caf0940 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -28,6 +28,19 @@ static inline bool virtio_access_is_big_endian(VirtIODevice *vdev) #endif } +static inline bool virtio_legacy_is_cross_endian(VirtIODevice *vdev) +{ +#ifdef TARGET_IS_BIENDIAN +#ifdef HOST_WORDS_BIGENDIAN + return !virtio_is_big_endian(vdev); +#else + return virtio_is_big_endian(vdev); +#endif +#else + return false; +#endif +} + static inline uint16_t virtio_lduw_phys(VirtIODevice *vdev, hwaddr pa) { if (virtio_access_is_big_endian(vdev)) {
Greg Kurz
2015-May-06 12:08 UTC
[PATCH RFC 4/7] vhost: set vring endianness for legacy virtio
Legacy virtio is native endian: if the guest and host endianness differ, we have to tell vhost so it can swap bytes where appropriate. This is done through a vhost ring ioctl. Signed-off-by: Greg Kurz <gkurz at linux.vnet.ibm.com> --- hw/virtio/vhost.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 54851b7..1d7b939 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -17,9 +17,11 @@ #include "hw/hw.h" #include "qemu/atomic.h" #include "qemu/range.h" +#include "qemu/error-report.h" #include <linux/vhost.h> #include "exec/address-spaces.h" #include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-access.h" #include "migration/migration.h" static void vhost_dev_sync_region(struct vhost_dev *dev, @@ -647,6 +649,27 @@ static void vhost_log_stop(MemoryListener *listener, /* FIXME: implement */ } +static int vhost_virtqueue_set_vring_endian_legacy(struct vhost_dev *dev, + bool is_big_endian, + int vhost_vq_index) +{ + struct vhost_vring_state s = { + .index = vhost_vq_index, + .num = is_big_endian + }; + + if (!dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_ENDIAN, &s)) { + return 0; + } + + if (errno == ENOTTY) { + error_report("vhost does not support cross-endian"); + return -ENOSYS; + } + + return -errno; +} + static int vhost_virtqueue_start(struct vhost_dev *dev, struct VirtIODevice *vdev, struct vhost_virtqueue *vq, @@ -677,6 +700,16 @@ static int vhost_virtqueue_start(struct vhost_dev *dev, return -errno; } + if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) && + virtio_legacy_is_cross_endian(vdev)) { + r = vhost_virtqueue_set_vring_endian_legacy(dev, + virtio_is_big_endian(vdev), + vhost_vq_index); + if (r) { + return -errno; + } + } + s = l = virtio_queue_get_desc_size(vdev, idx); a = virtio_queue_get_desc_addr(vdev, idx); vq->desc = cpu_physical_memory_map(a, &l, 0); @@ -747,8 +780,9 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev, struct vhost_virtqueue *vq, unsigned idx) { + int vhost_vq_index = idx - dev->vq_index; struct vhost_vring_state state = { - .index = idx - dev->vq_index + .index = vhost_vq_index, }; int r; assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs); @@ -759,6 +793,20 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev, } virtio_queue_set_last_avail_idx(vdev, idx, state.num); virtio_queue_invalidate_signalled_used(vdev, idx); + + /* In the cross-endian case, we need to reset the vring endianness to + * native as legacy devices expect so by default. + */ + if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) && + virtio_legacy_is_cross_endian(vdev)) { + r = vhost_virtqueue_set_vring_endian_legacy(dev, + !virtio_is_big_endian(vdev), + vhost_vq_index); + if (r < 0) { + error_report("failed to reset vring endianness"); + } + } + assert (r >= 0); cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx), 0, virtio_queue_get_ring_size(vdev, idx));
The linux tap and macvtap backends can be told to parse vnet headers according to little or big endian. This is done through the TUNSETVNETLE and TUNSETVNETBE ioctls. This patch brings all the plumbing for QEMU to use these APIs. Signed-off-by: Greg Kurz <gkurz at linux.vnet.ibm.com> --- include/net/net.h | 6 ++++++ net/net.c | 18 ++++++++++++++++++ net/tap-linux.c | 34 ++++++++++++++++++++++++++++++++++ net/tap-linux.h | 2 ++ net/tap.c | 16 ++++++++++++++++ net/tap_int.h | 2 ++ 6 files changed, 78 insertions(+) diff --git a/include/net/net.h b/include/net/net.h index 50ffcb9..86f57f7 100644 --- a/include/net/net.h +++ b/include/net/net.h @@ -55,6 +55,8 @@ typedef bool (HasVnetHdrLen)(NetClientState *, int); typedef void (UsingVnetHdr)(NetClientState *, bool); typedef void (SetOffload)(NetClientState *, int, int, int, int, int); typedef void (SetVnetHdrLen)(NetClientState *, int); +typedef int (SetVnetLE)(NetClientState *, bool); +typedef int (SetVnetBE)(NetClientState *, bool); typedef struct NetClientInfo { NetClientOptionsKind type; @@ -73,6 +75,8 @@ typedef struct NetClientInfo { UsingVnetHdr *using_vnet_hdr; SetOffload *set_offload; SetVnetHdrLen *set_vnet_hdr_len; + SetVnetLE *set_vnet_le; + SetVnetBE *set_vnet_be; } NetClientInfo; struct NetClientState { @@ -138,6 +142,8 @@ void qemu_using_vnet_hdr(NetClientState *nc, bool enable); void qemu_set_offload(NetClientState *nc, int csum, int tso4, int tso6, int ecn, int ufo); void qemu_set_vnet_hdr_len(NetClientState *nc, int len); +int qemu_set_vnet_le(NetClientState *nc, bool is_le); +int qemu_set_vnet_be(NetClientState *nc, bool is_be); void qemu_macaddr_default_if_unset(MACAddr *macaddr); int qemu_show_nic_models(const char *arg, const char *const *models); void qemu_check_nic_model(NICInfo *nd, const char *model); diff --git a/net/net.c b/net/net.c index 0be084d..eb8ef3e 100644 --- a/net/net.c +++ b/net/net.c @@ -452,6 +452,24 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len) nc->info->set_vnet_hdr_len(nc, len); } +int qemu_set_vnet_le(NetClientState *nc, bool is_le) +{ + if (!nc || !nc->info->set_vnet_le) { + return -ENOSYS; + } + + return nc->info->set_vnet_le(nc, is_le); +} + +int qemu_set_vnet_be(NetClientState *nc, bool is_be) +{ + if (!nc || !nc->info->set_vnet_be) { + return -ENOSYS; + } + + return nc->info->set_vnet_be(nc, is_be); +} + int qemu_can_send_packet(NetClientState *sender) { int vm_running = runstate_is_running(); diff --git a/net/tap-linux.c b/net/tap-linux.c index 812bf2d..15b57a7 100644 --- a/net/tap-linux.c +++ b/net/tap-linux.c @@ -198,6 +198,40 @@ void tap_fd_set_vnet_hdr_len(int fd, int len) } } +int tap_fd_set_vnet_le(int fd, int is_le) +{ + int arg = is_le ? 1 : 0; + + if (!ioctl(fd, TUNSETVNETLE, &arg)) { + return 0; + } + + /* Check if our kernel supports TUNSETVNETLE */ + if (errno == EINVAL) { + return -errno; + } + + error_report("TUNSETVNETLE ioctl() failed: %s.\n", strerror(errno)); + abort(); +} + +int tap_fd_set_vnet_be(int fd, int is_be) +{ + int arg = is_be ? 1 : 0; + + if (!ioctl(fd, TUNSETVNETBE, &arg)) { + return 0; + } + + /* Check if our kernel supports TUNSETVNETBE */ + if (errno == EINVAL) { + return -errno; + } + + error_report("TUNSETVNETBE ioctl() failed: %s.\n", strerror(errno)); + abort(); +} + void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo) { diff --git a/net/tap-linux.h b/net/tap-linux.h index 1cf35d4..01dc6f8 100644 --- a/net/tap-linux.h +++ b/net/tap-linux.h @@ -30,6 +30,8 @@ #define TUNGETVNETHDRSZ _IOR('T', 215, int) #define TUNSETVNETHDRSZ _IOW('T', 216, int) #define TUNSETQUEUE _IOW('T', 217, int) +#define TUNSETVNETLE _IOW('T', 220, int) +#define TUNSETVNETBE _IOW('T', 222, int) #endif diff --git a/net/tap.c b/net/tap.c index 968df46..c6f9a7d 100644 --- a/net/tap.c +++ b/net/tap.c @@ -274,6 +274,20 @@ static void tap_using_vnet_hdr(NetClientState *nc, bool using_vnet_hdr) s->using_vnet_hdr = using_vnet_hdr; } +static int tap_set_vnet_le(NetClientState *nc, bool is_le) +{ + TAPState *s = DO_UPCAST(TAPState, nc, nc); + + return tap_fd_set_vnet_le(s->fd, is_le); +} + +static int tap_set_vnet_be(NetClientState *nc, bool is_be) +{ + TAPState *s = DO_UPCAST(TAPState, nc, nc); + + return tap_fd_set_vnet_be(s->fd, is_be); +} + static void tap_set_offload(NetClientState *nc, int csum, int tso4, int tso6, int ecn, int ufo) { @@ -335,6 +349,8 @@ static NetClientInfo net_tap_info = { .using_vnet_hdr = tap_using_vnet_hdr, .set_offload = tap_set_offload, .set_vnet_hdr_len = tap_set_vnet_hdr_len, + .set_vnet_le = tap_set_vnet_le, + .set_vnet_be = tap_set_vnet_be, }; static TAPState *net_tap_fd_init(NetClientState *peer, diff --git a/net/tap_int.h b/net/tap_int.h index 79afdf2..5cb79fc 100644 --- a/net/tap_int.h +++ b/net/tap_int.h @@ -40,6 +40,8 @@ int tap_probe_vnet_hdr_len(int fd, int len); int tap_probe_has_ufo(int fd); void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo); void tap_fd_set_vnet_hdr_len(int fd, int len); +int tap_fd_set_vnet_le(int fd, int vnet_is_le); +int tap_fd_set_vnet_be(int fd, int vnet_is_be); int tap_fd_enable(int fd); int tap_fd_disable(int fd); int tap_fd_get_ifname(int fd, char *ifname);
Greg Kurz
2015-May-06 12:08 UTC
[PATCH RFC 6/7] vhost-net: tell tap backend about the vnet endianness
The default behaviour for TAP/MACVTAP is to consider vnet as native endian. This patch handles the cases when this is not true: - virtio 1.0: always little-endian - legacy cross-endian Signed-off-by: Greg Kurz <gkurz at linux.vnet.ibm.com> --- hw/net/vhost_net.c | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index cf23335..1884e59 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -38,6 +38,7 @@ #include "standard-headers/linux/virtio_ring.h" #include "hw/virtio/vhost.h" #include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-access.h" struct vhost_net { struct vhost_dev dev; @@ -194,6 +195,27 @@ static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index) net->dev.vq_index = vq_index; } +static int vhost_net_set_vnet_endian(VirtIODevice *dev, NetClientState *peer, + bool set) +{ + int r = 0; + + if (virtio_has_feature(dev, VIRTIO_F_VERSION_1) || + (virtio_legacy_is_cross_endian(dev) && !virtio_is_big_endian(dev))) { + r = qemu_set_vnet_le(peer, set); + if (r) { + error_report("backend does not support LE vnet headers"); + } + } else if (virtio_legacy_is_cross_endian(dev)) { + r = qemu_set_vnet_be(peer, set); + if (r) { + error_report("backend does not support BE vnet headers"); + } + } + + return r; +} + static int vhost_net_start_one(struct vhost_net *net, VirtIODevice *dev) { @@ -304,6 +326,11 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, goto err; } + r = vhost_net_set_vnet_endian(dev, ncs[0].peer, true); + if (r < 0) { + goto err; + } + for (i = 0; i < total_queues; i++) { vhost_net_set_vq_index(get_vhost_net(ncs[i].peer), i * 2); } @@ -311,7 +338,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true); if (r < 0) { error_report("Error binding guest notifier: %d", -r); - goto err; + goto err_endian; } for (i = 0; i < total_queues; i++) { @@ -333,6 +360,8 @@ err_start: fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e); fflush(stderr); } +err_endian: + vhost_net_set_vnet_endian(dev, ncs[0].peer, false); err: return r; } @@ -355,6 +384,8 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs, fflush(stderr); } assert(r >= 0); + + assert(vhost_net_set_vnet_endian(dev, ncs[0].peer, false) >= 0); } void vhost_net_cleanup(struct vhost_net *net)
From: C?dric Le Goater <clg at fr.ibm.com> Cross-endianness is now checked by the core vhost code. revert 371df9f5e0f1 "vhost-net: disable when cross-endian" Signed-off-by: C?dric Le Goater <clg at fr.ibm.com> [ added commit message, Greg Kurz <gkurz at linux.vnet.ibm.com> ] Signed-off-by: Greg Kurz <gkurz at linux.vnet.ibm.com> --- hw/net/vhost_net.c | 19 ------------------- 1 file changed, 19 deletions(-) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 1884e59..3e4b0f2 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -293,19 +293,6 @@ static void vhost_net_stop_one(struct vhost_net *net, vhost_dev_disable_notifiers(&net->dev, dev); } -static bool vhost_net_device_endian_ok(VirtIODevice *vdev) -{ -#ifdef TARGET_IS_BIENDIAN -#ifdef HOST_WORDS_BIGENDIAN - return virtio_is_big_endian(vdev); -#else - return !virtio_is_big_endian(vdev); -#endif -#else - return true; -#endif -} - int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, int total_queues) { @@ -314,12 +301,6 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); int r, e, i; - if (!vhost_net_device_endian_ok(dev)) { - error_report("vhost-net does not support cross-endian"); - r = -ENOSYS; - goto err; - } - if (!k->set_guest_notifiers) { error_report("binding does not support guest notifiers"); r = -ENOSYS;
Cornelia Huck
2015-May-12 13:14 UTC
[Qemu-devel] [PATCH RFC 1/7] virtio: relax feature check
On Wed, 06 May 2015 14:07:37 +0200 Greg Kurz <gkurz at linux.vnet.ibm.com> wrote:> Unlike with add and clear, there is no valid reason to abort when checking > for a feature. It makes more sense to return false (i.e. the feature bit > isn't set). This is exactly what __virtio_has_feature() does if fbit >= 32. > > This allows to introduce code that is aware about new 64-bit features like > VIRTIO_F_VERSION_1, even if they are still not implemented. > > Signed-off-by: Greg Kurz <gkurz at linux.vnet.ibm.com> > --- > include/hw/virtio/virtio.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index d95f8b6..6ef70f1 100644 > --- a/include/hw/virtio/virtio.h > +++ b/include/hw/virtio/virtio.h > @@ -233,7 +233,6 @@ static inline void virtio_clear_feature(uint32_t *features, unsigned int fbit) > > static inline bool __virtio_has_feature(uint32_t features, unsigned int fbit) > { > - assert(fbit < 32); > return !!(features & (1 << fbit)); > } > > >I must say I'm not very comfortable with knowingly passing out-of-rage values to this function. Can we perhaps apply at least the feature-bit-size extending patches prior to your patchset, if the remainder of the virtio-1 patchset still takes some time?
Cornelia Huck
2015-May-12 13:25 UTC
[Qemu-devel] [PATCH RFC 4/7] vhost: set vring endianness for legacy virtio
On Wed, 06 May 2015 14:08:02 +0200 Greg Kurz <gkurz at linux.vnet.ibm.com> wrote:> Legacy virtio is native endian: if the guest and host endianness differ, > we have to tell vhost so it can swap bytes where appropriate. This is > done through a vhost ring ioctl. > > Signed-off-by: Greg Kurz <gkurz at linux.vnet.ibm.com> > --- > hw/virtio/vhost.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 49 insertions(+), 1 deletion(-) > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index 54851b7..1d7b939 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c(...)> @@ -677,6 +700,16 @@ static int vhost_virtqueue_start(struct vhost_dev *dev, > return -errno; > } > > + if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&I think this should either go in after the virtio-1 base support (more feature bits etc.) or get a big fat comment and be touched up later. I'd prefer the first solution so it does not get forgotten, but I'm not sure when Michael plans to proceed with the virtio-1 patches (I think they're mostly fine already).> + virtio_legacy_is_cross_endian(vdev)) { > + r = vhost_virtqueue_set_vring_endian_legacy(dev, > + virtio_is_big_endian(vdev), > + vhost_vq_index); > + if (r) { > + return -errno; > + } > + } > + > s = l = virtio_queue_get_desc_size(vdev, idx); > a = virtio_queue_get_desc_addr(vdev, idx); > vq->desc = cpu_physical_memory_map(a, &l, 0);
Possibly Parallel Threads
- [PATCH RFC 0/7] vhost: cross-endian support (vhost-net only)
- [Qemu-devel] [PATCH RFC 4/7] vhost: set vring endianness for legacy virtio
- [Qemu-devel] [PATCH RFC 4/7] vhost: set vring endianness for legacy virtio
- [PATCH RFC 4/7] vhost: set vring endianness for legacy virtio
- [PATCH RFC v5 00/19] qemu: towards virtio-1 host support