Jason Wang
2012-Jun-25 10:04 UTC
[RFC V2 PATCH 0/4] Multiqueue support for tap and virtio-net/vhost
Hello all: This seires is an update of last version of multiqueue support to add multiqueue capability to both tap and virtio-net. Some kinds of tap backends has (macvatp in linux) or would (tap) support multiqueue. In such kind of tap backend, each file descriptor of a tap is a qeueu and ioctls were prodived to attach an exist tap file descriptor to the tun/tap device. So the patch let qemu to use this kind of backend, and let it can transmit and receving packets through multiple file descriptors. Patch 1 introduce a new help to get all matched options, after this patch, we could pass multiple file descriptors to a signle netdev by: qemu -netdev tap,id=hn0,fd=10,fd=11,... Patch 2 introduce generic helpers in tap to attach or detach a file descriptor from a tap device, emulated nics could use this helper to enable/disable queues. Patch 3 modifies the NICState to allow multiple VLANClientState to be stored in it, with this patch, qemu has basic support of multiple capable tap backend. Patch 4 converts virtio-net/vhost to be multiple capable. The vhost device were created per tx/rx queue pairs as usual. Changes from V1: - rebase to the latest - fix memory leak in parse_netdev - fix guest notifiers assignment/de-assignment - changes the command lines to: qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2 TODO: - netdev_add - bridge helper for multiple queue backend References: - V1 http://comments.gmane.org/gmane.comp.emulators.qemu/100481 Please review and comments. --- Jason Wang (4): option: introduce qemu_get_opt_all() tap: multiqueue support net: multiqueue support virtio-net: add multiqueue support hw/dp8393x.c | 2 hw/mcf_fec.c | 2 hw/qdev-properties.c | 33 +++- hw/qdev.h | 3 hw/vhost.c | 58 ++++-- hw/vhost.h | 1 hw/vhost_net.c | 7 + hw/vhost_net.h | 2 hw/virtio-net.c | 461 +++++++++++++++++++++++++++++++++----------------- hw/virtio-net.h | 3 net.c | 62 ++++++- net.h | 16 +- net/tap-aix.c | 13 + net/tap-bsd.c | 13 + net/tap-haiku.c | 13 + net/tap-linux.c | 55 ++++++ net/tap-linux.h | 3 net/tap-solaris.c | 13 + net/tap-win32.c | 11 + net/tap.c | 189 +++++++++++++------- net/tap.h | 7 + qemu-option.c | 19 ++ qemu-option.h | 2 23 files changed, 714 insertions(+), 274 deletions(-) -- Signature
Sometimes, we need to pass option like -netdev tap,fd=100,fd=101,fd=102 which can not be properly parsed by qemu_find_opt() because it only returns the first matched option. So qemu_get_opt_all() were introduced to return an array of pointers which contains all matched option. Signed-off-by: Jason Wang <jasowang at redhat.com> --- qemu-option.c | 19 +++++++++++++++++++ qemu-option.h | 2 ++ 2 files changed, 21 insertions(+), 0 deletions(-) diff --git a/qemu-option.c b/qemu-option.c index bb3886c..9263125 100644 --- a/qemu-option.c +++ b/qemu-option.c @@ -545,6 +545,25 @@ static QemuOpt *qemu_opt_find(QemuOpts *opts, const char *name) return NULL; } +int qemu_opt_get_all(QemuOpts *opts, const char *name, const char **optp, + int max) +{ + QemuOpt *opt; + int index = 0; + + QTAILQ_FOREACH_REVERSE(opt, &opts->head, QemuOptHead, next) { + if (strcmp(opt->name, name) == 0) { + if (index < max) { + optp[index++] = opt->str; + } + if (index == max) { + break; + } + } + } + return index; +} + const char *qemu_opt_get(QemuOpts *opts, const char *name) { QemuOpt *opt = qemu_opt_find(opts, name); diff --git a/qemu-option.h b/qemu-option.h index 951dec3..3c9a273 100644 --- a/qemu-option.h +++ b/qemu-option.h @@ -106,6 +106,8 @@ struct QemuOptsList { QemuOptDesc desc[]; }; +int qemu_opt_get_all(QemuOpts *opts, const char *name, const char **optp, + int max); const char *qemu_opt_get(QemuOpts *opts, const char *name); bool qemu_opt_get_bool(QemuOpts *opts, const char *name, bool defval); uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval);
This patch adds basic support for the multiple queue capable tap device. When multiqueue were enabled for a tap device, user can attach/detach multiple files (sockets) to the device through TUNATTACHQUEUE/TUNDETACHQUEUE. Two helpers tun_attach() and tun_deatch() were introduced to attach and detach file. Platform-specific helpers were called and only linux helper has its content as multiqueue tap were only supported in linux. Signed-off-by: Jason Wang <jasowang at redhat.com> --- net.c | 4 + net/tap-aix.c | 13 +++- net/tap-bsd.c | 13 +++- net/tap-haiku.c | 13 +++- net/tap-linux.c | 55 +++++++++++++++ net/tap-linux.h | 3 + net/tap-solaris.c | 13 +++- net/tap-win32.c | 11 +++ net/tap.c | 189 ++++++++++++++++++++++++++++++++++------------------- net/tap.h | 7 ++ 10 files changed, 245 insertions(+), 76 deletions(-) diff --git a/net.c b/net.c index 4aa416c..eabe830 100644 --- a/net.c +++ b/net.c @@ -978,6 +978,10 @@ static const struct { .name = "vhostforce", .type = QEMU_OPT_BOOL, .help = "force vhost on for non-MSIX virtio guests", + }, { + .name = "queues", + .type = QEMU_OPT_NUMBER, + .help = "number of queues the backend can provides", }, #endif /* _WIN32 */ { /* end of list */ } diff --git a/net/tap-aix.c b/net/tap-aix.c index e19aaba..f111e0f 100644 --- a/net/tap-aix.c +++ b/net/tap-aix.c @@ -25,7 +25,8 @@ #include "net/tap.h" #include <stdio.h> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int attach) { fprintf(stderr, "no tap on AIX\n"); return -1; @@ -59,3 +60,13 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo) { } + +int tap_fd_attach(int fd, const char *ifname) +{ + return -1; +} + +int tap_fd_detach(int fd, const char *ifname) +{ + return -1; +} diff --git a/net/tap-bsd.c b/net/tap-bsd.c index 937a94b..44f3421 100644 --- a/net/tap-bsd.c +++ b/net/tap-bsd.c @@ -33,7 +33,8 @@ #include <net/if_tap.h> #endif -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int attach) { int fd; #ifdef TAPGIFNAME @@ -145,3 +146,13 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo) { } + +int tap_fd_attach(int fd, const char *ifname) +{ + return -1; +} + +int tap_fd_detach(int fd, const char *ifname) +{ + return -1; +} diff --git a/net/tap-haiku.c b/net/tap-haiku.c index 91dda8e..6fb6719 100644 --- a/net/tap-haiku.c +++ b/net/tap-haiku.c @@ -25,7 +25,8 @@ #include "net/tap.h" #include <stdio.h> -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int attach) { fprintf(stderr, "no tap on Haiku\n"); return -1; @@ -59,3 +60,13 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo) { } + +int tap_fd_attach(int fd, const char *ifname) +{ + return -1; +} + +int tap_fd_detach(int fd, const char *ifname) +{ + return -1; +} diff --git a/net/tap-linux.c b/net/tap-linux.c index 41d581b..5d74b53 100644 --- a/net/tap-linux.c +++ b/net/tap-linux.c @@ -35,7 +35,8 @@ #define PATH_NET_TUN "/dev/net/tun" -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int attach) { struct ifreq ifr; int fd, ret; @@ -47,6 +48,8 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required } memset(&ifr, 0, sizeof(ifr)); ifr.ifr_flags = IFF_TAP | IFF_NO_PI; + if (!attach) + ifr.ifr_flags |= IFF_MULTI_QUEUE; if (*vnet_hdr) { unsigned int features; @@ -71,7 +74,10 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required pstrcpy(ifr.ifr_name, IFNAMSIZ, ifname); else pstrcpy(ifr.ifr_name, IFNAMSIZ, "tap%d"); - ret = ioctl(fd, TUNSETIFF, (void *) &ifr); + if (attach) + ret = ioctl(fd, TUNATTACHQUEUE, (void *) &ifr); + else + ret = ioctl(fd, TUNSETIFF, (void *) &ifr); if (ret != 0) { if (ifname[0] != '\0') { error_report("could not configure %s (%s): %m", PATH_NET_TUN, ifr.ifr_name); @@ -197,3 +203,48 @@ void tap_fd_set_offload(int fd, int csum, int tso4, } } } + +/* Attach a file descriptor to a TUN/TAP device. This descriptor should be + * detached before. + */ +int tap_fd_attach(int fd, const char *ifname) +{ + struct ifreq ifr; + int ret; + + memset(&ifr, 0, sizeof(ifr)); + + ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR; + pstrcpy(ifr.ifr_name, IFNAMSIZ, ifname); + + ret = ioctl(fd, TUNATTACHQUEUE, (void *) &ifr); + + if (ret != 0) { + error_report("could not attach to %s", ifname); + } + + return ret; +} + +/* Detach a file descriptor to a TUN/TAP device. This file descriptor must have + * been attach to a device. + */ +int tap_fd_detach(int fd, const char *ifname) +{ + struct ifreq ifr; + int ret; + + memset(&ifr, 0, sizeof(ifr)); + + ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR; + pstrcpy(ifr.ifr_name, IFNAMSIZ, ifname); + + ret = ioctl(fd, TUNDETACHQUEUE, (void *) &ifr); + + if (ret != 0) { + error_report("could not detach to %s", ifname); + } + + return ret; +} + diff --git a/net/tap-linux.h b/net/tap-linux.h index 659e981..0f5e34e 100644 --- a/net/tap-linux.h +++ b/net/tap-linux.h @@ -29,6 +29,8 @@ #define TUNSETSNDBUF _IOW('T', 212, int) #define TUNGETVNETHDRSZ _IOR('T', 215, int) #define TUNSETVNETHDRSZ _IOW('T', 216, int) +#define TUNATTACHQUEUE _IOW('T', 217, int) +#define TUNDETACHQUEUE _IOW('T', 218, int) #endif @@ -36,6 +38,7 @@ #define IFF_TAP 0x0002 #define IFF_NO_PI 0x1000 #define IFF_VNET_HDR 0x4000 +#define IFF_MULTI_QUEUE 0x0100 /* Features for GSO (TUNSETOFFLOAD). */ #define TUN_F_CSUM 0x01 /* You can hand me unchecksummed packets. */ diff --git a/net/tap-solaris.c b/net/tap-solaris.c index cf76463..f7c8e8d 100644 --- a/net/tap-solaris.c +++ b/net/tap-solaris.c @@ -173,7 +173,8 @@ static int tap_alloc(char *dev, size_t dev_size) return tap_fd; } -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int attach) { char dev[10]=""; int fd; @@ -225,3 +226,13 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo) { } + +int tap_fd_attach(int fd, const char *ifname) +{ + return -1; +} + +int tap_fd_detach(int fd, const char *ifname) +{ + return -1; +} diff --git a/net/tap-win32.c b/net/tap-win32.c index a801a55..dae1c00 100644 --- a/net/tap-win32.c +++ b/net/tap-win32.c @@ -749,3 +749,14 @@ struct vhost_net *tap_get_vhost_net(VLANClientState *nc) { return NULL; } + +int tap_attach(VLANClientState *nc) +{ + return -1; +} + +int tap_detach(VLANClientState *nc) +{ + return -1; +} + diff --git a/net/tap.c b/net/tap.c index 5ac4ba3..2b9dcb5 100644 --- a/net/tap.c +++ b/net/tap.c @@ -53,11 +53,13 @@ typedef struct TAPState { int fd; char down_script[1024]; char down_script_arg[128]; + char ifname[128]; uint8_t buf[TAP_BUFSIZE]; unsigned int read_poll : 1; unsigned int write_poll : 1; unsigned int using_vnet_hdr : 1; unsigned int has_ufo: 1; + unsigned int enabled:1; VHostNetState *vhost_net; unsigned host_vnet_hdr_len; } TAPState; @@ -546,7 +548,7 @@ int net_init_bridge(QemuOpts *opts, const char *name, VLANState *vlan) return 0; } -static int net_tap_init(QemuOpts *opts, int *vnet_hdr) +static int net_tap_init(QemuOpts *opts, int *vnet_hdr, int attach) { int fd, vnet_hdr_required; char ifname[128] = {0,}; @@ -563,7 +565,9 @@ static int net_tap_init(QemuOpts *opts, int *vnet_hdr) vnet_hdr_required = 0; } - TFR(fd = tap_open(ifname, sizeof(ifname), vnet_hdr, vnet_hdr_required)); + TFR(fd = tap_open(ifname, sizeof(ifname), vnet_hdr, vnet_hdr_required, + attach)); + if (fd < 0) { return -1; } @@ -572,7 +576,7 @@ static int net_tap_init(QemuOpts *opts, int *vnet_hdr) if (setup_script && setup_script[0] != '\0' && strcmp(setup_script, "no") != 0 && - launch_script(setup_script, ifname, fd)) { + (!attach && launch_script(setup_script, ifname, fd))) { close(fd); return -1; } @@ -582,74 +586,11 @@ static int net_tap_init(QemuOpts *opts, int *vnet_hdr) return fd; } -int net_init_tap(QemuOpts *opts, const char *name, VLANState *vlan) +static int __net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, + VLANState *vlan, int fd, int vnet_hdr) { - TAPState *s; - int fd, vnet_hdr = 0; - const char *model; - - if (qemu_opt_get(opts, "fd")) { - if (qemu_opt_get(opts, "ifname") || - qemu_opt_get(opts, "script") || - qemu_opt_get(opts, "downscript") || - qemu_opt_get(opts, "vnet_hdr") || - qemu_opt_get(opts, "helper")) { - error_report("ifname=, script=, downscript=, vnet_hdr=, " - "and helper= are invalid with fd="); - return -1; - } - - fd = net_handle_fd_param(cur_mon, qemu_opt_get(opts, "fd")); - if (fd == -1) { - return -1; - } - - fcntl(fd, F_SETFL, O_NONBLOCK); - - vnet_hdr = tap_probe_vnet_hdr(fd); - - model = "tap"; - - } else if (qemu_opt_get(opts, "helper")) { - if (qemu_opt_get(opts, "ifname") || - qemu_opt_get(opts, "script") || - qemu_opt_get(opts, "downscript") || - qemu_opt_get(opts, "vnet_hdr")) { - error_report("ifname=, script=, downscript=, and vnet_hdr= " - "are invalid with helper="); - return -1; - } - - fd = net_bridge_run_helper(qemu_opt_get(opts, "helper"), - DEFAULT_BRIDGE_INTERFACE); - if (fd == -1) { - return -1; - } - - fcntl(fd, F_SETFL, O_NONBLOCK); - - vnet_hdr = tap_probe_vnet_hdr(fd); - - model = "bridge"; - - } else { - if (!qemu_opt_get(opts, "script")) { - qemu_opt_set(opts, "script", DEFAULT_NETWORK_SCRIPT); - } - - if (!qemu_opt_get(opts, "downscript")) { - qemu_opt_set(opts, "downscript", DEFAULT_NETWORK_DOWN_SCRIPT); - } + TAPState *s = net_tap_fd_init(vlan, "tap", name, fd, vnet_hdr); - fd = net_tap_init(opts, &vnet_hdr); - if (fd == -1) { - return -1; - } - - model = "tap"; - } - - s = net_tap_fd_init(vlan, model, name, fd, vnet_hdr); if (!s) { close(fd); return -1; @@ -671,6 +612,7 @@ int net_init_tap(QemuOpts *opts, const char *name, VLANState *vlan) script = qemu_opt_get(opts, "script"); downscript = qemu_opt_get(opts, "downscript"); + pstrcpy(s->ifname, sizeof(s->ifname), ifname); snprintf(s->nc.info_str, sizeof(s->nc.info_str), "ifname=%s,script=%s,downscript=%s", ifname, script, downscript); @@ -704,6 +646,82 @@ int net_init_tap(QemuOpts *opts, const char *name, VLANState *vlan) return -1; } + s->enabled = 1; + return 0; +} + +int net_init_tap(QemuOpts *opts, const char *name, VLANState *vlan) +{ + int i, fd, vnet_hdr = 0; + int numqueues = qemu_opt_get_number(opts, "queues", 1); + + if (qemu_opt_get(opts, "fd")) { + const char *fdp[16]; + if (qemu_opt_get(opts, "ifname") || + qemu_opt_get(opts, "script") || + qemu_opt_get(opts, "downscript") || + qemu_opt_get(opts, "vnet_hdr") || + qemu_opt_get(opts, "helper")) { + error_report("ifname=, script=, downscript=, vnet_hdr=, " + "and helper= are invalid with fd="); + return -1; + } + + if (numqueues != qemu_opt_get_all(opts, "fd", fdp, 16)) { + error_report("the number of queue does not match the" + "number of fd passed"); + return -1; + } + + for (i = 0; i < numqueues; i++) { + fd = net_handle_fd_param(cur_mon, fdp[i]); + if (fd == -1) { + return -1; + } + + fcntl(fd, F_SETFL, O_NONBLOCK); + + vnet_hdr = tap_probe_vnet_hdr(fd); + + __net_init_tap(opts, cur_mon, name, vlan, fd, vnet_hdr); + } + } else if (qemu_opt_get(opts, "helper")) { + if (qemu_opt_get(opts, "ifname") || + qemu_opt_get(opts, "script") || + qemu_opt_get(opts, "downscript") || + qemu_opt_get(opts, "vnet_hdr")) { + error_report("ifname=, script=, downscript=, and vnet_hdr= " + "are invalid with helper="); + return -1; + } + + /* FIXME: multiqueue helper */ + fd = net_bridge_run_helper(qemu_opt_get(opts, "helper"), + DEFAULT_BRIDGE_INTERFACE); + if (fd == -1) { + return -1; + } + + fcntl(fd, F_SETFL, O_NONBLOCK); + + vnet_hdr = tap_probe_vnet_hdr(fd); + } else { + if (!qemu_opt_get(opts, "script")) { + qemu_opt_set(opts, "script", DEFAULT_NETWORK_SCRIPT); + } + + if (!qemu_opt_get(opts, "downscript")) { + qemu_opt_set(opts, "downscript", DEFAULT_NETWORK_DOWN_SCRIPT); + } + + for (i = 0; i < numqueues; i++) { + fd = net_tap_init(opts, &vnet_hdr, i != 0); + if (fd == -1) { + return -1; + } + __net_init_tap(opts, cur_mon, name, vlan, fd, vnet_hdr); + } + } return 0; } @@ -713,3 +731,36 @@ VHostNetState *tap_get_vhost_net(VLANClientState *nc) assert(nc->info->type == NET_CLIENT_TYPE_TAP); return s->vhost_net; } + +int tap_attach(VLANClientState *nc) +{ + TAPState *s = DO_UPCAST(TAPState, nc, nc); + int ret; + + if (s->enabled) { + return 0; + } else { + ret = tap_fd_attach(s->fd, s->ifname); + if (ret == 0) { + s->enabled = 1; + } + return ret; + } +} + +int tap_detach(VLANClientState *nc) +{ + TAPState *s = DO_UPCAST(TAPState, nc, nc); + int ret; + + if (s->enabled == 0) { + return 0; + } else { + ret = tap_fd_detach(s->fd, s->ifname); + if (ret == 0) { + s->enabled = 0; + } + return ret; + } +} + diff --git a/net/tap.h b/net/tap.h index b2a9450..cead7ca 100644 --- a/net/tap.h +++ b/net/tap.h @@ -34,7 +34,8 @@ int net_init_tap(QemuOpts *opts, const char *name, VLANState *vlan); -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required); +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int attach); ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen); @@ -51,6 +52,10 @@ int tap_probe_vnet_hdr_len(int fd, int len); int tap_probe_has_ufo(int fd); void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo); void tap_fd_set_vnet_hdr_len(int fd, int len); +int tap_attach(VLANClientState *vc); +int tap_detach(VLANClientState *vc); +int tap_fd_attach(int fd, const char *ifname); +int tap_fd_detach(int fd, const char *ifname); int tap_get_fd(VLANClientState *vc);
This patch adds the multiqueues support for emulated nics. Each VLANClientState pairs are now abstract as a queue instead of a nic, and multiple VLANClientState pointers were stored in the NICState. A queue_index were also introduced to let the emulated nics know which queue the packet were came from or sent out. Virtio-net would be the first user. Signed-off-by: Jason Wang <jasowang at redhat.com> --- hw/dp8393x.c | 2 +- hw/mcf_fec.c | 2 +- hw/qdev-properties.c | 33 +++++++++++++++++++++++----- hw/qdev.h | 3 ++- net.c | 58 +++++++++++++++++++++++++++++++++++++++++++------- net.h | 16 ++++++++++---- 6 files changed, 93 insertions(+), 21 deletions(-) diff --git a/hw/dp8393x.c b/hw/dp8393x.c index 017d074..483a868 100644 --- a/hw/dp8393x.c +++ b/hw/dp8393x.c @@ -900,7 +900,7 @@ void dp83932_init(NICInfo *nd, target_phys_addr_t base, int it_shift, s->conf.macaddr = nd->macaddr; s->conf.vlan = nd->vlan; - s->conf.peer = nd->netdev; + s->conf.peers[0] = nd->netdev; s->nic = qemu_new_nic(&net_dp83932_info, &s->conf, nd->model, nd->name, s); diff --git a/hw/mcf_fec.c b/hw/mcf_fec.c index ae37bef..69f508d 100644 --- a/hw/mcf_fec.c +++ b/hw/mcf_fec.c @@ -473,7 +473,7 @@ void mcf_fec_init(MemoryRegion *sysmem, NICInfo *nd, s->conf.macaddr = nd->macaddr; s->conf.vlan = nd->vlan; - s->conf.peer = nd->netdev; + s->conf.peers[0] = nd->netdev; s->nic = qemu_new_nic(&net_mcf_fec_info, &s->conf, nd->model, nd->name, s); diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c index 9ae3187..d45fcef 100644 --- a/hw/qdev-properties.c +++ b/hw/qdev-properties.c @@ -554,16 +554,37 @@ PropertyInfo qdev_prop_chr = { static int parse_netdev(DeviceState *dev, const char *str, void **ptr) { - VLANClientState *netdev = qemu_find_netdev(str); + VLANClientState ***nc = (VLANClientState ***)ptr; + VLANClientState *vcs[MAX_QUEUE_NUM]; + int queues, i = 0; + int ret; - if (netdev == NULL) { - return -ENOENT; + *nc = g_malloc(MAX_QUEUE_NUM * sizeof(VLANClientState *)); + queues = qemu_find_netdev_all(str, vcs, MAX_QUEUE_NUM); + if (queues == 0) { + ret = -ENOENT; + goto err; } - if (netdev->peer) { - return -EEXIST; + + for (i = 0; i < queues; i++) { + if (vcs[i] == NULL) { + ret = -ENOENT; + goto err; + } + + if (vcs[i]->peer) { + ret = -EEXIST; + goto err; + } + + (*nc)[i] = vcs[i]; } - *ptr = netdev; + return 0; + +err: + g_free(*nc); + return ret; } static const char *print_netdev(void *ptr) diff --git a/hw/qdev.h b/hw/qdev.h index 5386b16..1c023b4 100644 --- a/hw/qdev.h +++ b/hw/qdev.h @@ -248,6 +248,7 @@ extern PropertyInfo qdev_prop_blocksize; .defval = (bool)_defval, \ } + #define DEFINE_PROP_UINT8(_n, _s, _f, _d) \ DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_uint8, uint8_t) #define DEFINE_PROP_UINT16(_n, _s, _f, _d) \ @@ -274,7 +275,7 @@ extern PropertyInfo qdev_prop_blocksize; #define DEFINE_PROP_STRING(_n, _s, _f) \ DEFINE_PROP(_n, _s, _f, qdev_prop_string, char*) #define DEFINE_PROP_NETDEV(_n, _s, _f) \ - DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState*) + DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState**) #define DEFINE_PROP_VLAN(_n, _s, _f) \ DEFINE_PROP(_n, _s, _f, qdev_prop_vlan, VLANState*) #define DEFINE_PROP_DRIVE(_n, _s, _f) \ diff --git a/net.c b/net.c index eabe830..026a03a 100644 --- a/net.c +++ b/net.c @@ -238,16 +238,40 @@ NICState *qemu_new_nic(NetClientInfo *info, { VLANClientState *nc; NICState *nic; + int i; assert(info->type == NET_CLIENT_TYPE_NIC); assert(info->size >= sizeof(NICState)); - nc = qemu_new_net_client(info, conf->vlan, conf->peer, model, name); + if (conf->peers) { + nc = qemu_new_net_client(info, NULL, conf->peers[0], model, name); + } else { + nc = qemu_new_net_client(info, conf->vlan, NULL, model, name); + } nic = DO_UPCAST(NICState, nc, nc); nic->conf = conf; nic->opaque = opaque; + /* For compatiablity with single queue nic */ + nic->ncs[0] = nc; + nc->opaque = nic; + + for (i = 1 ; i < conf->queues; i++) { + VLANClientState *vc = g_malloc0(sizeof(*vc)); + vc->opaque = nic; + nic->ncs[i] = vc; + vc->peer = conf->peers[i]; + vc->info = info; + vc->queue_index = i; + vc->peer->peer = vc; + QTAILQ_INSERT_TAIL(&non_vlan_clients, vc, next); + + vc->send_queue = qemu_new_net_queue(qemu_deliver_packet, + qemu_deliver_packet_iov, + vc); + } + return nic; } @@ -283,11 +307,10 @@ void qemu_del_vlan_client(VLANClientState *vc) { /* If there is a peer NIC, delete and cleanup client, but do not free. */ if (!vc->vlan && vc->peer && vc->peer->info->type == NET_CLIENT_TYPE_NIC) { - NICState *nic = DO_UPCAST(NICState, nc, vc->peer); - if (nic->peer_deleted) { + if (vc->peer_deleted) { return; } - nic->peer_deleted = true; + vc->peer_deleted = true; /* Let NIC know peer is gone. */ vc->peer->link_down = true; if (vc->peer->info->link_status_changed) { @@ -299,8 +322,7 @@ void qemu_del_vlan_client(VLANClientState *vc) /* If this is a peer NIC and peer has already been deleted, free it now. */ if (!vc->vlan && vc->peer && vc->info->type == NET_CLIENT_TYPE_NIC) { - NICState *nic = DO_UPCAST(NICState, nc, vc); - if (nic->peer_deleted) { + if (vc->peer_deleted) { qemu_free_vlan_client(vc->peer); } } @@ -342,14 +364,14 @@ void qemu_foreach_nic(qemu_nic_foreach func, void *opaque) QTAILQ_FOREACH(nc, &non_vlan_clients, next) { if (nc->info->type == NET_CLIENT_TYPE_NIC) { - func(DO_UPCAST(NICState, nc, nc), opaque); + func((NICState *)nc->opaque, opaque); } } QTAILQ_FOREACH(vlan, &vlans, next) { QTAILQ_FOREACH(nc, &vlan->clients, next) { if (nc->info->type == NET_CLIENT_TYPE_NIC) { - func(DO_UPCAST(NICState, nc, nc), opaque); + func((NICState *)nc->opaque, opaque); } } } @@ -674,6 +696,26 @@ VLANClientState *qemu_find_netdev(const char *id) return NULL; } +int qemu_find_netdev_all(const char *id, VLANClientState **vcs, int max) +{ + VLANClientState *vc; + int ret = 0; + + QTAILQ_FOREACH(vc, &non_vlan_clients, next) { + if (vc->info->type == NET_CLIENT_TYPE_NIC) { + continue; + } + if (!strcmp(vc->name, id) && ret < max) { + vcs[ret++] = vc; + } + if (ret >= max) { + break; + } + } + + return ret; +} + static int nic_get_free_idx(void) { int index; diff --git a/net.h b/net.h index bdc2a06..40378ce 100644 --- a/net.h +++ b/net.h @@ -12,20 +12,24 @@ struct MACAddr { uint8_t a[6]; }; +#define MAX_QUEUE_NUM 32 + /* qdev nic properties */ typedef struct NICConf { MACAddr macaddr; VLANState *vlan; - VLANClientState *peer; + VLANClientState **peers; int32_t bootindex; + int32_t queues; } NICConf; #define DEFINE_NIC_PROPERTIES(_state, _conf) \ DEFINE_PROP_MACADDR("mac", _state, _conf.macaddr), \ DEFINE_PROP_VLAN("vlan", _state, _conf.vlan), \ - DEFINE_PROP_NETDEV("netdev", _state, _conf.peer), \ - DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1) + DEFINE_PROP_NETDEV("netdev", _state, _conf.peers), \ + DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1), \ + DEFINE_PROP_INT32("queues", _state, _conf.queues, 1) /* VLANs support */ @@ -72,13 +76,16 @@ struct VLANClientState { char *name; char info_str[256]; unsigned receive_disabled : 1; + unsigned int queue_index; + bool peer_deleted; + void *opaque; }; typedef struct NICState { VLANClientState nc; + VLANClientState *ncs[MAX_QUEUE_NUM]; NICConf *conf; void *opaque; - bool peer_deleted; } NICState; struct VLANState { @@ -90,6 +97,7 @@ struct VLANState { VLANState *qemu_find_vlan(int id, int allocate); VLANClientState *qemu_find_netdev(const char *id); +int qemu_find_netdev_all(const char *id, VLANClientState **vcs, int max); VLANClientState *qemu_new_net_client(NetClientInfo *info, VLANState *vlan, VLANClientState *peer,
This patch let the virtio-net can transmit and recevie packets through multiuple VLANClientStates and abstract them as multiple virtqueues to guest. A new parameter 'queues' were introduced to specify the number of queue pairs. The main goal for vhost support is to let the multiqueue could be used without changes in vhost code. So each vhost_net structure were used to track a single VLANClientState and two virtqueues in the past. As multiple VLANClientState were stored in the NICState, we can infer the correspond VLANClientState from this and queue_index easily. Signed-off-by: Jason Wang <jasowang at redhat.com> --- hw/vhost.c | 58 ++++--- hw/vhost.h | 1 hw/vhost_net.c | 7 + hw/vhost_net.h | 2 hw/virtio-net.c | 461 +++++++++++++++++++++++++++++++++++++------------------ hw/virtio-net.h | 3 6 files changed, 355 insertions(+), 177 deletions(-) diff --git a/hw/vhost.c b/hw/vhost.c index 43664e7..6318bb2 100644 --- a/hw/vhost.c +++ b/hw/vhost.c @@ -620,11 +620,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev, { target_phys_addr_t s, l, a; int r; + int vhost_vq_index = (idx > 2 ? idx - 1 : idx) % dev->nvqs; struct vhost_vring_file file = { - .index = idx, + .index = vhost_vq_index }; struct vhost_vring_state state = { - .index = idx, + .index = vhost_vq_index }; struct VirtQueue *vvq = virtio_get_queue(vdev, idx); @@ -670,11 +671,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev, goto fail_alloc_ring; } - r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled); + r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev->log_enabled); if (r < 0) { r = -errno; goto fail_alloc; } + file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq)); r = ioctl(dev->control, VHOST_SET_VRING_KICK, &file); if (r) { @@ -715,7 +717,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev, unsigned idx) { struct vhost_vring_state state = { - .index = idx, + .index = (idx > 2 ? idx - 1 : idx) % dev->nvqs, }; int r; r = ioctl(dev->control, VHOST_GET_VRING_BASE, &state); @@ -829,7 +831,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev) } for (i = 0; i < hdev->nvqs; ++i) { - r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, true); + r = vdev->binding->set_host_notifier(vdev->binding_opaque, + hdev->start_idx + i, + true); if (r < 0) { fprintf(stderr, "vhost VQ %d notifier binding failed: %d\n", i, -r); goto fail_vq; @@ -839,7 +843,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev) return 0; fail_vq: while (--i >= 0) { - r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, false); + r = vdev->binding->set_host_notifier(vdev->binding_opaque, + hdev->start_idx + i, + false); if (r < 0) { fprintf(stderr, "vhost VQ %d notifier cleanup error: %d\n", i, -r); fflush(stderr); @@ -860,7 +866,9 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev) int i, r; for (i = 0; i < hdev->nvqs; ++i) { - r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, false); + r = vdev->binding->set_host_notifier(vdev->binding_opaque, + hdev->start_idx + i, + false); if (r < 0) { fprintf(stderr, "vhost VQ %d notifier cleanup failed: %d\n", i, -r); fflush(stderr); @@ -874,15 +882,17 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) { int i, r; if (!vdev->binding->set_guest_notifiers) { - fprintf(stderr, "binding does not support guest notifiers\n"); + fprintf(stderr, "binding does not support guest notifier\n"); r = -ENOSYS; goto fail; } - r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true); - if (r < 0) { - fprintf(stderr, "Error binding guest notifier: %d\n", -r); - goto fail_notifiers; + if (hdev->start_idx == 0) { + r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true); + if (r < 0) { + fprintf(stderr, "Error binding guest notifier: %d\n", -r); + goto fail_notifiers; + } } r = vhost_dev_set_features(hdev, hdev->log_enabled); @@ -898,7 +908,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) r = vhost_virtqueue_init(hdev, vdev, hdev->vqs + i, - i); + hdev->start_idx + i); if (r < 0) { goto fail_vq; } @@ -925,11 +935,13 @@ fail_vq: vhost_virtqueue_cleanup(hdev, vdev, hdev->vqs + i, - i); + hdev->start_idx + i); } + i = hdev->nvqs; fail_mem: fail_features: - vdev->binding->set_guest_notifiers(vdev->binding_opaque, false); + if (hdev->start_idx == 0) + vdev->binding->set_guest_notifiers(vdev->binding_opaque, false); fail_notifiers: fail: return r; @@ -944,18 +956,22 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev) vhost_virtqueue_cleanup(hdev, vdev, hdev->vqs + i, - i); + hdev->start_idx + i); } + for (i = 0; i < hdev->n_mem_sections; ++i) { vhost_sync_dirty_bitmap(hdev, &hdev->mem_sections[i], 0, (target_phys_addr_t)~0x0ull); } - r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false); - if (r < 0) { - fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r); - fflush(stderr); + + if (hdev->start_idx == 0) { + r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false); + if (r < 0) { + fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r); + fflush(stderr); + } + assert (r >= 0); } - assert (r >= 0); hdev->started = false; g_free(hdev->log); diff --git a/hw/vhost.h b/hw/vhost.h index 80e64df..fa5357a 100644 --- a/hw/vhost.h +++ b/hw/vhost.h @@ -34,6 +34,7 @@ struct vhost_dev { MemoryRegionSection *mem_sections; struct vhost_virtqueue *vqs; int nvqs; + int start_idx; unsigned long long features; unsigned long long acked_features; unsigned long long backend_features; diff --git a/hw/vhost_net.c b/hw/vhost_net.c index f672e9d..73a72bb 100644 --- a/hw/vhost_net.c +++ b/hw/vhost_net.c @@ -138,13 +138,15 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev) } int vhost_net_start(struct vhost_net *net, - VirtIODevice *dev) + VirtIODevice *dev, + int start_idx) { struct vhost_vring_file file = { }; int r; net->dev.nvqs = 2; net->dev.vqs = net->vqs; + net->dev.start_idx = start_idx; r = vhost_dev_enable_notifiers(&net->dev, dev); if (r < 0) { @@ -227,7 +229,8 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev) } int vhost_net_start(struct vhost_net *net, - VirtIODevice *dev) + VirtIODevice *dev, + int start_idx) { return -ENOSYS; } diff --git a/hw/vhost_net.h b/hw/vhost_net.h index 91e40b1..79a4f09 100644 --- a/hw/vhost_net.h +++ b/hw/vhost_net.h @@ -9,7 +9,7 @@ typedef struct vhost_net VHostNetState; VHostNetState *vhost_net_init(VLANClientState *backend, int devfd, bool force); bool vhost_net_query(VHostNetState *net, VirtIODevice *dev); -int vhost_net_start(VHostNetState *net, VirtIODevice *dev); +int vhost_net_start(VHostNetState *net, VirtIODevice *dev, int start_idx); void vhost_net_stop(VHostNetState *net, VirtIODevice *dev); void vhost_net_cleanup(VHostNetState *net); diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 3f190d4..d42c4cc 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -26,34 +26,43 @@ #define MAC_TABLE_ENTRIES 64 #define MAX_VLAN (1 << 12) /* Per 802.1Q definition */ -typedef struct VirtIONet +struct VirtIONet; + +typedef struct VirtIONetQueue { - VirtIODevice vdev; - uint8_t mac[ETH_ALEN]; - uint16_t status; VirtQueue *rx_vq; VirtQueue *tx_vq; - VirtQueue *ctrl_vq; - NICState *nic; QEMUTimer *tx_timer; QEMUBH *tx_bh; uint32_t tx_timeout; - int32_t tx_burst; int tx_waiting; - uint32_t has_vnet_hdr; - uint8_t has_ufo; struct { VirtQueueElement elem; ssize_t len; } async_tx; + struct VirtIONet *n; + uint8_t vhost_started; +} VirtIONetQueue; + +typedef struct VirtIONet +{ + VirtIODevice vdev; + uint8_t mac[ETH_ALEN]; + uint16_t status; + VirtIONetQueue vqs[MAX_QUEUE_NUM]; + VirtQueue *ctrl_vq; + NICState *nic; + int32_t tx_burst; + uint32_t has_vnet_hdr; + uint8_t has_ufo; int mergeable_rx_bufs; + int multiqueue; uint8_t promisc; uint8_t allmulti; uint8_t alluni; uint8_t nomulti; uint8_t nouni; uint8_t nobcast; - uint8_t vhost_started; struct { int in_use; int first_multi; @@ -63,6 +72,7 @@ typedef struct VirtIONet } mac_table; uint32_t *vlans; DeviceState *qdev; + uint32_t queues; } VirtIONet; /* TODO @@ -74,12 +84,25 @@ static VirtIONet *to_virtio_net(VirtIODevice *vdev) return (VirtIONet *)vdev; } +static int vq_get_pair_index(VirtIONet *n, VirtQueue *vq) +{ + int i; + for (i = 0; i < n->queues; i++) { + if (n->vqs[i].tx_vq == vq || n->vqs[i].rx_vq == vq) { + return i; + } + } + assert(1); + return -1; +} + static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config) { VirtIONet *n = to_virtio_net(vdev); struct virtio_net_config netcfg; stw_p(&netcfg.status, n->status); + netcfg.queues = n->queues * 2; memcpy(netcfg.mac, n->mac, ETH_ALEN); memcpy(config, &netcfg, sizeof(netcfg)); } @@ -103,78 +126,140 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status) (n->status & VIRTIO_NET_S_LINK_UP) && n->vdev.vm_running; } -static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) +static void nc_vhost_status(VLANClientState *nc, VirtIONet *n, + uint8_t status) { - if (!n->nic->nc.peer) { + int queue_index = nc->queue_index; + VLANClientState *peer = nc->peer; + VirtIONetQueue *netq = &n->vqs[nc->queue_index]; + + if (!peer) { return; } - if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) { + if (peer->info->type != NET_CLIENT_TYPE_TAP) { return; } - if (!tap_get_vhost_net(n->nic->nc.peer)) { + if (!tap_get_vhost_net(peer)) { return; } - if (!!n->vhost_started == virtio_net_started(n, status) && - !n->nic->nc.peer->link_down) { + if (!!netq->vhost_started == virtio_net_started(n, status) && + !peer->link_down) { return; } - if (!n->vhost_started) { - int r; - if (!vhost_net_query(tap_get_vhost_net(n->nic->nc.peer), &n->vdev)) { + if (!netq->vhost_started) { + /* skip ctrl vq */ + int r, start_idx = queue_index == 0 ? 0 : queue_index * 2 + 1; + if (!vhost_net_query(tap_get_vhost_net(peer), &n->vdev)) { return; } - r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), &n->vdev); + r = vhost_net_start(tap_get_vhost_net(peer), &n->vdev, start_idx); if (r < 0) { error_report("unable to start vhost net: %d: " "falling back on userspace virtio", -r); } else { - n->vhost_started = 1; + netq->vhost_started = 1; } } else { - vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), &n->vdev); - n->vhost_started = 0; + vhost_net_stop(tap_get_vhost_net(peer), &n->vdev); + netq->vhost_started = 0; + } +} + +static int peer_attach(VirtIONet *n, int index) +{ + if (!n->nic->ncs[index]->peer) { + return -1; + } + + if (n->nic->ncs[index]->peer->info->type != NET_CLIENT_TYPE_TAP) { + return -1; + } + + return tap_attach(n->nic->ncs[index]->peer); +} + +static int peer_detach(VirtIONet *n, int index) +{ + if (!n->nic->ncs[index]->peer) { + return -1; + } + + if (n->nic->ncs[index]->peer->info->type != NET_CLIENT_TYPE_TAP) { + return -1; + } + + return tap_detach(n->nic->ncs[index]->peer); +} + +static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) +{ + int i; + for (i = 0; i < n->queues; i++) { + if (!n->multiqueue && i != 0) + status = 0; + nc_vhost_status(n->nic->ncs[i], n, status); } } static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status) { VirtIONet *n = to_virtio_net(vdev); + int i; virtio_net_vhost_status(n, status); - if (!n->tx_waiting) { - return; - } + for (i = 0; i < n->queues; i++) { + VirtIONetQueue *netq = &n->vqs[i]; + if (!netq->tx_waiting) { + continue; + } + + if (!n->multiqueue && i != 0) + status = 0; - if (virtio_net_started(n, status) && !n->vhost_started) { - if (n->tx_timer) { - qemu_mod_timer(n->tx_timer, - qemu_get_clock_ns(vm_clock) + n->tx_timeout); + if (virtio_net_started(n, status) && !netq->vhost_started) { + if (netq->tx_timer) { + qemu_mod_timer(netq->tx_timer, + qemu_get_clock_ns(vm_clock) + netq->tx_timeout); + } else { + qemu_bh_schedule(netq->tx_bh); + } } else { - qemu_bh_schedule(n->tx_bh); + if (netq->tx_timer) { + qemu_del_timer(netq->tx_timer); + } else { + qemu_bh_cancel(netq->tx_bh); + } } - } else { - if (n->tx_timer) { - qemu_del_timer(n->tx_timer); - } else { - qemu_bh_cancel(n->tx_bh); + } +} + +static bool virtio_net_is_link_up(VirtIONet *n) +{ + int i; + for (i = 0; i < n->queues; i++) { + if (n->nic->ncs[i]->link_down) { + return false; } } + return true; } static void virtio_net_set_link_status(VLANClientState *nc) { - VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque; + VirtIONet *n = ((NICState *)(nc->opaque))->opaque; uint16_t old_status = n->status; - if (nc->link_down) + if (virtio_net_is_link_up(n)) { n->status &= ~VIRTIO_NET_S_LINK_UP; - else + } else { n->status |= VIRTIO_NET_S_LINK_UP; + } - if (n->status != old_status) + if (n->status != old_status) { virtio_notify_config(&n->vdev); + } virtio_net_set_status(&n->vdev, n->vdev.status); } @@ -202,13 +287,15 @@ static void virtio_net_reset(VirtIODevice *vdev) static int peer_has_vnet_hdr(VirtIONet *n) { - if (!n->nic->nc.peer) + if (!n->nic->ncs[0]->peer) { return 0; + } - if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) + if (n->nic->ncs[0]->peer->info->type != NET_CLIENT_TYPE_TAP) { return 0; + } - n->has_vnet_hdr = tap_has_vnet_hdr(n->nic->nc.peer); + n->has_vnet_hdr = tap_has_vnet_hdr(n->nic->ncs[0]->peer); return n->has_vnet_hdr; } @@ -218,7 +305,7 @@ static int peer_has_ufo(VirtIONet *n) if (!peer_has_vnet_hdr(n)) return 0; - n->has_ufo = tap_has_ufo(n->nic->nc.peer); + n->has_ufo = tap_has_ufo(n->nic->ncs[0]->peer); return n->has_ufo; } @@ -228,9 +315,13 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features) VirtIONet *n = to_virtio_net(vdev); features |= (1 << VIRTIO_NET_F_MAC); + features |= (1 << VIRTIO_NET_F_MULTIQUEUE); if (peer_has_vnet_hdr(n)) { - tap_using_vnet_hdr(n->nic->nc.peer, 1); + int i; + for (i = 0; i < n->queues; i++) { + tap_using_vnet_hdr(n->nic->ncs[i]->peer, 1); + } } else { features &= ~(0x1 << VIRTIO_NET_F_CSUM); features &= ~(0x1 << VIRTIO_NET_F_HOST_TSO4); @@ -248,14 +339,15 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features) features &= ~(0x1 << VIRTIO_NET_F_HOST_UFO); } - if (!n->nic->nc.peer || - n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) { + if (!n->nic->ncs[0]->peer || + n->nic->ncs[0]->peer->info->type != NET_CLIENT_TYPE_TAP) { return features; } - if (!tap_get_vhost_net(n->nic->nc.peer)) { + if (!tap_get_vhost_net(n->nic->ncs[0]->peer)) { return features; } - return vhost_net_get_features(tap_get_vhost_net(n->nic->nc.peer), features); + return vhost_net_get_features(tap_get_vhost_net(n->nic->ncs[0]->peer), + features); } static uint32_t virtio_net_bad_features(VirtIODevice *vdev) @@ -276,25 +368,38 @@ static uint32_t virtio_net_bad_features(VirtIODevice *vdev) static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features) { VirtIONet *n = to_virtio_net(vdev); + int i, r; n->mergeable_rx_bufs = !!(features & (1 << VIRTIO_NET_F_MRG_RXBUF)); + n->multiqueue = !!(features & (1 << VIRTIO_NET_F_MULTIQUEUE)); - if (n->has_vnet_hdr) { - tap_set_offload(n->nic->nc.peer, - (features >> VIRTIO_NET_F_GUEST_CSUM) & 1, - (features >> VIRTIO_NET_F_GUEST_TSO4) & 1, - (features >> VIRTIO_NET_F_GUEST_TSO6) & 1, - (features >> VIRTIO_NET_F_GUEST_ECN) & 1, - (features >> VIRTIO_NET_F_GUEST_UFO) & 1); - } - if (!n->nic->nc.peer || - n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) { - return; - } - if (!tap_get_vhost_net(n->nic->nc.peer)) { - return; + for (i = 0; i < n->queues; i++) { + if (!n->multiqueue && i != 0) { + r = peer_detach(n, i); + assert(r == 0); + } else { + r = peer_attach(n, i); + assert(r == 0); + + if (n->has_vnet_hdr) { + tap_set_offload(n->nic->ncs[i]->peer, + (features >> VIRTIO_NET_F_GUEST_CSUM) & 1, + (features >> VIRTIO_NET_F_GUEST_TSO4) & 1, + (features >> VIRTIO_NET_F_GUEST_TSO6) & 1, + (features >> VIRTIO_NET_F_GUEST_ECN) & 1, + (features >> VIRTIO_NET_F_GUEST_UFO) & 1); + } + if (!n->nic->ncs[i]->peer || + n->nic->ncs[i]->peer->info->type != NET_CLIENT_TYPE_TAP) { + continue; + } + if (!tap_get_vhost_net(n->nic->ncs[i]->peer)) { + continue; + } + vhost_net_ack_features(tap_get_vhost_net(n->nic->ncs[i]->peer), + features); + } } - vhost_net_ack_features(tap_get_vhost_net(n->nic->nc.peer), features); } static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd, @@ -446,7 +551,7 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq) { VirtIONet *n = to_virtio_net(vdev); - qemu_flush_queued_packets(&n->nic->nc); + qemu_flush_queued_packets(n->nic->ncs[vq_get_pair_index(n, vq)]); /* We now have RX buffers, signal to the IO thread to break out of the * select to re-poll the tap file descriptor */ @@ -455,36 +560,37 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq) static int virtio_net_can_receive(VLANClientState *nc) { - VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque; + int queue_index = nc->queue_index; + VirtIONet *n = ((NICState *)nc->opaque)->opaque; + if (!n->vdev.vm_running) { return 0; } - if (!virtio_queue_ready(n->rx_vq) || + if (!virtio_queue_ready(n->vqs[queue_index].rx_vq) || !(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) return 0; return 1; } -static int virtio_net_has_buffers(VirtIONet *n, int bufsize) +static int virtio_net_has_buffers(VirtIONet *n, int bufsize, VirtQueue *vq) { - if (virtio_queue_empty(n->rx_vq) || - (n->mergeable_rx_bufs && - !virtqueue_avail_bytes(n->rx_vq, bufsize, 0))) { - virtio_queue_set_notification(n->rx_vq, 1); + if (virtio_queue_empty(vq) || (n->mergeable_rx_bufs && + !virtqueue_avail_bytes(vq, bufsize, 0))) { + virtio_queue_set_notification(vq, 1); /* To avoid a race condition where the guest has made some buffers * available after the above check but before notification was * enabled, check for available buffers again. */ - if (virtio_queue_empty(n->rx_vq) || - (n->mergeable_rx_bufs && - !virtqueue_avail_bytes(n->rx_vq, bufsize, 0))) + if (virtio_queue_empty(vq) || (n->mergeable_rx_bufs && + !virtqueue_avail_bytes(vq, bufsize, 0))) { return 0; + } } - virtio_queue_set_notification(n->rx_vq, 0); + virtio_queue_set_notification(vq, 0); return 1; } @@ -595,12 +701,15 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size) static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_t size) { - VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque; + int queue_index = nc->queue_index; + VirtIONet *n = ((NICState *)(nc->opaque))->opaque; + VirtQueue *vq = n->vqs[queue_index].rx_vq; struct virtio_net_hdr_mrg_rxbuf *mhdr = NULL; size_t guest_hdr_len, offset, i, host_hdr_len; - if (!virtio_net_can_receive(&n->nic->nc)) + if (!virtio_net_can_receive(n->nic->ncs[queue_index])) { return -1; + } /* hdr_len refers to the header we supply to the guest */ guest_hdr_len = n->mergeable_rx_bufs ? @@ -608,7 +717,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_ host_hdr_len = n->has_vnet_hdr ? sizeof(struct virtio_net_hdr) : 0; - if (!virtio_net_has_buffers(n, size + guest_hdr_len - host_hdr_len)) + if (!virtio_net_has_buffers(n, size + guest_hdr_len - host_hdr_len, vq)) return 0; if (!receive_filter(n, buf, size)) @@ -623,7 +732,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_ total = 0; - if (virtqueue_pop(n->rx_vq, &elem) == 0) { + if (virtqueue_pop(vq, &elem) == 0) { if (i == 0) return -1; error_report("virtio-net unexpected empty queue: " @@ -675,47 +784,50 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_ } /* signal other side */ - virtqueue_fill(n->rx_vq, &elem, total, i++); + virtqueue_fill(vq, &elem, total, i++); } if (mhdr) { stw_p(&mhdr->num_buffers, i); } - virtqueue_flush(n->rx_vq, i); - virtio_notify(&n->vdev, n->rx_vq); + virtqueue_flush(vq, i); + virtio_notify(&n->vdev, vq); return size; } -static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq); +static int32_t virtio_net_flush_tx(VirtIONet *n, VirtIONetQueue *tvq); static void virtio_net_tx_complete(VLANClientState *nc, ssize_t len) { - VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque; + VirtIONet *n = ((NICState *)nc->opaque)->opaque; + VirtIONetQueue *netq = &n->vqs[nc->queue_index]; - virtqueue_push(n->tx_vq, &n->async_tx.elem, n->async_tx.len); - virtio_notify(&n->vdev, n->tx_vq); + virtqueue_push(netq->tx_vq, &netq->async_tx.elem, netq->async_tx.len); + virtio_notify(&n->vdev, netq->tx_vq); - n->async_tx.elem.out_num = n->async_tx.len = 0; + netq->async_tx.elem.out_num = netq->async_tx.len; - virtio_queue_set_notification(n->tx_vq, 1); - virtio_net_flush_tx(n, n->tx_vq); + virtio_queue_set_notification(netq->tx_vq, 1); + virtio_net_flush_tx(n, netq); } /* TX */ -static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq) +static int32_t virtio_net_flush_tx(VirtIONet *n, VirtIONetQueue *netq) { VirtQueueElement elem; int32_t num_packets = 0; + VirtQueue *vq = netq->tx_vq; + if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) { return num_packets; } assert(n->vdev.vm_running); - if (n->async_tx.elem.out_num) { - virtio_queue_set_notification(n->tx_vq, 0); + if (netq->async_tx.elem.out_num) { + virtio_queue_set_notification(vq, 0); return num_packets; } @@ -747,12 +859,12 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq) len += hdr_len; } - ret = qemu_sendv_packet_async(&n->nic->nc, out_sg, out_num, - virtio_net_tx_complete); + ret = qemu_sendv_packet_async(n->nic->ncs[vq_get_pair_index(n, vq)], + out_sg, out_num, virtio_net_tx_complete); if (ret == 0) { - virtio_queue_set_notification(n->tx_vq, 0); - n->async_tx.elem = elem; - n->async_tx.len = len; + virtio_queue_set_notification(vq, 0); + netq->async_tx.elem = elem; + netq->async_tx.len = len; return -EBUSY; } @@ -771,22 +883,23 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq) static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq) { VirtIONet *n = to_virtio_net(vdev); + VirtIONetQueue *netq = &n->vqs[vq_get_pair_index(n, vq)]; /* This happens when device was stopped but VCPU wasn't. */ if (!n->vdev.vm_running) { - n->tx_waiting = 1; + netq->tx_waiting = 1; return; } - if (n->tx_waiting) { + if (netq->tx_waiting) { virtio_queue_set_notification(vq, 1); - qemu_del_timer(n->tx_timer); - n->tx_waiting = 0; - virtio_net_flush_tx(n, vq); + qemu_del_timer(netq->tx_timer); + netq->tx_waiting = 0; + virtio_net_flush_tx(n, netq); } else { - qemu_mod_timer(n->tx_timer, - qemu_get_clock_ns(vm_clock) + n->tx_timeout); - n->tx_waiting = 1; + qemu_mod_timer(netq->tx_timer, + qemu_get_clock_ns(vm_clock) + netq->tx_timeout); + netq->tx_waiting = 1; virtio_queue_set_notification(vq, 0); } } @@ -794,48 +907,53 @@ static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq) static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq) { VirtIONet *n = to_virtio_net(vdev); + VirtIONetQueue *netq = &n->vqs[vq_get_pair_index(n, vq)]; - if (unlikely(n->tx_waiting)) { + if (unlikely(netq->tx_waiting)) { return; } - n->tx_waiting = 1; + netq->tx_waiting = 1; /* This happens when device was stopped but VCPU wasn't. */ if (!n->vdev.vm_running) { return; } virtio_queue_set_notification(vq, 0); - qemu_bh_schedule(n->tx_bh); + qemu_bh_schedule(netq->tx_bh); } static void virtio_net_tx_timer(void *opaque) { - VirtIONet *n = opaque; + VirtIONetQueue *netq = opaque; + VirtIONet *n = netq->n; + assert(n->vdev.vm_running); - n->tx_waiting = 0; + netq->tx_waiting = 0; /* Just in case the driver is not ready on more */ if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) return; - virtio_queue_set_notification(n->tx_vq, 1); - virtio_net_flush_tx(n, n->tx_vq); + virtio_queue_set_notification(netq->tx_vq, 1); + virtio_net_flush_tx(n, netq); } static void virtio_net_tx_bh(void *opaque) { - VirtIONet *n = opaque; + VirtIONetQueue *netq = opaque; + VirtQueue *vq = netq->tx_vq; + VirtIONet *n = netq->n; int32_t ret; assert(n->vdev.vm_running); - n->tx_waiting = 0; + netq->tx_waiting = 0; /* Just in case the driver is not ready on more */ if (unlikely(!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK))) return; - ret = virtio_net_flush_tx(n, n->tx_vq); + ret = virtio_net_flush_tx(n, netq); if (ret == -EBUSY) { return; /* Notification re-enable handled by tx_complete */ } @@ -843,33 +961,39 @@ static void virtio_net_tx_bh(void *opaque) /* If we flush a full burst of packets, assume there are * more coming and immediately reschedule */ if (ret >= n->tx_burst) { - qemu_bh_schedule(n->tx_bh); - n->tx_waiting = 1; + qemu_bh_schedule(netq->tx_bh); + netq->tx_waiting = 1; return; } /* If less than a full burst, re-enable notification and flush * anything that may have come in while we weren't looking. If * we find something, assume the guest is still active and reschedule */ - virtio_queue_set_notification(n->tx_vq, 1); - if (virtio_net_flush_tx(n, n->tx_vq) > 0) { - virtio_queue_set_notification(n->tx_vq, 0); - qemu_bh_schedule(n->tx_bh); - n->tx_waiting = 1; + virtio_queue_set_notification(vq, 1); + if (virtio_net_flush_tx(n, netq) > 0) { + virtio_queue_set_notification(vq, 0); + qemu_bh_schedule(netq->tx_bh); + netq->tx_waiting = 1; } } static void virtio_net_save(QEMUFile *f, void *opaque) { VirtIONet *n = opaque; + int i; /* At this point, backend must be stopped, otherwise * it might keep writing to memory. */ - assert(!n->vhost_started); + for (i = 0; i < n->queues; i++) { + assert(!n->vqs[i].vhost_started); + } virtio_save(&n->vdev, f); qemu_put_buffer(f, n->mac, ETH_ALEN); - qemu_put_be32(f, n->tx_waiting); + qemu_put_be32(f, n->queues); + for (i = 0; i < n->queues; i++) { + qemu_put_be32(f, n->vqs[i].tx_waiting); + } qemu_put_be32(f, n->mergeable_rx_bufs); qemu_put_be16(f, n->status); qemu_put_byte(f, n->promisc); @@ -902,7 +1026,10 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) } qemu_get_buffer(f, n->mac, ETH_ALEN); - n->tx_waiting = qemu_get_be32(f); + n->queues = qemu_get_be32(f); + for (i = 0; i < n->queues; i++) { + n->vqs[i].tx_waiting = qemu_get_be32(f); + } n->mergeable_rx_bufs = qemu_get_be32(f); if (version_id >= 3) @@ -930,7 +1057,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) n->mac_table.in_use = 0; } } - + if (version_id >= 6) qemu_get_buffer(f, (uint8_t *)n->vlans, MAX_VLAN >> 3); @@ -941,13 +1068,16 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) } if (n->has_vnet_hdr) { - tap_using_vnet_hdr(n->nic->nc.peer, 1); - tap_set_offload(n->nic->nc.peer, - (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_CSUM) & 1, - (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_TSO4) & 1, - (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_TSO6) & 1, - (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_ECN) & 1, - (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_UFO) & 1); + for(i = 0; i < n->queues; i++) { + tap_using_vnet_hdr(n->nic->ncs[i]->peer, 1); + tap_set_offload(n->nic->ncs[i]->peer, + (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_CSUM) & 1, + (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_TSO4) & 1, + (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_TSO6) & 1, + (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_ECN) & 1, + (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_UFO) & + 1); + } } } @@ -982,7 +1112,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) static void virtio_net_cleanup(VLANClientState *nc) { - VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque; + VirtIONet *n = ((NICState *)nc->opaque)->opaque; n->nic = NULL; } @@ -1000,6 +1130,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf, virtio_net_conf *net) { VirtIONet *n; + int i; n = (VirtIONet *)virtio_common_init("virtio-net", VIRTIO_ID_NET, sizeof(struct virtio_net_config), @@ -1012,7 +1143,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf, n->vdev.bad_features = virtio_net_bad_features; n->vdev.reset = virtio_net_reset; n->vdev.set_status = virtio_net_set_status; - n->rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx); if (net->tx && strcmp(net->tx, "timer") && strcmp(net->tx, "bh")) { error_report("virtio-net: " @@ -1021,15 +1151,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf, error_report("Defaulting to \"bh\""); } - if (net->tx && !strcmp(net->tx, "timer")) { - n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_timer); - n->tx_timer = qemu_new_timer_ns(vm_clock, virtio_net_tx_timer, n); - n->tx_timeout = net->txtimer; - } else { - n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_bh); - n->tx_bh = qemu_bh_new(virtio_net_tx_bh, n); - } - n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl); qemu_macaddr_default_if_unset(&conf->macaddr); memcpy(&n->mac[0], &conf->macaddr, sizeof(n->mac)); n->status = VIRTIO_NET_S_LINK_UP; @@ -1038,7 +1159,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf, qemu_format_nic_info_str(&n->nic->nc, conf->macaddr.a); - n->tx_waiting = 0; n->tx_burst = net->txburst; n->mergeable_rx_bufs = 0; n->promisc = 1; /* for compatibility */ @@ -1046,6 +1166,32 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf, n->mac_table.macs = g_malloc0(MAC_TABLE_ENTRIES * ETH_ALEN); n->vlans = g_malloc0(MAX_VLAN >> 3); + n->queues = conf->queues; + + /* Allocate per rx/tx vq's */ + for (i = 0; i < n->queues; i++) { + n->vqs[i].rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx); + if (net->tx && !strcmp(net->tx, "timer")) { + n->vqs[i].tx_vq = virtio_add_queue(&n->vdev, 256, + virtio_net_handle_tx_timer); + n->vqs[i].tx_timer = qemu_new_timer_ns(vm_clock, + virtio_net_tx_timer, + &n->vqs[i]); + n->vqs[i].tx_timeout = net->txtimer; + } else { + n->vqs[i].tx_vq = virtio_add_queue(&n->vdev, 256, + virtio_net_handle_tx_bh); + n->vqs[i].tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vqs[i]); + } + + n->vqs[i].tx_waiting = 0; + n->vqs[i].n = n; + + if (i == 0) { + /* keep compatiable with spec and old guest */ + n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl); + } + } n->qdev = dev; register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION, @@ -1059,24 +1205,33 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf, void virtio_net_exit(VirtIODevice *vdev) { VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev); + int i; /* This will stop vhost backend if appropriate. */ virtio_net_set_status(vdev, 0); - qemu_purge_queued_packets(&n->nic->nc); + for (i = 0; i < n->queues; i++) { + qemu_purge_queued_packets(n->nic->ncs[i]); + } unregister_savevm(n->qdev, "virtio-net", n); g_free(n->mac_table.macs); g_free(n->vlans); - if (n->tx_timer) { - qemu_del_timer(n->tx_timer); - qemu_free_timer(n->tx_timer); - } else { - qemu_bh_delete(n->tx_bh); + for (i = 0; i < n->queues; i++) { + VirtIONetQueue *netq = &n->vqs[i]; + if (netq->tx_timer) { + qemu_del_timer(netq->tx_timer); + qemu_free_timer(netq->tx_timer); + } else { + qemu_bh_delete(netq->tx_bh); + } } - qemu_del_vlan_client(&n->nic->nc); virtio_cleanup(&n->vdev); + + for (i = 0; i < n->queues; i++) { + qemu_del_vlan_client(n->nic->ncs[i]); + } } diff --git a/hw/virtio-net.h b/hw/virtio-net.h index 36aa463..b35ba5d 100644 --- a/hw/virtio-net.h +++ b/hw/virtio-net.h @@ -44,6 +44,7 @@ #define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */ #define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */ #define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */ +#define VIRTIO_NET_F_MULTIQUEUE 22 #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */ @@ -72,6 +73,8 @@ struct virtio_net_config uint8_t mac[ETH_ALEN]; /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */ uint16_t status; + + uint16_t queues; } QEMU_PACKED; /* This is the first element of the scatter-gather list. If you don't
Possibly Parallel Threads
- [RFC V2 PATCH 0/4] Multiqueue support for tap and virtio-net/vhost
- [RFC V3 0/5] Multiqueue support for tap and virtio-net/vhost
- [RFC V3 0/5] Multiqueue support for tap and virtio-net/vhost
- [PATCHv3 3/4] qemu-kvm: vhost-net implementation
- [PATCHv3 3/4] qemu-kvm: vhost-net implementation