Shriram Rajagopalan
2013-Jul-25 07:09 UTC
[PATCH 0 of 4 RFC] xl - Remus network buffering support
This patch series adds support for network buffering in the Remus codebase in xl/libxl. In previous emails, I had proposed for a script invocation to setup network buffering. After digging through libnl API, I managed to find most of what I needed (except for one command, which right now is executed through system() call). The patch series in its current state would allow xl to dynamically setup and teardown buffering devices, qdiscs, etc associated with the guest, instead of resorting to clunky one time configurations. The series is organized as follows: 1/4 - Network buffering setup functions - abstractions built on top of libnl3 API to implement functionality such as add/delete qdisc, interface up/down, search for free ifb devices, etc. 2/4 - xl cmdline utility uses these abstractions to setup network buffers and provides libxl with a list of ifb devices where packets would be buffered 3/4 - Libxl interaction with network buffer module in the kernel via libnl3. 4/4 - adds libnl3 (>= v3.2.17) dependency to autoconf scripts and linker flags in tools/libxl/Makefile. Functionality tested on debian squeeze (kernel 3.4) + openvswitch + 64-bit PV domU (kernel 3.4). Couple of things to note: 1. I am not well versed with the autoconf stuff. I fixed the configure.ac as best as I could. However, the libxl/Makefile patching is still clunky [hard coded -I/usr/local/include/libnl3..] 2. I have kept most of the setup related C code in a separate file, that I am also planning to submit to libvirt mailing list. Other toolstacks can choose to setup network buffers in their own way (scripts or a different implementation). As long as libxl gets as input a list of IFB devices to act on, things are good. The code to control the network buffer, when Remus is operational, is incorporated into libxl in the remus callbacks. This is something that only libxl can do and [should do]. thanks shriram
Shriram Rajagopalan
2013-Jul-25 07:09 UTC
[PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions
The functions contained in xl_netbuf.c are built on top of libnl3 API. They setup the necessary infrastructure required to for network output buffering in Remus. In xend, the Remus python code performed this setup by invoking shell commands. This code is built purely on top of APIs supplied by libnl3.0. There are two public helper functions: 1) remus_install_netbuf_on_dev to install a network buffer on a vif. Its basically equivalent to: a)find free ifb (say ifb0) b)ip link set ifb0 up c)tc qdisc add dev vif1.0 ingress d)tc filter add dev vif1.0 parent ffff: proto ip pref 10 u32 match u32 0 0 \ action mirred egress redirect dev ifb0 e)tc qdisc add dev ifb0 root plug f)get handle to plug qdisc and control it programmatically after suspending a VM/receiving checkpoint ack from remote 2) remus_uninstall_netbufs to remove all network buffers and ingress qdiscs from the supplied interface list. I have managed to code up 5 of the 6 steps mentioned above, purely in C, by building on top of libnl3. There is currently no support in libnl3 to implement step (d) (redirection) and it is pretty complicated to implement. So I am currently resorting to system() lib call to get the job done. N.B. This implementation is just xl (cmdline utility) specific. Other toolstacks may choose to reuse this code or do this setup in their own way (scripts/shell commands, etc.). Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 711fe8ed54e4 -r 3ae38cbe535c tools/libxl/xl.h --- a/tools/libxl/xl.h Wed Jul 24 22:48:09 2013 -0700 +++ b/tools/libxl/xl.h Wed Jul 24 22:55:00 2013 -0700 @@ -161,6 +161,9 @@ enum output_format { extern enum output_format default_output_format; extern void printf_info_sexp(int domid, libxl_domain_config *d_config); +extern char *remus_install_netbuf_on_dev(char *vifname); +extern void remus_uninstall_netbufs(char **vifs, int num_vifs, + char **ifbs, int num_ifbs); #define XL_GLOBAL_CONFIG XEN_CONFIG_DIR "/xl.conf" #define XL_LOCK_FILE XEN_LOCK_DIR "/xl" diff -r 711fe8ed54e4 -r 3ae38cbe535c tools/libxl/xl_netbuf.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/libxl/xl_netbuf.c Wed Jul 24 22:55:00 2013 -0700 @@ -0,0 +1,441 @@ +/* + * xl command line utility - helper functions for setting up + * remus network buffering. + */ + +#include "libxl_osdeps.h" + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <ctype.h> +#include <inttypes.h> +#include <limits.h> +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/select.h> +#include <regex.h> +#include <xentoollog.h> + +#include <netlink/cache.h> +#include <netlink/socket.h> +#include <netlink/attr.h> +#include <netlink/route/link.h> +#include <netlink/route/route.h> +#include <netlink/route/qdisc.h> +#include <netlink/route/qdisc/plug.h> + +#include "libxl.h" +#include "libxl_utils.h" +#include "xl.h" + +#define RET_BAD_ARG(x, y) \ + do { \ + fprintf(stderr, "%s:%d bad argument "#x"\n",__FUNCTION__, __LINE__); \ + return y; \ + } while (0) + +static struct rtnl_link *get_free_ifbdev(struct nl_cache *link_cache, + struct nl_cache *qdisc_cache) +{ + struct nl_object *o = NULL; + struct rtnl_link *temp = NULL, *ifb = NULL; + struct rtnl_qdisc *qdisc = NULL; + char *ifbname = NULL; + regex_t ifbregex; + int ifindex; + + if (!link_cache) RET_BAD_ARG(link_cache, NULL); + if (!qdisc_cache) RET_BAD_ARG(qdisc_cache, NULL); + + if (regcomp(&ifbregex, "^ifb[0-9]+$", REG_EXTENDED|REG_NOSUB)) { + fprintf(stderr, "Failed to alloc regex while " + "searching for free ifbs\n"); + return NULL; + } + + for (o = nl_cache_get_first(link_cache); o; o = nl_cache_get_next(o)) { + + temp = (struct rtnl_link *)o; + ifbname = rtnl_link_get_name(temp); + + if (!ifbname || regexec(&ifbregex, ifbname, 0, NULL, 0)) + continue; + + ifindex = rtnl_link_get_ifindex(temp); + if (!ifindex) continue; + + /* found an IFB. check if it has a qdisc on root */ + qdisc = rtnl_qdisc_get_by_parent(qdisc_cache, ifindex, TC_H_ROOT); + if (qdisc) { + char *kind = rtnl_tc_get_kind(TC_CAST(qdisc)); + rtnl_qdisc_put(qdisc); + + if (strcmp(kind, "pfifo_fast") && strcmp(kind, "mq") + && strcmp(kind, "ingress")) + continue; + } + + ifb = rtnl_link_get(link_cache, ifindex); + printf("Acquired IFB dev %s for network buffering\n", ifbname); + break; + } + + if (!ifb) + fprintf(stderr, "No free ifb device available\n"); + + regfree(&ifbregex); + return ifb; +} + +static int do_net_ifupdown(struct nl_sock *sock, char *ifname, int ifup) +{ + struct nl_msg *msg; + struct nlmsghdr *nlh; + struct ifinfomsg ifm; + int ret; + + if (!sock) RET_BAD_ARG(sock, -1); + if (!ifname || !strlen(ifname)) RET_BAD_ARG(ifname, -1); + + msg = nlmsg_alloc(); //allocates 4K msg by default + nlh = nlmsg_hdr(msg); + nlh->nlmsg_type = RTM_NEWLINK; + nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; + + ifm.ifi_family = AF_UNSPEC; + ifm.ifi_change = IFF_UP; + ifm.ifi_flags = ifup ? IFF_UP : ~IFF_UP; + + if (nlmsg_append(msg, &ifm, sizeof(ifm), NLMSG_ALIGNTO) < 0) + goto nla_put_failure; + NLA_PUT_STRING(msg, IFLA_IFNAME, ifname); + + ret = nl_send_sync(sock, msg); //this call also frees msg + + if (!ret) { + printf("IFB dev %s link %s\n", ifname, ifup ? "up" : "down"); + return 0; + } else { + fprintf(stderr, "Unable to bring %s interface %s : %s\n", + ifup ? "up" : "down", + ifname, nl_geterror(ret)); + return -1; + } + + nla_put_failure: + fprintf(stderr, "failed to create nl_msg for bringing %s interface %s\n", + ifup ? "up" : "down", ifname); + nlmsg_free(msg); + return -1; +} + +static int tc_redirect_traffic(char *fromdev, char *todev) +{ + /* command: + * tc filter add dev vif1.0 parent ffff: proto ip pref 10 + * u32 match u32 0 0 action mirred egress redirect dev ifb0 + */ + char *tc_cmd = NULL; + int ret = 0; + + if (!fromdev) RET_BAD_ARG(fromdev, -1); + if (!todev) RET_BAD_ARG(todev, -1); + + if (asprintf(&tc_cmd, + "exec tc filter add dev %s parent ffff: " + "proto ip pref 10 u32 match u32 0 0 " + "act mirred egress redirect dev %s", + fromdev, todev) < 0) { + fprintf(stderr, "Failed to compose tc redirection command (%s->%s)\n", + fromdev, todev); + return -1; + } + + ret = system(tc_cmd); + if (ret) ret = errno; + free(tc_cmd); + return -ret; +} + +static int do_qdisc_add_or_del(struct nl_sock *sock, + struct rtnl_link *interface, + char *kind, uint32_t parent, int add) +{ + struct rtnl_qdisc *qdisc = NULL; + int ret = -1; + + if (!sock) RET_BAD_ARG(sock, -1); + if (!interface) RET_BAD_ARG(interface, -1); + if (!kind) RET_BAD_ARG(kind, -1); + + qdisc = rtnl_qdisc_alloc(); + if (!qdisc) { + fprintf(stderr, "Failed to allocate %s qdisc\n", kind); + return -1; + } + + rtnl_tc_set_link(TC_CAST(qdisc), interface); + rtnl_tc_set_parent(TC_CAST(qdisc), parent); + rtnl_tc_set_kind(TC_CAST(qdisc), kind); + + /* Submit request to kernel and wait for response */ + if (add) + ret = rtnl_qdisc_add(sock, qdisc, NLM_F_CREATE|NLM_F_EXCL); + else + ret = rtnl_qdisc_delete(sock, qdisc); + rtnl_qdisc_put(qdisc); + + return ret; +} + +static int netdev_ifup(struct nl_sock *sock, char *ifname) +{ + return do_net_ifupdown(sock, ifname, 1); +} + +static int netdev_ifdown(struct nl_sock *sock, char *ifname) +{ + return do_net_ifupdown(sock, ifname, 0); +} + +/* Need to reimplement code from do_qdisc_add_or_del because for plug + * qdisc, we need to set an initial buffer size of 4MB atleast. The default + * buffer size of 100 packets (set by sch_plug kernel module) is too + * small to hold all packets from a VM during an epoch, resulting in + * packet drops. + */ +#define PLUG_QDISC_LIMIT (4 * 1024 * 1024) +static int add_plug_qdisc(struct nl_sock *sock, struct rtnl_link *ifb) +{ + struct rtnl_qdisc *qdisc = NULL; + int ret = -1; + char *kind = "plug"; + + if (!sock) RET_BAD_ARG(sock, -1); + if (!ifb) RET_BAD_ARG(ifb, -1); + + qdisc = rtnl_qdisc_alloc(); + if (!qdisc) { + fprintf(stderr, "Failed to allocate %s qdisc\n", kind); + return -1; + } + + rtnl_tc_set_link(TC_CAST(qdisc), ifb); + rtnl_tc_set_parent(TC_CAST(qdisc), TC_H_ROOT); + rtnl_tc_set_kind(TC_CAST(qdisc), kind); + + if ((ret = rtnl_qdisc_plug_set_limit(qdisc, PLUG_QDISC_LIMIT)) < 0) { + fprintf(stderr, "Unable to change plug buffer size: %s\n", + nl_geterror(ret)); + rtnl_qdisc_put(qdisc); + return ret; + } + + /* Submit request to kernel and wait for response */ + ret = rtnl_qdisc_add(sock, qdisc, NLM_F_CREATE|NLM_F_EXCL); + rtnl_qdisc_put(qdisc); + + return ret; +} + +static int del_plug_qdisc(struct nl_sock *sock, struct rtnl_link *ifb) +{ + return do_qdisc_add_or_del(sock, ifb, "plug", TC_H_ROOT, 0); +} + +static int add_ingress_qdisc(struct nl_sock *sock, struct rtnl_link *vif) +{ + int ret; + ret = do_qdisc_add_or_del(sock, vif, "ingress", TC_H_INGRESS, 1); + if (ret == -NLE_EXIST) + ret = 0; /* ignore error, if ingress qdisc is already installed + on vif. */ + return ret; +} + +static int del_ingress_qdisc(struct nl_sock *sock, struct rtnl_link *vif) +{ + return do_qdisc_add_or_del(sock, vif, "ingress", TC_H_INGRESS, 0); +} + +/* + * When Remus (xend version) installs a network buffer on a guest vif, + * it does the following (using a mix of shell commands and netlink messages) + * + * ip link set dev ifb0 up + * tc qdisc add dev vif1.0 ingress + * tc filter add dev vif1.0 parent ffff: proto ip \ + * prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0 + * send netlink message to kernel to add plug_qdisc to ifb0 and get + * handle + * use handle to control operations of plug qdisc. + * + * So order of operations when installing a network buffer on vif1.0 + * 1. find a free ifb and bring up the device + * 2. add ingress qdisc to vif1.0 (to capture outgoing packets from guest) + * 3. redirect traffic from vif1.0 to ifb device + * 4. install plug_qdisc on ifb device, with which we can buffer/release + * guest''s network output from vif1.0 + * + * Return value: ifb dev name on success. NULL on failure. + */ +char *remus_install_netbuf_on_dev(char *vifname) +{ + struct nl_cache *link_cache = NULL, *qdisc_cache = NULL; + struct rtnl_link *vif = NULL; + struct rtnl_link *ifb = NULL; + struct nl_sock *sock = NULL; + char *ifbname = NULL; + int ret = -1; + + if (!vifname) return NULL; + + sock = nl_socket_alloc(); + if (!sock) { + fprintf(stderr, "failed to allocate libnl socket\n"); + return NULL; + } + nl_connect(sock, NETLINK_ROUTE); + + /* get the link link_cache (equivalent to ifconfig -a -s) */ + if ((ret = rtnl_link_alloc_cache(sock, AF_UNSPEC, &link_cache)) < 0) { + fprintf(stderr, "failed to allocate link cache : %s\n", nl_geterror(ret)); + goto end; + } + + /* similarly, get the qdisc cache, i.e. list of all qdiscs + * installed currently on various network interfaces. + */ + if ((ret = rtnl_qdisc_alloc_cache(sock, &qdisc_cache)) < 0) { + fprintf(stderr, "failed to allocate qdisc cache : %s\n", nl_geterror(ret)); + goto end; + } + + vif = rtnl_link_get_by_name(link_cache, vifname); + if (!vif) { + /* link does not exist */ + fprintf(stderr, "interface %s does not exist\n", vifname); + ret = -1; + goto end; + } + + ret = -1; + /* 1. find a free ifb and bring up the device */ + ifb = get_free_ifbdev(link_cache, qdisc_cache); + if (!ifb) goto end; + + ret = netdev_ifup(sock, rtnl_link_get_name(ifb)); + if (ret) goto end; + + /* 2. add ingress qdisc to vif1.0 (to catch outgoing packets from guest) */ + ret = add_ingress_qdisc(sock, vif); + if (ret) goto end; + + /* 3. redirect traffic from vif1.0 to ifb device */ + ret = tc_redirect_traffic(vifname, rtnl_link_get_name(ifb)); + if (ret) goto end; + + /* 4. install plug_qdisc on ifb device, with which we can buffer/release + * guest''s network output from vif1.0 + */ + ret = add_plug_qdisc(sock, ifb); + if (!ret) { + ifbname = strdup(rtnl_link_get_name(ifb)); + if (!ifbname) { + perror("Failed to allocate mem for ifbname!"); + exit(-1); + } + fprintf(stderr, "Added plug qdisc to %s\n", ifbname); + } + else + fprintf(stderr, "Unable to add plug qdisc: %s\n", nl_geterror(ret)); + +end: + if (ret) { + if (vif) { + del_ingress_qdisc(sock, vif); + if (ifb) { + /* + * To release ifb dev, first bring it down and then remove + * the qdisc. Otherwise, the kernel will replace plug with + * pfifo_fast and it will remain attached to the ifb even + * if we bring down the device. + */ + netdev_ifdown(sock, rtnl_link_get_name(ifb)); + del_plug_qdisc(sock, ifb); + } + } + } + + if (vif) rtnl_link_put(vif); + if (ifb) rtnl_link_put(ifb); + if (link_cache) nl_cache_free(link_cache); + if (qdisc_cache) nl_cache_free(qdisc_cache); + if (sock) nl_close(sock); + return ifbname; +} + +/* + * Cleanup code. Remove ingress qdisc from the supplied list of vif + * interfaces. Bring down the ifb devices and remove the plug qdiscs + * from each of them, thereby releasing the ifbs back into the ifb pool. + */ +void remus_uninstall_netbufs(char **vifs, int num_vifs, + char **ifbs, int num_ifbs) +{ + struct rtnl_link *netdev = NULL; + struct nl_sock *sock = NULL; + int i; + + if (!vifs && !ifbs) return; + + sock = nl_socket_alloc(); + if (!sock) { + fprintf(stderr, "failed to allocate libnl socket " + "while uninstalling netbufs\n"); + return; + } + nl_connect(sock, NETLINK_ROUTE); + + /* We cant use link caches when tearing down network buffering because + * someone may have destroyed the domU, thus tearing down the + * interfaces itself. As a safer alternative, we will resort to + * un-cached direct queries to the kernel for each vif interface. + */ + netdev = rtnl_link_alloc(); + if (!netdev) { + fprintf(stderr, "cannot allocate netdev\n"); + nl_close(sock); + return; + } + + for (i = 0; i < num_ifbs; i++) { + if (ifbs[i]) { + if (rtnl_link_get_kernel(sock, 0, ifbs[i], &netdev) < 0) + continue; + netdev_ifdown(sock, ifbs[i]); + del_plug_qdisc(sock, netdev); + } + } + + for (i = 0; i < num_vifs; i++) { + if (vifs[i]) { + if (rtnl_link_get_kernel(sock, 0, vifs[i], &netdev) < 0) + continue; + del_ingress_qdisc(sock, netdev); + } + } + + rtnl_link_put(netdev); + nl_close(sock); +} + +/* + * Local variables: + * mode: C + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */
Shriram Rajagopalan
2013-Jul-25 07:09 UTC
[PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
Add appropriate code to xl_cmdline.c to setup network buffers for each vif belonging to the guest. Also provide a command line switch to explicitly "enable" network buffering. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl Wed Jul 24 22:55:00 2013 -0700 +++ b/tools/libxl/libxl_types.idl Thu Jul 25 00:02:19 2013 -0700 @@ -521,6 +521,7 @@ libxl_domain_remus_info = Struct("domain ("interval", integer), ("blackhole", bool), ("compression", bool), + ("netbuf_iflist", libxl_string_list), ]) libxl_event_type = Enumeration("event_type", [ diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c Wed Jul 24 22:55:00 2013 -0700 +++ b/tools/libxl/xl_cmdimpl.c Thu Jul 25 00:02:19 2013 -0700 @@ -7039,10 +7039,109 @@ done: return ret; } +static char **get_guest_vifnames(uint32_t domid, int *num_vifs) +{ + char **viflist; + libxl_device_nic *nics; + libxl_nicinfo nicinfo; + int nb, i; + + nics = libxl_device_nic_list(ctx, domid, &nb); + if (!nics) { *num_vifs = 0; return NULL;} + + viflist = calloc((nb + 1), sizeof(char *)); + if (!viflist) { + perror("failed to allocate memory to hold vif names!"); + exit(-1); + } + + for (i = 0; i < nb; ++i) { + if (!libxl_device_nic_getinfo(ctx, domid, &nics[i], &nicinfo)) { + if (asprintf(&viflist[i], "vif%u.%d", domid, nicinfo.devid) < 0) { + perror("Cannot alloc memory while getting guest vif names"); + exit(-1); + } + libxl_nicinfo_dispose(&nicinfo); + } + libxl_device_nic_dispose(&nics[i]); + } + free(nics); + + *num_vifs = nb; + return viflist; +} + +static int remus_setup_network_buffers(uint32_t domid, + char ***pifblist, int *num_ifbs) +{ + + char **viflist, **ifblist; + int nb, i, j; + + viflist = get_guest_vifnames(domid, &nb); + fprintf(stderr, "Domain %u has %d vifs\n", domid, nb); + + if (!viflist) { *num_ifbs = 0; *ifblist = NULL; return 0;} + + ifblist = calloc((nb + 1), sizeof(char *)); + if (!ifblist) { + perror("failed to allocate memory for ifb list!"); + exit(-1); + } + + /* For each vif, install the network buffer */ + for (i = 0; i < nb; ++i) { + ifblist[i] = remus_install_netbuf_on_dev(viflist[i]); + if (ifblist[i] == NULL) { + fprintf(stderr, "Failed to setup output buffer for interface %s\n", + viflist[i]); + break; + } + } + + if (i < nb) { + j = i; + remus_uninstall_netbufs(viflist, nb, ifblist, j); + for (i = 0; i < nb; i++) + free(viflist[i]); + free(viflist); + + for (i = 0; i < j; i++) + free(ifblist[i]); + free(ifblist); + *num_ifbs = 0; + *pifblist = NULL; + return -1; + } + + for (i = 0; i < nb; i++) + free(viflist[i]); + free(viflist); + + *num_ifbs = nb; + *pifblist = ifblist; + return 0; +} + +static void remus_teardown_network_buffers(uint32_t domid, char **ifblist, + int num_ifbs) +{ + int nb, i; + char **viflist; + + viflist = get_guest_vifnames(domid, &nb); + remus_uninstall_netbufs(viflist, nb, ifblist, num_ifbs); + + for (i = 0; i < nb; i++) + free(viflist[i]); + free(viflist); + +} + int main_remus(int argc, char **argv) { uint32_t domid; - int opt, rc, daemonize = 1; + int opt, rc, daemonize = 1, netbuf = 0, num_ifbs = 0; const char *ssh_command = "ssh"; char *host = NULL, *rune = NULL; libxl_domain_remus_info r_info; @@ -7057,7 +7156,7 @@ int main_remus(int argc, char **argv) r_info.blackhole = 0; r_info.compression = 1; - SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) { + SWITCH_FOREACH_OPT(opt, "bui:s:en", NULL, "remus", 2) { case ''i'': r_info.interval = atoi(optarg); break; @@ -7073,6 +7172,9 @@ int main_remus(int argc, char **argv) case ''e'': daemonize = 0; break; + case ''n'': + netbuf = 1; + break; } domid = find_domain(argv[optind]); @@ -7109,6 +7211,19 @@ int main_remus(int argc, char **argv) rune); } + if (netbuf) { + rc = remus_setup_network_buffers(domid, + (char ***)(&r_info.netbuf_iflist), + &num_ifbs); + if (rc) { + fprintf(stderr, "Failed to properly setup network " + "buffering. Exiting..\n"); + close(send_fd); + return -ERROR_FAIL; + } else + fprintf(stderr, "Network buffers setup on %d interfaces\n", num_ifbs); + } + /* Point of no return */ rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0); @@ -7126,6 +7241,9 @@ int main_remus(int argc, char **argv) libxl_domain_resume(ctx, domid, 1, 0); } + if (netbuf) + remus_teardown_network_buffers(domid, (char **)r_info.netbuf_iflist, num_ifbs); + close(send_fd); return -ERROR_FAIL; } diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/xl_cmdtable.c --- a/tools/libxl/xl_cmdtable.c Wed Jul 24 22:55:00 2013 -0700 +++ b/tools/libxl/xl_cmdtable.c Thu Jul 25 00:02:19 2013 -0700 @@ -485,8 +485,8 @@ struct cmd_spec cmd_table[] = { " to sh. If empty, run <host> instead of \n" " ssh <host> xl migrate-receive -r [-e]\n" "-e Do not wait in the background (on <host>) for the death\n" - " of the domain." - + " of the domain.\n" + "-n Enable Remus network buffering\n" }, };
Shriram Rajagopalan
2013-Jul-25 07:09 UTC
[PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
This patch constitutes the core network buffering logic. Libxl would receive a list of IFB devices that collectively act as network buffering devices, for a given guest. Also, libxl expects that every ifb device in the list supplied by a toolstack (netbuf_iflist) should have a plug qdisc installed. This patch does the following: a) establish a dedicated remus context containing libnl related state (netlink sockets, qdisc caches, etc.,) b) Obtain handles to plug qdiscs installed on the supplied list of IFB devices c) add a new network buffer (i.e., create a new one) when the domain is suspended (remus_domain_suspend_callback) d) release the network buffer pertaining to the acknowledged checkpoint in remus_domain_checkpoint_dm_saved, which is invoked for both PV & HVM. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 3cd67f6ff63a -r bef729fc4336 tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c Thu Jul 25 00:02:19 2013 -0700 +++ b/tools/libxl/libxl_dom.c Thu Jul 25 00:02:22 2013 -0700 @@ -20,6 +20,7 @@ #include "libxl_internal.h" #include "libxl_arch.h" +#include <netlink/route/qdisc/plug.h> #include <xc_dom.h> #include <xen/hvm/hvm_info_table.h> #include <xen/hvm/hvm_xs_strings.h> @@ -1212,12 +1213,36 @@ int libxl__toolstack_save(uint32_t domid /*----- remus callbacks -----*/ +/* REMUS TODO: Issue disk checkpoint reqs. */ static int libxl__remus_domain_suspend_callback(void *data) { - /* REMUS TODO: Issue disk and network checkpoint reqs. */ - return libxl__domain_suspend_common_callback(data); + libxl__save_helper_state *shs = data; + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); + STATE_AO_GC(dss->ao); + + int is_suspended = 0, i, ret; + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + is_suspended = libxl__domain_suspend_common_callback(data); + if (!remus_ctx->num_netbufs) return is_suspended; + + if (is_suspended) { + for (i = 0; i < remus_ctx->num_netbufs; ++i) { + ret = rtnl_qdisc_plug_buffer(remus_ctx->netbuf_qdisc_list[i]); + if (!ret) + ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i], + NLM_F_REQUEST); + if (ret) { + LOG(ERROR, "Cannot create new buffer on %s:%s", + dss->remus->netbuf_iflist[i], nl_geterror(ret)); + return 0; + } + } + } + + return is_suspended; } +/* REMUS TODO: Deal with disk. */ static int libxl__remus_domain_resume_callback(void *data) { libxl__save_helper_state *shs = data; @@ -1228,7 +1253,6 @@ static int libxl__remus_domain_resume_ca if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) return 0; - /* REMUS TODO: Deal with disk. Start a new network output buffer */ return 1; } @@ -1255,13 +1279,104 @@ static void libxl__remus_domain_checkpoi static void remus_checkpoint_dm_saved(libxl__egc *egc, libxl__domain_suspend_state *dss, int rc) { - /* REMUS TODO: Wait for disk and memory ack, release network buffer */ - /* REMUS TODO: make this asynchronous */ - assert(!rc); /* REMUS TODO handle this error properly */ - usleep(dss->interval * 1000); + /* REMUS TODO: Wait for disk and explicit memory ack (through restore + callback from remote) before release network buffer. */ + int i, ret; + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + STATE_AO_GC(dss->ao); + + assert(!rc); + + if (remus_ctx->num_netbufs > 0) { + for (i = 0; i < remus_ctx->num_netbufs; ++i) { + ret = rtnl_qdisc_plug_release_one(remus_ctx->netbuf_qdisc_list[i]); + if (!ret) + ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i], + NLM_F_REQUEST); + if (ret) { + LOG(ERROR, "Cannot release buffer from %s:%s", + dss->remus->netbuf_iflist[i], nl_geterror(ret)); + ret= 0; + break; + } + } + } + + usleep(dss->remus_ctx->interval * 1000); libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); } +static libxl__remus_ctx *setup_remus_ctx(libxl__gc *gc, + const libxl_domain_remus_info *r_info) +{ + libxl__remus_ctx *remus_ctx = NULL; + int num_netbufs = 0, i; + struct nl_cache *qdisc_cache = NULL; + struct rtnl_link *ifb = NULL; + struct nl_sock *nlsock = NULL; + struct rtnl_qdisc *qdisc = NULL; + libxl_string_list l; + int ifindex; + + remus_ctx = libxl__zalloc(gc, sizeof(libxl__remus_ctx)); + remus_ctx->interval = r_info->interval; + + l = r_info->netbuf_iflist; + num_netbufs = libxl_string_list_length(&l); + if (!num_netbufs) return remus_ctx; + + nlsock = nl_socket_alloc(); + if (!nlsock) { + LOG(ERROR, "setup_remus_ctx: cannot allocate nl socket"); + return NULL; + } + + nl_connect(nlsock, NETLINK_ROUTE); + + if (rtnl_qdisc_alloc_cache(nlsock, &qdisc_cache) < 0) { + LOG(ERROR, "setup_remus_ctx: failed to allocate qdisc cache"); + goto end; + } + + remus_ctx->netbuf_qdisc_list = libxl__calloc(gc, num_netbufs + 1, + sizeof(struct rtnl_qdisc *)); + remus_ctx->num_netbufs = num_netbufs; + remus_ctx->nlsock = nlsock; + remus_ctx->qdisc_cache = qdisc_cache; + ifb = rtnl_link_alloc(); + + for (i = 0; i < num_netbufs; ++i) { + + if (rtnl_link_get_kernel(nlsock, 0, l[i], &ifb) < 0) { + LOG(ERROR, "setup_remus_ctx: cannot obtain handle for %s", l[i]); + goto end; + } + + ifindex = rtnl_link_get_ifindex(ifb); + if (!ifindex) { + LOG(ERROR, "setup_remus_ctx: invalid interface %s", l[i]); + goto end; + } + + qdisc = rtnl_qdisc_get_by_parent(qdisc_cache, ifindex, TC_H_ROOT); + if (!qdisc || strcmp(rtnl_tc_get_kind(TC_CAST(qdisc)), "plug")) { + LOG(ERROR, "setup_remus_ctx: plug qdisc is not installed on %s", l[i]); + goto end; + } + + remus_ctx->netbuf_qdisc_list[i] = qdisc; + } + + rtnl_link_put(ifb); + return remus_ctx; + + end: + if (ifb) rtnl_link_put(ifb); + if (qdisc_cache) nl_cache_free(qdisc_cache); + if (nlsock) nl_close(nlsock); + return NULL; +} + /*----- main code for suspending, in order of execution -----*/ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss) @@ -1312,7 +1427,12 @@ void libxl__domain_suspend(libxl__egc *e dss->dm_savefile = libxl__device_model_savefile(gc, domid); if (r_info != NULL) { - dss->interval = r_info->interval; + /* This suspend is for Remus. We need to get a handle on + * the network output buffers and setup the remus_ctx; + */ + dss->remus_ctx = setup_remus_ctx(gc, r_info); + if (!dss->remus_ctx) + goto out; if (r_info->compression) dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS; } diff -r 3cd67f6ff63a -r bef729fc4336 tools/libxl/libxl_internal.h --- a/tools/libxl/libxl_internal.h Thu Jul 25 00:02:19 2013 -0700 +++ b/tools/libxl/libxl_internal.h Thu Jul 25 00:02:22 2013 -0700 @@ -44,6 +44,13 @@ #include <sys/wait.h> #include <sys/socket.h> +#include <netlink/cache.h> +#include <netlink/socket.h> +#include <netlink/attr.h> +#include <netlink/route/link.h> +#include <netlink/route/route.h> +#include <netlink/route/qdisc.h> + #include <xenstore.h> #include <xenctrl.h> #include <xenguest.h> @@ -2242,6 +2249,18 @@ typedef struct libxl__logdirty_switch { libxl__ev_time timeout; } libxl__logdirty_switch; +typedef struct libxl__remus_ctx { + /* checkpoint interval */ + int interval; + /* array of plug qdisc pointers, that hold + * network output from the guest''s vifs. + */ + int num_netbufs; + struct rtnl_qdisc **netbuf_qdisc_list; + struct nl_sock *nlsock; + struct nl_cache *qdisc_cache; +} libxl__remus_ctx; + struct libxl__domain_suspend_state { /* set by caller of libxl__domain_suspend */ libxl__ao *ao; @@ -2260,7 +2279,7 @@ struct libxl__domain_suspend_state { int xcflags; int guest_responded; const char *dm_savefile; - int interval; /* checkpoint interval (for Remus) */ + libxl__remus_ctx *remus_ctx; libxl__save_helper_state shs; libxl__logdirty_switch logdirty; /* private for libxl__domain_save_device_model */
Shriram Rajagopalan
2013-Jul-25 07:09 UTC
[PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
Add dependency on libnl3 version 3.2.17 or higher to autoconf. Add include flags and link to relevant libraries in tools/libxl/Makefile. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac --- a/tools/configure.ac Thu Jul 25 00:02:22 2013 -0700 +++ b/tools/configure.ac Thu Jul 25 00:02:33 2013 -0700 @@ -171,4 +171,12 @@ AC_SUBST(libiconv) # Checks for header files. AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h]) +# Checks for libnl3 libraries and headers. +PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no]) +if (test "${have_libnl3}" = "yes"); then + CFLAGS+="-I$LIBNL3_CFLAGS" +else + AC_MSG_ERROR([Need libnl version 3.2.17 or higher]) +fi + AC_OUTPUT() diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile --- a/tools/libxl/Makefile Thu Jul 25 00:02:22 2013 -0700 +++ b/tools/libxl/Makefile Thu Jul 25 00:02:33 2013 -0700 @@ -13,7 +13,7 @@ XLUMINOR = 0 CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \ -Wno-declaration-after-statement -Wformat-nonliteral -CFLAGS += -I. -fPIC +CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/ ifeq ($(CONFIG_Linux),y) LIBUUID_LIBS += -luuid @@ -29,7 +29,7 @@ CFLAGS_LIBXL += $(CFLAGS_libblktapctl) CFLAGS_LIBXL += -Wshadow CFLAGS += $(PTHREAD_CFLAGS) -LDFLAGS += $(PTHREAD_LDFLAGS) +LDFLAGS += $(PTHREAD_LDFLAGS) -L/usr/local/lib/libnl3/ LIBXL_LIBS += $(PTHREAD_LIBS) LIBXLU_LIBS @@ -68,6 +68,7 @@ ifeq ($(BISON),) endif LIBXL_LIBS += -lyajl +LIBXL_LIBS += -lnl-3 -lnl-route-3 LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \ @@ -92,7 +93,7 @@ CLIENTS = xl testidl libxl-save-helper CFLAGS_XL += $(CFLAGS_libxenlight) CFLAGS_XL += -Wshadow -XL_OBJS = xl.o xl_cmdimpl.o xl_cmdtable.o xl_sxp.o +XL_OBJS = xl.o xl_cmdimpl.o xl_cmdtable.o xl_sxp.o xl_netbuf.o $(XL_OBJS) _libxl.api-for-check: \ CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h $(XL_OBJS): CFLAGS += $(CFLAGS_XL) @@ -189,7 +190,7 @@ libxlutil.a: $(LIBXLU_OBJS) $(AR) rcs libxlutil.a $^ xl: $(XL_OBJS) libxlutil.so libxenlight.so - $(CC) $(LDFLAGS) -o $@ $(XL_OBJS) libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) -lyajl $(APPEND_LDFLAGS) + $(CC) $(LDFLAGS) -o $@ $(XL_OBJS) libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) -lyajl -lnl-3 -lnl-route-3 $(APPEND_LDFLAGS) libxl-save-helper: $(SAVE_HELPER_OBJS) libxenlight.so $(CC) $(LDFLAGS) -o $@ $(SAVE_HELPER_OBJS) $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS)
Wen Congyang
2013-Jul-26 09:44 UTC
Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
At 07/25/2013 03:09 PM, Shriram Rajagopalan Wrote:> Add dependency on libnl3 version 3.2.17 or higher to autoconf. > Add include flags and link to relevant libraries in tools/libxl/Makefile.Is it OK to use libnl, not libnl3? Some systems(For example: RHEL6) don''t have libnl3.> > Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> > > diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac > --- a/tools/configure.ac Thu Jul 25 00:02:22 2013 -0700 > +++ b/tools/configure.ac Thu Jul 25 00:02:33 2013 -0700 > @@ -171,4 +171,12 @@ AC_SUBST(libiconv) > # Checks for header files. > AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h]) > > +# Checks for libnl3 libraries and headers. > +PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no]) > +if (test "${have_libnl3}" = "yes"); then > + CFLAGS+="-I$LIBNL3_CFLAGS" > +else > + AC_MSG_ERROR([Need libnl version 3.2.17 or higher]) > +fi > + > AC_OUTPUT() > diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile > --- a/tools/libxl/Makefile Thu Jul 25 00:02:22 2013 -0700 > +++ b/tools/libxl/Makefile Thu Jul 25 00:02:33 2013 -0700 > @@ -13,7 +13,7 @@ XLUMINOR = 0 > > CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \ > -Wno-declaration-after-statement -Wformat-nonliteral > -CFLAGS += -I. -fPIC > +CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/Hmm, why use /usr/local? Thanks Wen Congyang> > ifeq ($(CONFIG_Linux),y) > LIBUUID_LIBS += -luuid > @@ -29,7 +29,7 @@ CFLAGS_LIBXL += $(CFLAGS_libblktapctl) > CFLAGS_LIBXL += -Wshadow > > CFLAGS += $(PTHREAD_CFLAGS) > -LDFLAGS += $(PTHREAD_LDFLAGS) > +LDFLAGS += $(PTHREAD_LDFLAGS) -L/usr/local/lib/libnl3/ > LIBXL_LIBS += $(PTHREAD_LIBS) > > LIBXLU_LIBS > @@ -68,6 +68,7 @@ ifeq ($(BISON),) > endif > > LIBXL_LIBS += -lyajl > +LIBXL_LIBS += -lnl-3 -lnl-route-3 > > LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ > libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \ > @@ -92,7 +93,7 @@ CLIENTS = xl testidl libxl-save-helper > CFLAGS_XL += $(CFLAGS_libxenlight) > CFLAGS_XL += -Wshadow > > -XL_OBJS = xl.o xl_cmdimpl.o xl_cmdtable.o xl_sxp.o > +XL_OBJS = xl.o xl_cmdimpl.o xl_cmdtable.o xl_sxp.o xl_netbuf.o > $(XL_OBJS) _libxl.api-for-check: \ > CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h > $(XL_OBJS): CFLAGS += $(CFLAGS_XL) > @@ -189,7 +190,7 @@ libxlutil.a: $(LIBXLU_OBJS) > $(AR) rcs libxlutil.a $^ > > xl: $(XL_OBJS) libxlutil.so libxenlight.so > - $(CC) $(LDFLAGS) -o $@ $(XL_OBJS) libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) -lyajl $(APPEND_LDFLAGS) > + $(CC) $(LDFLAGS) -o $@ $(XL_OBJS) libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) -lyajl -lnl-3 -lnl-route-3 $(APPEND_LDFLAGS) > > libxl-save-helper: $(SAVE_HELPER_OBJS) libxenlight.so > $(CC) $(LDFLAGS) -o $@ $(SAVE_HELPER_OBJS) $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS) > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
David Vrabel
2013-Jul-26 09:56 UTC
Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
On 25/07/13 08:09, Shriram Rajagopalan wrote:> Add dependency on libnl3 version 3.2.17 or higher to autoconf. > Add include flags and link to relevant libraries in tools/libxl/Makefile. > > Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> > > diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac > --- a/tools/configure.ac Thu Jul 25 00:02:22 2013 -0700 > +++ b/tools/configure.ac Thu Jul 25 00:02:33 2013 -0700 > @@ -171,4 +171,12 @@ AC_SUBST(libiconv) > # Checks for header files. > AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h]) > > +# Checks for libnl3 libraries and headers. > +PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no]) > +if (test "${have_libnl3}" = "yes"); then > + CFLAGS+="-I$LIBNL3_CFLAGS" > +else > + AC_MSG_ERROR([Need libnl version 3.2.17 or higher]) > +fiUsing pkg-config here...> AC_OUTPUT() > diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile > --- a/tools/libxl/Makefile Thu Jul 25 00:02:22 2013 -0700 > +++ b/tools/libxl/Makefile Thu Jul 25 00:02:33 2013 -0700 > @@ -13,7 +13,7 @@ XLUMINOR = 0 > > CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \ > -Wno-declaration-after-statement -Wformat-nonliteral > -CFLAGS += -I. -fPIC > +CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/Means you should need to include the -I option here. David
Shriram Rajagopalan
2013-Jul-26 13:51 UTC
Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
On Fri, Jul 26, 2013 at 5:44 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:> At 07/25/2013 03:09 PM, Shriram Rajagopalan Wrote: >> Add dependency on libnl3 version 3.2.17 or higher to autoconf. >> Add include flags and link to relevant libraries in tools/libxl/Makefile. > > Is it OK to use libnl, not libnl3? > Some systems(For example: RHEL6) don''t have libnl3. >Unfortunately no. libnl3 is where the network buffering support exists. And most of the API used to setup network buffering were introduced in libnl3.>> >> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> >> >> diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac >> --- a/tools/configure.ac Thu Jul 25 00:02:22 2013 -0700 >> +++ b/tools/configure.ac Thu Jul 25 00:02:33 2013 -0700 >> @@ -171,4 +171,12 @@ AC_SUBST(libiconv) >> # Checks for header files. >> AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h]) >> >> +# Checks for libnl3 libraries and headers. >> +PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no]) >> +if (test "${have_libnl3}" = "yes"); then >> + CFLAGS+="-I$LIBNL3_CFLAGS" >> +else >> + AC_MSG_ERROR([Need libnl version 3.2.17 or higher]) >> +fi >> + >> AC_OUTPUT() >> diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile >> --- a/tools/libxl/Makefile Thu Jul 25 00:02:22 2013 -0700 >> +++ b/tools/libxl/Makefile Thu Jul 25 00:02:33 2013 -0700 >> @@ -13,7 +13,7 @@ XLUMINOR = 0 >> >> CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \ >> -Wno-declaration-after-statement -Wformat-nonliteral >> -CFLAGS += -I. -fPIC >> +CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/ > > Hmm, why use /usr/local? >That was a hack, as stated in the introductory email. I am not well versed with autoconf. Using pkg-config and adding the libraries to APPEND_INCLUDES didnt do the job. shriram
Shriram Rajagopalan
2013-Jul-26 13:56 UTC
Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
On Fri, Jul 26, 2013 at 5:56 AM, David Vrabel <david.vrabel@citrix.com> wrote:> On 25/07/13 08:09, Shriram Rajagopalan wrote: >> Add dependency on libnl3 version 3.2.17 or higher to autoconf. >> Add include flags and link to relevant libraries in tools/libxl/Makefile. >> >> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> >> >> diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac >> --- a/tools/configure.ac Thu Jul 25 00:02:22 2013 -0700 >> +++ b/tools/configure.ac Thu Jul 25 00:02:33 2013 -0700 >> @@ -171,4 +171,12 @@ AC_SUBST(libiconv) >> # Checks for header files. >> AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h]) >> >> +# Checks for libnl3 libraries and headers. >> +PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no]) >> +if (test "${have_libnl3}" = "yes"); then >> + CFLAGS+="-I$LIBNL3_CFLAGS" >> +else >> + AC_MSG_ERROR([Need libnl version 3.2.17 or higher]) >> +fi > > Using pkg-config here... >The pkg-config idea was based on the suggestion here http://www.carisma.slowglass.com/~tgr/libnl/doc/core.html#_linking_to_this_library If I use AC_CHECK_LIB and AC_CHECK_HEADERS, would it automatically add the include path to CFLAGS, such that I wouldnt have to manually specify -I flags in libxl/Makefile ?>> AC_OUTPUT() >> diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile >> --- a/tools/libxl/Makefile Thu Jul 25 00:02:22 2013 -0700 >> +++ b/tools/libxl/Makefile Thu Jul 25 00:02:33 2013 -0700 >> @@ -13,7 +13,7 @@ XLUMINOR = 0 >> >> CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \ >> -Wno-declaration-after-statement -Wformat-nonliteral >> -CFLAGS += -I. -fPIC >> +CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/ > > Means you should need to include the -I option here. > > David >
Wen Congyang
2013-Jul-29 05:58 UTC
Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
At 07/26/2013 09:56 PM, Shriram Rajagopalan Wrote:> On Fri, Jul 26, 2013 at 5:56 AM, David Vrabel <david.vrabel@citrix.com> wrote: >> On 25/07/13 08:09, Shriram Rajagopalan wrote: >>> Add dependency on libnl3 version 3.2.17 or higher to autoconf. >>> Add include flags and link to relevant libraries in tools/libxl/Makefile. >>> >>> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> >>> >>> diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac >>> --- a/tools/configure.ac Thu Jul 25 00:02:22 2013 -0700 >>> +++ b/tools/configure.ac Thu Jul 25 00:02:33 2013 -0700 >>> @@ -171,4 +171,12 @@ AC_SUBST(libiconv) >>> # Checks for header files. >>> AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h]) >>> >>> +# Checks for libnl3 libraries and headers. >>> +PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no]) >>> +if (test "${have_libnl3}" = "yes"); then >>> + CFLAGS+="-I$LIBNL3_CFLAGS" >>> +else >>> + AC_MSG_ERROR([Need libnl version 3.2.17 or higher]) >>> +fi >> >> Using pkg-config here... >> > > The pkg-config idea was based on the suggestion here > http://www.carisma.slowglass.com/~tgr/libnl/doc/core.html#_linking_to_this_library > > If I use AC_CHECK_LIB and AC_CHECK_HEADERS, would it automatically add the > include path to CFLAGS, such that I wouldnt have to manually specify > -I flags in libxl/Makefile ?No. AC_CHECK_HEADERS can check if the header file is under the default directory. If the header file is under the directory /usr/include, you can use it. But why your header file is under the directory /usr/local/include? Thanks Wen Congyang> >>> AC_OUTPUT() >>> diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile >>> --- a/tools/libxl/Makefile Thu Jul 25 00:02:22 2013 -0700 >>> +++ b/tools/libxl/Makefile Thu Jul 25 00:02:33 2013 -0700 >>> @@ -13,7 +13,7 @@ XLUMINOR = 0 >>> >>> CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \ >>> -Wno-declaration-after-statement -Wformat-nonliteral >>> -CFLAGS += -I. -fPIC >>> +CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/ >> >> Means you should need to include the -I option here. >> >> David >> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Shriram Rajagopalan
2013-Jul-29 13:07 UTC
Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
On Mon, Jul 29, 2013 at 1:58 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:>> If I use AC_CHECK_LIB and AC_CHECK_HEADERS, would it automatically add the >> include path to CFLAGS, such that I wouldnt have to manually specify >> -I flags in libxl/Makefile ? > > No. AC_CHECK_HEADERS can check if the header file is under the default directory. > If the header file is under the directory /usr/include, you can use it. > But why your header file is under the directory /usr/local/include? >I think its because I installed libnl3 in /usr/local, from the sources. If there is a proper way to specify custom install directories for certain libraries, then I dont think we need that hard coded -I /usr/local/include/libnl3 shriram
David Vrabel
2013-Jul-29 15:41 UTC
Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
On 29/07/13 14:07, Shriram Rajagopalan wrote:> On Mon, Jul 29, 2013 at 1:58 AM, Wen Congyang <wency@cn.fujitsu.com> wrote: >>> If I use AC_CHECK_LIB and AC_CHECK_HEADERS, would it automatically add the >>> include path to CFLAGS, such that I wouldnt have to manually specify >>> -I flags in libxl/Makefile ? >> >> No. AC_CHECK_HEADERS can check if the header file is under the default directory. >> If the header file is under the directory /usr/include, you can use it. >> But why your header file is under the directory /usr/local/include? >> > > I think its because I installed libnl3 in /usr/local, from the sources. > If there is a proper way to specify custom install directories for > certain libraries, > then I dont think we need that hard coded -I /usr/local/include/libnl3You need to tell pkg-config where to find the .pc file. I don''t recall the details for how to do this. PKG_CONFIG_PATH perhaps? David
Ian Campbell
2013-Jul-29 15:42 UTC
Re: [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions
On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote:> The functions contained in xl_netbuf.c are built on top of libnl3 API. > They setup the necessary infrastructure required to for network output > buffering in Remus. In xend, the Remus python code performed this setup > by invoking shell commands. This code is built purely on top of APIs > supplied by libnl3.0. > > There are two public helper functions: > 1) remus_install_netbuf_on_dev to install a network buffer on a vif. > Its basically equivalent to: > a)find free ifb (say ifb0) > b)ip link set ifb0 up > c)tc qdisc add dev vif1.0 ingress > d)tc filter add dev vif1.0 parent ffff: proto ip pref 10 u32 match u32 0 0 \ > action mirred egress redirect dev ifb0 > e)tc qdisc add dev ifb0 root plug > f)get handle to plug qdisc and control it programmatically after suspending > a VM/receiving checkpoint ack from remote > > > 2) remus_uninstall_netbufs to remove all > network buffers and ingress qdiscs from the supplied interface list. > > I have managed to code up 5 of the 6 steps mentioned above, purely in C, by building > on top of libnl3. There is currently no support in libnl3 to implement step (d) (redirection) > and it is pretty complicated to implement. So I am currently resorting to system() lib call > to get the job done. > > N.B. This implementation is just xl (cmdline utility) specific.You haven''t included a Makefile change so I can''t easily tell if this is really part of the xl binary, or a separate helper or indeed part of libxl.> Other toolstacks may choose > to reuse this code or do this setup in their own way (scripts/shell commands, etc.).It''s not clear to me why this code wouldn''t be part of libxl to be called by xl and other toolstacks as needed. Why would a toolstack want to do this differently? The point of libxl is precisely to relieve toolstacks of the burden of doing these sorts of complex domain setup activities themselves by putting them in a common library, I can''t see why Remus setup would be an exception to this. I think we need to answer this question before reviewing this code, since doing it libxl would change the requirements of the code (from the API exposed, to logging, to memory managment, to not using system directly).> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>Ian.
Ian Campbell
2013-Jul-29 15:49 UTC
Re: [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote:> Add appropriate code to xl_cmdline.c to setup network buffers for > each vif belonging to the guest. Also provide a command line switch > to explicitly "enable" network buffering. > > Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> > > diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/libxl_types.idl > --- a/tools/libxl/libxl_types.idl Wed Jul 24 22:55:00 2013 -0700 > +++ b/tools/libxl/libxl_types.idl Thu Jul 25 00:02:19 2013 -0700 > @@ -521,6 +521,7 @@ libxl_domain_remus_info = Struct("domain > ("interval", integer), > ("blackhole", bool), > ("compression", bool), > + ("netbuf_iflist", libxl_string_list), > ]) > > libxl_event_type = Enumeration("event_type", [ > diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/xl_cmdimpl.c > --- a/tools/libxl/xl_cmdimpl.c Wed Jul 24 22:55:00 2013 -0700 > +++ b/tools/libxl/xl_cmdimpl.c Thu Jul 25 00:02:19 2013 -0700 > @@ -7039,10 +7039,109 @@ done: > return ret; > } > > +static char **get_guest_vifnames(uint32_t domid, int *num_vifs) > +{ > + char **viflist; > + libxl_device_nic *nics; > + libxl_nicinfo nicinfo; > + int nb, i; > + > + nics = libxl_device_nic_list(ctx, domid, &nb); > + if (!nics) { *num_vifs = 0; return NULL;} > + > + viflist = calloc((nb + 1), sizeof(char *)); > + if (!viflist) { > + perror("failed to allocate memory to hold vif names!"); > + exit(-1); > + } > + > + for (i = 0; i < nb; ++i) { > + if (!libxl_device_nic_getinfo(ctx, domid, &nics[i], &nicinfo)) { > + if (asprintf(&viflist[i], "vif%u.%d", domid, nicinfo.devid) < 0) {This doesn''t account for the ifname field of the vif. Also, I''m not sure how this is supposed to work when driver domains are in use either, if this were done via libxl then it would naturally get incorporated into Roger''s work to make hotplug scripts etc work properly with stub domains. Most of the other comments I had would become invalid/irrelevant when this was moved to libxl, since you''d naturally end up doing things differently anyway. Ian.
Ian Campbell
2013-Jul-29 16:06 UTC
Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote:> This patch constitutes the core network buffering logic. > Libxl would receive a list of IFB devices that collectively act as > network buffering devices, for a given guest. Also, libxl expects that > every ifb device in the list supplied by a toolstack (netbuf_iflist) > should have a plug qdisc installed. > > This patch does the following: > a) establish a dedicated remus context containing libnl related > state (netlink sockets, qdisc caches, etc.,) > b) Obtain handles to plug qdiscs installed on the supplied list of > IFB devices > c) add a new network buffer (i.e., create a new one) when the domain is > suspended (remus_domain_suspend_callback) > d) release the network buffer pertaining to the acknowledged checkpoint > in remus_domain_checkpoint_dm_saved, which is invoked for both PV & HVM. > > Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>Please can you also CC Ian.Jackson@eu.citrix.com on toolstack patches such as this. This really needs Ian''s review WRT the lifetime of the remus context object vs. the ao machinery. IOW I''m not sure if this is the appropriate gc to be using or if one associated with the ao context isn''t required. On the other hand by simply embedding the struct (rather than a pointer to the struct) a bunch of those worries might go away.> > diff -r 3cd67f6ff63a -r bef729fc4336 tools/libxl/libxl_dom.c > --- a/tools/libxl/libxl_dom.c Thu Jul 25 00:02:19 2013 -0700 > +++ b/tools/libxl/libxl_dom.c Thu Jul 25 00:02:22 2013 -0700 > @@ -20,6 +20,7 @@ > #include "libxl_internal.h" > #include "libxl_arch.h" > > +#include <netlink/route/qdisc/plug.h> > #include <xc_dom.h> > #include <xen/hvm/hvm_info_table.h> > #include <xen/hvm/hvm_xs_strings.h> > @@ -1212,12 +1213,36 @@ int libxl__toolstack_save(uint32_t domid > > /*----- remus callbacks -----*/ > > +/* REMUS TODO: Issue disk checkpoint reqs. */ > static int libxl__remus_domain_suspend_callback(void *data) > { > - /* REMUS TODO: Issue disk and network checkpoint reqs. */ > - return libxl__domain_suspend_common_callback(data); > + libxl__save_helper_state *shs = data; > + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); > + STATE_AO_GC(dss->ao); > + > + int is_suspended = 0, i, ret; > + libxl__remus_ctx *remus_ctx = dss->remus_ctx; > + is_suspended = libxl__domain_suspend_common_callback(data); > + if (!remus_ctx->num_netbufs) return is_suspended; > + > + if (is_suspended) { > + for (i = 0; i < remus_ctx->num_netbufs; ++i) { > + ret = rtnl_qdisc_plug_buffer(remus_ctx->netbuf_qdisc_list[i]); > + if (!ret) > + ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i], > + NLM_F_REQUEST); > + if (ret) { > + LOG(ERROR, "Cannot create new buffer on %s:%s", > + dss->remus->netbuf_iflist[i], nl_geterror(ret)); > + return 0; > + } > + } > + } > + > + return is_suspended; > } > > +/* REMUS TODO: Deal with disk. */ > static int libxl__remus_domain_resume_callback(void *data) > { > libxl__save_helper_state *shs = data; > @@ -1228,7 +1253,6 @@ static int libxl__remus_domain_resume_ca > if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) > return 0; > > - /* REMUS TODO: Deal with disk. Start a new network output buffer */ > return 1; > } > > @@ -1255,13 +1279,104 @@ static void libxl__remus_domain_checkpoi > static void remus_checkpoint_dm_saved(libxl__egc *egc, > libxl__domain_suspend_state *dss, int rc) > { > - /* REMUS TODO: Wait for disk and memory ack, release network buffer */ > - /* REMUS TODO: make this asynchronous */ > - assert(!rc); /* REMUS TODO handle this error properly */ > - usleep(dss->interval * 1000); > + /* REMUS TODO: Wait for disk and explicit memory ack (through restore > + callback from remote) before release network buffer. */ > + int i, ret; > + libxl__remus_ctx *remus_ctx = dss->remus_ctx; > + STATE_AO_GC(dss->ao); > + > + assert(!rc); > + > + if (remus_ctx->num_netbufs > 0) { > + for (i = 0; i < remus_ctx->num_netbufs; ++i) { > + ret = rtnl_qdisc_plug_release_one(remus_ctx->netbuf_qdisc_list[i]); > + if (!ret) > + ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i], > + NLM_F_REQUEST); > + if (ret) { > + LOG(ERROR, "Cannot release buffer from %s:%s", > + dss->remus->netbuf_iflist[i], nl_geterror(ret)); > + ret= 0; > + break; > + } > + } > + } > + > + usleep(dss->remus_ctx->interval * 1000); > libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); > } > > +static libxl__remus_ctx *setup_remus_ctx(libxl__gc *gc, > + const libxl_domain_remus_info *r_info) > +{ > + libxl__remus_ctx *remus_ctx = NULL; > + int num_netbufs = 0, i; > + struct nl_cache *qdisc_cache = NULL; > + struct rtnl_link *ifb = NULL; > + struct nl_sock *nlsock = NULL; > + struct rtnl_qdisc *qdisc = NULL; > + libxl_string_list l; > + int ifindex; > + > + remus_ctx = libxl__zalloc(gc, sizeof(libxl__remus_ctx)); > + remus_ctx->interval = r_info->interval; > + > + l = r_info->netbuf_iflist; > + num_netbufs = libxl_string_list_length(&l); > + if (!num_netbufs) return remus_ctx; > + > + nlsock = nl_socket_alloc(); > + if (!nlsock) { > + LOG(ERROR, "setup_remus_ctx: cannot allocate nl socket"); > + return NULL; > + } > + > + nl_connect(nlsock, NETLINK_ROUTE); > + > + if (rtnl_qdisc_alloc_cache(nlsock, &qdisc_cache) < 0) { > + LOG(ERROR, "setup_remus_ctx: failed to allocate qdisc cache"); > + goto end; > + } > + > + remus_ctx->netbuf_qdisc_list = libxl__calloc(gc, num_netbufs + 1, > + sizeof(struct rtnl_qdisc *)); > + remus_ctx->num_netbufs = num_netbufs; > + remus_ctx->nlsock = nlsock; > + remus_ctx->qdisc_cache = qdisc_cache; > + ifb = rtnl_link_alloc(); > + > + for (i = 0; i < num_netbufs; ++i) { > + > + if (rtnl_link_get_kernel(nlsock, 0, l[i], &ifb) < 0) { > + LOG(ERROR, "setup_remus_ctx: cannot obtain handle for %s", l[i]); > + goto end; > + } > + > + ifindex = rtnl_link_get_ifindex(ifb); > + if (!ifindex) { > + LOG(ERROR, "setup_remus_ctx: invalid interface %s", l[i]); > + goto end; > + } > + > + qdisc = rtnl_qdisc_get_by_parent(qdisc_cache, ifindex, TC_H_ROOT); > + if (!qdisc || strcmp(rtnl_tc_get_kind(TC_CAST(qdisc)), "plug")) { > + LOG(ERROR, "setup_remus_ctx: plug qdisc is not installed on %s", l[i]); > + goto end; > + } > + > + remus_ctx->netbuf_qdisc_list[i] = qdisc; > + } > + > + rtnl_link_put(ifb); > + return remus_ctx; > + > + end: > + if (ifb) rtnl_link_put(ifb); > + if (qdisc_cache) nl_cache_free(qdisc_cache); > + if (nlsock) nl_close(nlsock); > + return NULL; > +} > + > /*----- main code for suspending, in order of execution -----*/ > > void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss) > @@ -1312,7 +1427,12 @@ void libxl__domain_suspend(libxl__egc *e > dss->dm_savefile = libxl__device_model_savefile(gc, domid); > > if (r_info != NULL) { > - dss->interval = r_info->interval; > + /* This suspend is for Remus. We need to get a handle on > + * the network output buffers and setup the remus_ctx; > + */ > + dss->remus_ctx = setup_remus_ctx(gc, r_info); > + if (!dss->remus_ctx) > + goto out; > if (r_info->compression) > dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS; > } > diff -r 3cd67f6ff63a -r bef729fc4336 tools/libxl/libxl_internal.h > --- a/tools/libxl/libxl_internal.h Thu Jul 25 00:02:19 2013 -0700 > +++ b/tools/libxl/libxl_internal.h Thu Jul 25 00:02:22 2013 -0700 > @@ -44,6 +44,13 @@ > #include <sys/wait.h> > #include <sys/socket.h> > > +#include <netlink/cache.h> > +#include <netlink/socket.h> > +#include <netlink/attr.h> > +#include <netlink/route/link.h> > +#include <netlink/route/route.h> > +#include <netlink/route/qdisc.h> > + > #include <xenstore.h> > #include <xenctrl.h> > #include <xenguest.h> > @@ -2242,6 +2249,18 @@ typedef struct libxl__logdirty_switch { > libxl__ev_time timeout; > } libxl__logdirty_switch; > > +typedef struct libxl__remus_ctx { > + /* checkpoint interval */ > + int interval; > + /* array of plug qdisc pointers, that hold > + * network output from the guest''s vifs. > + */ > + int num_netbufs; > + struct rtnl_qdisc **netbuf_qdisc_list; > + struct nl_sock *nlsock; > + struct nl_cache *qdisc_cache; > +} libxl__remus_ctx; > + > struct libxl__domain_suspend_state { > /* set by caller of libxl__domain_suspend */ > libxl__ao *ao; > @@ -2260,7 +2279,7 @@ struct libxl__domain_suspend_state { > int xcflags; > int guest_responded; > const char *dm_savefile; > - int interval; /* checkpoint interval (for Remus) */ > + libxl__remus_ctx *remus_ctx; > libxl__save_helper_state shs; > libxl__logdirty_switch logdirty; > /* private for libxl__domain_save_device_model */
Shriram Rajagopalan
2013-Jul-29 18:00 UTC
Re: [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions
On Mon, Jul 29, 2013 at 11:42 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote: >> The functions contained in xl_netbuf.c are built on top of libnl3 API. >> They setup the necessary infrastructure required to for network output >> buffering in Remus. In xend, the Remus python code performed this setup >> by invoking shell commands. This code is built purely on top of APIs >> supplied by libnl3.0. >> >> There are two public helper functions: >> 1) remus_install_netbuf_on_dev to install a network buffer on a vif. >> Its basically equivalent to: >> a)find free ifb (say ifb0) >> b)ip link set ifb0 up >> c)tc qdisc add dev vif1.0 ingress >> d)tc filter add dev vif1.0 parent ffff: proto ip pref 10 u32 match u32 0 0 \ >> action mirred egress redirect dev ifb0 >> e)tc qdisc add dev ifb0 root plug >> f)get handle to plug qdisc and control it programmatically after suspending >> a VM/receiving checkpoint ack from remote >> >> >> 2) remus_uninstall_netbufs to remove all >> network buffers and ingress qdiscs from the supplied interface list. >> >> I have managed to code up 5 of the 6 steps mentioned above, purely in C, by building >> on top of libnl3. There is currently no support in libnl3 to implement step (d) (redirection) >> and it is pretty complicated to implement. So I am currently resorting to system() lib call >> to get the job done. >> >> N.B. This implementation is just xl (cmdline utility) specific. > > You haven''t included a Makefile change so I can''t easily tell if this is > really part of the xl binary, or a separate helper or indeed part of > libxl. >The Makefile changes were clumped together in patch 4, along with changes to configure.ac.>> Other toolstacks may choose >> to reuse this code or do this setup in their own way (scripts/shell commands, etc.). > > It''s not clear to me why this code wouldn''t be part of libxl to be > called by xl and other toolstacks as needed. Why would a toolstack want > to do this differently? > > The point of libxl is precisely to relieve toolstacks of the burden of > doing these sorts of complex domain setup activities themselves by > putting them in a common library, I can''t see why Remus setup would be > an exception to this. > > I think we need to answer this question before reviewing this code, > since doing it libxl would change the requirements of the code (from the > API exposed, to logging, to memory managment, to not using system > directly). >If every one had to do this in the same way, then I agree that this code should be moved into libxl. The setup can be done using a a few lines of shell script or in a 100-200 line C code (like this patch). Given that there is a choice, I didnt see any point in forcing all toolstacks to go the libxl way (resorting to its internal c code for setting up the buffers). But if you think that we might as well make a sane choice and abstract this complexity from other toolstacks, I would be happy to push this into libxl itself. shriram
Shriram Rajagopalan
2013-Jul-29 19:00 UTC
Re: [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
On Mon, Jul 29, 2013 at 11:49 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote: >> Add appropriate code to xl_cmdline.c to setup network buffers for >> each vif belonging to the guest. Also provide a command line switch >> to explicitly "enable" network buffering. >> >> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> >> >> diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/libxl_types.idl >> --- a/tools/libxl/libxl_types.idl Wed Jul 24 22:55:00 2013 -0700 >> +++ b/tools/libxl/libxl_types.idl Thu Jul 25 00:02:19 2013 -0700 >> @@ -521,6 +521,7 @@ libxl_domain_remus_info = Struct("domain >> ("interval", integer), >> ("blackhole", bool), >> ("compression", bool), >> + ("netbuf_iflist", libxl_string_list), >> ]) >> >> libxl_event_type = Enumeration("event_type", [ >> diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/xl_cmdimpl.c >> --- a/tools/libxl/xl_cmdimpl.c Wed Jul 24 22:55:00 2013 -0700 >> +++ b/tools/libxl/xl_cmdimpl.c Thu Jul 25 00:02:19 2013 -0700 >> @@ -7039,10 +7039,109 @@ done: >> return ret; >> } >> >> +static char **get_guest_vifnames(uint32_t domid, int *num_vifs) >> +{ >> + char **viflist; >> + libxl_device_nic *nics; >> + libxl_nicinfo nicinfo; >> + int nb, i; >> + >> + nics = libxl_device_nic_list(ctx, domid, &nb); >> + if (!nics) { *num_vifs = 0; return NULL;} >> + >> + viflist = calloc((nb + 1), sizeof(char *)); >> + if (!viflist) { >> + perror("failed to allocate memory to hold vif names!"); >> + exit(-1); >> + } >> + >> + for (i = 0; i < nb; ++i) { >> + if (!libxl_device_nic_getinfo(ctx, domid, &nics[i], &nicinfo)) { >> + if (asprintf(&viflist[i], "vif%u.%d", domid, nicinfo.devid) < 0) { > > This doesn''t account for the ifname field of the vif. >Ah thanks. good catch.. Will add a check for that.> Also, I''m not sure how this is supposed to work when driver domains are > in use either,Remus wont work with driver domains, unless there are agents in each of the driver domains, that are co-ordinated by the memory checkpoint code in dom0. Irrespective of the hotplug scripts, Remus needs to control the IFB network interface attached to the Guest''s vifs/tap devices. Given that all these interfaces will be inside a driver domain, which does not have a fast (not xenstore) communication channel to dom0, there is no way the memory checkpointing code can co-ordinate with the driver domain in order to buffer/release its network packets after each checkpoint. One alternative would be to have a network agent running inside each of these driver domains, assuming that the driver domains would have network access (practical ? ). The memory checkpoint code would have to control the IFB devices via the network agents. All this is in the long run.. The immediate goal is to get network/disk buffering to work with xl.> if this were done via libxl then it would naturally get > incorporated into Roger''s work to make hotplug scripts etc work properly > with stub domains. > > Most of the other comments I had would become invalid/irrelevant when > this was moved to libxl, since you''d naturally end up doing things > differently anyway. >Even if the code moved to libxl, most of the setup would remain the same, i.e. setup on demand using C code or something. It does not make sense to setup an IFB device for every domain right from the boot. You will eventually run out of ifb devices in the system. If you are suggesting that we invoke a hotplug script when "xl remus" command is issued, I dont mind doing so either. The code in libxl (to control the plug qdisc) is not going to go away in either case. shriram
Ian Campbell
2013-Jul-30 10:44 UTC
Re: [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions
On Mon, 2013-07-29 at 14:00 -0400, Shriram Rajagopalan wrote:> On Mon, Jul 29, 2013 at 11:42 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote: > >> The functions contained in xl_netbuf.c are built on top of libnl3 API. > >> They setup the necessary infrastructure required to for network output > >> buffering in Remus. In xend, the Remus python code performed this setup > >> by invoking shell commands. This code is built purely on top of APIs > >> supplied by libnl3.0. > >> > >> There are two public helper functions: > >> 1) remus_install_netbuf_on_dev to install a network buffer on a vif. > >> Its basically equivalent to: > >> a)find free ifb (say ifb0) > >> b)ip link set ifb0 up > >> c)tc qdisc add dev vif1.0 ingress > >> d)tc filter add dev vif1.0 parent ffff: proto ip pref 10 u32 match u32 0 0 \ > >> action mirred egress redirect dev ifb0 > >> e)tc qdisc add dev ifb0 root plug > >> f)get handle to plug qdisc and control it programmatically after suspending > >> a VM/receiving checkpoint ack from remote > >> > >> > >> 2) remus_uninstall_netbufs to remove all > >> network buffers and ingress qdiscs from the supplied interface list. > >> > >> I have managed to code up 5 of the 6 steps mentioned above, purely in C, by building > >> on top of libnl3. There is currently no support in libnl3 to implement step (d) (redirection) > >> and it is pretty complicated to implement. So I am currently resorting to system() lib call > >> to get the job done. > >> > >> N.B. This implementation is just xl (cmdline utility) specific. > > > > You haven''t included a Makefile change so I can''t easily tell if this is > > really part of the xl binary, or a separate helper or indeed part of > > libxl. > > > > The Makefile changes were clumped together in patch 4, along with changes to > configure.ac.Please put these sorts of changes alongside the code changes, that might mean reordering your series etc.> >> Other toolstacks may choose > >> to reuse this code or do this setup in their own way (scripts/shell commands, etc.). > > > > It''s not clear to me why this code wouldn''t be part of libxl to be > > called by xl and other toolstacks as needed. Why would a toolstack want > > to do this differently? > > > > The point of libxl is precisely to relieve toolstacks of the burden of > > doing these sorts of complex domain setup activities themselves by > > putting them in a common library, I can''t see why Remus setup would be > > an exception to this. > > > > I think we need to answer this question before reviewing this code, > > since doing it libxl would change the requirements of the code (from the > > API exposed, to logging, to memory managment, to not using system > > directly). > > > > If every one had to do this in the same way, then I agree that this > code should be moved into libxl. > > The setup can be done using a a few lines of shell script or in a 100-200 > line C code (like this patch). Given that there is a choice, I didnt > see any point > in forcing all toolstacks to go the libxl way (resorting to its > internal c code for setting > up the buffers). But if you think that we might as well make a sane > choice and abstract > this complexity from other toolstacks, I would be happy to push this > into libxl itself.If you think the sane choice is for everyone to do this the same way then please do it in libxl. Realistically I don''t imagine other toolstacks will implement Remus unless you either make it a no-brainer (by making it a trivial to use libxl interface) or you do it for them. Ian.
Ian Campbell
2013-Jul-30 10:50 UTC
Re: [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
On Mon, 2013-07-29 at 15:00 -0400, Shriram Rajagopalan wrote:> On Mon, Jul 29, 2013 at 11:49 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote: > >> Add appropriate code to xl_cmdline.c to setup network buffers for > >> each vif belonging to the guest. Also provide a command line switch > >> to explicitly "enable" network buffering. > >> > >> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> > >> > >> diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/libxl_types.idl > >> --- a/tools/libxl/libxl_types.idl Wed Jul 24 22:55:00 2013 -0700 > >> +++ b/tools/libxl/libxl_types.idl Thu Jul 25 00:02:19 2013 -0700 > >> @@ -521,6 +521,7 @@ libxl_domain_remus_info = Struct("domain > >> ("interval", integer), > >> ("blackhole", bool), > >> ("compression", bool), > >> + ("netbuf_iflist", libxl_string_list), > >> ]) > >> > >> libxl_event_type = Enumeration("event_type", [ > >> diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/xl_cmdimpl.c > >> --- a/tools/libxl/xl_cmdimpl.c Wed Jul 24 22:55:00 2013 -0700 > >> +++ b/tools/libxl/xl_cmdimpl.c Thu Jul 25 00:02:19 2013 -0700 > >> @@ -7039,10 +7039,109 @@ done: > >> return ret; > >> } > >> > >> +static char **get_guest_vifnames(uint32_t domid, int *num_vifs) > >> +{ > >> + char **viflist; > >> + libxl_device_nic *nics; > >> + libxl_nicinfo nicinfo; > >> + int nb, i; > >> + > >> + nics = libxl_device_nic_list(ctx, domid, &nb); > >> + if (!nics) { *num_vifs = 0; return NULL;} > >> + > >> + viflist = calloc((nb + 1), sizeof(char *)); > >> + if (!viflist) { > >> + perror("failed to allocate memory to hold vif names!"); > >> + exit(-1); > >> + } > >> + > >> + for (i = 0; i < nb; ++i) { > >> + if (!libxl_device_nic_getinfo(ctx, domid, &nics[i], &nicinfo)) { > >> + if (asprintf(&viflist[i], "vif%u.%d", domid, nicinfo.devid) < 0) { > > > > This doesn''t account for the ifname field of the vif. > > > > Ah thanks. good catch.. Will add a check for that.This will probably be easier once you move it into libxl since you can do it at a place where this sort of information is already in hand, I''d think.> > Also, I''m not sure how this is supposed to work when driver domains are > > in use either, > > Remus wont work with driver domains, unless there are agents in each of the > driver domains, that are co-ordinated by the memory checkpoint code in dom0.That doesn''t seem to be to hard to arrange. We are already thinking of a "xenbackendd" daemon which runs hotplug scripts in both dom0 and domU in order to reduce our dependency on udev (in dom0 the daemon may actually be libxl directly).> Irrespective of the hotplug scripts, Remus needs to control the IFB > network interface > attached to the Guest''s vifs/tap devices. Given that all these > interfaces will be inside > a driver domain, which does not have a fast (not xenstore) > communication channel to dom0, > there is no way the memory checkpointing code can co-ordinate with the > driver domain in order > to buffer/release its network packets after each checkpoint.It doesn''t seem implausible that the toolstack and that daemon might not also negotiate a memory-checkpoint evtchn or something like that for a given domain.> One alternative would be to have a network agent running inside each > of these driver domains, > assuming that the driver domains would have network access (practical ? ). > The memory checkpoint code would have to control the IFB devices via > the network agents. > > All this is in the long run.. The immediate goal is to get > network/disk buffering to work with xl.We still need to consider the future when designing the libxl API since that is stable and we avoid changing it once committed (there are mechanisms to evolve the interface, see libxl.h, but they are best avoided).> > > > if this were done via libxl then it would naturally get > > incorporated into Roger''s work to make hotplug scripts etc work properly > > with stub domains. > > > > Most of the other comments I had would become invalid/irrelevant when > > this was moved to libxl, since you''d naturally end up doing things > > differently anyway. > > > > Even if the code moved to libxl, most of the setup would remain the > same, i.e. setup on demand > using C code or something. It does not make sense to setup an IFB > device for every domain > right from the boot. You will eventually run out of ifb devices in the system.Uhm, I don''t think I suggested setting up an IFB at boot here.> If you are suggesting that we invoke a hotplug script when "xl remus" > command is issued, > I dont mind doing so either. The code in libxl (to control the plug > qdisc) is not going to go away > in either case.Would doing the setup in a shell script be easier/cleaner etc? I''m still not sure I understand why finding an ifb and binding it to a vif cannot be done by the vif hotplug script and the ifb communicated back to the toolstack via xenstore.
Shriram Rajagopalan
2013-Jul-30 15:25 UTC
Re: [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
> >> > Also, I''m not sure how this is supposed to work when driver domains are >> > in use either, >> >> Remus wont work with driver domains, unless there are agents in each of the >> driver domains, that are co-ordinated by the memory checkpoint code in dom0. > > That doesn''t seem to be to hard to arrange. We are already thinking of a > "xenbackendd" daemon which runs hotplug scripts in both dom0 and domU in > order to reduce our dependency on udev (in dom0 the daemon may actually > be libxl directly). > >> Irrespective of the hotplug scripts, Remus needs to control the IFB >> network interface >> attached to the Guest''s vifs/tap devices. Given that all these >> interfaces will be inside >> a driver domain, which does not have a fast (not xenstore) >> communication channel to dom0, >> there is no way the memory checkpointing code can co-ordinate with the >> driver domain in order >> to buffer/release its network packets after each checkpoint. > > It doesn''t seem implausible that the toolstack and that daemon might not > also negotiate a memory-checkpoint evtchn or something like that for a > given domain. > >> One alternative would be to have a network agent running inside each >> of these driver domains, >> assuming that the driver domains would have network access (practical ? ). >> The memory checkpoint code would have to control the IFB devices via >> the network agents. >> >> All this is in the long run.. The immediate goal is to get >> network/disk buffering to work with xl. > > We still need to consider the future when designing the libxl API since > that is stable and we avoid changing it once committed (there are > mechanisms to evolve the interface, see libxl.h, but they are best > avoided). >Assuming that we push all of the remus setup into libxl, to ease the pain off other toolstacks, then the external API to activate/deactivate remus will not change. (Also assuming that libxl.h is the libxl API you are talking about). The API is already there. libxl_domain_remus_start with the remus_info structure. So, even if we add driver domain support later, external toolstacks would not be impacted. I will certainly add stub code to handle driver domains. However, adding any piece of mechanism in there would be premature unless we know how xenbackendd is going to work and the nature of communication channels.>> If you are suggesting that we invoke a hotplug script when "xl remus" >> command is issued, >> I dont mind doing so either. The code in libxl (to control the plug >> qdisc) is not going to go away >> in either case. > > Would doing the setup in a shell script be easier/cleaner etc? >Neither. The current C-code (xl_netbuf) can do everything that a shell script does. However, if we were to invoke an external "hotplug" style script, it would give us more flexibility. For e.g., people could tweak the script to add other traffic shaping stuff onto the VM''s traffic. Or, with openvswitch/SDN, other fun stuff could be done.> I''m still not sure I understand why finding an ifb and binding it to a > vif cannot be done by the vif hotplug script and the ifb communicated > back to the toolstack via xenstore. >To make sure we are on the same page: I am under the assumption that the vif hotplug script is invoked during the domain boot. I am also assuming that you are talking about existing vif hotplug scripts (vif-setup, vif-bridge, etc). So, if we lock onto an IFB and bind it to the vif (i.e. redirect all egress traffic via the IFB), then what we are basically doing is to acquire an IFB device for every vif in the VM, for every VM, that may be run under Remus sometime in future. This basically means we would be setting up an IFB device for every domain right from boot and not when Remus is started for the domain. In case you mean something like a "vif-remus" hotplug script that is invoked when one invokes libxl_domain_remus_start API, then, you are right. all the setup can be done in this hotplug script and the chosen ifbs can be communicated via xenstore back to libxl. Hope that clears things up. shriram
Ian Campbell
2013-Jul-30 15:39 UTC
Re: [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
On Tue, 2013-07-30 at 11:25 -0400, Shriram Rajagopalan wrote:> > > >> > Also, I''m not sure how this is supposed to work when driver domains are > >> > in use either, > >> > >> Remus wont work with driver domains, unless there are agents in each of the > >> driver domains, that are co-ordinated by the memory checkpoint code in dom0. > > > > That doesn''t seem to be to hard to arrange. We are already thinking of a > > "xenbackendd" daemon which runs hotplug scripts in both dom0 and domU in > > order to reduce our dependency on udev (in dom0 the daemon may actually > > be libxl directly). > > > >> Irrespective of the hotplug scripts, Remus needs to control the IFB > >> network interface > >> attached to the Guest''s vifs/tap devices. Given that all these > >> interfaces will be inside > >> a driver domain, which does not have a fast (not xenstore) > >> communication channel to dom0, > >> there is no way the memory checkpointing code can co-ordinate with the > >> driver domain in order > >> to buffer/release its network packets after each checkpoint. > > > > It doesn''t seem implausible that the toolstack and that daemon might not > > also negotiate a memory-checkpoint evtchn or something like that for a > > given domain. > > > >> One alternative would be to have a network agent running inside each > >> of these driver domains, > >> assuming that the driver domains would have network access (practical ? ). > >> The memory checkpoint code would have to control the IFB devices via > >> the network agents. > >> > >> All this is in the long run.. The immediate goal is to get > >> network/disk buffering to work with xl. > > > > We still need to consider the future when designing the libxl API since > > that is stable and we avoid changing it once committed (there are > > mechanisms to evolve the interface, see libxl.h, but they are best > > avoided). > > > > Assuming that we push all of the remus setup into libxl, to ease the pain off > other toolstacks, then the external API to activate/deactivate remus > will not change. > (Also assuming that libxl.h is the libxl API you are talking about). > > The API is already there. libxl_domain_remus_start with the remus_info > structure. So, even if we add driver domain support later, external > toolstacks would > not be impacted. I will certainly add stub code to handle driver > domains. However, > adding any piece of mechanism in there would be premature unless we know how > xenbackendd is going to work and the nature of communication channels. > > >> If you are suggesting that we invoke a hotplug script when "xl remus" > >> command is issued, > >> I dont mind doing so either. The code in libxl (to control the plug > >> qdisc) is not going to go away > >> in either case. > > > > Would doing the setup in a shell script be easier/cleaner etc? > > > > Neither. The current C-code (xl_netbuf) can do everything that a shell > script does. > However, if we were to invoke an external "hotplug" style script, it > would give us > more flexibility. > For e.g., people could tweak the script to add other traffic shaping > stuff onto the VM''s traffic. Or, with openvswitch/SDN, other fun > stuff could be done. > > > I''m still not sure I understand why finding an ifb and binding it to a > > vif cannot be done by the vif hotplug script and the ifb communicated > > back to the toolstack via xenstore. > > > > To make sure we are on the same page: > I am under the assumption that the vif hotplug script is invoked > during the domain boot. > I am also assuming that you are talking about existing vif hotplug > scripts (vif-setup, vif-bridge, etc).This is what I meant, but...> > So, if we lock onto an IFB and bind it to the vif (i.e. redirect all > egress traffic via the IFB), > then what we are basically doing is to acquire an IFB device for > every vif in the VM, for every VM, > that may be run under Remus sometime in future. > This basically means we would be setting up an IFB device for every > domain right from boot > and not when Remus is started for the domain.... this. For some reason I thought that remus domains were started as such and had the inherent property, which is obviously not the case, and I knew that really.> In case you mean something like a "vif-remus" hotplug script that is > invoked when one invokes > libxl_domain_remus_start API,Or a new "action" for the existing vif-foo, but yes. Depends on whether routed remus is different to bridged remus I suppose.> then, you are right. all the setup can > be done in this hotplug script > and the chosen ifbs can be communicated via xenstore back to libxl.This might be worth it for the flexibility etc you mention above. Anyone got any other thoughts? Ian.
Roger Pau Monné
2013-Jul-30 16:11 UTC
Re: [PATCH 0 of 4 RFC] xl - Remus network buffering support
On 25/07/13 09:09, Shriram Rajagopalan wrote:> This patch series adds support for network buffering in the Remus > codebase in xl/libxl. In previous emails, I had proposed for a script > invocation to setup network buffering. After digging through libnl API, > I managed to find most of what I needed (except for one command, which > right now is executed through system() call). > > The patch series in its current state would allow xl to dynamically setup > and teardown buffering devices, qdiscs, etc associated with the guest, > instead of resorting to clunky one time configurations. > > The series is organized as follows: > 1/4 - Network buffering setup functions - abstractions built on top of libnl3 API > to implement functionality such as add/delete qdisc, interface up/down, > search for free ifb devices, etc. > > 2/4 - xl cmdline utility uses these abstractions to setup network buffers and > provides libxl with a list of ifb devices where packets would be buffered > > 3/4 - Libxl interaction with network buffer module in the kernel via libnl3. > > 4/4 - adds libnl3 (>= v3.2.17) dependency to autoconf scripts and linker flags > in tools/libxl/Makefile. > > Functionality tested on debian squeeze (kernel 3.4) + openvswitch + 64-bit PV domU (kernel 3.4).I''ve just skimmed quickly through the patches, but it seems like they add a bunch of Linux specific code to general parts of libxl, which would prevent building or using libxl in any OS different than Linux. If you have to add Linux specific code there''s libxl_linux.c, but please remember to also add the necessary stubs to libxl_netbsd.c in order to not break libxl for OSes different than Linux.
Ian Campbell
2013-Jul-31 08:33 UTC
Re: [PATCH 0 of 4 RFC] xl - Remus network buffering support
On Tue, 2013-07-30 at 18:11 +0200, Roger Pau Monné wrote:> I've just skimmed quickly through the patches, but it seems like they > add a bunch of Linux specific code to general parts of libxl, which > would prevent building or using libxl in any OS different than Linux. If > you have to add Linux specific code there's libxl_linux.c, but please > remember to also add the necessary stubs to libxl_netbsd.c in order to > not break libxl for OSes different than Linux.Good point. Since this looks like it might be a reasonable amount of code I think it might be better to put it into a more semantic rather than OS based location, e..g libxl_netlink.c or libxl_remus.c and the put the stubs in libxl_no<xxx>.c and conditionally compile the right one (based on OS). Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Aug-07 15:38 UTC
Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
Shriram Rajagopalan writes ("[Xen-devel] [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks"):> This patch constitutes the core network buffering logic. > Libxl would receive a list of IFB devices that collectively act as > network buffering devices, for a given guest. Also, libxl expects that > every ifb device in the list supplied by a toolstack (netbuf_iflist) > should have a plug qdisc installed.This seems to have many of the required pieces and is going in the right direction.> diff -r 3cd67f6ff63a -r bef729fc4336 tools/libxl/libxl_dom.c > --- a/tools/libxl/libxl_dom.c Thu Jul 25 00:02:19 2013 -0700 > +++ b/tools/libxl/libxl_dom.c Thu Jul 25 00:02:22 2013 -0700 > @@ -20,6 +20,7 @@ > #include "libxl_internal.h" > #include "libxl_arch.h" > > +#include <netlink/route/qdisc/plug.h>We need a configure option to disable this, in case the host doesn''t have it.> #include <xc_dom.h> > #include <xen/hvm/hvm_info_table.h> > #include <xen/hvm/hvm_xs_strings.h> > @@ -1212,12 +1213,36 @@ int libxl__toolstack_save(uint32_t domid > > /*----- remus callbacks -----*/ > > +/* REMUS TODO: Issue disk checkpoint reqs. */ > static int libxl__remus_domain_suspend_callback(void *data)...> + if (!remus_ctx->num_netbufs) return is_suspended; > + > + if (is_suspended) { > + for (i = 0; i < remus_ctx->num_netbufs; ++i) { > + ret = rtnl_qdisc_plug_buffer(remus_ctx->netbuf_qdisc_list[i]); > + if (!ret) > + ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i],Needs the line length reducing to 70-75 characters, maximum. (Multiple occcurrences of this problem in the patch.)> @@ -1255,13 +1279,104 @@ static void libxl__remus_domain_checkpoi > static void remus_checkpoint_dm_saved(libxl__egc *egc, > libxl__domain_suspend_state *dss, int rc) > { > - /* REMUS TODO: Wait for disk and memory ack, release network buffer */ > - /* REMUS TODO: make this asynchronous */ > - assert(!rc); /* REMUS TODO handle this error properly */...> + assert(!rc);Does this TODO not need to remain ?> + if (!ret) > + ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i], > + NLM_F_REQUEST); > + if (ret) { > + LOG(ERROR, "Cannot release buffer from %s:%s", > + dss->remus->netbuf_iflist[i], nl_geterror(ret)); > + ret= 0;Missing space after "=". ...> + usleep(dss->remus_ctx->interval * 1000);This is still pretty bad, surely ?> +static libxl__remus_ctx *setup_remus_ctx(libxl__gc *gc, > + const libxl_domain_remus_info *r_info) > +{...> + libxl_string_list l;...> + l = r_info->netbuf_iflist;Is this a convenience shorthand ? If so I think it might be better to say const libxl_string_list *l = &r_info->netbuf_iflist; ...> + nl_connect(nlsock, NETLINK_ROUTE);Does this not return an error value ?> + if (rtnl_qdisc_alloc_cache(nlsock, &qdisc_cache) < 0) { > + LOG(ERROR, "setup_remus_ctx: failed to allocate qdisc cache"); > + goto end; > + } > + > + remus_ctx->netbuf_qdisc_list = libxl__calloc(gc, num_netbufs + 1, > + sizeof(struct rtnl_qdisc *));Why num_netbufs+1 ?> + remus_ctx->num_netbufs = num_netbufs; > + remus_ctx->nlsock = nlsock; > + remus_ctx->qdisc_cache = qdisc_cache;It would probably be better to use the fields in the remus_ctx directly, rather than having a separate set of local variables.> + ifb = rtnl_link_alloc(); > + > + for (i = 0; i < num_netbufs; ++i) { > + > + if (rtnl_link_get_kernel(nlsock, 0, l[i], &ifb) < 0) { > + LOG(ERROR, "setup_remus_ctx: cannot obtain handle for %s", l[i]); > + goto end;Do these functions not provide errno values, or the moral equivalent ?> + } > + > + ifindex = rtnl_link_get_ifindex(ifb); > + if (!ifindex) { > + LOG(ERROR, "setup_remus_ctx: invalid interface %s", l[i]); > + goto end; > + } > + > + qdisc = rtnl_qdisc_get_by_parent(qdisc_cache, ifindex, TC_H_ROOT); > + if (!qdisc || strcmp(rtnl_tc_get_kind(TC_CAST(qdisc)), "plug")) {Can rtnl_tc_get_kind fail ?> + LOG(ERROR, "setup_remus_ctx: plug qdisc is not installed on %s", l[i]); > + goto end; > + } > + > + remus_ctx->netbuf_qdisc_list[i] = qdisc; > + } > + > + rtnl_link_put(ifb); > + return remus_ctx; > + > + end: > + if (ifb) rtnl_link_put(ifb); > + if (qdisc_cache) nl_cache_free(qdisc_cache); > + if (nlsock) nl_close(nlsock);I think this can perhaps leak remus_ctx and qdisc. Thanks, Ian.
Ian Jackson
2013-Aug-07 15:41 UTC
Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
Ian Campbell writes ("Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks"):> This really needs Ian''s review WRT the lifetime of the remus context > object vs. the ao machinery. IOW I''m not sure if this is the appropriate > gc to be using or if one associated with the ao context isn''t required.AFAICT the remus_ctx is allocated from the ao''s gc because of this:> > void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)... STATE_AO_GC(dss->ao); ...> > + dss->remus_ctx = setup_remus_ctx(gc, r_info);So "gc" in libxl__domain_suspend is the ao gc, and that ends up being used for everything. Ian.
Shriram Rajagopalan
2013-Aug-07 21:51 UTC
Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
On Wed, Aug 7, 2013 at 11:38 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> > > > +#include <netlink/route/qdisc/plug.h> > > We need a configure option to disable this, in case the host doesn''t > have it. > > > #include <xc_dom.h> > > #include <xen/hvm/hvm_info_table.h> > > #include <xen/hvm/hvm_xs_strings.h> > > @@ -1212,12 +1213,36 @@ int libxl__toolstack_save(uint32_t domid > > > > /*----- remus callbacks -----*/ > > > > +/* REMUS TODO: Issue disk checkpoint reqs. */ > > static int libxl__remus_domain_suspend_callback(void *data) > ... > > + if (!remus_ctx->num_netbufs) return is_suspended; > > + > > + if (is_suspended) { > > + for (i = 0; i < remus_ctx->num_netbufs; ++i) { > > + ret > rtnl_qdisc_plug_buffer(remus_ctx->netbuf_qdisc_list[i]); > > + if (!ret) > > + ret = rtnl_qdisc_add(remus_ctx->nlsock, > remus_ctx->netbuf_qdisc_list[i], > > Needs the line length reducing to 70-75 characters, maximum. > (Multiple occcurrences of this problem in the patch.) > > > @@ -1255,13 +1279,104 @@ static void libxl__remus_domain_checkpoi > > static void remus_checkpoint_dm_saved(libxl__egc *egc, > > libxl__domain_suspend_state *dss, > int rc) > > { > > - /* REMUS TODO: Wait for disk and memory ack, release network buffer > */ > > - /* REMUS TODO: make this asynchronous */ > > - assert(!rc); /* REMUS TODO handle this error properly */ > ... > > + assert(!rc); > > Does this TODO not need to remain ? > > > + if (!ret) > > + ret = rtnl_qdisc_add(remus_ctx->nlsock, > remus_ctx->netbuf_qdisc_list[i], > > + NLM_F_REQUEST); > > + if (ret) { > > + LOG(ERROR, "Cannot release buffer from %s:%s", > > + dss->remus->netbuf_iflist[i], nl_geterror(ret)); > > + ret= 0; > > Missing space after "=". > > ... > > + usleep(dss->remus_ctx->interval * 1000); > > This is still pretty bad, surely ? > >Was thinking of updating this in a separate patch.. But I guess since this line is changing, might as well do it in this patch.> > + rtnl_link_put(ifb); > > + return remus_ctx; > > + > > + end: > > + if (ifb) rtnl_link_put(ifb); > > + if (qdisc_cache) nl_cache_free(qdisc_cache); > > + if (nlsock) nl_close(nlsock); > > I think this can perhaps leak remus_ctx and qdisc. > >There is no issue of leaking qdisc. Because it simply obtains references from qdisc_cache, which is freed if things fail. WRT remus_ctx, I had intentionally skipped the free call. Its allocated in the GC context. [the same one used by domain_suspend, etc]. And if this call fails, then the parent call libxl_domain_remus_start will fail. I assumed that when libxl_free_all was called at this point, remus_ctx would also be freed. Am I missing something in the overall control flow, that would cause this leak ? Thanks,> Ian. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Aug-08 11:07 UTC
Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
Shriram Rajagopalan writes ("Re: [Xen-devel] [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks"):> On Wed, Aug 7, 2013 at 11:38 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:> > + usleep(dss->remus_ctx->interval * 1000); > > This is still pretty bad, surely ?...> Was thinking of updating this in a separate patch.. But I guess since this line > is changing, might as well do it in this patch.I''m happy with it in a separate patch. Doing it in a separate patch probably makes things clearer. This patch does have a TODO about it.> > + rtnl_link_put(ifb); > > + return remus_ctx; > > + > > + end: > > + if (ifb) rtnl_link_put(ifb); > > + if (qdisc_cache) nl_cache_free(qdisc_cache); > > + if (nlsock) nl_close(nlsock); > > I think this can perhaps leak remus_ctx and qdisc....> There is no issue of leaking qdisc. Because it simply obtains references from > qdisc_cache, which is freed if things fail.Perhaps this is worth a comment ?> WRT remus_ctx, I had intentionally skipped the free call. > Its allocated in the GC context. [the same one used by domain_suspend, etc]. > And if this call fails, then the parent call libxl_domain_remus_start will > fail.Sorry, yes.> I assumed that when libxl_free_all was called at this point, remus_ctx would > also be > freed. Am I missing something in the overall control flow, that would cause > this leak ?No, I was confused. Ian.