Shriram Rajagopalan
2013-Oct-21 05:58 UTC
[PATCH 0 of 5 V3] Remus/Libxl: Network buffering support
This patch series adds support for network buffering in the Remus codebase in libxl. Changes in V3: [1/5] Fix redundant checks in configure scripts (based on Ian Campbell''s suggestions) [2/5] Introduce locking in the script, during IFB setup. Add xenstore paths used by netbuf scripts to xenstore-paths.markdown [3/5] Hotplug scripts setup/teardown invocations are now asynchronous following IanJ''s feedback. However, the invocations are still sequential. [5/5] Allow per-domain specification of netbuffer scripts in xl remus commmand. And minor nits throughout the series based on feedback from the last version Changes in V2: [1/5] Configure script will automatically enable/disable network buffer support depending on the availability of the appropriate libnl3 version. [If libnl3 is unavailable, a warning message will be printed to let the user know that the feature has been disabled.] use macros from pkg.m4 instead of pkg-config commands removed redundant checks for libnl3 libraries. [3,4/5] - Minor nits. Version 1: [1/5] Changes to autoconf scripts to check for libnl3. Add linker flags to libxl Makefile. [2/5] External script to setup/teardown network buffering using libnl3''s CLI. This script will be invoked by libxl before starting Remus. The script''s main job is to bring up an IFB device with plug qdisc attached to it. It then re-routes egress traffic from the guest''s vif to the IFB device. [3/5] Libxl code to invoke the external setup script, followed by netlink related setup to obtain a handle on the output buffers attached to each vif. [4/5] Libxl interaction with network buffer module in the kernel via libnl3 API. [5/5] xl cmdline switch to explicitly enable network buffering when starting remus. Few things to note: a) Based on previous email discussions, the setup/teardown task has been moved to a hotplug style shell script which can be customized as desired, instead of implementing it as C code inside libxl. b) Libnl3 is not available on NetBSD. Nor is it available on CentOS (Linux). So I have made network buffering support an optional feature so that it can be disabled if desired. c) NetBSD does not have libnl3. So I have put the setup script under tools/hotplug/Linux folder. thanks shriram
Shriram Rajagopalan
2013-Oct-21 05:58 UTC
[PATCH 1 of 5 V3] remus: add libnl3 dependency to autoconf scripts
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1381540581 25200 # Node ID 357655a1598cba8f8460758714885563dd2b4a9c # Parent 808aba98084edee7b6dc78c7333da71d2be70386 remus: add libnl3 dependency to autoconf scripts Libnl3 is required for controlling Remus network buffering. This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts. Also provide ability to configure tools without libnl3 support, that is without network buffering support. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 808aba98084e -r 357655a1598c config/Tools.mk.in --- a/config/Tools.mk.in Fri Oct 11 18:16:11 2013 -0700 +++ b/config/Tools.mk.in Fri Oct 11 18:16:21 2013 -0700 @@ -37,6 +37,8 @@ PTHREAD_LIBS := @PTHREAD_LIBS@ PTYFUNCS_LIBS := @PTYFUNCS_LIBS@ +LIBNL3_LIBS := @LIBNL3_LIBS@ +LIBNL3_CFLAGS := @LIBNL3_CFLAGS@ # Download GIT repositories via HTTP or GIT''s own protocol? # GIT''s protocol is faster and more robust, when it works at all (firewalls # may block it). We make it the default, but if your GIT repository downloads @@ -55,6 +57,7 @@ CONFIG_QEMU_TRAD := @qemu_traditional CONFIG_QEMU_XEN := @qemu_xen@ CONFIG_XEND := @xend@ CONFIG_BLKTAP1 := @blktap1@ +CONFIG_REMUS_NETBUF := @remus_netbuf@ #System options ZLIB := @zlib@ diff -r 808aba98084e -r 357655a1598c tools/configure.ac --- a/tools/configure.ac Fri Oct 11 18:16:11 2013 -0700 +++ b/tools/configure.ac Fri Oct 11 18:16:21 2013 -0700 @@ -219,6 +219,24 @@ AC_SUBST(libiconv) # Checks for header files. AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h]) +# Check for libnl3 >=3.2.8. If present enable remus network buffering. +PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8], + [libnl3_lib="y"], [libnl3_lib="n"]) + +# Check for libnl3 command line utilities +PKG_CHECK_EXISTS([libnl-cli-3.0 >= 3.2.8], [libnl3_cli="y"], [libnl3_cli="n"]) + +AS_IF([test "x$libnl3_lib" = "xn" || test "x$libnl3_cli" = "xn"], [ + AC_MSG_WARN([Disabling support for Remus network buffering. + Please install libnl3 libraries, command line tools and devel + headers - version 3.2.8 or higher]) + AC_SUBST(remus_netbuf, [n]) + ],[ + AC_SUBST(LIBNL3_LIBS) + AC_SUBST(LIBNL3_CFLAGS) + AC_SUBST(remus_netbuf, [y]) +]) + AC_OUTPUT() AS_IF([test "x$xend" = "xy" ], [ diff -r 808aba98084e -r 357655a1598c tools/libxl/Makefile --- a/tools/libxl/Makefile Fri Oct 11 18:16:11 2013 -0700 +++ b/tools/libxl/Makefile Fri Oct 11 18:16:21 2013 -0700 @@ -21,11 +21,13 @@ endif LIBXL_LIBS LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS) +LIBXL_LIBS += $(LIBNL3_LIBS) CFLAGS_LIBXL += $(CFLAGS_libxenctrl) CFLAGS_LIBXL += $(CFLAGS_libxenguest) CFLAGS_LIBXL += $(CFLAGS_libxenstore) CFLAGS_LIBXL += $(CFLAGS_libblktapctl) +CFLAGS_LIBXL += $(LIBNL3_CFLAGS) CFLAGS_LIBXL += -Wshadow CFLAGS += $(PTHREAD_CFLAGS) diff -r 808aba98084e -r 357655a1598c tools/remus/README --- a/tools/remus/README Fri Oct 11 18:16:11 2013 -0700 +++ b/tools/remus/README Fri Oct 11 18:16:21 2013 -0700 @@ -2,3 +2,9 @@ Remus provides fault tolerance for virtu checkpoints to a backup, which will activate if the target VM fails. See the website at http://nss.cs.ubc.ca/remus/ for details. + +Using Remus with libxl on Xen 4.4 and higher: + To enable network buffering, you need libnl 3.2.8 + or higher along with the development headers and command line utilities. + If your distro does not have the appropriate libnl3 version, you can find + the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
Shriram Rajagopalan
2013-Oct-21 05:58 UTC
[PATCH 2 of 5 V3] tools/hotplug: Remus network buffering setup scripts
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1382274772 25200 # Node ID 0001a8222a785865b753acf8bcdf97c10c9e3819 # Parent 357655a1598cba8f8460758714885563dd2b4a9c tools/hotplug: Remus network buffering setup scripts This patch introduces remus-netbuf-setup hotplug script responsible for setting up and tearing down the necessary infrastructure required for network output buffering in Remus. This script is intended to be invoked by libxl for each guest interface, when starting or stopping Remus. Apart from returning success/failure indication via the usual hotplug entries in xenstore, this script also writes to xenstore, the name of the IFB device to be used to control the vif''s network output. The script relies on libnl3 command line utilities to perform various setup/teardown functions. The script is confined to Linux platforms only since NetBSD does not seem to have libnl3. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 357655a1598c -r 0001a8222a78 docs/misc/xenstore-paths.markdown --- a/docs/misc/xenstore-paths.markdown Fri Oct 11 18:16:21 2013 -0700 +++ b/docs/misc/xenstore-paths.markdown Sun Oct 20 06:12:52 2013 -0700 @@ -356,6 +356,10 @@ The guest''s virtual time offset from UTC The device model version for a domain. +#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL] + +IFB device used by Remus to buffer network output from the associated vif. + [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html diff -r 357655a1598c -r 0001a8222a78 tools/hotplug/Linux/Makefile --- a/tools/hotplug/Linux/Makefile Fri Oct 11 18:16:21 2013 -0700 +++ b/tools/hotplug/Linux/Makefile Sun Oct 20 06:12:52 2013 -0700 @@ -16,6 +16,7 @@ XEN_SCRIPTS += network-nat vif-nat XEN_SCRIPTS += vif-openvswitch XEN_SCRIPTS += vif2 XEN_SCRIPTS += vif-setup +XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup XEN_SCRIPTS += block XEN_SCRIPTS += block-enbd block-nbd XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap diff -r 357655a1598c -r 0001a8222a78 tools/hotplug/Linux/remus-netbuf-setup --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/hotplug/Linux/remus-netbuf-setup Sun Oct 20 06:12:52 2013 -0700 @@ -0,0 +1,187 @@ +#!/bin/bash +#===========================================================================+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup +# +# Script for attaching a network buffer to the specified vif (in any mode). +# The hotplugging system will call this script when starting remus via libxl +# API, libxl_domain_remus_start. +# +# Usage: +# remus-netbuf-setup (setup|teardown) +# +# Environment vars: +# vifname vif interface name (required). +# XENBUS_PATH path in Xenstore, where the IFB device details will be stored +# or read from (required). +# (libxl passes /libxl/<domid>/remus/netbuf/<devid>) +# IFB ifb interface to be cleaned up (required). [for teardown op only] + +# Written to the store: (setup operation) +# XENBUS_PATH/ifb=<ifbdevName> the IFB device serving +# as the intermediate buffer through which the interface''s network output +# can be controlled. +# +# To install a network buffer on a guest vif (vif1.0) using ifb (ifb0) +# we need to do the following +# +# ip link set dev ifb0 up +# tc qdisc add dev vif1.0 ingress +# tc filter add dev vif1.0 parent ffff: proto ip \ +# prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0 +# nl-qdisc-add --dev=ifb0 --parent root plug +# nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000 +# (10MB limit on buffer) +# +# So order of operations when installing a network buffer on vif1.0 +# 1. find a free ifb and bring up the device +# 2. redirect traffic from vif1.0 to ifb: +# 2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest) +# 2.2 use tc filter command with actions mirred egress + redirect +# 3. install plug_qdisc on ifb device, with which we can buffer/release +# guest''s network output from vif1.0 +# +# + +#===========================================================================+ +# Unlike other vif scripts, vif-common is not needed here as it executes vif +#specific setup code such as renaming. +dir=$(dirname "$0") +. "$dir/xen-hotplug-common.sh" + +findCommand "$@" + +if [ "$command" != "setup" -a "$command" != "teardown" ] +then + echo "Invalid command: $command" + log err "Invalid command: $command" + exit 1 +fi + +evalVariables "$@" + +: ${vifname?} +: ${XENBUS_PATH?} + +check_libnl_tools() { + if ! command -v nl-qdisc-list > /dev/null 2>&1; then + fatal "Unable to find nl-qdisc-list tool" + fi + if ! command -v nl-qdisc-add > /dev/null 2>&1; then + fatal "Unable to find nl-qdisc-add tool" + fi + if ! command -v nl-qdisc-delete > /dev/null 2>&1; then + fatal "Unable to find nl-qdisc-delete tool" + fi +} + +# We only check for modules. We don''t load them. +# User/Admin is supposed to load ifb during boot time, +# ensuring that there are enough free ifbs in the system. +# Other modules will be loaded automatically by tc commands. +check_modules() { + for m in ifb sch_plug sch_ingress act_mirred cls_u32 + do + if ! modinfo $m > /dev/null 2>&1; then + fatal "Unable to find $m kernel module" + fi + done +} + +setup_ifb() { + + for ifb in `ifconfig -a -s|egrep ^ifb|cut -d '' '' -f1` + do + local installed=`nl-qdisc-list -d $ifb` + [ -n "$installed" ] && continue + IFB="$ifb" + break + done + + if [ -z "$IFB" ] + then + fatal "Unable to find a free IFB device for $vifname" + fi + + do_or_die ip link set dev "$IFB" up +} + +redirect_vif_traffic() { + local vif=$1 + local ifb=$2 + + do_or_die tc qdisc add dev "$vif" ingress + + tc filter add dev "$vif" parent ffff: proto ip prio 10 \ + u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1 + + if [ $? -ne 0 ] + then + do_without_error tc qdisc del dev "$vif" ingress + fatal "Failed to redirect traffic from $vif to $ifb" + fi +} + +add_plug_qdisc() { + local vif=$1 + local ifb=$2 + + nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1 + if [ $? -ne 0 ] + then + do_without_error tc qdisc del dev "$vif" ingress + fatal "Failed to add plug qdisc to $ifb" + fi + + #set ifb buffering limit in bytes. Its okay if this command + nl-qdisc-add --dev="$ifb" --parent root \ + --update plug --limit=10000000 >/dev/null 2>&1 +} + +teardown_netbuf() { + local vif=$1 + local ifb=$2 + + if [ "$ifb" ]; then + do_without_error ip link set dev "$ifb" down + do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1 + xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true + fi + do_without_error tc qdisc del dev "$vif" ingress + xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true +} + +xs_write_failed() { + local vif=$1 + local ifb=$2 + teardown_netbuf "$vifname" "$IFB" + fatal "failed to write ifb name to xenstore" +} + +case "$command" in + setup) + check_libnl_tools + check_modules + + claim_lock "pickifb" + setup_ifb + redirect_vif_traffic "$vifname" "$IFB" + add_plug_qdisc "$vifname" "$IFB" + release_lock "pickifb" + + #not using xenstore_write that automatically exits on error + # because we need to cleanup + _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB" + ;; + teardown) + : ${IFB?} + teardown_netbuf "$vifname" "$IFB" + ;; +esac + +log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB." + +if [ "$command" = "setup" ] +then + success +fi
Shriram Rajagopalan
2013-Oct-21 05:58 UTC
[PATCH 3 of 5 V3] tools/libxl: setup/teardown Remus network buffering
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1382295266 25200 # Node ID d3f088236c550213fc04ed982df47b4771b28d2f # Parent 0001a8222a785865b753acf8bcdf97c10c9e3819 tools/libxl: setup/teardown Remus network buffering Setup/teardown remus network buffering for a given guest, when libxl_domain_remus_start API is invoked. This patch does the following during setup: a) call the hotplug script for each vif to setup its network buffer b) establish a dedicated remus context containing libnl related state (netlink sockets, qdisc caches, etc.,) c) Obtain handles to plug qdiscs installed on the IFB devices chosen by the hotplug scripts. During teardown, the netlink resources are released, followed by invocation of hotplug scripts to remove the IFB devices. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 0001a8222a78 -r d3f088236c55 tools/libxl/Makefile --- a/tools/libxl/Makefile Sun Oct 20 06:12:52 2013 -0700 +++ b/tools/libxl/Makefile Sun Oct 20 11:54:26 2013 -0700 @@ -42,6 +42,13 @@ LIBXL_OBJS-y += libxl_blktap2.o else LIBXL_OBJS-y += libxl_noblktap2.o endif + +ifeq ($(CONFIG_REMUS_NETBUF),y) +LIBXL_OBJS-y += libxl_netbuffer.o +else +LIBXL_OBJS-y += libxl_nonetbuffer.o +endif + LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o LIBXL_OBJS-$(CONFIG_IA64) += libxl_nocpuid.o libxl_noarch.o LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_noarch.o diff -r 0001a8222a78 -r d3f088236c55 tools/libxl/libxl.c --- a/tools/libxl/libxl.c Sun Oct 20 06:12:52 2013 -0700 +++ b/tools/libxl/libxl.c Sun Oct 20 11:54:26 2013 -0700 @@ -698,8 +698,9 @@ libxl_vminfo * libxl_list_vm(libxl_ctx * return ptr; } -static void remus_failover_cb(libxl__egc *egc, - libxl__domain_suspend_state *dss, int rc); +static void remus_replication_failure_cb(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc); /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, @@ -718,40 +719,57 @@ int libxl_domain_remus_start(libxl_ctx * GCNEW(dss); dss->ao = ao; - dss->callback = remus_failover_cb; + dss->callback = remus_replication_failure_cb; dss->domid = domid; dss->fd = send_fd; /* TODO do something with recv_fd */ dss->type = type; dss->live = 1; dss->debug = 0; - dss->remus = info; assert(info); - /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */ + GCNEW(dss->remus_ctx); + + /* convenience shorthand */ + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + remus_ctx->blackhole = info->blackhole; + remus_ctx->interval = info->interval; + remus_ctx->compression = info->compression; + remus_ctx->dss = dss; + + /* TODO: enable disk buffering */ + + /* Setup network buffering */ + if (info->netbuf) { + if (!libxl__netbuffer_enabled(gc)) { + LOG(ERROR, "Remus: No support for network buffering"); + goto out; + } + + if (info->netbufscript) { + remus_ctx->netbufscript + libxl__strdup(gc, info->netbufscript); + } else { + remus_ctx->netbufscript + GCSPRINTF("%s/remus-netbuf-setup", + libxl__xen_script_dir_path()); + } + } /* Point of no return */ - libxl__domain_suspend(egc, dss); + libxl__remus_setup_initiate(egc, dss); return AO_INPROGRESS; out: return AO_ABORT(rc); } -static void remus_failover_cb(libxl__egc *egc, - libxl__domain_suspend_state *dss, int rc) +static void remus_replication_failure_cb(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc) { STATE_AO_GC(dss->ao); - /* - * With Remus, if we reach this point, it means either - * backup died or some network error occurred preventing us - * from sending checkpoints. - */ - - /* TBD: Remus cleanup - i.e. detach qdisc, release other - * resources. - */ libxl__ao_complete(egc, ao, rc); } diff -r 0001a8222a78 -r d3f088236c55 tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c Sun Oct 20 06:12:52 2013 -0700 +++ b/tools/libxl/libxl_dom.c Sun Oct 20 11:54:26 2013 -0700 @@ -1211,6 +1211,50 @@ int libxl__toolstack_save(uint32_t domid return 0; } +/*----- remus setup/teardown code -----*/ +void libxl__remus_setup_initiate(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ + if (!dss->remus_ctx->netbufscript) + libxl__remus_setup_done(egc, dss, 0); + else + libxl__remus_netbuf_setup(egc, dss); +} + +void libxl__remus_setup_done(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc) +{ + STATE_AO_GC(dss->ao); + if (!rc) { + libxl__domain_suspend(egc, dss); + return; + } + + LOG(ERROR, "Remus: failed to setup network buffering" + " for guest with domid %u", dss->domid); + domain_suspend_done(egc, dss, rc); +} + +void libxl__remus_teardown_initiate(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc) +{ + /* stash rc somewhere before invoking teardown ops. */ + dss->remus_ctx->saved_rc = rc; + + if (!dss->remus_ctx->netbuf_ctx) + libxl__remus_teardown_done(egc, dss); + else + libxl__remus_netbuf_teardown(egc, dss); +} + +void libxl__remus_teardown_done(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ + dss->callback(egc, dss, dss->remus_ctx->saved_rc); +} + /*----- remus callbacks -----*/ static int libxl__remus_domain_suspend_callback(void *data) @@ -1259,7 +1303,7 @@ static void remus_checkpoint_dm_saved(li /* REMUS TODO: Wait for disk and memory ack, release network buffer */ /* REMUS TODO: make this asynchronous */ assert(!rc); /* REMUS TODO handle this error properly */ - usleep(dss->interval * 1000); + usleep(dss->remus_ctx->interval * 1000); libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); } @@ -1277,7 +1321,6 @@ void libxl__domain_suspend(libxl__egc *e const libxl_domain_type type = dss->type; const int live = dss->live; const int debug = dss->debug; - const libxl_domain_remus_info *const r_info = dss->remus; libxl__srm_save_autogen_callbacks *const callbacks &dss->shs.callbacks.save.a; @@ -1312,11 +1355,8 @@ void libxl__domain_suspend(libxl__egc *e dss->guest_responded = 0; dss->dm_savefile = libxl__device_model_savefile(gc, domid); - if (r_info != NULL) { - dss->interval = r_info->interval; - if (r_info->compression) - dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS; - } + if (dss->remus_ctx && dss->remus_ctx->compression) + dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS; dss->xce = xc_evtchn_open(NULL, 0); if (dss->xce == NULL) @@ -1335,7 +1375,7 @@ void libxl__domain_suspend(libxl__egc *e } memset(callbacks, 0, sizeof(*callbacks)); - if (r_info != NULL) { + if (dss->remus_ctx != NULL) { callbacks->suspend = libxl__remus_domain_suspend_callback; callbacks->postcopy = libxl__remus_domain_resume_callback; callbacks->checkpoint = libxl__remus_domain_checkpoint_callback; @@ -1495,6 +1535,17 @@ static void domain_suspend_done(libxl__e if (dss->xce != NULL) xc_evtchn_close(dss->xce); + if (dss->remus_ctx) { + /* + * With Remus, if we reach this point, it means either + * backup died or some network error occurred preventing us + * from sending checkpoints. Teardown the network buffers and + * release netlink resources. This is an async op. + */ + libxl__remus_teardown_initiate(egc, dss, rc); + return; + } + dss->callback(egc, dss, rc); } diff -r 0001a8222a78 -r d3f088236c55 tools/libxl/libxl_internal.h --- a/tools/libxl/libxl_internal.h Sun Oct 20 06:12:52 2013 -0700 +++ b/tools/libxl/libxl_internal.h Sun Oct 20 11:54:26 2013 -0700 @@ -2242,6 +2242,50 @@ typedef struct libxl__logdirty_switch { libxl__ev_time timeout; } libxl__logdirty_switch; +typedef struct libxl__remus_ctx { + /* checkpoint interval */ + int interval; + int blackhole; + int compression; + int saved_rc; + /* Script to setup/teardown network buffers */ + const char *netbufscript; + /* Opaque context containing network buffer related stuff */ + void *netbuf_ctx; + libxl__domain_suspend_state *dss; + libxl__ev_time timeout; + libxl__ev_child child; + int num_exec; +} libxl__remus_ctx; + +_hidden int libxl__netbuffer_enabled(libxl__gc *gc); + +_hidden void libxl__remus_setup_initiate(libxl__egc *egc, + libxl__domain_suspend_state *dss); + +_hidden void libxl__remus_setup_done(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc); + +_hidden void libxl__remus_teardown_initiate(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc); + +_hidden void libxl__remus_teardown_done(libxl__egc *egc, + libxl__domain_suspend_state *dss); + +_hidden void libxl__remus_netbuf_setup(libxl__egc *egc, + libxl__domain_suspend_state *dss); + +_hidden void libxl__remus_netbuf_teardown(libxl__egc *egc, + libxl__domain_suspend_state *dss); + +_hidden int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx); + +_hidden int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx); + struct libxl__domain_suspend_state { /* set by caller of libxl__domain_suspend */ libxl__ao *ao; @@ -2252,7 +2296,7 @@ struct libxl__domain_suspend_state { libxl_domain_type type; int live; int debug; - const libxl_domain_remus_info *remus; + libxl__remus_ctx *remus_ctx; /* private */ xc_evtchn *xce; /* event channel handle */ int suspend_eventchn; @@ -2260,7 +2304,6 @@ struct libxl__domain_suspend_state { int xcflags; int guest_responded; const char *dm_savefile; - int interval; /* checkpoint interval (for Remus) */ libxl__save_helper_state shs; libxl__logdirty_switch logdirty; /* private for libxl__domain_save_device_model */ diff -r 0001a8222a78 -r d3f088236c55 tools/libxl/libxl_netbuffer.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/libxl/libxl_netbuffer.c Sun Oct 20 11:54:26 2013 -0700 @@ -0,0 +1,516 @@ +/* + * Copyright (C) 2013 + * Author Shriram Rajagopalan <rshriram@cs.ubc.ca> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include "libxl_osdeps.h" /* must come before any other headers */ + +#include "libxl_internal.h" + +#include <netlink/cache.h> +#include <netlink/socket.h> +#include <netlink/attr.h> +#include <netlink/route/link.h> +#include <netlink/route/route.h> +#include <netlink/route/qdisc.h> +#include <netlink/route/qdisc/plug.h> + +typedef struct libxl__remus_netbuf_ctx { + struct rtnl_qdisc **netbuf_qdisc_list; + struct nl_sock *nlsock; + struct nl_cache *qdisc_cache; + char **vif_list; + char **ifb_list; + uint32_t num_netbufs; + uint32_t unused; +} libxl__remus_netbuf_ctx; + +/* If the device has a vifname, then use that instead of + * the vifX.Y format. + */ +static char *get_vifname(libxl__gc *gc, uint32_t domid, + libxl_device_nic *nic) +{ + const char *vifname = NULL; + vifname = libxl__xs_read(gc, XBT_NULL, + libxl__sprintf(gc, + "%s/backend/vif/%d/%d/vifname", + libxl__xs_get_dompath(gc, 0), + domid, nic->devid)); + if (!vifname) { + vifname = libxl__device_nic_devname(gc, domid, + nic->devid, + nic->nictype); + } + + return libxl__strdup(gc, vifname); +} + +static char **get_guest_vif_list(libxl__gc *gc, uint32_t domid, + int *num_vifs) +{ + libxl_device_nic *nics = NULL; + int nb, i = 0; + char **vif_list = NULL; + + *num_vifs = 0; + nics = libxl_device_nic_list(CTX, domid, &nb); + if (!nics) return NULL; + + /* Ensure that none of the vifs are backed by driver domains */ + for (i = 0; i < nb; i++) { + if (nics[i].backend_domid != LIBXL_TOOLSTACK_DOMID) { + LOG(ERROR, "vif %s has driver domain (%u) as its backend.\n" + "Network buffering is not supported with driver domains", + get_vifname(gc, domid, &nics[i]), nics[i].backend_domid); + *num_vifs = -1; + goto end; + } + } + + vif_list = libxl__calloc(gc, nb, sizeof(char *)); + for (i = 0; i < nb; ++i) + vif_list[i] = get_vifname(gc, domid, &nics[i]); + *num_vifs = nb; + + end: + for (i = 0; i < nb; i++) + libxl_device_nic_dispose(&nics[i]); + free(nics); + return vif_list; +} + +static int init_qdiscs(libxl__gc *gc, + libxl__remus_ctx *remus_ctx); + +static int exec_netbuf_script(libxl__gc *gc, libxl__remus_ctx *remus_ctx, + char *op, libxl__ev_child_callback *death); + +static void netbuf_setup_script_cb(libxl__egc *egc, + libxl__ev_child *child, + pid_t pid, int status); + +static void netbuf_teardown_script_cb(libxl__egc *egc, + libxl__ev_child *child, + pid_t pid, int status); + +static void netbuf_setup_timeout_cb(libxl__egc *egc, + libxl__ev_time *ev, + const struct timeval *requested_abs); + +static int init_qdiscs(libxl__gc *gc, + libxl__remus_ctx *remus_ctx) +{ + int i, ret, ifindex; + struct rtnl_link *ifb = NULL; + struct rtnl_qdisc *qdisc = NULL; + + /* convenience vars */ + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + int num_netbufs = netbuf_ctx->num_netbufs; + char **ifb_list = netbuf_ctx->ifb_list; + + /* Now that we have brought up IFB devices with plug qdisc for + * each vif, lets get a netlink handle on the plug qdisc for use + * during checkpointing. + */ + netbuf_ctx->nlsock = nl_socket_alloc(); + if (!netbuf_ctx->nlsock) { + LOG(ERROR, "cannot allocate nl socket"); + goto end; + } + + ret = nl_connect(netbuf_ctx->nlsock, NETLINK_ROUTE); + if (ret) { + LOG(ERROR, "failed to open netlink socket: %s", + nl_geterror(ret)); + goto end; + } + + /* get list of all qdiscs installed on network devs. */ + ret = rtnl_qdisc_alloc_cache(netbuf_ctx->nlsock, + &netbuf_ctx->qdisc_cache); + if (ret) { + LOG(ERROR, "failed to allocate qdisc cache: %s", + nl_geterror(ret)); + goto end; + } + + /* list of handles to plug qdiscs */ + netbuf_ctx->netbuf_qdisc_list + libxl__calloc(gc, num_netbufs, + sizeof(struct rtnl_qdisc *)); + + ifb = rtnl_link_alloc(); + + for (i = 0; i < num_netbufs; ++i) { + + /* get a handle to the IFB interface */ + ret = rtnl_link_get_kernel(netbuf_ctx->nlsock, 0, + ifb_list[i], &ifb); + if (ret) { + LOG(ERROR, "cannot obtain handle for %s: %s", ifb_list[i], + nl_geterror(ret)); + goto end; + } + + ifindex = rtnl_link_get_ifindex(ifb); + if (!ifindex) { + LOG(ERROR, "interface %s has no index", ifb_list[i]); + goto end; + } + + /* Get a reference to the root qdisc installed on the IFB, by + * querying the qdisc list we obtained earlier. The netbufscript + * sets up the plug qdisc as the root qdisc, so we don''t have to + * search the entire qdisc tree on the IFB dev. + + * There is no need to explicitly free this qdisc as its just a + * reference from the qdisc cache we allocated earlier. + */ + qdisc = rtnl_qdisc_get_by_parent(netbuf_ctx->qdisc_cache, ifindex, + TC_H_ROOT); + + if (qdisc) { + const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc)); + /* Sanity check: Ensure that the root qdisc is a plug qdisc. */ + if (!tc_kind || strcmp(tc_kind, "plug")) { + LOG(ERROR, "plug qdisc is not installed on %s", ifb_list[i]); + goto end; + } + netbuf_ctx->netbuf_qdisc_list[i] = qdisc; + } else { + LOG(ERROR, "Cannot get qdisc handle from ifb %s", ifb_list[i]); + goto end; + } + } + + rtnl_link_put(ifb); + return 0; + + end: + if (ifb) rtnl_link_put(ifb); + return ERROR_FAIL; +} + +/* the script needs the following env & args + * $vifname + * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/) + * $IFB (for teardown) + * setup/teardown as command line arg. + * In return, the script writes the name of IFB device (during setup) to be + * used for output buffering into XENBUS_PATH/ifb + */ +static int exec_netbuf_script(libxl__gc *gc, libxl__remus_ctx *remus_ctx, + char *op, libxl__ev_child_callback *death) +{ + int arraysize, nr = 0; + char **env = NULL, **args = NULL; + pid_t pid; + libxl__ev_child *child = &remus_ctx->child; + libxl__ev_time *timeout = &remus_ctx->timeout; + char *script = libxl__strdup(gc, remus_ctx->netbufscript); + uint32_t domid = remus_ctx->dss->domid; + int devid = remus_ctx->num_exec; + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + char *vif = netbuf_ctx->vif_list[devid]; + char *ifb = netbuf_ctx->ifb_list[devid]; + + if (!strcmp(op, "setup")) + arraysize = 5; + else + arraysize = 7; + + GCNEW_ARRAY(env, arraysize); + env[nr++] = "vifname"; + env[nr++] = vif; + env[nr++] = "XENBUS_PATH"; + env[nr++] = GCSPRINTF("%s/remus/netbuf/%d", + libxl__xs_libxl_path(gc, domid), devid); + if (!strcmp(op, "teardown")) { + env[nr++] = "IFB"; + env[nr++] = ifb; + } + env[nr++] = NULL; + assert(nr <= arraysize); + + arraysize = 3; nr = 0; + GCNEW_ARRAY(args, arraysize); + args[nr++] = script; + args[nr++] = op; + args[nr++] = NULL; + assert(nr == arraysize); + + libxl__ev_child_init(child); + + /* Set hotplug timeout */ + if (libxl__ev_time_register_rel(gc, timeout, + netbuf_setup_timeout_cb, + LIBXL_HOTPLUG_TIMEOUT * 1000)) { + LOG(ERROR, "unable to register timeout for " + "netbuf setup script %s on vif %s", script, vif); + return ERROR_FAIL; + } + + LOG(DEBUG, "Calling netbuf script: %s %s on vif %s", + script, op, vif); + + /* Fork and exec netbuf script */ + pid = libxl__ev_child_fork(gc, child, death); + if (pid == -1) { + LOG(ERROR, "unable to fork netbuf script %s", script); + return ERROR_FAIL; + } + + if (!pid) { + /* child: Launch netbuf script */ + libxl__exec(gc, -1, -1, -1, args[0], args, env); + /* notreached */ + abort(); + } + + return 0; +} + +static void netbuf_teardown_script_cb(libxl__egc *egc, + libxl__ev_child *child, + pid_t pid, int status) +{ + libxl__remus_ctx *remus_ctx = CONTAINER_OF(child, *remus_ctx, child); + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + + STATE_AO_GC(remus_ctx->dss->ao); + + if (status) { + libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR, + remus_ctx->netbufscript, + pid, status); + } + + remus_ctx->num_exec++; + if (remus_ctx->num_exec < netbuf_ctx->num_netbufs) { + if (exec_netbuf_script(gc, remus_ctx, + "teardown", netbuf_teardown_script_cb)) + goto end; + return; + } + + end: + libxl__remus_teardown_done(egc, remus_ctx->dss); +} + +static void netbuf_setup_script_cb(libxl__egc *egc, + libxl__ev_child *child, + pid_t pid, int status) +{ + libxl__remus_ctx *remus_ctx = CONTAINER_OF(child, *remus_ctx, child); + char *out_path_base, *hotplug_error; + uint32_t domid = remus_ctx->dss->domid; + int devid = remus_ctx->num_exec; + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + char *vif = netbuf_ctx->vif_list[devid]; + char **ifb = &netbuf_ctx->ifb_list[devid]; + int rc = ERROR_FAIL; + + STATE_AO_GC(remus_ctx->dss->ao); + + libxl__ev_time_deregister(gc, &remus_ctx->timeout); + + out_path_base = GCSPRINTF("%s/remus/netbuf/%d", + libxl__xs_libxl_path(gc, domid), devid); + + hotplug_error = libxl__xs_read(gc, XBT_NULL, + GCSPRINTF("%s/hotplug-error", + out_path_base)); + if (hotplug_error) { + LOG(ERROR, "netbuf script %s setup failed for vif %s: %s", + remus_ctx->netbufscript, + netbuf_ctx->vif_list[devid], hotplug_error); + goto end; + } + + if (status) { + libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR, + remus_ctx->netbufscript, + pid, status); + goto end; + } + + *ifb = libxl__xs_read(gc, XBT_NULL, + GCSPRINTF("%s/remus/netbuf/%d/ifb", + libxl__xs_libxl_path(gc, domid), + devid)); + if (!(*ifb)) { + LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s", + domid, vif); + goto end; + } + + LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif); + remus_ctx->num_exec++; + if (remus_ctx->num_exec < netbuf_ctx->num_netbufs) { + if (exec_netbuf_script(gc, remus_ctx, + "setup", netbuf_setup_script_cb)) + goto end; + return; + } + + rc = init_qdiscs(gc, remus_ctx); + end: + libxl__remus_setup_done(egc, remus_ctx->dss, rc); +} + +static void netbuf_setup_timeout_cb(libxl__egc *egc, + libxl__ev_time *ev, + const struct timeval *requested_abs) +{ + libxl__remus_ctx *remus_ctx = CONTAINER_OF(ev, *remus_ctx, timeout); + int devid = remus_ctx->num_exec; + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + char *vif = netbuf_ctx->vif_list[devid]; + + STATE_AO_GC(remus_ctx->dss->ao); + + libxl__ev_time_deregister(gc, &remus_ctx->timeout); + assert(libxl__ev_child_inuse(&remus_ctx->child)); + + LOG(DEBUG, "killing hotplug script %s (on vif %s) because of timeout", + remus_ctx->netbufscript, vif); + + if (kill(remus_ctx->child.pid, SIGKILL)) { + LOGEV(ERROR, errno, "unable to kill hotplug script %s [%ld]", + remus_ctx->netbufscript, + (unsigned long)remus_ctx->child.pid); + } + + return; +} + +int libxl__netbuffer_enabled(libxl__gc *gc) +{ + return 1; +} + +/* Scan through the list of vifs belonging to domid and + * invoke the netbufscript to setup the IFB device & plug qdisc + * for each vif. Then scan through the list of IFB devices to obtain + * a handle on the plug qdisc installed on these IFB devices. + * Network output buffering is controlled via these qdiscs. + */ +void libxl__remus_netbuf_setup(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ + libxl__remus_netbuf_ctx *netbuf_ctx = NULL; + uint32_t domid = dss->domid; + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + int num_netbufs = 0; + int rc = ERROR_FAIL; + + STATE_AO_GC(dss->ao); + + netbuf_ctx = libxl__zalloc(gc, sizeof(libxl__remus_netbuf_ctx)); + netbuf_ctx->vif_list = get_guest_vif_list(gc, domid, &num_netbufs); + if (!num_netbufs) { + rc = 0; + goto end; + } + + if (num_netbufs < 0) goto end; + + netbuf_ctx->ifb_list = libxl__calloc(gc, num_netbufs, + sizeof(char *)); + netbuf_ctx->num_netbufs = num_netbufs; + remus_ctx->netbuf_ctx = netbuf_ctx; + remus_ctx->num_exec = 0; //start at devid == 0 + if (exec_netbuf_script(gc, remus_ctx, "setup", + netbuf_setup_script_cb)) + goto end; + return; + + end: + libxl__remus_setup_done(egc, dss, rc); +} + +/* Note: This function will be called in the same gc context as + * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start + * API call. + */ +void libxl__remus_netbuf_teardown(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + STATE_AO_GC(dss->ao); + + if (netbuf_ctx->qdisc_cache) + nl_cache_free(netbuf_ctx->qdisc_cache); + + if (netbuf_ctx->nlsock) + nl_close(netbuf_ctx->nlsock); + + remus_ctx->num_exec = 0; //start at devid == 0 + if (exec_netbuf_script(gc, remus_ctx, "teardown", + netbuf_teardown_script_cb)) + libxl__remus_teardown_done(egc, dss); +} + +#define TC_BUFFER_START 1 +#define TC_BUFFER_RELEASE 2 +static int remus_netbuf_op(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx, + int buffer_op) +{ + int i, ret; + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + + for (i = 0; i < netbuf_ctx->num_netbufs; ++i) { + if (buffer_op == TC_BUFFER_START) + ret = rtnl_qdisc_plug_buffer(netbuf_ctx->netbuf_qdisc_list[i]); + else + ret = rtnl_qdisc_plug_release_one(netbuf_ctx->netbuf_qdisc_list[i]); + + if (!ret) + ret = rtnl_qdisc_add(netbuf_ctx->nlsock, + netbuf_ctx->netbuf_qdisc_list[i], + NLM_F_REQUEST); + if (ret) { + LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s", + ((buffer_op == TC_BUFFER_START) ? + "start_new_epoch" : "release_prev_epoch"), + netbuf_ctx->ifb_list[i], nl_geterror(ret)); + return ERROR_FAIL; + } + } + + return 0; +} + +int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx) +{ + return remus_netbuf_op(gc, domid, remus_ctx, TC_BUFFER_START); +} + +int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx) +{ + return remus_netbuf_op(gc, domid, remus_ctx, TC_BUFFER_RELEASE); +} + +/* + * Local variables: + * mode: C + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ diff -r 0001a8222a78 -r d3f088236c55 tools/libxl/libxl_nonetbuffer.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/libxl/libxl_nonetbuffer.c Sun Oct 20 11:54:26 2013 -0700 @@ -0,0 +1,56 @@ +/* + * Copyright (C) 2013 + * Author Shriram Rajagopalan <rshriram@cs.ubc.ca> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include "libxl_osdeps.h" /* must come before any other headers */ + +#include "libxl_internal.h" + +int libxl__netbuffer_enabled(libxl__gc *gc) +{ + return 0; +} + +/* Remus network buffer related stubs */ +void libxl__remus_netbuf_setup(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ +} + +void libxl__remus_netbuf_teardown(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ +} + +int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx) +{ + LOG(ERROR, "Remus: No support for network buffering"); + return ERROR_FAIL; +} + +int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx) +{ + LOG(ERROR, "Remus: No support for network buffering"); + return ERROR_FAIL; +} + +/* + * Local variables: + * mode: C + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ diff -r 0001a8222a78 -r d3f088236c55 tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl Sun Oct 20 06:12:52 2013 -0700 +++ b/tools/libxl/libxl_types.idl Sun Oct 20 11:54:26 2013 -0700 @@ -539,6 +539,8 @@ libxl_domain_remus_info = Struct("domain ("interval", integer), ("blackhole", bool), ("compression", bool), + ("netbuf", bool), + ("netbufscript", string), ]) libxl_event_type = Enumeration("event_type", [
Shriram Rajagopalan
2013-Oct-21 05:58 UTC
[PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1382295546 25200 # Node ID a8deb9499e9dcce9869025fa1c02cf2e0d58612a # Parent d3f088236c550213fc04ed982df47b4771b28d2f tools/libxl: Control network buffering in remus callbacks This patch constitutes the core network buffering logic. and does the following: a) create a new network buffer when the domain is suspended (remus_domain_suspend_callback) b) release the previous network buffer pertaining to the committed checkpoint (remus_domain_checkpoint_dm_saved) Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r d3f088236c55 -r a8deb9499e9d tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c Sun Oct 20 11:54:26 2013 -0700 +++ b/tools/libxl/libxl_dom.c Sun Oct 20 11:59:06 2013 -0700 @@ -1259,8 +1259,24 @@ void libxl__remus_teardown_done(libxl__e static int libxl__remus_domain_suspend_callback(void *data) { - /* REMUS TODO: Issue disk and network checkpoint reqs. */ - return libxl__domain_suspend_common_callback(data); + /* REMUS TODO: Issue disk checkpoint reqs. */ + libxl__save_helper_state *shs = data; + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + bool is_suspended; + STATE_AO_GC(dss->ao); + + is_suspended = !!libxl__domain_suspend_common_callback(data); + + if (!remus_ctx->netbuf_ctx) return is_suspended; + + if (is_suspended) { + if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid, + remus_ctx)) + return !is_suspended; + } + + return is_suspended; } static int libxl__remus_domain_resume_callback(void *data) @@ -1273,7 +1289,7 @@ static int libxl__remus_domain_resume_ca if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) return 0; - /* REMUS TODO: Deal with disk. Start a new network output buffer */ + /* REMUS TODO: Deal with disk. */ return 1; } @@ -1300,11 +1316,42 @@ static void libxl__remus_domain_checkpoi static void remus_checkpoint_dm_saved(libxl__egc *egc, libxl__domain_suspend_state *dss, int rc) { - /* REMUS TODO: Wait for disk and memory ack, release network buffer */ - /* REMUS TODO: make this asynchronous */ - assert(!rc); /* REMUS TODO handle this error properly */ - usleep(dss->remus_ctx->interval * 1000); - libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); + /* + * REMUS TODO: Wait for disk and explicit memory ack (through restore + * callback from remote) before releasing network buffer. + */ + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + struct timespec epoch; + int do_next_iter = 0; + STATE_AO_GC(dss->ao); + + if (rc) { + LOG(ERROR, "Failed to save device model. Terminating Remus.."); + goto out; + } + + if (remus_ctx->netbuf_ctx) { + rc = libxl__remus_netbuf_release_prev_epoch(gc, dss->domid, + remus_ctx); + if (rc) { + LOG(ERROR, "Failed to release network buffer." + " Terminating Remus.."); + goto out; + } + } + + epoch.tv_sec = remus_ctx->interval / 1000; /* interval is in ms */ + epoch.tv_nsec = remus_ctx->interval * 1000L * 1000L; + nanosleep(&epoch, 0); + /* + * Set return value to 1, so that the infinite checkpoint cycle + * continues. See xc_domain_save.c: xc_domain_save() + */ + do_next_iter = 1; + + out: + libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, + do_next_iter); } /*----- main code for suspending, in order of execution -----*/
Shriram Rajagopalan
2013-Oct-21 05:58 UTC
[PATCH 5 of 5 V3] tools/xl: Remus - Network buffering cmdline switch
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1382295556 25200 # Node ID f4eea1e0ac3ed92ed17bbd74b0705743d20b302d # Parent a8deb9499e9dcce9869025fa1c02cf2e0d58612a tools/xl: Remus - Network buffering cmdline switch Command line switch to ''xl remus'' command, to enable network buffering. Pass on this flag to libxl so that it can act accordingly. Also update man pages to reflect the addition of a new option to ''xl remus'' command. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r a8deb9499e9d -r f4eea1e0ac3e docs/man/xl.conf.pod.5 --- a/docs/man/xl.conf.pod.5 Sun Oct 20 11:59:06 2013 -0700 +++ b/docs/man/xl.conf.pod.5 Sun Oct 20 11:59:16 2013 -0700 @@ -105,6 +105,12 @@ Configures the default gateway device to Default: C<None> +=item B<remus.default.netbufscript="PATH"> + +Configures the default script used by Remus to setup network buffering. + +Default: C</etc/xen/scripts/remus-netbuf-setup> + =item B<output_format="json|sxp"> Configures the default output format used by xl when printing "machine diff -r a8deb9499e9d -r f4eea1e0ac3e docs/man/xl.pod.1 --- a/docs/man/xl.pod.1 Sun Oct 20 11:59:06 2013 -0700 +++ b/docs/man/xl.pod.1 Sun Oct 20 11:59:16 2013 -0700 @@ -398,8 +398,7 @@ Print huge (!) amount of debug during th Enable Remus HA for domain. By default B<xl> relies on ssh as a transport mechanism between the two hosts. -N.B: Remus support in xl is still in experimental (proof-of-concept) phase. - There is no support for network or disk buffering at the moment. +N.B: There is no support for disk buffering at the moment. B<OPTIONS> @@ -418,6 +417,13 @@ Generally useful for debugging. Disable memory checkpoint compression. +=item B<-n> + +Enable network output buffering. The default script used to configure +network buffering is /etc/xen/scripts/remus-netbuf-setup. If you wish to +use a custom script, set the global variable "remus.default.netbufscript" +in /etc/xen/xl.conf to point to your script. + =item B<-s> I<sshcommand> Use <sshcommand> instead of ssh. String will be passed to sh. diff -r a8deb9499e9d -r f4eea1e0ac3e tools/libxl/xl.c --- a/tools/libxl/xl.c Sun Oct 20 11:59:06 2013 -0700 +++ b/tools/libxl/xl.c Sun Oct 20 11:59:16 2013 -0700 @@ -46,6 +46,7 @@ char *default_vifscript = NULL; char *default_bridge = NULL; char *default_gatewaydev = NULL; char *default_vifbackend = NULL; +char *default_remus_netbufscript = NULL; enum output_format default_output_format = OUTPUT_FORMAT_JSON; int claim_mode = 1; @@ -177,6 +178,9 @@ static void parse_global_config(const ch if (!xlu_cfg_get_long (config, "claim_mode", &l, 0)) claim_mode = l; + xlu_cfg_replace_string (config, "remus.default.netbufscript", + &default_remus_netbufscript, 0); + xlu_cfg_destroy(config); } diff -r a8deb9499e9d -r f4eea1e0ac3e tools/libxl/xl.h --- a/tools/libxl/xl.h Sun Oct 20 11:59:06 2013 -0700 +++ b/tools/libxl/xl.h Sun Oct 20 11:59:16 2013 -0700 @@ -152,6 +152,7 @@ extern char *default_vifscript; extern char *default_bridge; extern char *default_gatewaydev; extern char *default_vifbackend; +extern char *default_remus_netbufscript; extern char *blkdev_start; enum output_format { diff -r a8deb9499e9d -r f4eea1e0ac3e tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c Sun Oct 20 11:59:06 2013 -0700 +++ b/tools/libxl/xl_cmdimpl.c Sun Oct 20 11:59:16 2013 -0700 @@ -7111,8 +7111,9 @@ int main_remus(int argc, char **argv) r_info.interval = 200; r_info.blackhole = 0; r_info.compression = 1; - - SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) { + r_info.netbuf = 0; + + SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) { case ''i'': r_info.interval = atoi(optarg); break; @@ -7122,6 +7123,12 @@ int main_remus(int argc, char **argv) case ''u'': r_info.compression = 0; break; + case ''n'': + r_info.netbuf = 1; + break; + case ''N'': + r_info.netbufscript = optarg; + break; case ''s'': ssh_command = optarg; break; @@ -7133,6 +7140,11 @@ int main_remus(int argc, char **argv) domid = find_domain(argv[optind]); host = argv[optind + 1]; + if (r_info.netbuf && !r_info.netbufscript) { + if (default_remus_netbufscript) + r_info.netbufscript = default_remus_netbufscript; + } + if (r_info.blackhole) { send_fd = open("/dev/null", O_RDWR, 0644); if (send_fd < 0) { @@ -7170,13 +7182,10 @@ int main_remus(int argc, char **argv) /* Point of no return */ rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0); - /* If we are here, it means backup has failed/domain suspend failed. - * Try to resume the domain and exit gracefully. + /* If we are here, it means remus setup/domain suspend/backup has + * failed. Try to resume the domain and exit gracefully. * TODO: Split-Brain check. */ - fprintf(stderr, "remus sender: libxl_domain_suspend failed" - " (rc=%d)\n", rc); - if (rc == ERROR_GUEST_TIMEDOUT) fprintf(stderr, "Failed to suspend domain at primary.\n"); else { diff -r a8deb9499e9d -r f4eea1e0ac3e tools/libxl/xl_cmdtable.c --- a/tools/libxl/xl_cmdtable.c Sun Oct 20 11:59:06 2013 -0700 +++ b/tools/libxl/xl_cmdtable.c Sun Oct 20 11:59:16 2013 -0700 @@ -481,6 +481,9 @@ struct cmd_spec cmd_table[] = { "-i MS Checkpoint domain memory every MS milliseconds (def. 200ms).\n" "-b Replicate memory checkpoints to /dev/null (blackhole)\n" "-u Disable memory checkpoint compression.\n" + "-n Enable network output buffering.\n" + "-N <netbufscript> Use netbufscript to setup network buffering instead of the\n" + " instead of the default (/etc/xen/scripts/remus-netbuf-setup).\n" "-s <sshcommand> Use <sshcommand> instead of ssh. String will be passed\n" " to sh. If empty, run <host> instead of \n" " ssh <host> xl migrate-receive -r [-e]\n"
Shriram Rajagopalan
2013-Oct-30 23:05 UTC
Re: [PATCH 0 of 5 V3] Remus/Libxl: Network buffering support
On Sun, Oct 20, 2013 at 10:58 PM, Shriram Rajagopalan <rshriram@cs.ubc.ca>wrote:> This patch series adds support for network buffering in the Remus > codebase in libxl. > > Changes in V3: > [1/5] Fix redundant checks in configure scripts > (based on Ian Campbell''s suggestions) > > [2/5] Introduce locking in the script, during IFB setup. > Add xenstore paths used by netbuf scripts > to xenstore-paths.markdown > > [3/5] Hotplug scripts setup/teardown invocations are now asynchronous > following IanJ''s feedback. However, the invocations are still > sequential. > > [5/5] Allow per-domain specification of netbuffer scripts in xl remus > commmand. > > And minor nits throughout the series based on feedback from > the last version > > Changes in V2: > [1/5] Configure script will automatically enable/disable network > buffer support depending on the availability of the appropriate > libnl3 version. [If libnl3 is unavailable, a warning message will be > printed to let the user know that the feature has been disabled.] > > use macros from pkg.m4 instead of pkg-config commands > removed redundant checks for libnl3 libraries. > > [3,4/5] - Minor nits. > > Version 1: > > [1/5] Changes to autoconf scripts to check for libnl3. Add linker flags > to libxl Makefile. > > [2/5] External script to setup/teardown network buffering using libnl3''s > CLI. This script will be invoked by libxl before starting Remus. > The script''s main job is to bring up an IFB device with plug qdisc > attached to it. It then re-routes egress traffic from the guest''s > vif to the IFB device. > > [3/5] Libxl code to invoke the external setup script, followed by netlink > related setup to obtain a handle on the output buffers attached > to each vif. > > [4/5] Libxl interaction with network buffer module in the kernel via > libnl3 API. > > [5/5] xl cmdline switch to explicitly enable network buffering when > starting remus. > > > Few things to note: > > a) Based on previous email discussions, the setup/teardown task has > been moved to a hotplug style shell script which can be customized as > desired, instead of implementing it as C code inside libxl. > > b) Libnl3 is not available on NetBSD. Nor is it available on CentOS > (Linux). So I have made network buffering support an optional feature > so that it can be disabled if desired. > > c) NetBSD does not have libnl3. So I have put the setup script under > tools/hotplug/Linux folder. > > thanks > shriram > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >ping! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Oct-31 20:13 UTC
Re: [PATCH 1 of 5 V3] remus: add libnl3 dependency to autoconf scripts
On Sun, 2013-10-20 at 22:58 -0700, Shriram Rajagopalan wrote:> diff -r 808aba98084e -r 357655a1598c tools/configure.ac > --- a/tools/configure.ac Fri Oct 11 18:16:11 2013 -0700 > +++ b/tools/configure.ac Fri Oct 11 18:16:21 2013 -0700 > @@ -219,6 +219,24 @@ AC_SUBST(libiconv) > # Checks for header files. > AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h]) > > +# Check for libnl3 >=3.2.8. If present enable remus network buffering. > +PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8], > + [libnl3_lib="y"], [libnl3_lib="n"]) > + > +# Check for libnl3 command line utilitiesThese cli utils are needed on the runtime host but not the build host, right? (for the benefit of the hotplug scripts I suppose?) This check only runs on the build host, so it will fail needlessly if the tools aren''t there while at the same time not catching the case where they are missing from the runtime host. Best bet is to omit this check and simply list them in the README, in the list of optional tools (and reference tools/remus/README for more info if you like). Ian.
Ian Campbell
2013-Oct-31 20:21 UTC
Re: [PATCH 2 of 5 V3] tools/hotplug: Remus network buffering setup scripts
On Sun, 2013-10-20 at 22:58 -0700, Shriram Rajagopalan wrote:> +check_libnl_tools() { > + if ! command -v nl-qdisc-list > /dev/null 2>&1; then > + fatal "Unable to find nl-qdisc-list tool" > + fi > + if ! command -v nl-qdisc-add > /dev/null 2>&1; then > + fatal "Unable to find nl-qdisc-add tool" > + fi > + if ! command -v nl-qdisc-delete > /dev/null 2>&1; then > + fatal "Unable to find nl-qdisc-delete tool" > + fi > +}This probably suffices in place of the configure check I commented on earlier.> +add_plug_qdisc() { > + local vif=$1 > + local ifb=$2 > + > + nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1 > + if [ $? -ne 0 ] > + then > + do_without_error tc qdisc del dev "$vif" ingress > + fatal "Failed to add plug qdisc to $ifb" > + fi > + > + #set ifb buffering limit in bytes. Its okay if this command... "fails" ?> + > +case "$command" in > + setup) > + check_libnl_tools > + check_modules > + > + claim_lock "pickifb" > + setup_ifb > + redirect_vif_traffic "$vifname" "$IFB" > + add_plug_qdisc "$vifname" "$IFB" > + release_lock "pickifb" > + > + #not using xenstore_write that automatically exits on error > + # because we need to cleanupwhitespace inconsistency.> + _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB" > + ;; > + teardown) > + : ${IFB?}Do you mean log debug or something here?> + teardown_netbuf "$vifname" "$IFB" > + ;; > +esac > + > +log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB." > + > +if [ "$command" = "setup" ] > +then > + success > +fiWhy not put this in the case for setup? Ian.
Ian Campbell
2013-Oct-31 20:28 UTC
Re: [PATCH 3 of 5 V3] tools/libxl: setup/teardown Remus network buffering
On Sun, 2013-10-20 at 22:58 -0700, Shriram Rajagopalan wrote:> # HG changeset patch > # User Shriram Rajagopalan <rshriram@cs.ubc.ca> > # Date 1382295266 25200 > # Node ID d3f088236c550213fc04ed982df47b4771b28d2f > # Parent 0001a8222a785865b753acf8bcdf97c10c9e3819 > tools/libxl: setup/teardown Remus network buffering > > Setup/teardown remus network buffering for a given guest, when > libxl_domain_remus_start API is invoked. > > This patch does the following during setup: > a) call the hotplug script for each vif to setup its network buffer > > b) establish a dedicated remus context containing libnl related > state (netlink sockets, qdisc caches, etc.,) > > c) Obtain handles to plug qdiscs installed on the IFB devices > chosen by the hotplug scripts. > > During teardown, the netlink resources are released, followed by > invocation of hotplug scripts to remove the IFB devices. > > Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>All this async stuff needs Ian J''s eye on it really, not mine. It looks vaguely plausible to my untrained eye.> diff -r 0001a8222a78 -r d3f088236c55 tools/libxl/Makefile > --- a/tools/libxl/Makefile Sun Oct 20 06:12:52 2013 -0700 > +++ b/tools/libxl/Makefile Sun Oct 20 11:54:26 2013 -0700 > @@ -42,6 +42,13 @@ LIBXL_OBJS-y += libxl_blktap2.o > else > LIBXL_OBJS-y += libxl_noblktap2.o > endif > + > +ifeq ($(CONFIG_REMUS_NETBUF),y) > +LIBXL_OBJS-y += libxl_netbuffer.oShould the addition of the libnl stuff to LIBS and CFLAGS have been under here too?> diff -r 0001a8222a78 -r d3f088236c55 tools/libxl/libxl_types.idl > --- a/tools/libxl/libxl_types.idl Sun Oct 20 06:12:52 2013 -0700 > +++ b/tools/libxl/libxl_types.idl Sun Oct 20 11:54:26 2013 -0700 > @@ -539,6 +539,8 @@ libxl_domain_remus_info = Struct("domain > ("interval", integer), > ("blackhole", bool), > ("compression", bool), > + ("netbuf", bool),I guess libxl_defbool doesn''t make sense here.> + ("netbufscript", string),This requires a LIBXL_HAVE #define in libxl.h so users know it is available. One for the overall netbuf feature should do I think. Ian.
Ian Campbell
2013-Oct-31 20:31 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks
On Sun, 2013-10-20 at 22:58 -0700, Shriram Rajagopalan wrote:> # HG changeset patch > # User Shriram Rajagopalan <rshriram@cs.ubc.ca> > # Date 1382295546 25200 > # Node ID a8deb9499e9dcce9869025fa1c02cf2e0d58612a > # Parent d3f088236c550213fc04ed982df47b4771b28d2f > tools/libxl: Control network buffering in remus callbacks > > This patch constitutes the core network buffering logic. > and does the following: > a) create a new network buffer when the domain is suspended > (remus_domain_suspend_callback) > b) release the previous network buffer pertaining to the > committed checkpoint (remus_domain_checkpoint_dm_saved) > > Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>Another one where Ian J''s opinion matters more than mine, but it looks ok to me. Ian.
Ian Campbell
2013-Oct-31 20:38 UTC
Re: [PATCH 5 of 5 V3] tools/xl: Remus - Network buffering cmdline switch
On Sun, 2013-10-20 at 22:58 -0700, Shriram Rajagopalan wrote:> diff -r a8deb9499e9d -r f4eea1e0ac3e docs/man/xl.pod.1 > --- a/docs/man/xl.pod.1 Sun Oct 20 11:59:06 2013 -0700 > +++ b/docs/man/xl.pod.1 Sun Oct 20 11:59:16 2013 -0700 > @@ -398,8 +398,7 @@ Print huge (!) amount of debug during th > Enable Remus HA for domain. By default B<xl> relies on ssh as a transport > mechanism between the two hosts. > > -N.B: Remus support in xl is still in experimental (proof-of-concept) phase. > - There is no support for network or disk buffering at the moment. > +N.B: There is no support for disk buffering at the moment. > > B<OPTIONS> > > @@ -418,6 +417,13 @@ Generally useful for debugging. > > Disable memory checkpoint compression. > > +=item B<-n> > + > +Enable network output buffering. The default script used to configure > +network buffering is /etc/xen/scripts/remus-netbuf-setup. If you wish to > +use a custom script, set the global variable "remus.default.netbufscript" > +in /etc/xen/xl.conf to point to your script.No docs for -N ?> @@ -7133,6 +7140,11 @@ int main_remus(int argc, char **argv) > domid = find_domain(argv[optind]); > host = argv[optind + 1]; > > + if (r_info.netbuf && !r_info.netbufscript) {Will the code do anything if netbuf==False but netbufscript!=NULL?> + if (default_remus_netbufscript)This if is redundant isn''t it, it just avoids assigning NULL to a variable which is already NULL. Given the above two comments this whole block could be if (!r_info.netbufscript) r_info.netbufscript = default... [...]> - fprintf(stderr, "remus sender: libxl_domain_suspend failed" > - " (rc=%d)\n", rc);Was this removed on purpose? It seemed useful looking and wasn''t mentioned in the commit message. Ian.
Shriram Rajagopalan
2013-Oct-31 21:06 UTC
Re: [PATCH 2 of 5 V3] tools/hotplug: Remus network buffering setup scripts
On Thu, Oct 31, 2013 at 1:21 PM, Ian Campbell <Ian.Campbell@citrix.com>wrote:> > + > > +case "$command" in > > + setup) > > + check_libnl_tools > > + check_modules > > + > > + claim_lock "pickifb" > > + setup_ifb > > + redirect_vif_traffic "$vifname" "$IFB" > > + add_plug_qdisc "$vifname" "$IFB" > > + release_lock "pickifb" > > + > > + #not using xenstore_write that automatically exits on error > > + # because we need to cleanup > > whitespace inconsistency. > >I seem to get this wrong again and again.. Did you mean the space trailing the second # ? or should the code block inside setup/teardown be indented with two tabs instead of one ?> > + _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed > "$vifname" "$IFB" > > + ;; > > + teardown) > > + : ${IFB?} > > Do you mean log debug or something here? > >This was to just make sure that the IFB variable was supplied as part of the environment.. Just like the two checks on top of this script.. " : ${vifname?} : ${XENBUS_PATH?} " _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Shriram Rajagopalan
2013-Oct-31 21:47 UTC
Re: [PATCH 5 of 5 V3] tools/xl: Remus - Network buffering cmdline switch
On Thu, Oct 31, 2013 at 1:38 PM, Ian Campbell <Ian.Campbell@citrix.com>wrote:> On Sun, 2013-10-20 at 22:58 -0700, Shriram Rajagopalan wrote: > > diff -r a8deb9499e9d -r f4eea1e0ac3e docs/man/xl.pod.1 > > --- a/docs/man/xl.pod.1 Sun Oct 20 11:59:06 2013 -0700 > > +++ b/docs/man/xl.pod.1 Sun Oct 20 11:59:16 2013 -0700 > > @@ -398,8 +398,7 @@ Print huge (!) amount of debug during th > > Enable Remus HA for domain. By default B<xl> relies on ssh as a > transport > > mechanism between the two hosts. > > > > -N.B: Remus support in xl is still in experimental (proof-of-concept) > phase. > > - There is no support for network or disk buffering at the moment. > > +N.B: There is no support for disk buffering at the moment. > > > > B<OPTIONS> > > > > @@ -418,6 +417,13 @@ Generally useful for debugging. > > > > Disable memory checkpoint compression. > > > > +=item B<-n> > > + > > +Enable network output buffering. The default script used to configure > > +network buffering is /etc/xen/scripts/remus-netbuf-setup. If you wish to > > +use a custom script, set the global variable > "remus.default.netbufscript" > > +in /etc/xen/xl.conf to point to your script. > > No docs for -N ? > > > @@ -7133,6 +7140,11 @@ int main_remus(int argc, char **argv) > > domid = find_domain(argv[optind]); > > host = argv[optind + 1]; > > > > + if (r_info.netbuf && !r_info.netbufscript) { > > Will the code do anything if netbuf==False but netbufscript!=NULL? > > > + if (default_remus_netbufscript) > > This if is redundant isn''t it, it just avoids assigning NULL to a > variable which is already NULL. > > Given the above two comments this whole block could be > if (!r_info.netbufscript) > r_info.netbufscript = default... > [...] > > - fprintf(stderr, "remus sender: libxl_domain_suspend failed" > > - " (rc=%d)\n", rc); > > Was this removed on purpose? It seemed useful looking and wasn''t > mentioned in the commit message. > >Yes, because the error message was not representative. That code block currently looks like this (after removing the fprintf), which is bit more informative. /* If we are here, it means remus setup/domain suspend/backup has * failed. Try to resume the domain and exit gracefully. * TODO: Split-Brain check. */ if (rc == ERROR_GUEST_TIMEDOUT) fprintf(stderr, "Failed to suspend domain at primary.\n"); else { fprintf(stderr, "Remus: Backup failed? resuming domain at primary.\n"); libxl_domain_resume(ctx, domid, 1, 0); }> Ian. > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Oct-31 22:25 UTC
Re: [PATCH 2 of 5 V3] tools/hotplug: Remus network buffering setup scripts
On Thu, 2013-10-31 at 14:06 -0700, Shriram Rajagopalan wrote:> On Thu, Oct 31, 2013 at 1:21 PM, Ian Campbell > <Ian.Campbell@citrix.com> wrote: > > + > > > +case "$command" in > > + setup) > > + check_libnl_tools > > + check_modules > > + > > + claim_lock "pickifb" > > + setup_ifb > > + redirect_vif_traffic "$vifname" "$IFB" > > + add_plug_qdisc "$vifname" "$IFB" > > + release_lock "pickifb" > > + > > + #not using xenstore_write that automatically exits on > error > > + # because we need to cleanup > > > whitespace inconsistency. > > > > > I seem to get this wrong again and again.. > Did you mean the space trailing the second # ? or should the code > block inside setup/teardown > be indented with two tabs instead of one ?I meant "#foo" (1st line) vs "# foo" (2nd line) which just caught my eye. They should be the same. More normal would be a space after the # I think. I don''t think we have a particularly strict coding style for shell scripts, but code being consistent with itself is a good start.> > > > > + _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || > xs_write_failed "$vifname" "$IFB" > > + ;; > > + teardown) > > + : ${IFB?} > > > Do you mean log debug or something here? > > > > > This was to just make sure that the IFB variable was supplied as part > of the environment.. > Just like the two checks on top of this script.. > " > : ${vifname?} > : ${XENBUS_PATH?} > "Do these result in useful error reporting to the end user? Bearing in mind that the end user might be using libxl but not xl and therefore not see stdout/err (which might be going into some daemon''s log file). Also, either I''m missing it or the bash manpage only documents ${parameter:?} and not ${parameter?}. Are both actually valid? Ian.
Ian Campbell
2013-Oct-31 22:29 UTC
Re: [PATCH 5 of 5 V3] tools/xl: Remus - Network buffering cmdline switch
On Thu, 2013-10-31 at 14:47 -0700, Shriram Rajagopalan wrote:> > Yes, because the error message was not representative. That code block > currently looks like this (after removing the fprintf), which is bit > more informative.Makes sense, thanks.> > > /* If we are here, it means remus setup/domain suspend/backup has > > > > * failed. Try to resume the domain and exit gracefully. > > > > * TODO: Split-Brain check. > > > > */ > if (rc == ERROR_GUEST_TIMEDOUT) > fprintf(stderr, "Failed to suspend domain at primary.\n"); > else { > fprintf(stderr, "Remus: Backup failed? resuming domain at > primary.\n"); > libxl_domain_resume(ctx, domid, 1, 0); > } > > > > Ian. > > > > >
Ian Jackson
2013-Nov-01 18:28 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks
Shriram Rajagopalan writes ("[PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks"):> tools/libxl: Control network buffering in remus callbacks...> static int libxl__remus_domain_suspend_callback(void *data) > { > - /* REMUS TODO: Issue disk and network checkpoint reqs. */ > - return libxl__domain_suspend_common_callback(data); > + /* REMUS TODO: Issue disk checkpoint reqs. */ > + libxl__save_helper_state *shs = data; > + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); > + libxl__remus_ctx *remus_ctx = dss->remus_ctx; > + bool is_suspended; > + STATE_AO_GC(dss->ao); > + > + is_suspended = !!libxl__domain_suspend_common_callback(data);I think this function would be less confusing if the return value variable was called "ok" or something similar. It''s really just an error flag. Also AFAICT the !! has no function here. The return value is essentially a boolean.> + if (!remus_ctx->netbuf_ctx) return is_suspended; > + > + if (is_suspended) { > + if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid, > + remus_ctx)) > + return !is_suspended;This construction is rather odd. Why return !is_suspended when you know that !!is_suspended ?> @@ -1300,11 +1316,42 @@ static void libxl__remus_domain_checkpoi > static void remus_checkpoint_dm_saved(libxl__egc *egc, > libxl__domain_suspend_state *dss, int rc) > { > - /* REMUS TODO: Wait for disk and memory ack, release network buffer */ > - /* REMUS TODO: make this asynchronous */ > - assert(!rc); /* REMUS TODO handle this error properly */ > - usleep(dss->remus_ctx->interval * 1000); > - libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); > + /* > + * REMUS TODO: Wait for disk and explicit memory ack (through restore > + * callback from remote) before releasing network buffer. > + */ > + libxl__remus_ctx *remus_ctx = dss->remus_ctx; > + struct timespec epoch; > + int do_next_iter = 0;Again, this variable is probably best called "ok".> + epoch.tv_sec = remus_ctx->interval / 1000; /* interval is in ms */ > + epoch.tv_nsec = remus_ctx->interval * 1000L * 1000L; > + nanosleep(&epoch, 0);This is no good, I''m afraid. You should be using a libxl event loop timer. (Also why are you changing your existing usleep to this new nanosleep which seems functionally very similar?) Ian.
Shriram Rajagopalan
2013-Nov-01 19:57 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks
On Fri, Nov 1, 2013 at 11:28 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> Shriram Rajagopalan writes ("[PATCH 4 of 5 V3] tools/libxl: Control > network buffering in remus callbacks"): > > tools/libxl: Control network buffering in remus callbacks > ... > > static int libxl__remus_domain_suspend_callback(void *data) > > { > > - /* REMUS TODO: Issue disk and network checkpoint reqs. */ > > - return libxl__domain_suspend_common_callback(data); > > + /* REMUS TODO: Issue disk checkpoint reqs. */ > > + libxl__save_helper_state *shs = data; > > + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); > > + libxl__remus_ctx *remus_ctx = dss->remus_ctx; > > + bool is_suspended; > > + STATE_AO_GC(dss->ao); > > + > > + is_suspended = !!libxl__domain_suspend_common_callback(data); > > I think this function would be less confusing if the return value > variable was called "ok" or something similar. It''s really just an > error flag. > > Also AFAICT the !! has no function here. The return value is > essentially a boolean. > > > + if (!remus_ctx->netbuf_ctx) return is_suspended; > > + > > + if (is_suspended) { > > + if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid, > > + remus_ctx)) > > + return !is_suspended; > > This construction is rather odd. Why return !is_suspended when you > know that !!is_suspended ? > > > @@ -1300,11 +1316,42 @@ static void libxl__remus_domain_checkpoi > > static void remus_checkpoint_dm_saved(libxl__egc *egc, > > libxl__domain_suspend_state *dss, > int rc) > > { > > - /* REMUS TODO: Wait for disk and memory ack, release network buffer > */ > > - /* REMUS TODO: make this asynchronous */ > > - assert(!rc); /* REMUS TODO handle this error properly */ > > - usleep(dss->remus_ctx->interval * 1000); > > - libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); > > + /* > > + * REMUS TODO: Wait for disk and explicit memory ack (through > restore > > + * callback from remote) before releasing network buffer. > > + */ > > + libxl__remus_ctx *remus_ctx = dss->remus_ctx; > > + struct timespec epoch; > > + int do_next_iter = 0; > > Again, this variable is probably best called "ok". > > > + epoch.tv_sec = remus_ctx->interval / 1000; /* interval is in ms */ > > + epoch.tv_nsec = remus_ctx->interval * 1000L * 1000L; > > + nanosleep(&epoch, 0); > > This is no good, I''m afraid. You should be using a libxl event loop > timer. (Also why are you changing your existing usleep to this new > nanosleep which seems functionally very similar?) > > Ian. > >Nanosleep was more of my personal preference, just as the ''!!'' above was Andrew''s suggestion, due to the fact that libxl__domain_suspend_common_callback() does not strictly return boolean. But I agree that renaming that variable makes more sense. As far as using the event loop timers, I couldn''t find anything suitable in libxl_event.h/c that I could use instead of usleep/nanosleep. Specifically, they were all asynchronous, while the caller of these functions is libxc''s xc_domain_save infinite loop (synchronous). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Shriram Rajagopalan
2013-Nov-01 20:04 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks
On Fri, Nov 1, 2013 at 11:28 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> Shriram Rajagopalan writes ("[PATCH 4 of 5 V3] tools/libxl: Control > network buffering in remus callbacks"): > > tools/libxl: Control network buffering in remus callbacks > ... > > static int libxl__remus_domain_suspend_callback(void *data) > > { > > - /* REMUS TODO: Issue disk and network checkpoint reqs. */ > > - return libxl__domain_suspend_common_callback(data); > > + /* REMUS TODO: Issue disk checkpoint reqs. */ > > + libxl__save_helper_state *shs = data; > > + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); > > + libxl__remus_ctx *remus_ctx = dss->remus_ctx; > > + bool is_suspended; > > + STATE_AO_GC(dss->ao); > > + > > + is_suspended = !!libxl__domain_suspend_common_callback(data); > > I think this function would be less confusing if the return value > variable was called "ok" or something similar. It''s really just an > error flag. > > Also AFAICT the !! has no function here. The return value is > essentially a boolean. > > > + if (!remus_ctx->netbuf_ctx) return is_suspended; > > + > > + if (is_suspended) { > > + if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid, > > + remus_ctx)) > > + return !is_suspended; > > This construction is rather odd. Why return !is_suspended when you > know that !!is_suspended ? > >If a new network buffer cannot be created (for the next epoch), then return an error. This ends up terminating the checkpointing process.> > @@ -1300,11 +1316,42 @@ static void libxl__remus_domain_checkpoi > > static void remus_checkpoint_dm_saved(libxl__egc *egc, > > libxl__domain_suspend_state *dss, > int rc) > > { > > - /* REMUS TODO: Wait for disk and memory ack, release network buffer > */ > > - /* REMUS TODO: make this asynchronous */ > > - assert(!rc); /* REMUS TODO handle this error properly */ > > - usleep(dss->remus_ctx->interval * 1000); > > - libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); > > + /* > > + * REMUS TODO: Wait for disk and explicit memory ack (through > restore > > + * callback from remote) before releasing network buffer. > > + */ > > + libxl__remus_ctx *remus_ctx = dss->remus_ctx; > > + struct timespec epoch; > > + int do_next_iter = 0; > > Again, this variable is probably best called "ok". > > > + epoch.tv_sec = remus_ctx->interval / 1000; /* interval is in ms */ > > + epoch.tv_nsec = remus_ctx->interval * 1000L * 1000L; > > + nanosleep(&epoch, 0); > > This is no good, I''m afraid. You should be using a libxl event loop > timer. (Also why are you changing your existing usleep to this new > nanosleep which seems functionally very similar?) > > Ian. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Nov-04 12:12 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]
Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks"):> Nanosleep was more of my personal preference,I don''t think that''s a good enough reason for the churn, but as I say this really ought to be replaced with use of a timeout event.> just as the ''!!'' above was Andrew''s suggestion, due to the fact that > libxl__domain_suspend_common_callback() does not strictly return > boolean.Are you looking at a different tree to me ? I have checked the return statements in libxl__domain_suspend_common_callback and they are all "return 0" or "return 1".> But I agree that renaming that variable makes more sense.Good :-). Thanks.> As far as using the event loop timers, I couldn''t find anything > suitable in libxl_event.h/c that I could use instead of > usleep/nanosleep. Specifically, they were all asynchronous, while > the caller of these functions is libxc''s xc_domain_save infinite > loop (synchronous).No, you need to make it an asynchronous call. The libxl save helper machinery arranges to run xc_domain_save in a subprocess so that the main program can continue asynchronously. It''s true that the suspend checkpoint is currently synchronous. It needs to be made synchronous by telling the stub generator (libxl_save_msgs_gen.pl) to generate an asynchronous binding, like it does for switch_qemu_logdirty. Perhaps it would be helpful if I provided a pre-patch to make that change for you.> > + if (!remus_ctx->netbuf_ctx) return is_suspended; > > + > > + if (is_suspended) { > > + if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid, > > + remus_ctx)) > > + return !is_suspended; > > This construction is rather odd. Why return !is_suspended when you > know that !!is_suspended ?...> If a new network buffer cannot be created (for the next epoch), then > return an error. This ends up terminating the checkpointing process.Yes. But why not "return 0" ? Thanks, Ian.
Shriram Rajagopalan
2013-Nov-04 15:17 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]
On Mon, Nov 4, 2013 at 6:12 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control > network buffering in remus callbacks"): > > Nanosleep was more of my personal preference, > > I don''t think that''s a good enough reason for the churn, but as I say > this really ought to be replaced with use of a timeout event. > >Fair enough. My question is what is the overhead of setting up, firing and tearing down a timeout event using the event gen framework, if I wish to checkpoint the VM, say every 20ms ? If you happen to have the numbers off the top of your head, it would help. Or if you are sure that the system can easily handle this rate of events with very little overhead (<0.2ms per event)> > As far as using the event loop timers, I couldn''t find anything > > suitable in libxl_event.h/c that I could use instead of > > usleep/nanosleep. Specifically, they were all asynchronous, while > > the caller of these functions is libxc''s xc_domain_save infinite > > loop (synchronous). > > No, you need to make it an asynchronous call. > > The libxl save helper machinery arranges to run xc_domain_save in a > subprocess so that the main program can continue asynchronously. > > It''s true that the suspend checkpoint is currently synchronous. It > needs to be made synchronous by telling the stub generator > (libxl_save_msgs_gen.pl) to generate an asynchronous binding, like it > does for switch_qemu_logdirty. > > Perhaps it would be helpful if I provided a pre-patch to make that > change for you. > >Yes, that would be pretty helpful. Thanks! shriram _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Nov-04 15:32 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]
On Mon, 2013-11-04 at 09:17 -0600, Shriram Rajagopalan wrote:> On Mon, Nov 4, 2013 at 6:12 AM, Ian Jackson > <Ian.Jackson@eu.citrix.com> wrote: > Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] > tools/libxl: Control network buffering in remus callbacks"): > > Nanosleep was more of my personal preference, > > I don''t think that''s a good enough reason for the churn, but > as I say > this really ought to be replaced with use of a timeout event. > > > > > > Fair enough. My question is what is the overhead of setting up, firing > and tearing down > a timeout event using the event gen framework, if I wish to checkpoint > the VM, say every 20ms ? > If you happen to have the numbers off the top of your head, it would > help. Or if you are sure that > the system can easily handle this rate of events with very little > overhead (<0.2ms per event)Regardless of the answer here, would it make sense to do some/all of the checkpoint processing in the helper subprocess anyway and only signal the eventual failover up to the libxl process? This async op is potentially quite long running I think compared to a normal one i.e. if the guest doesn''t die it is expected that the ao lives "forever". Since the associated gc''s persist until the ao ends this might end up accumulating lots of allocations? Ian had a similar concern about Roger''s hotplug daemon series and suggested creating a per iteration gc or something. Ian.
Ian Jackson
2013-Nov-04 16:06 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]
Ian Campbell writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]"):> Regardless of the answer here, would it make sense to do some/all of the > checkpoint processing in the helper subprocess anyway and only signal > the eventual failover up to the libxl process?It might do. Mind you, the code in libxc is tangled enough as it is and is due for a rewrite. Perhaps this could be done in the helper executable, although there isn''t currently any way to easily intersperse code in there.> This async op is potentially quite long running I think compared to a > normal one i.e. if the guest doesn''t die it is expected that the ao > lives "forever". Since the associated gc''s persist until the ao ends > this might end up accumulating lots of allocations? Ian had a similar > concern about Roger''s hotplug daemon series and suggested creating a per > iteration gc or something.Yes, this is indeed a problem. Well spotted. Which of the xc_domain_save (and _restore) callbacks are called each remus iteration ? I think introducing a per-iteration gc here is going to involve taking some care, since we need to be sure not to put per-iteraton-gc-allocated objects into data structures which are used by subsequent iterations. Shriram writes:> Fair enough. My question is what is the overhead of setting up, firing > and tearing down a timeout event using the event gen framework, if I > wish to checkpoint the VM, say every 20ms ?The ultimate cost of going back into the event loop to wait for a timeout will depend on what else the process is doing. If the process is doing nothing else, it''s about two calls to gettimeofday and one to poll. Plus a bit of in-process computation, but that''s going to be swamped by system call overhead. Having said that, libxl is not performance-optimised. Indeed the callback mechanism involves context switching, and IPC, between the save/restore helper and libxl proper. Probably not too much to be doing every 20ms for a single domain, but if you have a lot of these it''s going to end up taking a lot of dom0 cpu etc. I assume you''re not doing this for HVM domains, which involve saving the qemu state each time too. Ian.
Ian Jackson
2013-Nov-04 16:40 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]
Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]"):> On Mon, Nov 4, 2013 at 6:12 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote> Perhaps it would be helpful if I provided a pre-patch to make that > change for you. > > Yes, that would be pretty helpful. Thanks!See below. I have compiled this but not tested it. It should be safe but I can''t rule out having perpetrated some howler of a bug. Please let me know if it doesn''t work. Ian Jackson writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]"):> Shriram writes: > > Fair enough. My question is what is the overhead of setting up, firing > > and tearing down a timeout event using the event gen framework, if I > > wish to checkpoint the VM, say every 20ms ? > > The ultimate cost of going back into the event loop to wait for a > timeout will depend on what else the process is doing. If the process > is doing nothing else, it''s about two calls to gettimeofday and one to > poll. Plus a bit of in-process computation, but that''s going to be > swamped by system call overhead. > > Having said that, libxl is not performance-optimised. Indeed the > callback mechanism involves context switching, and IPC, between the > save/restore helper and libxl proper. Probably not too much to be > doing every 20ms for a single domain, but if you have a lot of these > it''s going to end up taking a lot of dom0 cpu etc. > > I assume you''re not doing this for HVM domains, which involve saving > the qemu state each time too.I guess another way to look at this is that changing this one timeout from a synchronous to asynchronous version is not going to make any noticeable difference to the performance of the whole thing. You''re already using all of the asynchronous save/restore helper machinery and the libxl event loop. So if the performance of your V3 patches is acceptable, this will be fine too. Ian. From 46b08918302a8c1d2e470b7af7045557f73afde9 Mon Sep 17 00:00:00 2001 From: Ian Jackson <ian.jackson@eu.citrix.com> Date: Mon, 4 Nov 2013 16:27:53 +0000 Subject: [PATCH] libxl: make libxl__domain_suspend_callback be asynchronous Mark the suspend callback as asynchronous in the helper stub generator (libxl_save_msgs_gen.pl). Remus is going to want to provide an asynchronous version of this function. libxl__domain_suspend_common_callback, the common synchronous core, which used to be provided directly as the callback function for the helper machinery, becomes libxl__domain_suspend_callback_common. It can now take a typesafe parameter. For now, provide two very similar asynchronous wrappers for it. Each is a simple function which contains only boilerplate, calls the common synchronous core, and returns the asynchronous response. Essentially, we have just moved (in the case of suspend callbacks) the call site of libxl__srm_callout_callback_complete. It was in the switch statement in the autogenerated _libxl_save_msgs_callout.c, and is now in the handwritten libxl_dom.c. No functional change. Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca> Cc: Ian Campbell <ian.campbell@citrix.com> --- tools/libxl/libxl_dom.c | 25 +++++++++++++++++++------ tools/libxl/libxl_internal.h | 2 +- tools/libxl/libxl_save_msgs_gen.pl | 2 +- 3 files changed, 21 insertions(+), 8 deletions(-) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 1812bdc..b5cde42 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1003,10 +1003,8 @@ int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid) return 0; } -int libxl__domain_suspend_common_callback(void *user) +int libxl__domain_suspend_callback_common(libxl__domain_suspend_state *dss) { - libxl__save_helper_state *shs = user; - libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); STATE_AO_GC(dss->ao); unsigned long hvm_s_state = 0, hvm_pvdrv = 0; int ret; @@ -1225,12 +1223,27 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf, return 0; } +static void libxl__domain_suspend_callback(void *data) +{ + libxl__save_helper_state *shs = data; + libxl__egc *egc = shs->egc; + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); + + int ok = libxl__domain_suspend_callback_common(dss); + libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); +} + /*----- remus callbacks -----*/ -static int libxl__remus_domain_suspend_callback(void *data) +static void libxl__remus_domain_suspend_callback(void *data) { + libxl__save_helper_state *shs = data; + libxl__egc *egc = shs->egc; + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); + /* REMUS TODO: Issue disk and network checkpoint reqs. */ - return libxl__domain_suspend_common_callback(data); + int ok = libxl__domain_suspend_callback_common(dss); + libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); } static int libxl__remus_domain_resume_callback(void *data) @@ -1354,7 +1367,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss) callbacks->postcopy = libxl__remus_domain_resume_callback; callbacks->checkpoint = libxl__remus_domain_checkpoint_callback; } else - callbacks->suspend = libxl__domain_suspend_common_callback; + callbacks->suspend = libxl__domain_suspend_callback; callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty; dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save; diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 4f92522..79eb8f8 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2551,7 +2551,7 @@ _hidden void libxl__xc_domain_save_done(libxl__egc*, void *dss_void, void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc, libxl__save_helper_state *shs, int return_value); -_hidden int libxl__domain_suspend_common_callback(void *data); +_hidden int libxl__domain_suspend_callback_common(libxl__domain_suspend_state*); _hidden void libxl__domain_suspend_common_switch_qemu_logdirty (int domid, unsigned int enable, void *data); _hidden int libxl__toolstack_save(uint32_t domid, uint8_t **buf, diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl index ee126c7..3c6bd57 100755 --- a/tools/libxl/libxl_save_msgs_gen.pl +++ b/tools/libxl/libxl_save_msgs_gen.pl @@ -23,7 +23,7 @@ our @msgs = ( STRING doing_what), ''unsigned long'', ''done'', ''unsigned long'', ''total''] ], - [ 3, ''scxW'', "suspend", [] ], + [ 3, ''scxA'', "suspend", [] ], [ 4, ''scxW'', "postcopy", [] ], [ 5, ''scxA'', "checkpoint", [] ], [ 6, ''scxA'', "switch_qemu_logdirty", [qw(int domid -- 1.7.10.4
Ian Campbell
2013-Nov-04 16:45 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]
On Mon, 2013-11-04 at 16:06 +0000, Ian Jackson wrote:> Ian Campbell writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]"): > > Regardless of the answer here, would it make sense to do some/all of the > > checkpoint processing in the helper subprocess anyway and only signal > > the eventual failover up to the libxl process? > > It might do. Mind you, the code in libxc is tangled enough as it is > and is due for a rewrite. Perhaps this could be done in the helper > executable,That''s where I meant.> although there isn''t currently any way to easily > intersperse code in there.A simple matter of programming surely ;-) Ian.
Shriram Rajagopalan
2013-Nov-04 16:47 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]
On Mon, Nov 4, 2013 at 10:06 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> Ian Campbell writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network > buffering in remus callbacks [and 1 more messages]"): > > Regardless of the answer here, would it make sense to do some/all of the > > checkpoint processing in the helper subprocess anyway and only signal > > the eventual failover up to the libxl process? > > It might do. Mind you, the code in libxc is tangled enough as it is > and is due for a rewrite. Perhaps this could be done in the helper > executable, although there isn''t currently any way to easily > intersperse code in there. > > > This async op is potentially quite long running I think compared to a > > normal one i.e. if the guest doesn''t die it is expected that the ao > > lives "forever". Since the associated gc''s persist until the ao ends > > this might end up accumulating lots of allocations? Ian had a similar > > concern about Roger''s hotplug daemon series and suggested creating a per > > iteration gc or something. >Yes, this is indeed a problem. Well spotted.>Which of the xc_domain_save (and _restore) callbacks are called each> remus iteration ? > >Almost all of them on the xc_domain_save side. (suspend, resume, save_qemu state, checkpoint). xc_domain_restore doesn''t have any callbacks AFAIK. And remus as of now does not have a component on the restore side. It piggybacks on live migration''s restore framework.> I think introducing a per-iteration gc here is going to involve taking > some care, since we need to be sure not to put > per-iteraton-gc-allocated objects into data structures which are used > by subsequent iterations. > >FWIW, the remus related code that executes per iteration does not allocate anything. All allocations happen only during setup and I was under the impression that no other allocations are taking place everytime xc_domain_save calls back into libxl. However, it may be possible that other parts of the AO machinery (and there are a lot of them) are allocating stuff per iteration. And if that is the case, it could easily lead to OOMs since Remus technically runs as long as the domain lives. Shriram writes:> > Fair enough. My question is what is the overhead of setting up, firing > > and tearing down a timeout event using the event gen framework, if I > > wish to checkpoint the VM, say every 20ms ? > > The ultimate cost of going back into the event loop to wait for a > timeout will depend on what else the process is doing. If the process > is doing nothing else, it''s about two calls to gettimeofday and one to > poll. Plus a bit of in-process computation, but that''s going to be > swamped by system call overhead. > > Having said that, libxl is not performance-optimised. Indeed the > callback mechanism involves context switching, and IPC, between the > save/restore helper and libxl proper. Probably not too much to be > doing every 20ms for a single domain, but if you have a lot of these > it''s going to end up taking a lot of dom0 cpu etc. > >Yes and that is a problem. Xend+Remus avoided this by linking the libcheckpoint library that interfaced with both the python & libxc code.> I assume you''re not doing this for HVM domains, which involve saving > the qemu state each time too. > >It includes HVM domains too. Although in that case, xenstore based suspend takes about 5ms. So the checkpoint interval is typically 50ms or so. If there is a latency sensitive task running inside the VM, lower checkpoint interval leads to better performance. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Nov-04 17:01 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]
Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]"):> On Mon, Nov 4, 2013 at 10:06 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: > Which of the xc_domain_save (and _restore) callbacks are called each > remus iteration ? > > Almost all of them on the xc_domain_save side. (suspend, resume, > save_qemu state, checkpoint).Right.> xc_domain_restore doesn''t have any > callbacks AFAIK. And remus as of now does not have a component on > the restore side. It piggybacks on live migration''s restore > framework.But the libxl memory management in the restore code is currently written to assume a finite lifetime for the ao. So I think this needs to be improved. Perhaps all the suspend/restore callbacks should each get one of the nested ao type things that Roger needs for his driver domain daemon.> FWIW, the remus related code that executes per iteration does not > allocate anything. All allocations happen only during setup and I > was under the impression that no other allocations are taking place > everytime xc_domain_save calls back into libxl.If this is true, then good, because we don''t need to do anything, but there is a lot of code there and I would want to check.> However, it may be possible that other parts of the AO machinery > (and there are a lot of them) are allocating stuff per > iteration. And if that is the case, it could easily lead to OOMs > since Remus technically runs as long as the domain lives.The ao and event machinery doesn''t do much allocation itself.> Having said that, libxl is not performance-optimised. Indeed the > callback mechanism involves context switching, and IPC, between the > save/restore helper and libxl proper. Probably not too much to be > doing every 20ms for a single domain, but if you have a lot of these > it''s going to end up taking a lot of dom0 cpu etc. > > Yes and that is a problem. Xend+Remus avoided this by linking > the libcheckpoint library that interfaced with both the python & libxc code.Have you observed whether the performance is acceptable with your V3 patches ?> I assume you''re not doing this for HVM domains, which involve saving > the qemu state each time too....> It includes HVM domains too. Although in that case, xenstore based suspend > takes about 5ms. So the checkpoint interval is typically 50ms or so.Right.> If there is a latency sensitive task running inside > the VM, lower checkpoint interval leads to better performance.Yes. Ian.
Shriram Rajagopalan
2013-Nov-04 17:23 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]
On Mon, Nov 4, 2013 at 11:01 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> > Having said that, libxl is not performance-optimised. Indeed the >> callback mechanism involves context switching, and IPC, between the> > save/restore helper and libxl proper. Probably not too much to be > > doing every 20ms for a single domain, but if you have a lot of these > > it''s going to end up taking a lot of dom0 cpu etc. > > > > Yes and that is a problem. Xend+Remus avoided this by linking > > the libcheckpoint library that interfaced with both the python & libxc > code. > > Have you observed whether the performance is acceptable with your V3 > patches ? > >Frankly I don''t know. At the moment, I am trying to make sure that the libxl-remus implementation is correct before I attempt to make it fast. With/without these patches, there is a huge overhead per checkpoint. I cannot get to a 20ms checkpoint interval. Time to suspend/resume a domain is on the order of tens of milliseconds even if it was a PV domain with fast suspend support -- where it used to take under 1ms to suspend, checkpoint and resume an idle PV domain on Xend+Remus. This was one of the issues I raised a long time back. And this is currently the second element in my queue, the first being to get the patches mainline, then slowly tighten up relevant code for performance. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Nov-04 17:33 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]
Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]"):> Frankly I don''t know. At the moment, I am trying to make sure that > the libxl-remus implementation is correct before I attempt to make > it fast. > > With/without these patches, there is a huge overhead per > checkpoint. I cannot get to a 20ms checkpoint interval. Time to > suspend/resume a domain is on the order of tens of milliseconds even > if it was a PV domain with fast suspend support -- where it used to > take under 1ms to suspend, checkpoint and resume an idle PV domain > on Xend+Remus.Oh dear.> This was one of the issues I raised a long time back. And this is > currently the second element in my queue, the first being to get the > patches mainline, then slowly tighten up relevant code for > performance.I think this is probably a bad approach. Under the circumstances, it seems like you really need to move the relevant code lower in the call stack. In particular, you want everything to be handled synchronously in the same process as is doing the save/restore, at least for PV domains. That''s likely to involve substantially rewriting it. Ian.
Shriram Rajagopalan
2013-Nov-11 17:56 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]
On Mon, Nov 4, 2013 at 10:40 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control > network buffering in remus callbacks [and 1 more messages]"): > > On Mon, Nov 4, 2013 at 6:12 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> > wrote> Perhaps it would be helpful if I provided a pre-patch to make > that > > change for you. > > > > Yes, that would be pretty helpful. Thanks! > > See below. I have compiled this but not tested it. It should be safe > but I can''t rule out having perpetrated some howler of a bug. Please > let me know if it doesn''t work. > > > Ian Jackson writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network > buffering in remus callbacks [and 1 more messages]"): > > Shriram writes: > > > Fair enough. My question is what is the overhead of setting up, firing > > > and tearing down a timeout event using the event gen framework, if I > > > wish to checkpoint the VM, say every 20ms ? > > > > The ultimate cost of going back into the event loop to wait for a > > timeout will depend on what else the process is doing. If the process > > is doing nothing else, it''s about two calls to gettimeofday and one to > > poll. Plus a bit of in-process computation, but that''s going to be > > swamped by system call overhead. > > > > Having said that, libxl is not performance-optimised. Indeed the > > callback mechanism involves context switching, and IPC, between the > > save/restore helper and libxl proper. Probably not too much to be > > doing every 20ms for a single domain, but if you have a lot of these > > it''s going to end up taking a lot of dom0 cpu etc. > > > > I assume you''re not doing this for HVM domains, which involve saving > > the qemu state each time too. > > I guess another way to look at this is that changing this one timeout > from a synchronous to asynchronous version is not going to make any > noticeable difference to the performance of the whole thing. You''re > already using all of the asynchronous save/restore helper machinery > and the libxl event loop. > > So if the performance of your V3 patches is acceptable, this will be > fine too. > > Ian. > > > >From 46b08918302a8c1d2e470b7af7045557f73afde9 Mon Sep 17 00:00:00 2001 > From: Ian Jackson <ian.jackson@eu.citrix.com> > Date: Mon, 4 Nov 2013 16:27:53 +0000 > Subject: [PATCH] libxl: make libxl__domain_suspend_callback be asynchronous > > Mark the suspend callback as asynchronous in the helper stub generator > (libxl_save_msgs_gen.pl). Remus is going to want to provide an > asynchronous version of this function. > > libxl__domain_suspend_common_callback, the common synchronous core, > which used to be provided directly as the callback function for the > helper machinery, becomes libxl__domain_suspend_callback_common. It > can now take a typesafe parameter. > > For now, provide two very similar asynchronous wrappers for it. Each > is a simple function which contains only boilerplate, calls the common > synchronous core, and returns the asynchronous response. > > Essentially, we have just moved (in the case of suspend callbacks) the > call site of libxl__srm_callout_callback_complete. It was in the > switch statement in the autogenerated _libxl_save_msgs_callout.c, and > is now in the handwritten libxl_dom.c. > > No functional change. > > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca> > Cc: Ian Campbell <ian.campbell@citrix.com> > --- > tools/libxl/libxl_dom.c | 25 +++++++++++++++++++------ > tools/libxl/libxl_internal.h | 2 +- > tools/libxl/libxl_save_msgs_gen.pl | 2 +- > 3 files changed, 21 insertions(+), 8 deletions(-) > > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > index 1812bdc..b5cde42 100644 > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -1003,10 +1003,8 @@ int libxl__domain_resume_device_model(libxl__gc > *gc, uint32_t domid) > return 0; > } > > -int libxl__domain_suspend_common_callback(void *user) > +int libxl__domain_suspend_callback_common(libxl__domain_suspend_state > *dss) > { > - libxl__save_helper_state *shs = user; > - libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); > STATE_AO_GC(dss->ao); > unsigned long hvm_s_state = 0, hvm_pvdrv = 0; > int ret; > @@ -1225,12 +1223,27 @@ int libxl__toolstack_save(uint32_t domid, uint8_t > **buf, > return 0; > } > > +static void libxl__domain_suspend_callback(void *data) > +{ > + libxl__save_helper_state *shs = data; > + libxl__egc *egc = shs->egc; > + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); > + > + int ok = libxl__domain_suspend_callback_common(dss); > + libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); > +} > + > /*----- remus callbacks -----*/ > > -static int libxl__remus_domain_suspend_callback(void *data) > +static void libxl__remus_domain_suspend_callback(void *data) > { > + libxl__save_helper_state *shs = data; > + libxl__egc *egc = shs->egc; > + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); > + > /* REMUS TODO: Issue disk and network checkpoint reqs. */ > - return libxl__domain_suspend_common_callback(data); > + int ok = libxl__domain_suspend_callback_common(dss); > + libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); > } > > static int libxl__remus_domain_resume_callback(void *data) > @@ -1354,7 +1367,7 @@ void libxl__domain_suspend(libxl__egc *egc, > libxl__domain_suspend_state *dss) > callbacks->postcopy = libxl__remus_domain_resume_callback; > callbacks->checkpoint = libxl__remus_domain_checkpoint_callback; > } else > - callbacks->suspend = libxl__domain_suspend_common_callback; > + callbacks->suspend = libxl__domain_suspend_callback; > > callbacks->switch_qemu_logdirty > libxl__domain_suspend_common_switch_qemu_logdirty; > dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save; > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > index 4f92522..79eb8f8 100644 > --- a/tools/libxl/libxl_internal.h > +++ b/tools/libxl/libxl_internal.h > @@ -2551,7 +2551,7 @@ _hidden void libxl__xc_domain_save_done(libxl__egc*, > void *dss_void, > void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc, > libxl__save_helper_state *shs, int > return_value); > > -_hidden int libxl__domain_suspend_common_callback(void *data); > +_hidden int > libxl__domain_suspend_callback_common(libxl__domain_suspend_state*); > _hidden void libxl__domain_suspend_common_switch_qemu_logdirty > (int domid, unsigned int enable, void > *data); > _hidden int libxl__toolstack_save(uint32_t domid, uint8_t **buf, > diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/ > libxl_save_msgs_gen.pl > index ee126c7..3c6bd57 100755 > --- a/tools/libxl/libxl_save_msgs_gen.pl > +++ b/tools/libxl/libxl_save_msgs_gen.pl > @@ -23,7 +23,7 @@ our @msgs = ( > STRING doing_what), > ''unsigned long'', ''done'', > ''unsigned long'', ''total''] > ], > - [ 3, ''scxW'', "suspend", [] ], > + [ 3, ''scxA'', "suspend", [] ], > [ 4, ''scxW'', "postcopy", [] ], > [ 5, ''scxA'', "checkpoint", [] ], > [ 6, ''scxA'', "switch_qemu_logdirty", [qw(int domid > -- > 1.7.10.4 > >The patch seems harmless enough. How do you want to go about this? Do you want to post/commit this patch ? Because I have to modify my patches accordingly. Or should I post this along with my patch series, avoiding the need to wait on you before I post mine ? shriram _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Nov-12 09:48 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]
On Mon, 2013-11-11 at 11:56 -0600, Shriram Rajagopalan wrote:> The patch seems harmless enough. How do you want to go about this? > Do you want to post/commit this patch ? Because I have to modify my > patches accordingly. Or should I post this along with my patch series, > avoiding the need to wait on you before I post mine ?I recommend doing both, i.e. include it at the head of your series but also be prepared for it to go in independently, that way nobody is blocked. Ian.
Ian Jackson
2013-Nov-12 15:38 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]
Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]"):> On Mon, Nov 4, 2013 at 10:40 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:...> The patch seems harmless enough. How do you want to go about this?If you think this is the right approach, you should start on it right away based on my patch. The reason I posted it quickly was because I wanted to unblock you.> Do you want to post/commit this patch ? Because I have to modify my > patches accordingly. Or should I post this along with my patch > series, avoiding the need to wait on you before I post mine ?You can certainly take it into the start of your patch series, yes. Like what Roger did with the nested ao patch. Currently there isn''t any other reason to make the change in this patch, so I don''t think it should be committed right away. But if for some reason it does get committed to staging, you or we can just drop it from the start of your series. Regards, Ian.
Shriram Rajagopalan
2013-Nov-12 16:24 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]
On Tue, Nov 12, 2013 at 9:38 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control > network buffering in remus callbacks [and 1 more messages] [and 1 more > messages]"): > > On Mon, Nov 4, 2013 at 10:40 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> > wrote: > ... > > The patch seems harmless enough. How do you want to go about this? > > If you think this is the right approach, you should start on it right > away based on my patch. The reason I posted it quickly was because I > wanted to unblock you. > >The nested-ao patch makes sense for Remus, even without fixing this timeout issue. I can modify my stuff accordingly. Probably create a nested-ao per iteration and drop it at the start of the next iteration. However, the timeout part is not convincing enough. For example, libxl__domain_suspend_common_callback [the version before your patch] has two 6 second wait loops in the worst case.. LOG(DEBUG, "issuing %s suspend request via XenBus control node", dss->hvm ? "PVHVM" : "PV"); libxl__domain_pvcontrol_write(gc, XBT_NULL, domid, "suspend"); LOG(DEBUG, "wait for the guest to acknowledge suspend request"); watchdog = 60; while (!strcmp(state, "suspend") && watchdog > 0) { usleep(100000); state = libxl__domain_pvcontrol_read(gc, XBT_NULL, domid); if (!state) state = ""; watchdog--; } and then once again LOG(DEBUG, "wait for the guest to suspend"); watchdog = 60; while (watchdog > 0) { xc_domaininfo_t info; usleep(100000); ret = xc_domain_getinfolist(CTX->xch, domid, 1, &info); Now I know where the 200ms overhead per checkpoint comes from. Shouldn''t this also be made into an event loop? Irrespective of whether it is invoked in Remus'' context or normal suspend/resume/save/restore/migrate context. And if this remains, the Remus checkpoint interval is much lower compared to this. Typically between 25-100ms.> > Do you want to post/commit this patch ? Because I have to modify my > > patches accordingly. Or should I post this along with my patch > > series, avoiding the need to wait on you before I post mine ? > > You can certainly take it into the start of your patch series, yes. > Like what Roger did with the nested ao patch. > > Currently there isn''t any other reason to make the change in this > patch, so I don''t think it should be committed right away. But if for > some reason it does get committed to staging, you or we can just drop > it from the start of your series. > >The only reason it might get committed to staging without other remus patches would be to fix the issue I cited above. cheers shriram _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Nov-12 16:38 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]
Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]"):> The nested-ao patch makes sense for Remus, even without fixing this > timeout issue. I can modify my stuff accordingly. Probably create a > nested-ao per iteration and drop it at the start of the next > iteration.Right. Great.> However, the timeout part is not convincing enough. For example, > libxl__domain_suspend_common_callback [the version before your patch] > has two 6 second wait loops in the worst case.....> LOG(DEBUG, "wait for the guest to acknowledge suspend request"); > watchdog = 60; > while (!strcmp(state, "suspend") && watchdog > 0) { > usleep(100000);...> and then once again...> usleep(100000);Oh dear. That is very poor.> Now I know where the 200ms overhead per checkpoint comes from. > > Shouldn''t this also be made into an event loop? Irrespective of > whether it is invoked in Remus'' context or normal > suspend/resume/save/restore/migrate context.Yes, you are entitrely correct. Both of these loops should be replaced with timeout/event/callback approaches. Do you want to attempt this or would you like me to do it ?> Currently there isn''t any other reason to make the change in this > patch, so I don''t think it should be committed right away. But if for > some reason it does get committed to staging, you or we can just drop > it from the start of your series....> The only reason it might get committed to staging without other remus patches > would be to fix the issue I cited above.Yes. Ian.
Shriram Rajagopalan
2013-Nov-12 16:43 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]
On Tue, Nov 12, 2013 at 10:38 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control > network buffering in remus callbacks [and 1 more messages] [and 1 more > messages]"): > > The nested-ao patch makes sense for Remus, even without fixing this > > timeout issue. I can modify my stuff accordingly. Probably create a > > nested-ao per iteration and drop it at the start of the next > > iteration. > > Right. Great. > > > However, the timeout part is not convincing enough. For example, > > libxl__domain_suspend_common_callback [the version before your patch] > > has two 6 second wait loops in the worst case.. > ... > > LOG(DEBUG, "wait for the guest to acknowledge suspend request"); > > watchdog = 60; > > while (!strcmp(state, "suspend") && watchdog > 0) { > > usleep(100000); > ... > > and then once again > ... > > usleep(100000); > > Oh dear. That is very poor. > > > Now I know where the 200ms overhead per checkpoint comes from. > > > > Shouldn''t this also be made into an event loop? Irrespective of > > whether it is invoked in Remus'' context or normal > > suspend/resume/save/restore/migrate context. > > Yes, you are entitrely correct. > > Both of these loops should be replaced with timeout/event/callback > approaches. > > Do you want to attempt this or would you like me to do it ? >I can take a crack at it. Thanks Shriram _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Nov-12 17:00 UTC
Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]
Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]"):> On Tue, Nov 12, 2013 at 10:38 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> > wrote: > Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control > network buffering in remus callbacks [and 1 more messages] [and 1 more > messages]"):...> Both of these loops should be replaced with timeout/event/callback > approaches. > > Do you want to attempt this or would you like me to do it ? > > I can take a crack at it.OK. Good luck. If you get stuck please do shout for help. Also note that feature freeze is quite soon! Ian.
Shriram Rajagopalan
2013-Nov-14 03:55 UTC
Re: [PATCH 2 of 5 V3] tools/hotplug: Remus network buffering setup scripts
On Thu, Oct 31, 2013 at 5:25 PM, Ian Campbell <Ian.Campbell@citrix.com>wrote:> > This was to just make sure that the IFB variable was supplied as part > > of the environment.. > > Just like the two checks on top of this script.. > > " > > : ${vifname?} > > : ${XENBUS_PATH?} > > " > > Do these result in useful error reporting to the end user?It displays a message "<script name>:lineno: vifname: parameter not set or null" on stderr.> Bearing in > mind that the end user might be using libxl but not xl and therefore not > see stdout/err (which might be going into some daemon''s log file). > >These parameters are supposed to be set by libxl before fork+exec-ing the script. And their values are not obtained from the user or the command line. They are generated inside libxl, and checked for non-null status. So I dont think we need to worry about libxl not being able to see the error.> Also, either I''m missing it or the bash manpage only documents > ${parameter:?} and not ${parameter?}. Are both actually valid? > >You are right. The man page documents only the ${parameter:?} format, although the "${parameter?}" also seems to work for me. I will stick to the documented format.> Ian. > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel