Shriram Rajagopalan
2013-Nov-15 05:47 UTC
[PATCH 0 of 7 V4] Remus/Libxl: Network buffering support
This patch series adds support for network buffering in the Remus codebase in libxl and makes the suspend callback asynchronous. I have tested the series with PV guests (with and without suspend event channel). Would appreciate if anyone could test the series with PV-HVM guests and pure HVM guests. Changes in V4: [1/7] *new*. This Ian Jackson''s patch -- making libxl__domain_suspend_callback asynchronous [2/7] [was 1/5 in previous versions] Remove check for libnl command line utils [3/7] [was 2/5 previously] minor nits [4/7] [was 3/5 previously] define LIBXL_HAVE_REMUS_NETBUF in libxl.h [5/7] [was 4/5 previously] clean ups. Make the usleep in checkpoint callback asynchronous [6/7] [was 5/5 previously] minor nits [7/7] *new*. Refactor libxl__domain_suspend_common_callback to substitute usleep based wait loops with libxl''s timer events. Changes in V3: [1/5] Fix redundant checks in configure scripts (based on Ian Campbell''s suggestions) [2/5] Introduce locking in the script, during IFB setup. Add xenstore paths used by netbuf scripts to xenstore-paths.markdown [3/5] Hotplug scripts setup/teardown invocations are now asynchronous following IanJ''s feedback. However, the invocations are still sequential. [5/5] Allow per-domain specification of netbuffer scripts in xl remus commmand. And minor nits throughout the series based on feedback from the last version Changes in V2: [1/5] Configure script will automatically enable/disable network buffer support depending on the availability of the appropriate libnl3 version. [If libnl3 is unavailable, a warning message will be printed to let the user know that the feature has been disabled.] use macros from pkg.m4 instead of pkg-config commands removed redundant checks for libnl3 libraries. [3,4/5] - Minor nits. Version 1: [1/5] Changes to autoconf scripts to check for libnl3. Add linker flags to libxl Makefile. [2/5] External script to setup/teardown network buffering using libnl3''s CLI. This script will be invoked by libxl before starting Remus. The script''s main job is to bring up an IFB device with plug qdisc attached to it. It then re-routes egress traffic from the guest''s vif to the IFB device. [3/5] Libxl code to invoke the external setup script, followed by netlink related setup to obtain a handle on the output buffers attached to each vif. [4/5] Libxl interaction with network buffer module in the kernel via libnl3 API. [5/5] xl cmdline switch to explicitly enable network buffering when starting remus. Few things to note: a) Based on previous email discussions, the setup/teardown task has been moved to a hotplug style shell script which can be customized as desired, instead of implementing it as C code inside libxl. b) Libnl3 is not available on NetBSD. Nor is it available on CentOS (Linux). So I have made network buffering support an optional feature so that it can be disabled if desired. c) NetBSD does not have libnl3. So I have put the setup script under tools/hotplug/Linux folder. thanks shriram
Shriram Rajagopalan
2013-Nov-15 05:47 UTC
[PATCH 1 of 7 V4] [PATCH] libxl: make libxl__domain_suspend_callback be asynchronous
# HG changeset patch # User Ian Jackson <ian.jackson@eu.citrix.com> # Date 1384374958 28800 # Node ID 3143bed377a08bb9e87c7636e004dc371bb20338 # Parent b90864157a670771462fb05918b0829889dde86c [PATCH] libxl: make libxl__domain_suspend_callback be asynchronous Date: Mon, 4 Nov 2013 16:27:53 +0000 Mark the suspend callback as asynchronous in the helper stub generator (libxl_save_msgs_gen.pl). Remus is going to want to provide an asynchronous version of this function. libxl__domain_suspend_common_callback, the common synchronous core, which used to be provided directly as the callback function for the helper machinery, becomes libxl__domain_suspend_callback_common. It can now take a typesafe parameter. For now, provide two very similar asynchronous wrappers for it. Each is a simple function which contains only boilerplate, calls the common synchronous core, and returns the asynchronous response. Essentially, we have just moved (in the case of suspend callbacks) the call site of libxl__srm_callout_callback_complete. It was in the switch statement in the autogenerated _libxl_save_msgs_callout.c, and is now in the handwritten libxl_dom.c. No functional change. Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca> Cc: Ian Campbell <ian.campbell@citrix.com> --- tools/libxl/libxl_dom.c | 25 +++++++++++++++++++------ tools/libxl/libxl_internal.h | 2 +- tools/libxl/libxl_save_msgs_gen.pl | 2 +- 3 files changed, 21 insertions(+), 8 deletions(-) diff -r b90864157a67 -r 3143bed377a0 tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c Wed Nov 13 12:34:19 2013 -0800 +++ b/tools/libxl/libxl_dom.c Wed Nov 13 12:35:58 2013 -0800 @@ -1003,10 +1003,8 @@ int libxl__domain_resume_device_model(li return 0; } -int libxl__domain_suspend_common_callback(void *user) +int libxl__domain_suspend_callback_common(libxl__domain_suspend_state *dss) { - libxl__save_helper_state *shs = user; - libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); STATE_AO_GC(dss->ao); unsigned long hvm_s_state = 0, hvm_pvdrv = 0; int ret; @@ -1225,12 +1223,27 @@ int libxl__toolstack_save(uint32_t domid return 0; } +static void libxl__domain_suspend_callback(void *data) +{ + libxl__save_helper_state *shs = data; + libxl__egc *egc = shs->egc; + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); + + int ok = libxl__domain_suspend_callback_common(dss); + libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); +} + /*----- remus callbacks -----*/ -static int libxl__remus_domain_suspend_callback(void *data) +static void libxl__remus_domain_suspend_callback(void *data) { + libxl__save_helper_state *shs = data; + libxl__egc *egc = shs->egc; + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); + /* REMUS TODO: Issue disk and network checkpoint reqs. */ - return libxl__domain_suspend_common_callback(data); + int ok = libxl__domain_suspend_callback_common(dss); + libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); } static int libxl__remus_domain_resume_callback(void *data) @@ -1354,7 +1367,7 @@ void libxl__domain_suspend(libxl__egc *e callbacks->postcopy = libxl__remus_domain_resume_callback; callbacks->checkpoint = libxl__remus_domain_checkpoint_callback; } else - callbacks->suspend = libxl__domain_suspend_common_callback; + callbacks->suspend = libxl__domain_suspend_callback; callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty; dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save; diff -r b90864157a67 -r 3143bed377a0 tools/libxl/libxl_internal.h --- a/tools/libxl/libxl_internal.h Wed Nov 13 12:34:19 2013 -0800 +++ b/tools/libxl/libxl_internal.h Wed Nov 13 12:35:58 2013 -0800 @@ -2597,7 +2597,7 @@ _hidden void libxl__xc_domain_save_done( void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc, libxl__save_helper_state *shs, int return_value); -_hidden int libxl__domain_suspend_common_callback(void *data); +_hidden int libxl__domain_suspend_callback_common(libxl__domain_suspend_state*); _hidden void libxl__domain_suspend_common_switch_qemu_logdirty (int domid, unsigned int enable, void *data); _hidden int libxl__toolstack_save(uint32_t domid, uint8_t **buf, diff -r b90864157a67 -r 3143bed377a0 tools/libxl/libxl_save_msgs_gen.pl --- a/tools/libxl/libxl_save_msgs_gen.pl Wed Nov 13 12:34:19 2013 -0800 +++ b/tools/libxl/libxl_save_msgs_gen.pl Wed Nov 13 12:35:58 2013 -0800 @@ -23,7 +23,7 @@ our @msgs = ( STRING doing_what), ''unsigned long'', ''done'', ''unsigned long'', ''total''] ], - [ 3, ''scxW'', "suspend", [] ], + [ 3, ''scxA'', "suspend", [] ], [ 4, ''scxW'', "postcopy", [] ], [ 5, ''scxA'', "checkpoint", [] ], [ 6, ''scxA'', "switch_qemu_logdirty", [qw(int domid
Shriram Rajagopalan
2013-Nov-15 05:47 UTC
[PATCH 2 of 7 V4] remus: add libnl3 dependency to autoconf scripts
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1384492673 28800 # Node ID f14fb4982ec89b30c04c6f4684de5056450c27a0 # Parent 3143bed377a08bb9e87c7636e004dc371bb20338 remus: add libnl3 dependency to autoconf scripts Libnl3 is required for controlling Remus network buffering. This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts. Also provide ability to configure tools without libnl3 support, that is without network buffering support. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 3143bed377a0 -r f14fb4982ec8 README --- a/README Wed Nov 13 12:35:58 2013 -0800 +++ b/README Thu Nov 14 21:17:53 2013 -0800 @@ -71,6 +71,10 @@ disabled at compile time: includes the alternative ocaml xenstored. * cmake (if building vtpm stub domains) * markdown + * Development install of libnl3 (e.g., libnl-3-200, + libnl-3-dev, etc). Required if network buffering is desired + when using Remus with libxl. See tools/remus/README for detailed + information. Second, you need to acquire a suitable kernel for use in domain 0. If possible you should use a kernel provided by your OS distributor. If diff -r 3143bed377a0 -r f14fb4982ec8 config/Tools.mk.in --- a/config/Tools.mk.in Wed Nov 13 12:35:58 2013 -0800 +++ b/config/Tools.mk.in Thu Nov 14 21:17:53 2013 -0800 @@ -38,6 +38,8 @@ PTHREAD_LIBS := @PTHREAD_LIBS@ PTYFUNCS_LIBS := @PTYFUNCS_LIBS@ +LIBNL3_LIBS := @LIBNL3_LIBS@ +LIBNL3_CFLAGS := @LIBNL3_CFLAGS@ # Download GIT repositories via HTTP or GIT''s own protocol? # GIT''s protocol is faster and more robust, when it works at all (firewalls # may block it). We make it the default, but if your GIT repository downloads @@ -56,6 +58,7 @@ CONFIG_QEMU_TRAD := @qemu_traditional CONFIG_QEMU_XEN := @qemu_xen@ CONFIG_XEND := @xend@ CONFIG_BLKTAP1 := @blktap1@ +CONFIG_REMUS_NETBUF := @remus_netbuf@ #System options ZLIB := @zlib@ diff -r 3143bed377a0 -r f14fb4982ec8 tools/configure.ac --- a/tools/configure.ac Wed Nov 13 12:35:58 2013 -0800 +++ b/tools/configure.ac Thu Nov 14 21:17:53 2013 -0800 @@ -230,6 +230,21 @@ AC_SUBST(libiconv) # Checks for header files. AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h]) +# Check for libnl3 >=3.2.8. If present enable remus network buffering. +PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8], + [libnl3_lib="y"], [libnl3_lib="n"]) + +AS_IF([test "x$libnl3_lib" = "xn" ], [ + AC_MSG_WARN([Disabling support for Remus network buffering. + Please install libnl3 libraries, command line tools and devel + headers - version 3.2.8 or higher]) + AC_SUBST(remus_netbuf, [n]) + ],[ + AC_SUBST(LIBNL3_LIBS) + AC_SUBST(LIBNL3_CFLAGS) + AC_SUBST(remus_netbuf, [y]) +]) + AC_OUTPUT() AS_IF([test "x$xend" = "xy" ], [ diff -r 3143bed377a0 -r f14fb4982ec8 tools/libxl/Makefile --- a/tools/libxl/Makefile Wed Nov 13 12:35:58 2013 -0800 +++ b/tools/libxl/Makefile Thu Nov 14 21:17:53 2013 -0800 @@ -21,11 +21,13 @@ endif LIBXL_LIBS LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS) +LIBXL_LIBS += $(LIBNL3_LIBS) CFLAGS_LIBXL += $(CFLAGS_libxenctrl) CFLAGS_LIBXL += $(CFLAGS_libxenguest) CFLAGS_LIBXL += $(CFLAGS_libxenstore) CFLAGS_LIBXL += $(CFLAGS_libblktapctl) +CFLAGS_LIBXL += $(LIBNL3_CFLAGS) CFLAGS_LIBXL += -Wshadow CFLAGS += $(PTHREAD_CFLAGS) diff -r 3143bed377a0 -r f14fb4982ec8 tools/remus/README --- a/tools/remus/README Wed Nov 13 12:35:58 2013 -0800 +++ b/tools/remus/README Thu Nov 14 21:17:53 2013 -0800 @@ -2,3 +2,9 @@ Remus provides fault tolerance for virtu checkpoints to a backup, which will activate if the target VM fails. See the website at http://nss.cs.ubc.ca/remus/ for details. + +Using Remus with libxl on Xen 4.4 and higher: + To enable network buffering, you need libnl 3.2.8 + or higher along with the development headers and command line utilities. + If your distro does not have the appropriate libnl3 version, you can find + the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
Shriram Rajagopalan
2013-Nov-15 05:47 UTC
[PATCH 3 of 7 V4] tools/hotplug: Remus network buffering setup scripts
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1384492674 28800 # Node ID 85a6d32b9e261f051bcc1776481ff5f4b7af1286 # Parent f14fb4982ec89b30c04c6f4684de5056450c27a0 tools/hotplug: Remus network buffering setup scripts This patch introduces remus-netbuf-setup hotplug script responsible for setting up and tearing down the necessary infrastructure required for network output buffering in Remus. This script is intended to be invoked by libxl for each guest interface, when starting or stopping Remus. Apart from returning success/failure indication via the usual hotplug entries in xenstore, this script also writes to xenstore, the name of the IFB device to be used to control the vif''s network output. The script relies on libnl3 command line utilities to perform various setup/teardown functions. The script is confined to Linux platforms only since NetBSD does not seem to have libnl3. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r f14fb4982ec8 -r 85a6d32b9e26 docs/misc/xenstore-paths.markdown --- a/docs/misc/xenstore-paths.markdown Thu Nov 14 21:17:53 2013 -0800 +++ b/docs/misc/xenstore-paths.markdown Thu Nov 14 21:17:54 2013 -0800 @@ -356,6 +356,10 @@ The guest''s virtual time offset from UTC The device model version for a domain. +#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL] + +IFB device used by Remus to buffer network output from the associated vif. + [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html diff -r f14fb4982ec8 -r 85a6d32b9e26 tools/hotplug/Linux/Makefile --- a/tools/hotplug/Linux/Makefile Thu Nov 14 21:17:53 2013 -0800 +++ b/tools/hotplug/Linux/Makefile Thu Nov 14 21:17:54 2013 -0800 @@ -16,6 +16,7 @@ XEN_SCRIPTS += network-nat vif-nat XEN_SCRIPTS += vif-openvswitch XEN_SCRIPTS += vif2 XEN_SCRIPTS += vif-setup +XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup XEN_SCRIPTS += block XEN_SCRIPTS += block-enbd block-nbd XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap diff -r f14fb4982ec8 -r 85a6d32b9e26 tools/hotplug/Linux/remus-netbuf-setup --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/hotplug/Linux/remus-netbuf-setup Thu Nov 14 21:17:54 2013 -0800 @@ -0,0 +1,183 @@ +#!/bin/bash +#===========================================================================+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup +# +# Script for attaching a network buffer to the specified vif (in any mode). +# The hotplugging system will call this script when starting remus via libxl +# API, libxl_domain_remus_start. +# +# Usage: +# remus-netbuf-setup (setup|teardown) +# +# Environment vars: +# vifname vif interface name (required). +# XENBUS_PATH path in Xenstore, where the IFB device details will be stored +# or read from (required). +# (libxl passes /libxl/<domid>/remus/netbuf/<devid>) +# IFB ifb interface to be cleaned up (required). [for teardown op only] + +# Written to the store: (setup operation) +# XENBUS_PATH/ifb=<ifbdevName> the IFB device serving +# as the intermediate buffer through which the interface''s network output +# can be controlled. +# +# To install a network buffer on a guest vif (vif1.0) using ifb (ifb0) +# we need to do the following +# +# ip link set dev ifb0 up +# tc qdisc add dev vif1.0 ingress +# tc filter add dev vif1.0 parent ffff: proto ip \ +# prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0 +# nl-qdisc-add --dev=ifb0 --parent root plug +# nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000 +# (10MB limit on buffer) +# +# So order of operations when installing a network buffer on vif1.0 +# 1. find a free ifb and bring up the device +# 2. redirect traffic from vif1.0 to ifb: +# 2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest) +# 2.2 use tc filter command with actions mirred egress + redirect +# 3. install plug_qdisc on ifb device, with which we can buffer/release +# guest''s network output from vif1.0 +# +# + +#===========================================================================+ +# Unlike other vif scripts, vif-common is not needed here as it executes vif +#specific setup code such as renaming. +dir=$(dirname "$0") +. "$dir/xen-hotplug-common.sh" + +findCommand "$@" + +if [ "$command" != "setup" -a "$command" != "teardown" ] +then + echo "Invalid command: $command" + log err "Invalid command: $command" + exit 1 +fi + +evalVariables "$@" + +: ${vifname:?} +: ${XENBUS_PATH:?} + +check_libnl_tools() { + if ! command -v nl-qdisc-list > /dev/null 2>&1; then + fatal "Unable to find nl-qdisc-list tool" + fi + if ! command -v nl-qdisc-add > /dev/null 2>&1; then + fatal "Unable to find nl-qdisc-add tool" + fi + if ! command -v nl-qdisc-delete > /dev/null 2>&1; then + fatal "Unable to find nl-qdisc-delete tool" + fi +} + +# We only check for modules. We don''t load them. +# User/Admin is supposed to load ifb during boot time, +# ensuring that there are enough free ifbs in the system. +# Other modules will be loaded automatically by tc commands. +check_modules() { + for m in ifb sch_plug sch_ingress act_mirred cls_u32 + do + if ! modinfo $m > /dev/null 2>&1; then + fatal "Unable to find $m kernel module" + fi + done +} + +setup_ifb() { + + for ifb in `ifconfig -a -s|egrep ^ifb|cut -d '' '' -f1` + do + local installed=`nl-qdisc-list -d $ifb` + [ -n "$installed" ] && continue + IFB="$ifb" + break + done + + if [ -z "$IFB" ] + then + fatal "Unable to find a free IFB device for $vifname" + fi + + do_or_die ip link set dev "$IFB" up +} + +redirect_vif_traffic() { + local vif=$1 + local ifb=$2 + + do_or_die tc qdisc add dev "$vif" ingress + + tc filter add dev "$vif" parent ffff: proto ip prio 10 \ + u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1 + + if [ $? -ne 0 ] + then + do_without_error tc qdisc del dev "$vif" ingress + fatal "Failed to redirect traffic from $vif to $ifb" + fi +} + +add_plug_qdisc() { + local vif=$1 + local ifb=$2 + + nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1 + if [ $? -ne 0 ] + then + do_without_error tc qdisc del dev "$vif" ingress + fatal "Failed to add plug qdisc to $ifb" + fi + + #set ifb buffering limit in bytes. Its okay if this command fails + nl-qdisc-add --dev="$ifb" --parent root \ + --update plug --limit=10000000 >/dev/null 2>&1 +} + +teardown_netbuf() { + local vif=$1 + local ifb=$2 + + if [ "$ifb" ]; then + do_without_error ip link set dev "$ifb" down + do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1 + xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true + fi + do_without_error tc qdisc del dev "$vif" ingress + xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true +} + +xs_write_failed() { + local vif=$1 + local ifb=$2 + teardown_netbuf "$vifname" "$IFB" + fatal "failed to write ifb name to xenstore" +} + +case "$command" in + setup) + check_libnl_tools + check_modules + + claim_lock "pickifb" + setup_ifb + redirect_vif_traffic "$vifname" "$IFB" + add_plug_qdisc "$vifname" "$IFB" + release_lock "pickifb" + + #not using xenstore_write that automatically exits on error + #because we need to cleanup + _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB" + success + ;; + teardown) + : ${IFB:?} + teardown_netbuf "$vifname" "$IFB" + ;; +esac + +log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB."
Shriram Rajagopalan
2013-Nov-15 05:47 UTC
[PATCH 4 of 7 V4] tools/libxl: setup/teardown Remus network buffering
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1384492676 28800 # Node ID d3d7da1c2289749b2a1c5b8baaf9d9c18a914e9b # Parent 85a6d32b9e261f051bcc1776481ff5f4b7af1286 tools/libxl: setup/teardown Remus network buffering Setup/teardown remus network buffering for a given guest, when libxl_domain_remus_start API is invoked. This patch does the following during setup: a) call the hotplug script for each vif to setup its network buffer b) establish a dedicated remus context containing libnl related state (netlink sockets, qdisc caches, etc.,) c) Obtain handles to plug qdiscs installed on the IFB devices chosen by the hotplug scripts. During teardown, the netlink resources are released, followed by invocation of hotplug scripts to remove the IFB devices. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 85a6d32b9e26 -r d3d7da1c2289 tools/libxl/Makefile --- a/tools/libxl/Makefile Thu Nov 14 21:17:54 2013 -0800 +++ b/tools/libxl/Makefile Thu Nov 14 21:17:56 2013 -0800 @@ -42,6 +42,13 @@ LIBXL_OBJS-y += libxl_blktap2.o else LIBXL_OBJS-y += libxl_noblktap2.o endif + +ifeq ($(CONFIG_REMUS_NETBUF),y) +LIBXL_OBJS-y += libxl_netbuffer.o +else +LIBXL_OBJS-y += libxl_nonetbuffer.o +endif + LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o LIBXL_OBJS-$(CONFIG_IA64) += libxl_nocpuid.o libxl_noarch.o LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_noarch.o diff -r 85a6d32b9e26 -r d3d7da1c2289 tools/libxl/libxl.c --- a/tools/libxl/libxl.c Thu Nov 14 21:17:54 2013 -0800 +++ b/tools/libxl/libxl.c Thu Nov 14 21:17:56 2013 -0800 @@ -698,8 +698,9 @@ libxl_vminfo * libxl_list_vm(libxl_ctx * return ptr; } -static void remus_failover_cb(libxl__egc *egc, - libxl__domain_suspend_state *dss, int rc); +static void remus_replication_failure_cb(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc); /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, @@ -718,40 +719,57 @@ int libxl_domain_remus_start(libxl_ctx * GCNEW(dss); dss->ao = ao; - dss->callback = remus_failover_cb; + dss->callback = remus_replication_failure_cb; dss->domid = domid; dss->fd = send_fd; /* TODO do something with recv_fd */ dss->type = type; dss->live = 1; dss->debug = 0; - dss->remus = info; assert(info); - /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */ + GCNEW(dss->remus_ctx); + + /* convenience shorthand */ + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + remus_ctx->blackhole = info->blackhole; + remus_ctx->interval = info->interval; + remus_ctx->compression = info->compression; + remus_ctx->dss = dss; + + /* TODO: enable disk buffering */ + + /* Setup network buffering */ + if (info->netbuf) { + if (!libxl__netbuffer_enabled(gc)) { + LOG(ERROR, "Remus: No support for network buffering"); + goto out; + } + + if (info->netbufscript) { + remus_ctx->netbufscript + libxl__strdup(gc, info->netbufscript); + } else { + remus_ctx->netbufscript + GCSPRINTF("%s/remus-netbuf-setup", + libxl__xen_script_dir_path()); + } + } /* Point of no return */ - libxl__domain_suspend(egc, dss); + libxl__remus_setup_initiate(egc, dss); return AO_INPROGRESS; out: return AO_ABORT(rc); } -static void remus_failover_cb(libxl__egc *egc, - libxl__domain_suspend_state *dss, int rc) +static void remus_replication_failure_cb(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc) { STATE_AO_GC(dss->ao); - /* - * With Remus, if we reach this point, it means either - * backup died or some network error occurred preventing us - * from sending checkpoints. - */ - - /* TBD: Remus cleanup - i.e. detach qdisc, release other - * resources. - */ libxl__ao_complete(egc, ao, rc); } diff -r 85a6d32b9e26 -r d3d7da1c2289 tools/libxl/libxl.h --- a/tools/libxl/libxl.h Thu Nov 14 21:17:54 2013 -0800 +++ b/tools/libxl/libxl.h Thu Nov 14 21:17:56 2013 -0800 @@ -369,6 +369,19 @@ */ #define LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1 +/* + * LIBXL_HAVE_REMUS_NETBUF 1 + * + * If this is defined, then the libxl_domain_remus_info structure will + * have a boolean field (netbuf) and a string field (netbufscript). + * + * netbuf, if true, indicates that network buffering should be enabled. + * + * netbufscript, if set, indicates the path to the hotplug script to + * setup or teardown network buffers. + */ +#define LIBXL_HAVE_REMUS_NETBUF 1 + /* Functions annotated with LIBXL_EXTERNAL_CALLERS_ONLY may not be * called from within libxl itself. Callers outside libxl, who * do not #include libxl_internal.h, are fine. */ diff -r 85a6d32b9e26 -r d3d7da1c2289 tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c Thu Nov 14 21:17:54 2013 -0800 +++ b/tools/libxl/libxl_dom.c Thu Nov 14 21:17:56 2013 -0800 @@ -1233,6 +1233,51 @@ static void libxl__domain_suspend_callba libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); } +/*----- remus setup/teardown code -----*/ +void libxl__remus_setup_initiate(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ + libxl__ev_time_init(&dss->remus_ctx->timeout); + if (!dss->remus_ctx->netbufscript) + libxl__remus_setup_done(egc, dss, 0); + else + libxl__remus_netbuf_setup(egc, dss); +} + +void libxl__remus_setup_done(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc) +{ + STATE_AO_GC(dss->ao); + if (!rc) { + libxl__domain_suspend(egc, dss); + return; + } + + LOG(ERROR, "Remus: failed to setup network buffering" + " for guest with domid %u", dss->domid); + domain_suspend_done(egc, dss, rc); +} + +void libxl__remus_teardown_initiate(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc) +{ + /* stash rc somewhere before invoking teardown ops. */ + dss->remus_ctx->saved_rc = rc; + + if (!dss->remus_ctx->netbuf_ctx) + libxl__remus_teardown_done(egc, dss); + else + libxl__remus_netbuf_teardown(egc, dss); +} + +void libxl__remus_teardown_done(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ + dss->callback(egc, dss, dss->remus_ctx->saved_rc); +} + /*----- remus callbacks -----*/ static void libxl__remus_domain_suspend_callback(void *data) @@ -1286,7 +1331,7 @@ static void remus_checkpoint_dm_saved(li /* REMUS TODO: Wait for disk and memory ack, release network buffer */ /* REMUS TODO: make this asynchronous */ assert(!rc); /* REMUS TODO handle this error properly */ - usleep(dss->interval * 1000); + usleep(dss->remus_ctx->interval * 1000); libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); } @@ -1304,7 +1349,6 @@ void libxl__domain_suspend(libxl__egc *e const libxl_domain_type type = dss->type; const int live = dss->live; const int debug = dss->debug; - const libxl_domain_remus_info *const r_info = dss->remus; libxl__srm_save_autogen_callbacks *const callbacks &dss->shs.callbacks.save.a; @@ -1339,11 +1383,8 @@ void libxl__domain_suspend(libxl__egc *e dss->guest_responded = 0; dss->dm_savefile = libxl__device_model_savefile(gc, domid); - if (r_info != NULL) { - dss->interval = r_info->interval; - if (r_info->compression) - dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS; - } + if (dss->remus_ctx && dss->remus_ctx->compression) + dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS; dss->xce = xc_evtchn_open(NULL, 0); if (dss->xce == NULL) @@ -1362,7 +1403,7 @@ void libxl__domain_suspend(libxl__egc *e } memset(callbacks, 0, sizeof(*callbacks)); - if (r_info != NULL) { + if (dss->remus_ctx != NULL) { callbacks->suspend = libxl__remus_domain_suspend_callback; callbacks->postcopy = libxl__remus_domain_resume_callback; callbacks->checkpoint = libxl__remus_domain_checkpoint_callback; @@ -1522,6 +1563,17 @@ static void domain_suspend_done(libxl__e if (dss->xce != NULL) xc_evtchn_close(dss->xce); + if (dss->remus_ctx) { + /* + * With Remus, if we reach this point, it means either + * backup died or some network error occurred preventing us + * from sending checkpoints. Teardown the network buffers and + * release netlink resources. This is an async op. + */ + libxl__remus_teardown_initiate(egc, dss, rc); + return; + } + dss->callback(egc, dss, rc); } diff -r 85a6d32b9e26 -r d3d7da1c2289 tools/libxl/libxl_internal.h --- a/tools/libxl/libxl_internal.h Thu Nov 14 21:17:54 2013 -0800 +++ b/tools/libxl/libxl_internal.h Thu Nov 14 21:17:56 2013 -0800 @@ -2290,6 +2290,50 @@ typedef struct libxl__logdirty_switch { libxl__ev_time timeout; } libxl__logdirty_switch; +typedef struct libxl__remus_ctx { + /* checkpoint interval */ + int interval; + int blackhole; + int compression; + int saved_rc; + /* Script to setup/teardown network buffers */ + const char *netbufscript; + /* Opaque context containing network buffer related stuff */ + void *netbuf_ctx; + libxl__domain_suspend_state *dss; + libxl__ev_time timeout; + libxl__ev_child child; + int num_exec; +} libxl__remus_ctx; + +_hidden int libxl__netbuffer_enabled(libxl__gc *gc); + +_hidden void libxl__remus_setup_initiate(libxl__egc *egc, + libxl__domain_suspend_state *dss); + +_hidden void libxl__remus_setup_done(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc); + +_hidden void libxl__remus_teardown_initiate(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc); + +_hidden void libxl__remus_teardown_done(libxl__egc *egc, + libxl__domain_suspend_state *dss); + +_hidden void libxl__remus_netbuf_setup(libxl__egc *egc, + libxl__domain_suspend_state *dss); + +_hidden void libxl__remus_netbuf_teardown(libxl__egc *egc, + libxl__domain_suspend_state *dss); + +_hidden int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx); + +_hidden int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx); + struct libxl__domain_suspend_state { /* set by caller of libxl__domain_suspend */ libxl__ao *ao; @@ -2300,7 +2344,7 @@ struct libxl__domain_suspend_state { libxl_domain_type type; int live; int debug; - const libxl_domain_remus_info *remus; + libxl__remus_ctx *remus_ctx; /* private */ xc_evtchn *xce; /* event channel handle */ int suspend_eventchn; @@ -2308,7 +2352,6 @@ struct libxl__domain_suspend_state { int xcflags; int guest_responded; const char *dm_savefile; - int interval; /* checkpoint interval (for Remus) */ libxl__save_helper_state shs; libxl__logdirty_switch logdirty; /* private for libxl__domain_save_device_model */ diff -r 85a6d32b9e26 -r d3d7da1c2289 tools/libxl/libxl_netbuffer.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/libxl/libxl_netbuffer.c Thu Nov 14 21:17:56 2013 -0800 @@ -0,0 +1,516 @@ +/* + * Copyright (C) 2013 + * Author Shriram Rajagopalan <rshriram@cs.ubc.ca> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include "libxl_osdeps.h" /* must come before any other headers */ + +#include "libxl_internal.h" + +#include <netlink/cache.h> +#include <netlink/socket.h> +#include <netlink/attr.h> +#include <netlink/route/link.h> +#include <netlink/route/route.h> +#include <netlink/route/qdisc.h> +#include <netlink/route/qdisc/plug.h> + +typedef struct libxl__remus_netbuf_ctx { + struct rtnl_qdisc **netbuf_qdisc_list; + struct nl_sock *nlsock; + struct nl_cache *qdisc_cache; + char **vif_list; + char **ifb_list; + uint32_t num_netbufs; + uint32_t unused; +} libxl__remus_netbuf_ctx; + +/* If the device has a vifname, then use that instead of + * the vifX.Y format. + */ +static char *get_vifname(libxl__gc *gc, uint32_t domid, + libxl_device_nic *nic) +{ + const char *vifname = NULL; + vifname = libxl__xs_read(gc, XBT_NULL, + libxl__sprintf(gc, + "%s/backend/vif/%d/%d/vifname", + libxl__xs_get_dompath(gc, 0), + domid, nic->devid)); + if (!vifname) { + vifname = libxl__device_nic_devname(gc, domid, + nic->devid, + nic->nictype); + } + + return libxl__strdup(gc, vifname); +} + +static char **get_guest_vif_list(libxl__gc *gc, uint32_t domid, + int *num_vifs) +{ + libxl_device_nic *nics = NULL; + int nb, i = 0; + char **vif_list = NULL; + + *num_vifs = 0; + nics = libxl_device_nic_list(CTX, domid, &nb); + if (!nics) return NULL; + + /* Ensure that none of the vifs are backed by driver domains */ + for (i = 0; i < nb; i++) { + if (nics[i].backend_domid != LIBXL_TOOLSTACK_DOMID) { + LOG(ERROR, "vif %s has driver domain (%u) as its backend.\n" + "Network buffering is not supported with driver domains", + get_vifname(gc, domid, &nics[i]), nics[i].backend_domid); + *num_vifs = -1; + goto end; + } + } + + vif_list = libxl__calloc(gc, nb, sizeof(char *)); + for (i = 0; i < nb; ++i) + vif_list[i] = get_vifname(gc, domid, &nics[i]); + *num_vifs = nb; + + end: + for (i = 0; i < nb; i++) + libxl_device_nic_dispose(&nics[i]); + free(nics); + return vif_list; +} + +static int init_qdiscs(libxl__gc *gc, + libxl__remus_ctx *remus_ctx); + +static int exec_netbuf_script(libxl__gc *gc, libxl__remus_ctx *remus_ctx, + char *op, libxl__ev_child_callback *death); + +static void netbuf_setup_script_cb(libxl__egc *egc, + libxl__ev_child *child, + pid_t pid, int status); + +static void netbuf_teardown_script_cb(libxl__egc *egc, + libxl__ev_child *child, + pid_t pid, int status); + +static void netbuf_setup_timeout_cb(libxl__egc *egc, + libxl__ev_time *ev, + const struct timeval *requested_abs); + +static int init_qdiscs(libxl__gc *gc, + libxl__remus_ctx *remus_ctx) +{ + int i, ret, ifindex; + struct rtnl_link *ifb = NULL; + struct rtnl_qdisc *qdisc = NULL; + + /* convenience vars */ + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + int num_netbufs = netbuf_ctx->num_netbufs; + char **ifb_list = netbuf_ctx->ifb_list; + + /* Now that we have brought up IFB devices with plug qdisc for + * each vif, lets get a netlink handle on the plug qdisc for use + * during checkpointing. + */ + netbuf_ctx->nlsock = nl_socket_alloc(); + if (!netbuf_ctx->nlsock) { + LOG(ERROR, "cannot allocate nl socket"); + goto end; + } + + ret = nl_connect(netbuf_ctx->nlsock, NETLINK_ROUTE); + if (ret) { + LOG(ERROR, "failed to open netlink socket: %s", + nl_geterror(ret)); + goto end; + } + + /* get list of all qdiscs installed on network devs. */ + ret = rtnl_qdisc_alloc_cache(netbuf_ctx->nlsock, + &netbuf_ctx->qdisc_cache); + if (ret) { + LOG(ERROR, "failed to allocate qdisc cache: %s", + nl_geterror(ret)); + goto end; + } + + /* list of handles to plug qdiscs */ + netbuf_ctx->netbuf_qdisc_list + libxl__calloc(gc, num_netbufs, + sizeof(struct rtnl_qdisc *)); + + ifb = rtnl_link_alloc(); + + for (i = 0; i < num_netbufs; ++i) { + + /* get a handle to the IFB interface */ + ret = rtnl_link_get_kernel(netbuf_ctx->nlsock, 0, + ifb_list[i], &ifb); + if (ret) { + LOG(ERROR, "cannot obtain handle for %s: %s", ifb_list[i], + nl_geterror(ret)); + goto end; + } + + ifindex = rtnl_link_get_ifindex(ifb); + if (!ifindex) { + LOG(ERROR, "interface %s has no index", ifb_list[i]); + goto end; + } + + /* Get a reference to the root qdisc installed on the IFB, by + * querying the qdisc list we obtained earlier. The netbufscript + * sets up the plug qdisc as the root qdisc, so we don''t have to + * search the entire qdisc tree on the IFB dev. + + * There is no need to explicitly free this qdisc as its just a + * reference from the qdisc cache we allocated earlier. + */ + qdisc = rtnl_qdisc_get_by_parent(netbuf_ctx->qdisc_cache, ifindex, + TC_H_ROOT); + + if (qdisc) { + const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc)); + /* Sanity check: Ensure that the root qdisc is a plug qdisc. */ + if (!tc_kind || strcmp(tc_kind, "plug")) { + LOG(ERROR, "plug qdisc is not installed on %s", ifb_list[i]); + goto end; + } + netbuf_ctx->netbuf_qdisc_list[i] = qdisc; + } else { + LOG(ERROR, "Cannot get qdisc handle from ifb %s", ifb_list[i]); + goto end; + } + } + + rtnl_link_put(ifb); + return 0; + + end: + if (ifb) rtnl_link_put(ifb); + return ERROR_FAIL; +} + +/* the script needs the following env & args + * $vifname + * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/) + * $IFB (for teardown) + * setup/teardown as command line arg. + * In return, the script writes the name of IFB device (during setup) to be + * used for output buffering into XENBUS_PATH/ifb + */ +static int exec_netbuf_script(libxl__gc *gc, libxl__remus_ctx *remus_ctx, + char *op, libxl__ev_child_callback *death) +{ + int arraysize, nr = 0; + char **env = NULL, **args = NULL; + pid_t pid; + libxl__ev_child *child = &remus_ctx->child; + libxl__ev_time *timeout = &remus_ctx->timeout; + char *script = libxl__strdup(gc, remus_ctx->netbufscript); + uint32_t domid = remus_ctx->dss->domid; + int devid = remus_ctx->num_exec; + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + char *vif = netbuf_ctx->vif_list[devid]; + char *ifb = netbuf_ctx->ifb_list[devid]; + + if (!strcmp(op, "setup")) + arraysize = 5; + else + arraysize = 7; + + GCNEW_ARRAY(env, arraysize); + env[nr++] = "vifname"; + env[nr++] = vif; + env[nr++] = "XENBUS_PATH"; + env[nr++] = GCSPRINTF("%s/remus/netbuf/%d", + libxl__xs_libxl_path(gc, domid), devid); + if (!strcmp(op, "teardown")) { + env[nr++] = "IFB"; + env[nr++] = ifb; + } + env[nr++] = NULL; + assert(nr <= arraysize); + + arraysize = 3; nr = 0; + GCNEW_ARRAY(args, arraysize); + args[nr++] = script; + args[nr++] = op; + args[nr++] = NULL; + assert(nr == arraysize); + + libxl__ev_child_init(child); + + /* Set hotplug timeout */ + if (libxl__ev_time_register_rel(gc, timeout, + netbuf_setup_timeout_cb, + LIBXL_HOTPLUG_TIMEOUT * 1000)) { + LOG(ERROR, "unable to register timeout for " + "netbuf setup script %s on vif %s", script, vif); + return ERROR_FAIL; + } + + LOG(DEBUG, "Calling netbuf script: %s %s on vif %s", + script, op, vif); + + /* Fork and exec netbuf script */ + pid = libxl__ev_child_fork(gc, child, death); + if (pid == -1) { + LOG(ERROR, "unable to fork netbuf script %s", script); + return ERROR_FAIL; + } + + if (!pid) { + /* child: Launch netbuf script */ + libxl__exec(gc, -1, -1, -1, args[0], args, env); + /* notreached */ + abort(); + } + + return 0; +} + +static void netbuf_teardown_script_cb(libxl__egc *egc, + libxl__ev_child *child, + pid_t pid, int status) +{ + libxl__remus_ctx *remus_ctx = CONTAINER_OF(child, *remus_ctx, child); + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + + STATE_AO_GC(remus_ctx->dss->ao); + + if (status) { + libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR, + remus_ctx->netbufscript, + pid, status); + } + + remus_ctx->num_exec++; + if (remus_ctx->num_exec < netbuf_ctx->num_netbufs) { + if (exec_netbuf_script(gc, remus_ctx, + "teardown", netbuf_teardown_script_cb)) + goto end; + return; + } + + end: + libxl__remus_teardown_done(egc, remus_ctx->dss); +} + +static void netbuf_setup_script_cb(libxl__egc *egc, + libxl__ev_child *child, + pid_t pid, int status) +{ + libxl__remus_ctx *remus_ctx = CONTAINER_OF(child, *remus_ctx, child); + char *out_path_base, *hotplug_error; + uint32_t domid = remus_ctx->dss->domid; + int devid = remus_ctx->num_exec; + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + char *vif = netbuf_ctx->vif_list[devid]; + char **ifb = &netbuf_ctx->ifb_list[devid]; + int rc = ERROR_FAIL; + + STATE_AO_GC(remus_ctx->dss->ao); + + libxl__ev_time_deregister(gc, &remus_ctx->timeout); + + out_path_base = GCSPRINTF("%s/remus/netbuf/%d", + libxl__xs_libxl_path(gc, domid), devid); + + hotplug_error = libxl__xs_read(gc, XBT_NULL, + GCSPRINTF("%s/hotplug-error", + out_path_base)); + if (hotplug_error) { + LOG(ERROR, "netbuf script %s setup failed for vif %s: %s", + remus_ctx->netbufscript, + netbuf_ctx->vif_list[devid], hotplug_error); + goto end; + } + + if (status) { + libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR, + remus_ctx->netbufscript, + pid, status); + goto end; + } + + *ifb = libxl__xs_read(gc, XBT_NULL, + GCSPRINTF("%s/remus/netbuf/%d/ifb", + libxl__xs_libxl_path(gc, domid), + devid)); + if (!(*ifb)) { + LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s", + domid, vif); + goto end; + } + + LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif); + remus_ctx->num_exec++; + if (remus_ctx->num_exec < netbuf_ctx->num_netbufs) { + if (exec_netbuf_script(gc, remus_ctx, + "setup", netbuf_setup_script_cb)) + goto end; + return; + } + + rc = init_qdiscs(gc, remus_ctx); + end: + libxl__remus_setup_done(egc, remus_ctx->dss, rc); +} + +static void netbuf_setup_timeout_cb(libxl__egc *egc, + libxl__ev_time *ev, + const struct timeval *requested_abs) +{ + libxl__remus_ctx *remus_ctx = CONTAINER_OF(ev, *remus_ctx, timeout); + int devid = remus_ctx->num_exec; + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + char *vif = netbuf_ctx->vif_list[devid]; + + STATE_AO_GC(remus_ctx->dss->ao); + + libxl__ev_time_deregister(gc, &remus_ctx->timeout); + assert(libxl__ev_child_inuse(&remus_ctx->child)); + + LOG(DEBUG, "killing hotplug script %s (on vif %s) because of timeout", + remus_ctx->netbufscript, vif); + + if (kill(remus_ctx->child.pid, SIGKILL)) { + LOGEV(ERROR, errno, "unable to kill hotplug script %s [%ld]", + remus_ctx->netbufscript, + (unsigned long)remus_ctx->child.pid); + } + + return; +} + +int libxl__netbuffer_enabled(libxl__gc *gc) +{ + return 1; +} + +/* Scan through the list of vifs belonging to domid and + * invoke the netbufscript to setup the IFB device & plug qdisc + * for each vif. Then scan through the list of IFB devices to obtain + * a handle on the plug qdisc installed on these IFB devices. + * Network output buffering is controlled via these qdiscs. + */ +void libxl__remus_netbuf_setup(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ + libxl__remus_netbuf_ctx *netbuf_ctx = NULL; + uint32_t domid = dss->domid; + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + int num_netbufs = 0; + int rc = ERROR_FAIL; + + STATE_AO_GC(dss->ao); + + netbuf_ctx = libxl__zalloc(gc, sizeof(libxl__remus_netbuf_ctx)); + netbuf_ctx->vif_list = get_guest_vif_list(gc, domid, &num_netbufs); + if (!num_netbufs) { + rc = 0; + goto end; + } + + if (num_netbufs < 0) goto end; + + netbuf_ctx->ifb_list = libxl__calloc(gc, num_netbufs, + sizeof(char *)); + netbuf_ctx->num_netbufs = num_netbufs; + remus_ctx->netbuf_ctx = netbuf_ctx; + remus_ctx->num_exec = 0; //start at devid == 0 + if (exec_netbuf_script(gc, remus_ctx, "setup", + netbuf_setup_script_cb)) + goto end; + return; + + end: + libxl__remus_setup_done(egc, dss, rc); +} + +/* Note: This function will be called in the same gc context as + * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start + * API call. + */ +void libxl__remus_netbuf_teardown(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + STATE_AO_GC(dss->ao); + + if (netbuf_ctx->qdisc_cache) + nl_cache_free(netbuf_ctx->qdisc_cache); + + if (netbuf_ctx->nlsock) + nl_close(netbuf_ctx->nlsock); + + remus_ctx->num_exec = 0; //start at devid == 0 + if (exec_netbuf_script(gc, remus_ctx, "teardown", + netbuf_teardown_script_cb)) + libxl__remus_teardown_done(egc, dss); +} + +#define TC_BUFFER_START 1 +#define TC_BUFFER_RELEASE 2 +static int remus_netbuf_op(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx, + int buffer_op) +{ + int i, ret; + libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx; + + for (i = 0; i < netbuf_ctx->num_netbufs; ++i) { + if (buffer_op == TC_BUFFER_START) + ret = rtnl_qdisc_plug_buffer(netbuf_ctx->netbuf_qdisc_list[i]); + else + ret = rtnl_qdisc_plug_release_one(netbuf_ctx->netbuf_qdisc_list[i]); + + if (!ret) + ret = rtnl_qdisc_add(netbuf_ctx->nlsock, + netbuf_ctx->netbuf_qdisc_list[i], + NLM_F_REQUEST); + if (ret) { + LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s", + ((buffer_op == TC_BUFFER_START) ? + "start_new_epoch" : "release_prev_epoch"), + netbuf_ctx->ifb_list[i], nl_geterror(ret)); + return ERROR_FAIL; + } + } + + return 0; +} + +int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx) +{ + return remus_netbuf_op(gc, domid, remus_ctx, TC_BUFFER_START); +} + +int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx) +{ + return remus_netbuf_op(gc, domid, remus_ctx, TC_BUFFER_RELEASE); +} + +/* + * Local variables: + * mode: C + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ diff -r 85a6d32b9e26 -r d3d7da1c2289 tools/libxl/libxl_nonetbuffer.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/libxl/libxl_nonetbuffer.c Thu Nov 14 21:17:56 2013 -0800 @@ -0,0 +1,56 @@ +/* + * Copyright (C) 2013 + * Author Shriram Rajagopalan <rshriram@cs.ubc.ca> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include "libxl_osdeps.h" /* must come before any other headers */ + +#include "libxl_internal.h" + +int libxl__netbuffer_enabled(libxl__gc *gc) +{ + return 0; +} + +/* Remus network buffer related stubs */ +void libxl__remus_netbuf_setup(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ +} + +void libxl__remus_netbuf_teardown(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ +} + +int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx) +{ + LOG(ERROR, "Remus: No support for network buffering"); + return ERROR_FAIL; +} + +int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid, + libxl__remus_ctx *remus_ctx) +{ + LOG(ERROR, "Remus: No support for network buffering"); + return ERROR_FAIL; +} + +/* + * Local variables: + * mode: C + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ diff -r 85a6d32b9e26 -r d3d7da1c2289 tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl Thu Nov 14 21:17:54 2013 -0800 +++ b/tools/libxl/libxl_types.idl Thu Nov 14 21:17:56 2013 -0800 @@ -557,6 +557,8 @@ libxl_domain_remus_info = Struct("domain ("interval", integer), ("blackhole", bool), ("compression", bool), + ("netbuf", bool), + ("netbufscript", string), ]) libxl_event_type = Enumeration("event_type", [
Shriram Rajagopalan
2013-Nov-15 05:47 UTC
[PATCH 5 of 7 V4] tools/libxl: Control network buffering in remus callbacks
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1384492677 28800 # Node ID 8e5366a6cd958c00a9b7a726c149a552a02a7af6 # Parent d3d7da1c2289749b2a1c5b8baaf9d9c18a914e9b tools/libxl: Control network buffering in remus callbacks This patch constitutes the core network buffering logic. and does the following: a) create a new network buffer when the domain is suspended (remus_domain_suspend_callback) b) release the previous network buffer pertaining to the committed checkpoint (remus_domain_checkpoint_dm_saved) Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r d3d7da1c2289 -r 8e5366a6cd95 tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c Thu Nov 14 21:17:56 2013 -0800 +++ b/tools/libxl/libxl_dom.c Thu Nov 14 21:17:57 2013 -0800 @@ -1285,9 +1285,23 @@ static void libxl__remus_domain_suspend_ libxl__save_helper_state *shs = data; libxl__egc *egc = shs->egc; libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + STATE_AO_GC(dss->ao); - /* REMUS TODO: Issue disk and network checkpoint reqs. */ + /* REMUS TODO: Issue disk checkpoint reqs. */ int ok = libxl__domain_suspend_callback_common(dss); + + if (!remus_ctx->netbuf_ctx || !ok) goto out; + + /* The domain was suspended successfully. Start a new network + * buffer for the next epoch. If this operation fails, then act + * as though domain suspend failed -- libxc exits its infinite + * loop and ultimately, the replication stops. + */ + if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid, + remus_ctx)) + ok = 0; + out: libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); } @@ -1301,7 +1315,7 @@ static int libxl__remus_domain_resume_ca if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) return 0; - /* REMUS TODO: Deal with disk. Start a new network output buffer */ + /* REMUS TODO: Deal with disk. */ return 1; } @@ -1310,6 +1324,9 @@ static int libxl__remus_domain_resume_ca static void remus_checkpoint_dm_saved(libxl__egc *egc, libxl__domain_suspend_state *dss, int rc); +static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev, + const struct timeval *requested_abs); + static void libxl__remus_domain_checkpoint_callback(void *data) { libxl__save_helper_state *shs = data; @@ -1328,10 +1345,51 @@ static void libxl__remus_domain_checkpoi static void remus_checkpoint_dm_saved(libxl__egc *egc, libxl__domain_suspend_state *dss, int rc) { - /* REMUS TODO: Wait for disk and memory ack, release network buffer */ - /* REMUS TODO: make this asynchronous */ - assert(!rc); /* REMUS TODO handle this error properly */ - usleep(dss->remus_ctx->interval * 1000); + /* + * REMUS TODO: Wait for disk and explicit memory ack (through restore + * callback from remote) before releasing network buffer. + */ + libxl__remus_ctx *remus_ctx = dss->remus_ctx; + STATE_AO_GC(dss->ao); + + if (rc) { + LOG(ERROR, "Failed to save device model. Terminating Remus.."); + goto out; + } + + if (remus_ctx->netbuf_ctx) { + rc = libxl__remus_netbuf_release_prev_epoch(gc, dss->domid, + remus_ctx); + if (rc) { + LOG(ERROR, "Failed to release network buffer." + " Terminating Remus.."); + goto out; + } + } + + /* Set checkpoint interval timeout */ + rc = libxl__ev_time_register_rel(gc, &remus_ctx->timeout, + remus_next_checkpoint, + dss->remus_ctx->interval); + if (rc) { + LOG(ERROR, "unable to register timeout for next epoch." + " Terminating Remus.."); + goto out; + } + return; + + out: + libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0); +} + +static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev, + const struct timeval *requested_abs) +{ + libxl__remus_ctx *remus_ctx = CONTAINER_OF(ev, *remus_ctx, timeout); + libxl__domain_suspend_state *dss = remus_ctx->dss; + STATE_AO_GC(dss->ao); + + libxl__ev_time_deregister(gc, &remus_ctx->timeout); libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); }
Shriram Rajagopalan
2013-Nov-15 05:47 UTC
[PATCH 6 of 7 V4] tools/xl: Remus - Network buffering cmdline switch
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1384492678 28800 # Node ID 6a2a1bfffc666d5c7aaaa3532c709f2e38a86d05 # Parent 8e5366a6cd958c00a9b7a726c149a552a02a7af6 tools/xl: Remus - Network buffering cmdline switch Command line switch to ''xl remus'' command, to enable network buffering. Pass on this flag to libxl so that it can act accordingly. Also update man pages to reflect the addition of a new option to ''xl remus'' command. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 8e5366a6cd95 -r 6a2a1bfffc66 docs/man/xl.conf.pod.5 --- a/docs/man/xl.conf.pod.5 Thu Nov 14 21:17:57 2013 -0800 +++ b/docs/man/xl.conf.pod.5 Thu Nov 14 21:17:58 2013 -0800 @@ -105,6 +105,12 @@ Configures the default gateway device to Default: C<None> +=item B<remus.default.netbufscript="PATH"> + +Configures the default script used by Remus to setup network buffering. + +Default: C</etc/xen/scripts/remus-netbuf-setup> + =item B<output_format="json|sxp"> Configures the default output format used by xl when printing "machine diff -r 8e5366a6cd95 -r 6a2a1bfffc66 docs/man/xl.pod.1 --- a/docs/man/xl.pod.1 Thu Nov 14 21:17:57 2013 -0800 +++ b/docs/man/xl.pod.1 Thu Nov 14 21:17:58 2013 -0800 @@ -398,8 +398,7 @@ Print huge (!) amount of debug during th Enable Remus HA for domain. By default B<xl> relies on ssh as a transport mechanism between the two hosts. -N.B: Remus support in xl is still in experimental (proof-of-concept) phase. - There is no support for network or disk buffering at the moment. +N.B: There is no support for disk buffering at the moment. B<OPTIONS> @@ -418,6 +417,18 @@ Generally useful for debugging. Disable memory checkpoint compression. +=item B<-n> + +Enable network output buffering. The default script used to configure +network buffering is /etc/xen/scripts/remus-netbuf-setup. If you wish to +use a custom script, use the I<-N> option or set the global variable +I<remus.default.netbufscript> in /etc/xen/xl.conf to point to your script. + +=item B<-N> I<netbufscript> + +Use <netbufscript> to setup network buffering instead of the instead of +the default (/etc/xen/scripts/remus-netbuf-setup). + =item B<-s> I<sshcommand> Use <sshcommand> instead of ssh. String will be passed to sh. diff -r 8e5366a6cd95 -r 6a2a1bfffc66 tools/libxl/xl.c --- a/tools/libxl/xl.c Thu Nov 14 21:17:57 2013 -0800 +++ b/tools/libxl/xl.c Thu Nov 14 21:17:58 2013 -0800 @@ -46,6 +46,7 @@ char *default_vifscript = NULL; char *default_bridge = NULL; char *default_gatewaydev = NULL; char *default_vifbackend = NULL; +char *default_remus_netbufscript = NULL; enum output_format default_output_format = OUTPUT_FORMAT_JSON; int claim_mode = 1; @@ -177,6 +178,9 @@ static void parse_global_config(const ch if (!xlu_cfg_get_long (config, "claim_mode", &l, 0)) claim_mode = l; + xlu_cfg_replace_string (config, "remus.default.netbufscript", + &default_remus_netbufscript, 0); + xlu_cfg_destroy(config); } diff -r 8e5366a6cd95 -r 6a2a1bfffc66 tools/libxl/xl.h --- a/tools/libxl/xl.h Thu Nov 14 21:17:57 2013 -0800 +++ b/tools/libxl/xl.h Thu Nov 14 21:17:58 2013 -0800 @@ -152,6 +152,7 @@ extern char *default_vifscript; extern char *default_bridge; extern char *default_gatewaydev; extern char *default_vifbackend; +extern char *default_remus_netbufscript; extern char *blkdev_start; enum output_format { diff -r 8e5366a6cd95 -r 6a2a1bfffc66 tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c Thu Nov 14 21:17:57 2013 -0800 +++ b/tools/libxl/xl_cmdimpl.c Thu Nov 14 21:17:58 2013 -0800 @@ -7115,8 +7115,9 @@ int main_remus(int argc, char **argv) r_info.interval = 200; r_info.blackhole = 0; r_info.compression = 1; - - SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) { + r_info.netbuf = 0; + + SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) { case ''i'': r_info.interval = atoi(optarg); break; @@ -7126,6 +7127,12 @@ int main_remus(int argc, char **argv) case ''u'': r_info.compression = 0; break; + case ''n'': + r_info.netbuf = 1; + break; + case ''N'': + r_info.netbufscript = optarg; + break; case ''s'': ssh_command = optarg; break; @@ -7137,6 +7144,9 @@ int main_remus(int argc, char **argv) domid = find_domain(argv[optind]); host = argv[optind + 1]; + if(!r_info.netbufscript) + r_info.netbufscript = default_remus_netbufscript; + if (r_info.blackhole) { send_fd = open("/dev/null", O_RDWR, 0644); if (send_fd < 0) { @@ -7174,13 +7184,10 @@ int main_remus(int argc, char **argv) /* Point of no return */ rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0); - /* If we are here, it means backup has failed/domain suspend failed. - * Try to resume the domain and exit gracefully. + /* If we are here, it means remus setup/domain suspend/backup has + * failed. Try to resume the domain and exit gracefully. * TODO: Split-Brain check. */ - fprintf(stderr, "remus sender: libxl_domain_suspend failed" - " (rc=%d)\n", rc); - if (rc == ERROR_GUEST_TIMEDOUT) fprintf(stderr, "Failed to suspend domain at primary.\n"); else { diff -r 8e5366a6cd95 -r 6a2a1bfffc66 tools/libxl/xl_cmdtable.c --- a/tools/libxl/xl_cmdtable.c Thu Nov 14 21:17:57 2013 -0800 +++ b/tools/libxl/xl_cmdtable.c Thu Nov 14 21:17:58 2013 -0800 @@ -481,6 +481,9 @@ struct cmd_spec cmd_table[] = { "-i MS Checkpoint domain memory every MS milliseconds (def. 200ms).\n" "-b Replicate memory checkpoints to /dev/null (blackhole)\n" "-u Disable memory checkpoint compression.\n" + "-n Enable network output buffering.\n" + "-N <netbufscript> Use netbufscript to setup network buffering instead of the\n" + " instead of the default (/etc/xen/scripts/remus-netbuf-setup).\n" "-s <sshcommand> Use <sshcommand> instead of ssh. String will be passed\n" " to sh. If empty, run <host> instead of \n" " ssh <host> xl migrate-receive -r [-e]\n"
Shriram Rajagopalan
2013-Nov-15 05:47 UTC
[PATCH 7 of 7 V4] tools/libxl: refactor domain_suspend_callback code to be fully asynchronous
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1384492684 28800 # Node ID c9b550e435d8dce5302e77609849bf007b4f3c2e # Parent 6a2a1bfffc666d5c7aaaa3532c709f2e38a86d05 tools/libxl: refactor domain_suspend_callback code to be fully asynchronous libxl__domain_suspend_callback_common uses usleep calls, while the caller libxl_domain_suspend_callback is asynchronous. This patch refactors the libxl__domain_suspend__common code to use AO facilities like libxl event loop timers instead of usleep calls. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 6a2a1bfffc66 -r c9b550e435d8 tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c Thu Nov 14 21:17:58 2013 -0800 +++ b/tools/libxl/libxl_dom.c Thu Nov 14 21:18:04 2013 -0800 @@ -1003,49 +1003,237 @@ int libxl__domain_resume_device_model(li return 0; } -int libxl__domain_suspend_callback_common(libxl__domain_suspend_state *dss) +static int cancel_xenbus_suspend_request(libxl__gc *gc, uint32_t domid); +static void wait_for_ack_timeout(libxl__egc *egc, libxl__ev_time *ev, + const struct timeval *requested_abs); +static void wait_for_suspend_timeout(libxl__egc *egc, libxl__ev_time *ev, + const struct timeval *requested_abs); +static void wait_for_suspend_req_ack(libxl__domain_suspend_state *dss); +static void wait_for_guest_suspend(libxl__domain_suspend_state *dss); +static void guest_suspended(libxl__domain_suspend_state *dss); + +/* Returns 0 if suspend request was cancelled and the guest did not + * respond during the cancellation process (see comments in function for + * explanation of race conditions). + * Returns 1 if guest had finally acked suspend request, during the + * cancellation process. + */ +static int cancel_xenbus_suspend_request(libxl__gc *gc, uint32_t domid) { + char *state = NULL; + xs_transaction_t t; + + /* + * Guest appears to not be responding. Cancel the suspend + * request. + * + * We re-read the suspend node and clear it within a + * transaction in order to handle the case where we race + * against the guest catching up and acknowledging the request + * at the last minute. + */ + LOG(ERROR, "guest didn''t acknowledge suspend, cancelling request"); + retry_transaction: + t = xs_transaction_start(CTX->xsh); + + state = libxl__domain_pvcontrol_read(gc, t, domid); + if (!state) state = ""; + + if (!strcmp(state, "suspend")) + libxl__domain_pvcontrol_write(gc, t, domid, ""); + + if (!xs_transaction_end(CTX->xsh, t, 0)) + if (errno == EAGAIN) + goto retry_transaction; + + /* + * Final check for guest acknowledgement. The guest may have + * acknowledged while we were cancelling the request in which + * case we lost the race while cancelling and should continue. + */ + if (!strcmp(state, "suspend")) { + LOG(ERROR, "guest didn''t acknowledge suspend, request cancelled"); + return 0; + } + + return 1; +} + +static void wait_for_ack_timeout(libxl__egc *egc, libxl__ev_time *ev, + const struct timeval *requested_abs) +{ + libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, timeout); STATE_AO_GC(dss->ao); - unsigned long hvm_s_state = 0, hvm_pvdrv = 0; + + libxl__ev_time_deregister(gc, &dss->timeout); + dss->watchdog--; + wait_for_suspend_req_ack(dss); +} + +static void wait_for_suspend_req_ack(libxl__domain_suspend_state *dss) +{ + char *state = NULL; int ret; - char *state = "suspend"; - int watchdog; - xs_transaction_t t; + int wait_time; + STATE_AO_GC(dss->ao); + + state = libxl__domain_pvcontrol_read(gc, XBT_NULL, dss->domid); + + if (!state) state = ""; + + if (!strcmp(state, "suspend") && dss->watchdog > 0) { + /* The first timeout is a short one (10ms), hoping that the + * guest would respond immediately. The subsequent timeouts + * are longer (40ms). + */ + wait_time = dss->watchdog_starting ? 10 : 40; + dss->watchdog_starting = 0; + ret = libxl__ev_time_register_rel(gc, &dss->timeout, + wait_for_ack_timeout, wait_time); + if (ret) { + LOG(ERROR, "unable to register timeout event to wait for" + " guest to suspend ack. Cancelling suspend request"); + goto cancel_req; + } + return; + } + + if (strcmp(state, "suspend")) { + ack: + LOG(DEBUG, "guest acknowledged suspend request"); + dss->guest_responded = 1; + dss->watchdog = 60; + dss->watchdog_starting = 1; + wait_for_guest_suspend(dss); + return; + } + + cancel_req: //either timer reg. failed or watchdog loop ended + if (cancel_xenbus_suspend_request(gc, dss->domid)) + goto ack; + + libxl__xc_domain_saverestore_async_callback_done(dss->shs.egc, &dss->shs, 0); +} + +static void wait_for_suspend_timeout(libxl__egc *egc, libxl__ev_time *ev, + const struct timeval *requested_abs) +{ + libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, timeout); + STATE_AO_GC(dss->ao); + + libxl__ev_time_deregister(gc, &dss->timeout); + dss->watchdog--; + wait_for_guest_suspend(dss); +} + +static void wait_for_guest_suspend(libxl__domain_suspend_state *dss) +{ + int ret; + int wait_time; + xc_domaininfo_t info; + STATE_AO_GC(dss->ao); + + ret = xc_domain_getinfolist(CTX->xch, dss->domid, 1, &info); + if (ret == 1 && info.domain == dss->domid && + (info.flags & XEN_DOMINF_shutdown)) { + int shutdown_reason; + + shutdown_reason = (info.flags >> XEN_DOMINF_shutdownshift) + & XEN_DOMINF_shutdownmask; + if (shutdown_reason == SHUTDOWN_suspend) { + LOG(DEBUG, "guest has suspended"); + guest_suspended(dss); + return; + } + } + + if (dss->watchdog > 0) { + /* The first timeout is a short one (10ms), hoping that the + * guest would respond immediately. The subsequent timeouts + * are longer (40ms). + */ + wait_time = dss->watchdog_starting ? 10 : 40; + dss->watchdog_starting = 0; + ret = libxl__ev_time_register_rel(gc, &dss->timeout, + wait_for_suspend_timeout, wait_time); + if (ret) { + LOG(ERROR, "unable to register timeout event to wait for" + " guest to suspend."); + goto err; + } + return; + } + + LOG(ERROR, "guest did not suspend"); + err: + libxl__xc_domain_saverestore_async_callback_done(dss->shs.egc, + &dss->shs, 0); +} + +static int libxl__remus_domain_suspend_callback(libxl__domain_suspend_state *dss); +static void guest_suspended(libxl__domain_suspend_state *dss) +{ + int ret, ok = 1; + STATE_AO_GC(dss->ao); + if (dss->hvm) { + ret = libxl__domain_suspend_device_model(gc, dss); + if (ret) { + LOG(ERROR, "libxl__domain_suspend_device_model failed " + "ret=%d", ret); + ok = 0; + goto end; + } + } + + if (dss->remus_ctx) + ok = libxl__remus_domain_suspend_callback(dss); + + end: + libxl__xc_domain_saverestore_async_callback_done(dss->shs.egc, + &dss->shs, ok); +} + +static void libxl__domain_suspend_callback(void *data) +{ + libxl__save_helper_state *shs = data; + libxl__egc *egc = shs->egc; + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); + int ret; + STATE_AO_GC(dss->ao); /* Convenience aliases */ const uint32_t domid = dss->domid; - if (dss->hvm) { - xc_get_hvm_param(CTX->xch, domid, HVM_PARAM_CALLBACK_IRQ, &hvm_pvdrv); - xc_get_hvm_param(CTX->xch, domid, HVM_PARAM_ACPI_S_STATE, &hvm_s_state); - } - - if ((hvm_s_state == 0) && (dss->suspend_eventchn >= 0)) { + if ((dss->hvm_s_state == 0) && (dss->suspend_eventchn >= 0)) { LOG(DEBUG, "issuing %s suspend request via event channel", dss->hvm ? "PVHVM" : "PV"); ret = xc_evtchn_notify(dss->xce, dss->suspend_eventchn); if (ret < 0) { LOG(ERROR, "xc_evtchn_notify failed ret=%d", ret); - return 0; + goto err; } ret = xc_await_suspend(CTX->xch, dss->xce, dss->suspend_eventchn); if (ret < 0) { LOG(ERROR, "xc_await_suspend failed ret=%d", ret); - return 0; + goto err; } dss->guest_responded = 1; - goto guest_suspended; + guest_suspended(dss); + return; } - if (dss->hvm && (!hvm_pvdrv || hvm_s_state)) { + dss->watchdog = 60; + dss->watchdog_starting = 1; + if (dss->hvm && (!dss->hvm_pvdrv || dss->hvm_s_state)) { LOG(DEBUG, "Calling xc_domain_shutdown on HVM domain"); ret = xc_domain_shutdown(CTX->xch, domid, SHUTDOWN_suspend); if (ret < 0) { LOGE(ERROR, "xc_domain_shutdown failed"); - return 0; + goto err; } /* The guest does not (need to) respond to this sort of request. */ dss->guest_responded = 1; + wait_for_guest_suspend(dss); } else { LOG(DEBUG, "issuing %s suspend request via XenBus control node", dss->hvm ? "PVHVM" : "PV"); @@ -1053,90 +1241,12 @@ int libxl__domain_suspend_callback_commo libxl__domain_pvcontrol_write(gc, XBT_NULL, domid, "suspend"); LOG(DEBUG, "wait for the guest to acknowledge suspend request"); - watchdog = 60; - while (!strcmp(state, "suspend") && watchdog > 0) { - usleep(100000); + wait_for_suspend_req_ack(dss); + } + return; - state = libxl__domain_pvcontrol_read(gc, XBT_NULL, domid); - if (!state) state = ""; - - watchdog--; - } - - /* - * Guest appears to not be responding. Cancel the suspend - * request. - * - * We re-read the suspend node and clear it within a - * transaction in order to handle the case where we race - * against the guest catching up and acknowledging the request - * at the last minute. - */ - if (!strcmp(state, "suspend")) { - LOG(ERROR, "guest didn''t acknowledge suspend, cancelling request"); - retry_transaction: - t = xs_transaction_start(CTX->xsh); - - state = libxl__domain_pvcontrol_read(gc, t, domid); - if (!state) state = ""; - - if (!strcmp(state, "suspend")) - libxl__domain_pvcontrol_write(gc, t, domid, ""); - - if (!xs_transaction_end(CTX->xsh, t, 0)) - if (errno == EAGAIN) - goto retry_transaction; - - } - - /* - * Final check for guest acknowledgement. The guest may have - * acknowledged while we were cancelling the request in which - * case we lost the race while cancelling and should continue. - */ - if (!strcmp(state, "suspend")) { - LOG(ERROR, "guest didn''t acknowledge suspend, request cancelled"); - return 0; - } - - LOG(DEBUG, "guest acknowledged suspend request"); - dss->guest_responded = 1; - } - - LOG(DEBUG, "wait for the guest to suspend"); - watchdog = 60; - while (watchdog > 0) { - xc_domaininfo_t info; - - usleep(100000); - ret = xc_domain_getinfolist(CTX->xch, domid, 1, &info); - if (ret == 1 && info.domain == domid && - (info.flags & XEN_DOMINF_shutdown)) { - int shutdown_reason; - - shutdown_reason = (info.flags >> XEN_DOMINF_shutdownshift) - & XEN_DOMINF_shutdownmask; - if (shutdown_reason == SHUTDOWN_suspend) { - LOG(DEBUG, "guest has suspended"); - goto guest_suspended; - } - } - - watchdog--; - } - - LOG(ERROR, "guest did not suspend"); - return 0; - - guest_suspended: - if (dss->hvm) { - ret = libxl__domain_suspend_device_model(gc, dss); - if (ret) { - LOG(ERROR, "libxl__domain_suspend_device_model failed ret=%d", ret); - return 0; - } - } - return 1; + err: + libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0); } static inline char *physmap_path(libxl__gc *gc, uint32_t domid, @@ -1223,16 +1333,6 @@ int libxl__toolstack_save(uint32_t domid return 0; } -static void libxl__domain_suspend_callback(void *data) -{ - libxl__save_helper_state *shs = data; - libxl__egc *egc = shs->egc; - libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); - - int ok = libxl__domain_suspend_callback_common(dss); - libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); -} - /*----- remus setup/teardown code -----*/ void libxl__remus_setup_initiate(libxl__egc *egc, libxl__domain_suspend_state *dss) @@ -1280,18 +1380,14 @@ void libxl__remus_teardown_done(libxl__e /*----- remus callbacks -----*/ -static void libxl__remus_domain_suspend_callback(void *data) +static int libxl__remus_domain_suspend_callback(libxl__domain_suspend_state *dss) { - libxl__save_helper_state *shs = data; - libxl__egc *egc = shs->egc; - libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); libxl__remus_ctx *remus_ctx = dss->remus_ctx; STATE_AO_GC(dss->ao); /* REMUS TODO: Issue disk checkpoint reqs. */ - int ok = libxl__domain_suspend_callback_common(dss); - - if (!remus_ctx->netbuf_ctx || !ok) goto out; + if (!remus_ctx->netbuf_ctx) + return 1; /* The domain was suspended successfully. Start a new network * buffer for the next epoch. If this operation fails, then act @@ -1300,9 +1396,9 @@ static void libxl__remus_domain_suspend_ */ if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid, remus_ctx)) - ok = 0; - out: - libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); + return 0; + + return 1; } static int libxl__remus_domain_resume_callback(void *data) @@ -1411,6 +1507,7 @@ void libxl__domain_suspend(libxl__egc *e &dss->shs.callbacks.save.a; logdirty_init(&dss->logdirty); + libxl__ev_time_init(&dss->timeout); switch (type) { case LIBXL_DOMAIN_TYPE_HVM: { @@ -1423,6 +1520,10 @@ void libxl__domain_suspend(libxl__egc *e vm_generationid_addr = (addr) ? strtoul(addr, NULL, 0) : 0; dss->hvm = 1; + xc_get_hvm_param(CTX->xch, domid, HVM_PARAM_CALLBACK_IRQ, + &dss->hvm_pvdrv); + xc_get_hvm_param(CTX->xch, domid, HVM_PARAM_ACPI_S_STATE, + &dss->hvm_s_state); break; } case LIBXL_DOMAIN_TYPE_PV: @@ -1439,6 +1540,7 @@ void libxl__domain_suspend(libxl__egc *e dss->suspend_eventchn = -1; dss->guest_responded = 0; + dss->watchdog = 0; dss->dm_savefile = libxl__device_model_savefile(gc, domid); if (dss->remus_ctx && dss->remus_ctx->compression) @@ -1461,12 +1563,12 @@ void libxl__domain_suspend(libxl__egc *e } memset(callbacks, 0, sizeof(*callbacks)); + + callbacks->suspend = libxl__domain_suspend_callback; if (dss->remus_ctx != NULL) { - callbacks->suspend = libxl__remus_domain_suspend_callback; callbacks->postcopy = libxl__remus_domain_resume_callback; callbacks->checkpoint = libxl__remus_domain_checkpoint_callback; - } else - callbacks->suspend = libxl__domain_suspend_callback; + } callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty; dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save; diff -r 6a2a1bfffc66 -r c9b550e435d8 tools/libxl/libxl_internal.h --- a/tools/libxl/libxl_internal.h Thu Nov 14 21:17:58 2013 -0800 +++ b/tools/libxl/libxl_internal.h Thu Nov 14 21:18:04 2013 -0800 @@ -2351,6 +2351,11 @@ struct libxl__domain_suspend_state { int hvm; int xcflags; int guest_responded; + int watchdog; + int watchdog_starting; + unsigned long hvm_s_state; + unsigned long hvm_pvdrv; + libxl__ev_time timeout; const char *dm_savefile; libxl__save_helper_state shs; libxl__logdirty_switch logdirty; @@ -2640,7 +2645,6 @@ _hidden void libxl__xc_domain_save_done( void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc, libxl__save_helper_state *shs, int return_value); -_hidden int libxl__domain_suspend_callback_common(libxl__domain_suspend_state*); _hidden void libxl__domain_suspend_common_switch_qemu_logdirty (int domid, unsigned int enable, void *data); _hidden int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
Shriram Rajagopalan
2013-Nov-15 18:27 UTC
Re: [PATCH 7 of 7 V4] tools/libxl: refactor domain_suspend_callback code to be fully asynchronous
On Thu, Nov 14, 2013 at 11:47 PM, Shriram Rajagopalan <rshriram@cs.ubc.ca>wrote:> # HG changeset patch > # User Shriram Rajagopalan <rshriram@cs.ubc.ca> > # Date 1384492684 28800 > # Node ID c9b550e435d8dce5302e77609849bf007b4f3c2e > # Parent 6a2a1bfffc666d5c7aaaa3532c709f2e38a86d05 > tools/libxl: refactor domain_suspend_callback code to be fully asynchronous > > libxl__domain_suspend_callback_common uses usleep calls, > while the caller libxl_domain_suspend_callback is asynchronous. > This patch refactors the libxl__domain_suspend__common code to use > AO facilities like libxl event loop timers instead of usleep calls. > > Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> > > diff -r 6a2a1bfffc66 -r c9b550e435d8 tools/libxl/libxl_dom.c > --- a/tools/libxl/libxl_dom.c Thu Nov 14 21:17:58 2013 -0800 > +++ b/tools/libxl/libxl_dom.c Thu Nov 14 21:18:04 2013 -0800 > @@ -1003,49 +1003,237 @@ int libxl__domain_resume_device_model(li > return 0; > } > > -int libxl__domain_suspend_callback_common(libxl__domain_suspend_state > *dss) > +static int cancel_xenbus_suspend_request(libxl__gc *gc, uint32_t domid); > +static void wait_for_ack_timeout(libxl__egc *egc, libxl__ev_time *ev, > + const struct timeval *requested_abs); > +static void wait_for_suspend_timeout(libxl__egc *egc, libxl__ev_time *ev, > + const struct timeval *requested_abs); > +static void wait_for_suspend_req_ack(libxl__domain_suspend_state *dss); > +static void wait_for_guest_suspend(libxl__domain_suspend_state *dss); > +static void guest_suspended(libxl__domain_suspend_state *dss); > + > +/* Returns 0 if suspend request was cancelled and the guest did not > + * respond during the cancellation process (see comments in function for > + * explanation of race conditions). > + * Returns 1 if guest had finally acked suspend request, during the > + * cancellation process. > + */ > +static int cancel_xenbus_suspend_request(libxl__gc *gc, uint32_t domid) > { > + char *state = NULL; > + xs_transaction_t t; > + > + /* > + * Guest appears to not be responding. Cancel the suspend > + * request. > + * > + * We re-read the suspend node and clear it within a > + * transaction in order to handle the case where we race > + * against the guest catching up and acknowledging the request > + * at the last minute. > + */ > + LOG(ERROR, "guest didn''t acknowledge suspend, cancelling request"); > + retry_transaction: > + t = xs_transaction_start(CTX->xsh); > + > + state = libxl__domain_pvcontrol_read(gc, t, domid); > + if (!state) state = ""; > + > + if (!strcmp(state, "suspend")) > + libxl__domain_pvcontrol_write(gc, t, domid, ""); > + > + if (!xs_transaction_end(CTX->xsh, t, 0)) > + if (errno == EAGAIN) > + goto retry_transaction; > + > + /* > + * Final check for guest acknowledgement. The guest may have > + * acknowledged while we were cancelling the request in which > + * case we lost the race while cancelling and should continue. > + */ > + if (!strcmp(state, "suspend")) { > + LOG(ERROR, "guest didn''t acknowledge suspend, request cancelled"); > + return 0; > + } > + > + return 1; > +} > + > +static void wait_for_ack_timeout(libxl__egc *egc, libxl__ev_time *ev, > + const struct timeval *requested_abs) > +{ > + libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, timeout); > STATE_AO_GC(dss->ao); > - unsigned long hvm_s_state = 0, hvm_pvdrv = 0; > + > + libxl__ev_time_deregister(gc, &dss->timeout); > + dss->watchdog--; > + wait_for_suspend_req_ack(dss); > +} > + > +static void wait_for_suspend_req_ack(libxl__domain_suspend_state *dss) > +{ > + char *state = NULL; > int ret; > - char *state = "suspend"; > - int watchdog; > - xs_transaction_t t; > + int wait_time; > + STATE_AO_GC(dss->ao); > + > + state = libxl__domain_pvcontrol_read(gc, XBT_NULL, dss->domid); > + > + if (!state) state = ""; > + > + if (!strcmp(state, "suspend") && dss->watchdog > 0) { > + /* The first timeout is a short one (10ms), hoping that the > + * guest would respond immediately. The subsequent timeouts > + * are longer (40ms). > + */ > + wait_time = dss->watchdog_starting ? 10 : 40; > + dss->watchdog_starting = 0; > + ret = libxl__ev_time_register_rel(gc, &dss->timeout, > + wait_for_ack_timeout, > wait_time); > + if (ret) { > + LOG(ERROR, "unable to register timeout event to wait for" > + " guest to suspend ack. Cancelling suspend request"); > + goto cancel_req; > + } > + return; > + } > + > + if (strcmp(state, "suspend")) { > + ack: > + LOG(DEBUG, "guest acknowledged suspend request"); > + dss->guest_responded = 1; > + dss->watchdog = 60; > + dss->watchdog_starting = 1; > + wait_for_guest_suspend(dss); > + return; > + } > + > + cancel_req: //either timer reg. failed or watchdog loop ended > + if (cancel_xenbus_suspend_request(gc, dss->domid)) > + goto ack; > + > + libxl__xc_domain_saverestore_async_callback_done(dss->shs.egc, > &dss->shs, 0); > +} > + > +static void wait_for_suspend_timeout(libxl__egc *egc, libxl__ev_time *ev, > + const struct timeval *requested_abs) > +{ > + libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, timeout); > + STATE_AO_GC(dss->ao); > + > + libxl__ev_time_deregister(gc, &dss->timeout); > + dss->watchdog--; > + wait_for_guest_suspend(dss); > +} > + > +static void wait_for_guest_suspend(libxl__domain_suspend_state *dss) > +{ > + int ret; > + int wait_time; > + xc_domaininfo_t info; > + STATE_AO_GC(dss->ao); > + > + ret = xc_domain_getinfolist(CTX->xch, dss->domid, 1, &info); > + if (ret == 1 && info.domain == dss->domid && > + (info.flags & XEN_DOMINF_shutdown)) { > + int shutdown_reason; > + > + shutdown_reason = (info.flags >> XEN_DOMINF_shutdownshift) > + & XEN_DOMINF_shutdownmask; > + if (shutdown_reason == SHUTDOWN_suspend) { > + LOG(DEBUG, "guest has suspended"); > + guest_suspended(dss); > + return; > + } > + } > + > + if (dss->watchdog > 0) { > + /* The first timeout is a short one (10ms), hoping that the > + * guest would respond immediately. The subsequent timeouts > + * are longer (40ms). > + */ > + wait_time = dss->watchdog_starting ? 10 : 40; > + dss->watchdog_starting = 0; > + ret = libxl__ev_time_register_rel(gc, &dss->timeout, > + wait_for_suspend_timeout, > wait_time); > + if (ret) { > + LOG(ERROR, "unable to register timeout event to wait for" > + " guest to suspend."); > + goto err; > + } > + return; > + } > + > + LOG(ERROR, "guest did not suspend"); > + err: > + libxl__xc_domain_saverestore_async_callback_done(dss->shs.egc, > + &dss->shs, 0); > +} > + > +static int > libxl__remus_domain_suspend_callback(libxl__domain_suspend_state *dss); > +static void guest_suspended(libxl__domain_suspend_state *dss) > +{ > + int ret, ok = 1; > + STATE_AO_GC(dss->ao); > + if (dss->hvm) { > + ret = libxl__domain_suspend_device_model(gc, dss); > + if (ret) { > + LOG(ERROR, "libxl__domain_suspend_device_model failed " > + "ret=%d", ret); > + ok = 0; > + goto end; > + } > + } > + > + if (dss->remus_ctx) > + ok = libxl__remus_domain_suspend_callback(dss); > + > + end: > + libxl__xc_domain_saverestore_async_callback_done(dss->shs.egc, > + &dss->shs, ok); > +} > + > +static void libxl__domain_suspend_callback(void *data) > +{ > + libxl__save_helper_state *shs = data; > + libxl__egc *egc = shs->egc; > + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); > + int ret; > + STATE_AO_GC(dss->ao); > > /* Convenience aliases */ > const uint32_t domid = dss->domid; > > - if (dss->hvm) { > - xc_get_hvm_param(CTX->xch, domid, HVM_PARAM_CALLBACK_IRQ, > &hvm_pvdrv); > - xc_get_hvm_param(CTX->xch, domid, HVM_PARAM_ACPI_S_STATE, > &hvm_s_state); > - } > - > - if ((hvm_s_state == 0) && (dss->suspend_eventchn >= 0)) { > + if ((dss->hvm_s_state == 0) && (dss->suspend_eventchn >= 0)) { > LOG(DEBUG, "issuing %s suspend request via event channel", > dss->hvm ? "PVHVM" : "PV"); > ret = xc_evtchn_notify(dss->xce, dss->suspend_eventchn); > if (ret < 0) { > LOG(ERROR, "xc_evtchn_notify failed ret=%d", ret); > - return 0; > + goto err; > } > ret = xc_await_suspend(CTX->xch, dss->xce, dss->suspend_eventchn); > if (ret < 0) { > LOG(ERROR, "xc_await_suspend failed ret=%d", ret); > - return 0; > + goto err; > } > dss->guest_responded = 1; > - goto guest_suspended; > + guest_suspended(dss); > + return; > } > > - if (dss->hvm && (!hvm_pvdrv || hvm_s_state)) { > + dss->watchdog = 60; > + dss->watchdog_starting = 1; > + if (dss->hvm && (!dss->hvm_pvdrv || dss->hvm_s_state)) { > LOG(DEBUG, "Calling xc_domain_shutdown on HVM domain"); > ret = xc_domain_shutdown(CTX->xch, domid, SHUTDOWN_suspend); > if (ret < 0) { > LOGE(ERROR, "xc_domain_shutdown failed"); > - return 0; > + goto err; > } > /* The guest does not (need to) respond to this sort of request. > */ > dss->guest_responded = 1; > + wait_for_guest_suspend(dss); > } else { > LOG(DEBUG, "issuing %s suspend request via XenBus control node", > dss->hvm ? "PVHVM" : "PV"); > @@ -1053,90 +1241,12 @@ int libxl__domain_suspend_callback_commo > libxl__domain_pvcontrol_write(gc, XBT_NULL, domid, "suspend"); > > LOG(DEBUG, "wait for the guest to acknowledge suspend request"); > - watchdog = 60; > - while (!strcmp(state, "suspend") && watchdog > 0) { > - usleep(100000); > + wait_for_suspend_req_ack(dss); > + } > + return; > > - state = libxl__domain_pvcontrol_read(gc, XBT_NULL, domid); > - if (!state) state = ""; > - > - watchdog--; > - } > - > - /* > - * Guest appears to not be responding. Cancel the suspend > - * request. > - * > - * We re-read the suspend node and clear it within a > - * transaction in order to handle the case where we race > - * against the guest catching up and acknowledging the request > - * at the last minute. > - */ > - if (!strcmp(state, "suspend")) { > - LOG(ERROR, "guest didn''t acknowledge suspend, cancelling > request"); > - retry_transaction: > - t = xs_transaction_start(CTX->xsh); > - > - state = libxl__domain_pvcontrol_read(gc, t, domid); > - if (!state) state = ""; > - > - if (!strcmp(state, "suspend")) > - libxl__domain_pvcontrol_write(gc, t, domid, ""); > - > - if (!xs_transaction_end(CTX->xsh, t, 0)) > - if (errno == EAGAIN) > - goto retry_transaction; > - > - } > - > - /* > - * Final check for guest acknowledgement. The guest may have > - * acknowledged while we were cancelling the request in which > - * case we lost the race while cancelling and should continue. > - */ > - if (!strcmp(state, "suspend")) { > - LOG(ERROR, "guest didn''t acknowledge suspend, request > cancelled"); > - return 0; > - } > - > - LOG(DEBUG, "guest acknowledged suspend request"); > - dss->guest_responded = 1; > - } > - > - LOG(DEBUG, "wait for the guest to suspend"); > - watchdog = 60; > - while (watchdog > 0) { > - xc_domaininfo_t info; > - > - usleep(100000); > - ret = xc_domain_getinfolist(CTX->xch, domid, 1, &info); > - if (ret == 1 && info.domain == domid && > - (info.flags & XEN_DOMINF_shutdown)) { > - int shutdown_reason; > - > - shutdown_reason = (info.flags >> XEN_DOMINF_shutdownshift) > - & XEN_DOMINF_shutdownmask; > - if (shutdown_reason == SHUTDOWN_suspend) { > - LOG(DEBUG, "guest has suspended"); > - goto guest_suspended; > - } > - } > - > - watchdog--; > - } > - > - LOG(ERROR, "guest did not suspend"); > - return 0; > - > - guest_suspended: > - if (dss->hvm) { > - ret = libxl__domain_suspend_device_model(gc, dss); > - if (ret) { > - LOG(ERROR, "libxl__domain_suspend_device_model failed > ret=%d", ret); > - return 0; > - } > - } > - return 1; > + err: > + libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0); > } > > static inline char *physmap_path(libxl__gc *gc, uint32_t domid, > @@ -1223,16 +1333,6 @@ int libxl__toolstack_save(uint32_t domid > return 0; > } > > -static void libxl__domain_suspend_callback(void *data) > -{ > - libxl__save_helper_state *shs = data; > - libxl__egc *egc = shs->egc; > - libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); > - > - int ok = libxl__domain_suspend_callback_common(dss); > - libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); > -} > - > /*----- remus setup/teardown code -----*/ > void libxl__remus_setup_initiate(libxl__egc *egc, > libxl__domain_suspend_state *dss) > @@ -1280,18 +1380,14 @@ void libxl__remus_teardown_done(libxl__e > > /*----- remus callbacks -----*/ > > -static void libxl__remus_domain_suspend_callback(void *data) > +static int > libxl__remus_domain_suspend_callback(libxl__domain_suspend_state *dss) > { > - libxl__save_helper_state *shs = data; > - libxl__egc *egc = shs->egc; > - libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); > libxl__remus_ctx *remus_ctx = dss->remus_ctx; > STATE_AO_GC(dss->ao); > > /* REMUS TODO: Issue disk checkpoint reqs. */ > - int ok = libxl__domain_suspend_callback_common(dss); > - > - if (!remus_ctx->netbuf_ctx || !ok) goto out; > + if (!remus_ctx->netbuf_ctx) > + return 1; > > /* The domain was suspended successfully. Start a new network > * buffer for the next epoch. If this operation fails, then act > @@ -1300,9 +1396,9 @@ static void libxl__remus_domain_suspend_ > */ > if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid, > remus_ctx)) > - ok = 0; > - out: > - libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok); > + return 0; > + > + return 1; > } > > static int libxl__remus_domain_resume_callback(void *data) > @@ -1411,6 +1507,7 @@ void libxl__domain_suspend(libxl__egc *e > &dss->shs.callbacks.save.a; > > logdirty_init(&dss->logdirty); > + libxl__ev_time_init(&dss->timeout); > > switch (type) { > case LIBXL_DOMAIN_TYPE_HVM: { > @@ -1423,6 +1520,10 @@ void libxl__domain_suspend(libxl__egc *e > > vm_generationid_addr = (addr) ? strtoul(addr, NULL, 0) : 0; > dss->hvm = 1; > + xc_get_hvm_param(CTX->xch, domid, HVM_PARAM_CALLBACK_IRQ, > + &dss->hvm_pvdrv); > + xc_get_hvm_param(CTX->xch, domid, HVM_PARAM_ACPI_S_STATE, > + &dss->hvm_s_state); > break; > } > case LIBXL_DOMAIN_TYPE_PV: > @@ -1439,6 +1540,7 @@ void libxl__domain_suspend(libxl__egc *e > > dss->suspend_eventchn = -1; > dss->guest_responded = 0; > + dss->watchdog = 0; > dss->dm_savefile = libxl__device_model_savefile(gc, domid); > > if (dss->remus_ctx && dss->remus_ctx->compression) > @@ -1461,12 +1563,12 @@ void libxl__domain_suspend(libxl__egc *e > } > > memset(callbacks, 0, sizeof(*callbacks)); > + > + callbacks->suspend = libxl__domain_suspend_callback; > if (dss->remus_ctx != NULL) { > - callbacks->suspend = libxl__remus_domain_suspend_callback; > callbacks->postcopy = libxl__remus_domain_resume_callback; > callbacks->checkpoint = libxl__remus_domain_checkpoint_callback; > - } else > - callbacks->suspend = libxl__domain_suspend_callback; > + } > > callbacks->switch_qemu_logdirty > libxl__domain_suspend_common_switch_qemu_logdirty; > dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save; > diff -r 6a2a1bfffc66 -r c9b550e435d8 tools/libxl/libxl_internal.h > --- a/tools/libxl/libxl_internal.h Thu Nov 14 21:17:58 2013 -0800 > +++ b/tools/libxl/libxl_internal.h Thu Nov 14 21:18:04 2013 -0800 > @@ -2351,6 +2351,11 @@ struct libxl__domain_suspend_state { > int hvm; > int xcflags; > int guest_responded; > + int watchdog; > + int watchdog_starting; > + unsigned long hvm_s_state; > + unsigned long hvm_pvdrv; > + libxl__ev_time timeout; > const char *dm_savefile; > libxl__save_helper_state shs; > libxl__logdirty_switch logdirty; > @@ -2640,7 +2645,6 @@ _hidden void libxl__xc_domain_save_done( > void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc, > libxl__save_helper_state *shs, int > return_value); > > -_hidden int > libxl__domain_suspend_callback_common(libxl__domain_suspend_state*); > _hidden void libxl__domain_suspend_common_switch_qemu_logdirty > (int domid, unsigned int enable, void > *data); > _hidden int libxl__toolstack_save(uint32_t domid, uint8_t **buf, > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >ping! If there is feedback, I can get going right away. Meanwhile, I am trying to make the code use the nested-ao stuff, but I can send it as a separate patch on top of this. shriram _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Shriram Rajagopalan
2013-Nov-18 13:13 UTC
Re: [PATCH 0 of 7 V4] Remus/Libxl: Network buffering support
On Nov 14, 2013 11:52 PM, "Shriram Rajagopalan" <rshriram@cs.ubc.ca> wrote:> > This patch series adds support for network buffering in the Remus > codebase in libxl and makes the suspend callback asynchronous. > > I have tested the series with PV guests (with and without suspend > event channel). Would appreciate if anyone could test the series > with PV-HVM guests and pure HVM guests. > > Changes in V4: > > [1/7] *new*. This Ian Jackson''s patch -- making > libxl__domain_suspend_callback asynchronous > > [2/7] [was 1/5 in previous versions] Remove check for libnl > command line utils > > [3/7] [was 2/5 previously] minor nits > > [4/7] [was 3/5 previously] define LIBXL_HAVE_REMUS_NETBUF > in libxl.h > > [5/7] [was 4/5 previously] clean ups. Make the usleep in checkpoint > callback asynchronous > > [6/7] [was 5/5 previously] minor nits > > [7/7] *new*. Refactor libxl__domain_suspend_common_callback to > substitute usleep based wait loops with libxl''s timer events. > > Changes in V3: > [1/5] Fix redundant checks in configure scripts > (based on Ian Campbell''s suggestions) > > [2/5] Introduce locking in the script, during IFB setup. > Add xenstore paths used by netbuf scripts > to xenstore-paths.markdown > > [3/5] Hotplug scripts setup/teardown invocations are now asynchronous > following IanJ''s feedback. However, the invocations are still > sequential. > > [5/5] Allow per-domain specification of netbuffer scripts in xl remus > commmand. > > And minor nits throughout the series based on feedback from > the last version > > Changes in V2: > [1/5] Configure script will automatically enable/disable network > buffer support depending on the availability of the appropriate > libnl3 version. [If libnl3 is unavailable, a warning message will be > printed to let the user know that the feature has been disabled.] > > use macros from pkg.m4 instead of pkg-config commands > removed redundant checks for libnl3 libraries. > > [3,4/5] - Minor nits. > > Version 1: > > [1/5] Changes to autoconf scripts to check for libnl3. Add linker flags > to libxl Makefile. > > [2/5] External script to setup/teardown network buffering using libnl3''s > CLI. This script will be invoked by libxl before starting Remus. > The script''s main job is to bring up an IFB device with plug qdisc > attached to it. It then re-routes egress traffic from the guest''s > vif to the IFB device. > > [3/5] Libxl code to invoke the external setup script, followed by netlink > related setup to obtain a handle on the output buffers attached > to each vif. > > [4/5] Libxl interaction with network buffer module in the kernel via > libnl3 API. > > [5/5] xl cmdline switch to explicitly enable network buffering when > starting remus. > > > Few things to note: > > a) Based on previous email discussions, the setup/teardown task has > been moved to a hotplug style shell script which can be customized as > desired, instead of implementing it as C code inside libxl. > > b) Libnl3 is not available on NetBSD. Nor is it available on CentOS > (Linux). So I have made network buffering support an optional feature > so that it can be disabled if desired. > > c) NetBSD does not have libnl3. So I have put the setup script under > tools/hotplug/Linux folder. > > thanks > shriram >Ping!> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Nov-18 16:31 UTC
Re: [PATCH 2 of 7 V4] remus: add libnl3 dependency to autoconf scripts
Shriram Rajagopalan writes ("[PATCH 2 of 7 V4] remus: add libnl3 dependency to autoconf scripts"):> # HG changeset patch > # User Shriram Rajagopalan <rshriram@cs.ubc.ca> > # Date 1384492673 28800 > # Node ID f14fb4982ec89b30c04c6f4684de5056450c27a0 > # Parent 3143bed377a08bb9e87c7636e004dc371bb20338 > remus: add libnl3 dependency to autoconf scripts > > Libnl3 is required for controlling Remus network buffering. > This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts. > Also provide ability to configure tools without libnl3 support, that > is without network buffering support. > > Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>I think this looks OK. The committer should rerun autoconf. Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson
2013-Nov-18 16:34 UTC
Re: [PATCH 3 of 7 V4] tools/hotplug: Remus network buffering setup scripts
Shriram Rajagopalan writes ("[PATCH 3 of 7 V4] tools/hotplug: Remus network buffering setup scripts"):> tools/hotplug: Remus network buffering setup scripts > > This patch introduces remus-netbuf-setup hotplug script responsible for > setting up and tearing down the necessary infrastructure required for > network output buffering in Remus. This script is intended to be invoked > by libxl for each guest interface, when starting or stopping Remus. > > Apart from returning success/failure indication via the usual hotplug > entries in xenstore, this script also writes to xenstore, the name of > the IFB device to be used to control the vif''s network output. > > The script relies on libnl3 command line utilities to perform various > setup/teardown functions. The script is confined to Linux platforms only > since NetBSD does not seem to have libnl3.Thanks for this. I think this script should be introduced in the same patch as the arrangements which call it. Also both this, and the calling arrangements, should be CC''d to Roger Pau Monne who has been doing some work on hotplug script arrangements, so I have CC''d him on this message. Thanks, Ian.
Shriram Rajagopalan
2013-Nov-18 17:36 UTC
Re: [PATCH 7 of 7 V4] tools/libxl: refactor domain_suspend_callback code to be fully asynchronous
On Thu, Nov 14, 2013 at 11:47 PM, Shriram Rajagopalan <rshriram@cs.ubc.ca> wrote:> > # HG changeset patch > # User Shriram Rajagopalan <rshriram@cs.ubc.ca> > # Date 1384492684 28800 > # Node ID c9b550e435d8dce5302e77609849bf007b4f3c2e > # Parent 6a2a1bfffc666d5c7aaaa3532c709f2e38a86d05 > tools/libxl: refactor domain_suspend_callback code to be fully asynchronous > > libxl__domain_suspend_callback_common uses usleep calls, > while the caller libxl_domain_suspend_callback is asynchronous. > This patch refactors the libxl__domain_suspend__common code to use > AO facilities like libxl event loop timers instead of usleep calls. > > Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> >Ian, would you like this to be lumped together with your patch on asynchronous domain suspend, and sent out as a separate patch series?
Ian Jackson
2013-Nov-18 17:45 UTC
Re: [PATCH 7 of 7 V4] tools/libxl: refactor domain_suspend_callback code to be fully asynchronous
Shriram Rajagopalan writes ("Re: [Xen-devel] [PATCH 7 of 7 V4] tools/libxl: refactor domain_suspend_callback code to be fully asynchronous"):> On Thu, Nov 14, 2013 at 11:47 PM, Shriram Rajagopalan > <rshriram@cs.ubc.ca> wrote: > > # HG changeset patch > > # User Shriram Rajagopalan <rshriram@cs.ubc.ca> > > # Date 1384492684 28800 > > # Node ID c9b550e435d8dce5302e77609849bf007b4f3c2e > > # Parent 6a2a1bfffc666d5c7aaaa3532c709f2e38a86d05 > > tools/libxl: refactor domain_suspend_callback code to be fully asynchronous > > > > libxl__domain_suspend_callback_common uses usleep calls, > > while the caller libxl_domain_suspend_callback is asynchronous. > > This patch refactors the libxl__domain_suspend__common code to use > > AO facilities like libxl event loop timers instead of usleep calls....> Ian, would you like this to be lumped together with your patch on > asynchronous domain suspend, and sent out as a separate patch > series?If you''d like to see it in 4.4, yes, please. I suspect that your full Remus series isn''t going to make it, I''m afraid. But these two together are a bugfix so are less in trouble due to the codefreeze. Thanks, Ian.
Ian Jackson
2013-Nov-18 17:49 UTC
Re: [PATCH 7 of 7 V4] tools/libxl: refactor domain_suspend_callback code to be fully asynchronous
Shriram Rajagopalan writes ("[PATCH 7 of 7 V4] tools/libxl: refactor domain_suspend_callback code to be fully asynchronous"):> tools/libxl: refactor domain_suspend_callback code to be fully asynchronous > > libxl__domain_suspend_callback_common uses usleep calls, > while the caller libxl_domain_suspend_callback is asynchronous. > This patch refactors the libxl__domain_suspend__common code to use > AO facilities like libxl event loop timers instead of usleep calls.Thanks. This mixture of code motion, semantic changes, and changes to function boundaries, is very difficult to review. Do you think you can split this patch up into a small subseries somehow ? For example by adding pre-patches or post-patches (or both) containing the code motion, and separate patches with functional change but no bulk motion ? Ian.
Shriram Rajagopalan
2013-Nov-18 17:52 UTC
Re: [PATCH 7 of 7 V4] tools/libxl: refactor domain_suspend_callback code to be fully asynchronous
On Mon, Nov 18, 2013 at 11:49 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:> Shriram Rajagopalan writes ("[PATCH 7 of 7 V4] tools/libxl: refactor domain_suspend_callback code to be fully asynchronous"): >> tools/libxl: refactor domain_suspend_callback code to be fully asynchronous >> >> libxl__domain_suspend_callback_common uses usleep calls, >> while the caller libxl_domain_suspend_callback is asynchronous. >> This patch refactors the libxl__domain_suspend__common code to use >> AO facilities like libxl event loop timers instead of usleep calls. > > Thanks. This mixture of code motion, semantic changes, and changes > to function boundaries, is very difficult to review. > > Do you think you can split this patch up into a small subseries > somehow ? For example by adding pre-patches or post-patches (or both) > containing the code motion, and separate patches with functional > change but no bulk motion ? >Yep sure.> Ian. >
Shriram Rajagopalan
2013-Nov-18 17:55 UTC
Re: [PATCH 7 of 7 V4] tools/libxl: refactor domain_suspend_callback code to be fully asynchronous
On Mon, Nov 18, 2013 at 11:45 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:> Shriram Rajagopalan writes ("Re: [Xen-devel] [PATCH 7 of 7 V4] tools/libxl: refactor domain_suspend_callback code to be fully asynchronous"): >> On Thu, Nov 14, 2013 at 11:47 PM, Shriram Rajagopalan >> <rshriram@cs.ubc.ca> wrote: >> > # HG changeset patch >> > # User Shriram Rajagopalan <rshriram@cs.ubc.ca> >> > # Date 1384492684 28800 >> > # Node ID c9b550e435d8dce5302e77609849bf007b4f3c2e >> > # Parent 6a2a1bfffc666d5c7aaaa3532c709f2e38a86d05 >> > tools/libxl: refactor domain_suspend_callback code to be fully asynchronous >> > >> > libxl__domain_suspend_callback_common uses usleep calls, >> > while the caller libxl_domain_suspend_callback is asynchronous. >> > This patch refactors the libxl__domain_suspend__common code to use >> > AO facilities like libxl event loop timers instead of usleep calls. > ... >> Ian, would you like this to be lumped together with your patch on >> asynchronous domain suspend, and sent out as a separate patch >> series? > > If you''d like to see it in 4.4, yes, please. I suspect that your full > Remus series isn''t going to make it, I''m afraid.Why? I had posted three versions these way before the freeze date (oct 18). And the final feedback I had on the Remus series was to make the checkpoint sleep function use event timers, which has been addressed in V4.