Thanos Makatos
2012-Dec-04 18:19 UTC
[PATCH 0 of 9 RFC v2] blktap3: Introduce a small subset of blktap3 files
blktap3 is a disk back-end driver. It is based on blktap2 but does not require the blktap/blkback kernel modules as it allows tapdisk to talk directly to blkfront. This primarily simplifies maintenance, and _may_ lead to performance improvements. blktap3 is based on a blktap2 fork maintained mostly by Citrix (it lives in github), so these changes are also imported, apart from the blktap3 ones. I''ve organised my upstream effort as follows: 1. Upstream the smallest possible subset of blktap3 that will allow guest VMs to use RAW images backed by blktap3. This will enable early testing on the bits introduced by blktap3. 2. Upstream the remaining of blktap3, most notably back-end drivers, e.g. VHD etc. 3. Import bug fixes from blktap2 living in github. 4. Import new features and optimisations from blktap2 living in github, e.g. the mirroring plug-in. blktap3 is made of the following components: 1. blkfront (not a blktap3 component and already upstream): a virtual block device driver in the guest VM that receives block I/O requests and forwards them to tapdisk via the shared ring. 2. tapdisk: a user space process that receives block I/O requests from blkfront, translates them to whatever the current backing file format is (i.e. RAW, VHD, qcow etc.), and performs the actual I/O. Apart from block I/O requests, the tapdisk also allows basic management of each virtual block device, e.g. a device may be temporarily paused. tapdisk listens to a loopback socket for such commands. The tap-ctl utility (explained later) can be used for managing the tapdisk. 3. libtapback: a user space library that implements the functionality required to access the shared ring. It is used by tapdisk to obtain the block I/O requests forwarded by blkfront, and to produce the corresponding responses. This is the very "heart" of blktap3, it''s architecture will be thoroughly explained by the patch series that introduces it. 4. tapback: a user space daemon that acts as the back-end of each virtual block device: it monitors XenStore for the block front-end''s state changes, creates/destroys the shared ring, and instructs the tapdisk to connect to/disconnect from the shared ring. It also communicates to the block front-end required parameters (e.g. block device size in sectors) via XenStore. 5. libblktapctl: a user space library where the tapdisk management functions are implemented. 6. tap-ctl: a user space utility that allows management of the tapdisk, uses libblktapctl. The tapdisk is spawned/destroyed by libxl when a domain is created/destroyed, in the exact same way as in blktap2. libxl uses libblktapctl for this. This patch series introduces a small subset of files required by tapback (the tapback daemon is introduced by the next patch series): - basic blktap3 header files - a rudimentary implementation of libblktapctl. Only the bits required by tapback to manage the tapdisk are introduced, the rest of this library will be introduced by later patches. --- Changed since v1: * In all patches the patch message has been improved. * Patches 1, 5, and 6 use GPLv2. * Patch 0: Basic explanation of blktap3''s fundamental components. * Patch 9: Improved tools/blktap3/control/Makefile by moving hard coded paths to config/StdGNU.mk.
Thanos Makatos
2012-Dec-04 18:19 UTC
[PATCH 1 of 9 RFC v2] blktap3: Introduce blktap3 headers
This patch introduces basic blktap3 header files. diff --git a/tools/blktap3/include/blktap3.h b/tools/blktap3/include/blktap3.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/include/blktap3.h @@ -0,0 +1,47 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + * + * Commonly used headers and definitions. + */ + +#ifndef __BLKTAP_3_H__ +#define __BLKTAP_3_H__ + +#include "compiler.h" + +/* TODO remove from other files */ +#include <xen-external/bsd-sys-queue.h> + +#define BLKTAP3_CONTROL_NAME "blktap-control" +#define BLKTAP3_CONTROL_DIR "/var/run/"BLKTAP3_CONTROL_NAME +#define BLKTAP3_CONTROL_SOCKET "ctl" + +#define BLKTAP3_ENOSPC_SIGNAL_FILE "/var/run/tapdisk3-enospc" + +/* + * TODO They may have to change due to macro namespacing. + */ +#define TAILQ_MOVE_HEAD(node, src, dst, entry) \ + TAILQ_REMOVE(src, node, entry); \ + TAILQ_INSERT_HEAD(dst, node, entry); + +#define TAILQ_MOVE_TAIL(node, src, dst, entry) \ + TAILQ_REMOVE(src, node, entry); \ + TAILQ_INSERT_TAIL(dst, node, entry); + +#endif /* __BLKTAP_3_H__ */ diff --git a/tools/blktap3/include/compiler.h b/tools/blktap3/include/compiler.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/include/compiler.h @@ -0,0 +1,42 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 + * USA + */ + +#ifndef __COMPILER_H__ +#define __COMPILER_H__ + +#define likely(_cond) __builtin_expect(!!(_cond), 1) +#define unlikely(_cond) __builtin_expect(!!(_cond), 0) + +/* + * FIXME taken from list.h, do we need to mention anything about the license? + */ +#define containerof(_ptr, _type, _memb) \ + ((_type*)((void*)(_ptr) - offsetof(_type, _memb))) + +#define __printf(a, b) __attribute__((format(printf, a, b))) +#define __scanf(_f, _a) __attribute__((format (scanf, _f, _a))) + +#ifndef ARRAY_SIZE +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) +#endif /* ARRAY_SIZE */ + +#define UNUSED_PARAMETER(x) \ + (void)(x); + +#endif /* __COMPILER_H__ */
Thanos Makatos
2012-Dec-04 18:19 UTC
[PATCH 2 of 9 RFC v2] blktap3/libblktapctl: Introduce tapdisk message types and structures
This patch introduces function prototypes and structures that are implemented by libblktapctl (the library that allows libxl, tap-ctl, and the tapback daemon to manage a running tapdisk process). This file is based on the existing blktap2 file, with some changes coming from blktap2 living in github (the STATS message, support for mirroring). tapdisk_message_name is now neater and uses a look up table instead of a big switch. blktap3 introduces the following messages: - DISK_INFO: used by xenio to communicate to blkfront via XenStore the number of sectors and the sector size so that it can create the virtual block device. - XENBLKIF_CONNECT/DISCONNECT: used by the xenio daemon to instruct a running tapdisk process to connect to the ring. The tapdisk_message_blkif structure is used to convey such messages. The ATTACH message has been removed since it is now superseded by XENBLKIF_CONNECT (this probably means that DETACH must be removed as well in favour of XENBLKIF_DISCONNECT). However, it would probably be nicer to keep the ATTACH/DETACH identifiers in order to minimize interface changes. diff --git a/tools/blktap2/include/tapdisk-message.h b/tools/blktap3/include/tapdisk-message.h copy from tools/blktap2/include/tapdisk-message.h copy to tools/blktap3/include/tapdisk-message.h --- a/tools/blktap2/include/tapdisk-message.h +++ b/tools/blktap3/include/tapdisk-message.h @@ -36,29 +36,35 @@ #define TAPDISK_MESSAGE_MAX_MINORS \ ((TAPDISK_MESSAGE_MAX_PATH_LENGTH / sizeof(int)) - 1) -#define TAPDISK_MESSAGE_FLAG_SHARED 0x01 -#define TAPDISK_MESSAGE_FLAG_RDONLY 0x02 -#define TAPDISK_MESSAGE_FLAG_ADD_CACHE 0x04 -#define TAPDISK_MESSAGE_FLAG_VHD_INDEX 0x08 -#define TAPDISK_MESSAGE_FLAG_LOG_DIRTY 0x10 +#define TAPDISK_MESSAGE_FLAG_SHARED 0x001 +#define TAPDISK_MESSAGE_FLAG_RDONLY 0x002 +#define TAPDISK_MESSAGE_FLAG_ADD_CACHE 0x004 +#define TAPDISK_MESSAGE_FLAG_VHD_INDEX 0x008 +#define TAPDISK_MESSAGE_FLAG_LOG_DIRTY 0x010 +#define TAPDISK_MESSAGE_FLAG_ADD_LCACHE 0x020 +#define TAPDISK_MESSAGE_FLAG_REUSE_PRT 0x040 +#define TAPDISK_MESSAGE_FLAG_SECONDARY 0x080 +#define TAPDISK_MESSAGE_FLAG_STANDBY 0x100 typedef struct tapdisk_message tapdisk_message_t; -typedef uint8_t tapdisk_message_flag_t; +typedef uint32_t tapdisk_message_flag_t; typedef struct tapdisk_message_image tapdisk_message_image_t; typedef struct tapdisk_message_params tapdisk_message_params_t; typedef struct tapdisk_message_string tapdisk_message_string_t; typedef struct tapdisk_message_response tapdisk_message_response_t; typedef struct tapdisk_message_minors tapdisk_message_minors_t; typedef struct tapdisk_message_list tapdisk_message_list_t; +typedef struct tapdisk_message_stat tapdisk_message_stat_t; +typedef struct tapdisk_message_blkif tapdisk_message_blkif_t; struct tapdisk_message_params { tapdisk_message_flag_t flags; - uint8_t storage; uint32_t devnum; uint32_t domid; - uint16_t path_len; char path[TAPDISK_MESSAGE_MAX_PATH_LENGTH]; + uint32_t prt_devnum; + char secondary[TAPDISK_MESSAGE_MAX_PATH_LENGTH]; }; struct tapdisk_message_image { @@ -88,6 +94,55 @@ struct tapdisk_message_list { char path[TAPDISK_MESSAGE_MAX_PATH_LENGTH]; }; +struct tapdisk_message_stat { + uint16_t type; + uint16_t cookie; + size_t length; +}; + +/** + * Tapdisk message containing all the necessary information required for the + * tapdisk to connect to a guest''s blkfront. + */ +struct tapdisk_message_blkif { + /** + * The domain ID of the guest to connect to. + */ + uint32_t domid; + + /** + * The device ID of the virtual block device. + */ + uint32_t devid; + + /** + * Grant references for the shared ring. + * TODO Why 8 specifically? + */ + uint32_t gref[8]; + + /** + * Number of pages in the ring, expressed as a page order. + */ + uint32_t order; + + /** + * Protocol to use: native, 32 bit, or 64 bit. + * TODO Is this used for supporting a 32 bit guest on a 64 bit hypervisor? + */ + uint32_t proto; + + /** + * TODO Page pool? Can be NULL. + */ + char pool[TAPDISK_MESSAGE_STRING_LENGTH]; + + /** + * The event channel port. + */ + uint32_t port; +}; + struct tapdisk_message { uint16_t type; uint16_t cookie; @@ -100,10 +155,15 @@ struct tapdisk_message { tapdisk_message_minors_t minors; tapdisk_message_response_t response; tapdisk_message_list_t list; + tapdisk_message_stat_t info; + tapdisk_message_blkif_t blkif; } u; }; enum tapdisk_message_id { + /* + * TODO Why start from 1 and not from 0? + */ TAPDISK_MESSAGE_ERROR = 1, TAPDISK_MESSAGE_RUNTIME_ERROR, TAPDISK_MESSAGE_PID, @@ -120,84 +180,70 @@ enum tapdisk_message_id { TAPDISK_MESSAGE_CLOSE_RSP, TAPDISK_MESSAGE_DETACH, TAPDISK_MESSAGE_DETACH_RSP, - TAPDISK_MESSAGE_LIST_MINORS, - TAPDISK_MESSAGE_LIST_MINORS_RSP, + TAPDISK_MESSAGE_LIST_MINORS, /* TODO still valid? */ + TAPDISK_MESSAGE_LIST_MINORS_RSP, /* TODO still valid? */ TAPDISK_MESSAGE_LIST, TAPDISK_MESSAGE_LIST_RSP, + TAPDISK_MESSAGE_STATS, + TAPDISK_MESSAGE_STATS_RSP, TAPDISK_MESSAGE_FORCE_SHUTDOWN, + TAPDISK_MESSAGE_DISK_INFO, + TAPDISK_MESSAGE_DISK_INFO_RSP, + TAPDISK_MESSAGE_XENBLKIF_CONNECT, + TAPDISK_MESSAGE_XENBLKIF_CONNECT_RSP, + TAPDISK_MESSAGE_XENBLKIF_DISCONNECT, + TAPDISK_MESSAGE_XENBLKIF_DISCONNECT_RSP, TAPDISK_MESSAGE_EXIT, }; -static inline char * -tapdisk_message_name(enum tapdisk_message_id id) -{ - switch (id) { - case TAPDISK_MESSAGE_ERROR: - return "error"; +#define TAPDISK_MESSAGE_MAX TAPDISK_MESSAGE_EXIT - case TAPDISK_MESSAGE_PID: - return "pid"; +/** + * Retrieves a message''s human-readable representation. + * + * @param id the message ID to translate + * @return the name of the message + */ +static inline char const * +tapdisk_message_name(const enum tapdisk_message_id id) { + static char const *msg_names[(TAPDISK_MESSAGE_MAX + 1)] = { + [TAPDISK_MESSAGE_ERROR] = "error", + [TAPDISK_MESSAGE_RUNTIME_ERROR] = "runtime error", + [TAPDISK_MESSAGE_PID] = "pid", + [TAPDISK_MESSAGE_PID_RSP] = "pid response", + [TAPDISK_MESSAGE_OPEN] = "open", + [TAPDISK_MESSAGE_OPEN_RSP] = "open response", + [TAPDISK_MESSAGE_PAUSE] = "pause", + [TAPDISK_MESSAGE_PAUSE_RSP] = "pause response", + [TAPDISK_MESSAGE_RESUME] = "resume", + [TAPDISK_MESSAGE_RESUME_RSP] = "resume response", + [TAPDISK_MESSAGE_CLOSE] = "close", + [TAPDISK_MESSAGE_FORCE_SHUTDOWN] = "force shutdown", + [TAPDISK_MESSAGE_CLOSE_RSP] = "close response", + [TAPDISK_MESSAGE_ATTACH] = "attach", + [TAPDISK_MESSAGE_ATTACH_RSP] = "attach response", + [TAPDISK_MESSAGE_DETACH] = "detach", + [TAPDISK_MESSAGE_DETACH_RSP] = "detach response", + [TAPDISK_MESSAGE_LIST_MINORS] = "list minors", + [TAPDISK_MESSAGE_LIST_MINORS_RSP] = "list minors response", + [TAPDISK_MESSAGE_LIST] = "list", + [TAPDISK_MESSAGE_LIST_RSP] = "list response", + [TAPDISK_MESSAGE_STATS] = "stats", + [TAPDISK_MESSAGE_STATS_RSP] = "stats response", + [TAPDISK_MESSAGE_DISK_INFO] = "disk info", + [TAPDISK_MESSAGE_DISK_INFO_RSP] = "disk info response", + [TAPDISK_MESSAGE_XENBLKIF_CONNECT] = "blkif connect", + [TAPDISK_MESSAGE_XENBLKIF_CONNECT_RSP] = "blkif connect response", + [TAPDISK_MESSAGE_XENBLKIF_DISCONNECT] = "blkif disconnect", + [TAPDISK_MESSAGE_XENBLKIF_DISCONNECT_RSP] + = "blkif disconnect response", + [TAPDISK_MESSAGE_EXIT] = "exit" + }; - case TAPDISK_MESSAGE_PID_RSP: - return "pid response"; - - case TAPDISK_MESSAGE_OPEN: - return "open"; - - case TAPDISK_MESSAGE_OPEN_RSP: - return "open response"; - - case TAPDISK_MESSAGE_PAUSE: - return "pause"; - - case TAPDISK_MESSAGE_PAUSE_RSP: - return "pause response"; - - case TAPDISK_MESSAGE_RESUME: - return "resume"; - - case TAPDISK_MESSAGE_RESUME_RSP: - return "resume response"; - - case TAPDISK_MESSAGE_CLOSE: - return "close"; - - case TAPDISK_MESSAGE_FORCE_SHUTDOWN: - return "force shutdown"; - - case TAPDISK_MESSAGE_CLOSE_RSP: - return "close response"; - - case TAPDISK_MESSAGE_ATTACH: - return "attach"; - - case TAPDISK_MESSAGE_ATTACH_RSP: - return "attach response"; - - case TAPDISK_MESSAGE_DETACH: - return "detach"; - - case TAPDISK_MESSAGE_DETACH_RSP: - return "detach response"; - - case TAPDISK_MESSAGE_LIST_MINORS: - return "list minors"; - - case TAPDISK_MESSAGE_LIST_MINORS_RSP: - return "list minors response"; - - case TAPDISK_MESSAGE_LIST: - return "list"; - - case TAPDISK_MESSAGE_LIST_RSP: - return "list response"; - - case TAPDISK_MESSAGE_EXIT: - return "exit"; - - default: + if (id < 1 || id > TAPDISK_MESSAGE_MAX) { return "unknown"; } + return msg_names[id]; } -#endif +#endif /* _TAPDISK_MESSAGE_H_ */
Thanos Makatos
2012-Dec-04 18:19 UTC
[PATCH 3 of 9 RFC v2] blktap3/libblktapctl: Introduce the tapdisk control header
This patch introduces the header file where tapdisk control-related structures and functions are declared. This file is based on the existing blktap2, with most changes coming from blktap2 living in github. Linux lists are replaced by BSD tail queues. Few functions are partly documented, most of them are not documented at all, this will be addressed by a future patch. diff --git a/tools/blktap2/control/tap-ctl.h b/tools/blktap3/control/tap-ctl.h copy from tools/blktap2/control/tap-ctl.h copy to tools/blktap3/control/tap-ctl.h --- a/tools/blktap2/control/tap-ctl.h +++ b/tools/blktap3/control/tap-ctl.h @@ -30,72 +30,180 @@ #include <syslog.h> #include <errno.h> +#include <sys/time.h> #include <tapdisk-message.h> +#include "blktap3.h" +/* + * TODO These are private, move to an internal header. + */ extern int tap_ctl_debug; #ifdef TAPCTL -#define DBG(_f, _a...) \ - do { \ +#define DBG(_f, _a...) \ + do { \ if (tap_ctl_debug) \ printf(_f, ##_a); \ } while (0) #define DPRINTF(_f, _a...) syslog(LOG_INFO, _f, ##_a) #define EPRINTF(_f, _a...) syslog(LOG_ERR, "tap-err:%s: " _f, __func__, ##_a) -#define PERROR(_f, _a...) syslog(LOG_ERR, "tap-err:%s: " _f ": %s", __func__, ##_a, \ - strerror(errno)) +#define PERROR(_f, _a...) syslog(LOG_ERR, "tap-err:%s: " _f ": %s", \ + __func__, ##_a, strerror(errno)) #endif -void tap_ctl_version(int *major, int *minor); -int tap_ctl_kernel_version(int *major, int *minor); +/** + * Contains information about a tapdisk process. + */ +typedef struct tap_list { -int tap_ctl_check_blktap(const char **message); -int tap_ctl_check_version(const char **message); -int tap_ctl_check(const char **message); + /** + * The process ID. + */ + pid_t pid; + /** + * TODO + */ + int minor; + + /** + * State of the VBD, specified in drivers/tapdisk-vbd.h. + */ + int state; + + /** + * TODO + */ + char *type; + + /** + * /path/to/file + */ + char *path; + + /** + * for linked lists + */ + TAILQ_ENTRY(tap_list) entry; +} tap_list_t; + +TAILQ_HEAD(tqh_tap_list, tap_list); + +/** + * Iterate over a list of struct tap_list elements. + */ +#define tap_list_for_each_entry(_pos, _head) \ + TAILQ_FOREACH(_pos, _head, entry) + +/** + * Iterate over a list of struct tap_list elements allowing deletions without + * having to restart the iteration. + */ +#define tap_list_for_each_entry_safe(_pos, _n, _head) \ + TAILQ_FOREACH_SAFE(_pos, _head, entry, _n) + +/** + * Connects to a tapdisk. + * + * @param /path/to/file of the control socket (e.g. + * /var/run/blktap-control/ctl/<pid> + * @param socket output parameter that receives the connection + * @returns 0 on success, an error code otherwise + */ int tap_ctl_connect(const char *path, int *socket); -int tap_ctl_connect_id(int id, int *socket); -int tap_ctl_read_message(int fd, tapdisk_message_t *message, int timeout); -int tap_ctl_write_message(int fd, tapdisk_message_t *message, int timeout); -int tap_ctl_send_and_receive(int fd, tapdisk_message_t *message, int timeout); -int tap_ctl_connect_send_and_receive(int id, - tapdisk_message_t *message, int timeout); + +/** + * Connects to a tapdisk. + * + * @param id the process ID of the tapdisk to connect to + * @param socket output parameter that receives the connection + * @returns 0 on success, an error code otherwise + */ +int tap_ctl_connect_id(const int id, int *socket); + +/** + * Reads from the tapdisk connection to the buffer. + * + * @param fd the file descriptor of the socket to read from + * @param buf buffer that receives the output + * @param sz size, in bytes, of the buffer + * @param timeout (optional) specifies the maximum time to wait for reading + * @returns 0 on success, an error code otherwise + */ +int tap_ctl_read_raw(const int fd, void *buf, const size_t sz, + struct timeval *timeout); + +int tap_ctl_read_message(int fd, tapdisk_message_t * message, + struct timeval *timeout); + +int tap_ctl_write_message(int fd, tapdisk_message_t * message, + struct timeval *timeout); + +int tap_ctl_send_and_receive(int fd, tapdisk_message_t * message, + struct timeval *timeout); + +int tap_ctl_connect_send_and_receive(int id, tapdisk_message_t * message, + struct timeval *timeout); + char *tap_ctl_socket_name(int id); -typedef struct { - int id; - pid_t pid; - int minor; - int state; - char *type; - char *path; -} tap_list_t; +int tap_ctl_list_pid(pid_t pid, struct tqh_tap_list *list); -int tap_ctl_get_driver_id(const char *handle); +/** + * Retrieves a list of all tapdisks. + * + * @param list output parameter that receives the list of tapdisks + * @returns 0 on success, an error code otherwise + */ +int tap_ctl_list(struct tqh_tap_list *list); -int tap_ctl_list(tap_list_t ***list); -void tap_ctl_free_list(tap_list_t **list); -int tap_ctl_find(const char *type, const char *path, tap_list_t *tap); +/** + * Deallocates a list of struct tap_list. + * + * @param list the tapdisk information structure to deallocate. + */ +void tap_ctl_list_free(struct tqh_tap_list *list); -int tap_ctl_allocate(int *minor, char **devname); -int tap_ctl_free(const int minor); +/** + * Creates a tapdisk process. + * + * TODO document parameters + * @param params + * @param flags + * @param prt_minor + * @param secondary + * @returns 0 on success, en error code otherwise + */ +int tap_ctl_create(const char *params, int flags, int prt_minor, + char *secondary); -int tap_ctl_create(const char *params, char **devname); -int tap_ctl_destroy(const int id, const int minor); +int tap_ctl_destroy(const int id, const int minor, int force, + struct timeval *timeout); +/* + * TODO The following functions are not currently used by anything else + * other than the tapdisk itself. Move to a private header? + */ int tap_ctl_spawn(void); pid_t tap_ctl_get_pid(const int id); int tap_ctl_attach(const int id, const int minor); int tap_ctl_detach(const int id, const int minor); -int tap_ctl_open(const int id, const int minor, const char *params); -int tap_ctl_close(const int id, const int minor, const int force); +int tap_ctl_open(const int id, const int minor, const char *params, + int flags, const int prt_minor, const char *secondary); -int tap_ctl_pause(const int id, const int minor); +int tap_ctl_close(const int id, const int minor, const int force, + struct timeval *timeout); + +int tap_ctl_pause(const int id, const int minor, struct timeval + *timeout); + int tap_ctl_unpause(const int id, const int minor, const char *params); -int tap_ctl_blk_major(void); +ssize_t tap_ctl_stats(pid_t pid, int minor, char *buf, size_t size); -#endif +int tap_ctl_stats_fwrite(pid_t pid, int minor, FILE * out); + +#endif /* __TAP_CTL_H__ */
Thanos Makatos
2012-Dec-04 18:19 UTC
[PATCH 4 of 9 RFC v2] blktap3/libblktapctl: Introduce listing running tapdisks functionality
This patch introduces tap-ctl-list.c, the file where listing tapdisks functionality is implemented. It is based on the existing blktap2 file with most changes coming from blktap2 living in github. I have not examined the changes in detail but it seems they are minor. Function tap_ctl_list needs to be amended because minors are not retrieved from sysfs (there''s no sysfs blktap directory any more). diff --git a/tools/blktap2/control/tap-ctl-list.c b/tools/blktap3/control/tap-ctl-list.c copy from tools/blktap2/control/tap-ctl-list.c copy to tools/blktap3/control/tap-ctl-list.c --- a/tools/blktap2/control/tap-ctl-list.c +++ b/tools/blktap3/control/tap-ctl-list.c @@ -34,23 +34,47 @@ #include <glob.h> #include "tap-ctl.h" -#include "blktap2.h" -#include "list.h" +#include "blktap3.h" + +/** + * Allocates and initializes a tap_list_t. + */ +static tap_list_t * +_tap_list_alloc(void) +{ + const size_t sz = sizeof(tap_list_t); + tap_list_t *tl; + + tl = malloc(sz); + if (!tl) + return NULL; + + tl->pid = -1; + tl->minor = -1; + tl->state = -1; + tl->type = NULL; + tl->path = NULL; + + return tl; +} static void -free_list(tap_list_t *entry) +_tap_list_free(tap_list_t * tl, struct tqh_tap_list *list) { - if (entry->type) { - free(entry->type); - entry->type = NULL; + if (tl->type) { + free(tl->type); + tl->type = NULL; } - if (entry->path) { - free(entry->path); - entry->path = NULL; + if (tl->path) { + free(tl->path); + tl->path = NULL; } - free(entry); + if (list) + TAILQ_REMOVE(list, tl, entry); + + free(tl); } int @@ -66,7 +90,7 @@ int len = ptr - params; *type = strndup(params, len); - *path = strdup(params + len + 1); + *path = strdup(params + len + 1); if (!*type || !*path) { free(*type); @@ -81,102 +105,26 @@ int return 0; } -static int -init_list(tap_list_t *entry, - int tap_id, pid_t tap_pid, int vbd_minor, int vbd_state, - const char *params) +void +tap_ctl_list_free(struct tqh_tap_list *list) { - int err = 0; + tap_list_t *tl, *n; - entry->id = tap_id; - entry->pid = tap_pid; - entry->minor = vbd_minor; - entry->state = vbd_state; - - if (params) - err = _parse_params(params, &entry->type, &entry->path); - - return err; -} - -void -tap_ctl_free_list(tap_list_t **list) -{ - tap_list_t **_entry; - - for (_entry = list; *_entry != NULL; ++_entry) - free_list(*_entry); - - free(list); -} - -static tap_list_t** -tap_ctl_alloc_list(int n) -{ - tap_list_t **list, *entry; - size_t size; - int i; - - size = sizeof(tap_list_t*) * (n+1); - list = malloc(size); - if (!list) - goto fail; - - memset(list, 0, size); - - for (i = 0; i < n; ++i) { - tap_list_t *entry; - - entry = malloc(sizeof(tap_list_t)); - if (!entry) - goto fail; - - memset(entry, 0, sizeof(tap_list_t)); - - list[i] = entry; - } - - return list; - -fail: - if (list) - tap_ctl_free_list(list); - - return NULL; -} - -static int -tap_ctl_list_length(const tap_list_t **list) -{ - const tap_list_t **_entry; - int n; - - n = 0; - for (_entry = list; *_entry != NULL; ++_entry) - n++; - - return n; -} - -static int -_tap_minor_cmp(const void *a, const void *b) -{ - return *(int*)a - *(int*)b; + tap_list_for_each_entry_safe(tl, n, list) + _tap_list_free(tl, list); } int -_tap_ctl_find_minors(int **_minorv) +_tap_ctl_find_tapdisks(struct tqh_tap_list *list) { - glob_t glbuf = { 0 }; + glob_t glbuf = { 0 }; const char *pattern, *format; - int *minorv = NULL, n_minors = 0; - int err, i; + int err, i, n_taps = 0; - pattern = BLKTAP2_SYSFS_DIR"/blktap*"; - format = BLKTAP2_SYSFS_DIR"/blktap%d"; + pattern = BLKTAP3_CONTROL_DIR "/" BLKTAP3_CONTROL_SOCKET "*"; + format = BLKTAP3_CONTROL_DIR "/" BLKTAP3_CONTROL_SOCKET "%d"; - n_minors = 0; - minorv = NULL; + TAILQ_INIT(list); err = glob(pattern, 0, NULL, &glbuf); switch (err) { @@ -186,337 +134,231 @@ int case GLOB_ABORTED: case GLOB_NOSPACE: err = -errno; - EPRINTF("%s: glob failed, err %d", pattern, err); - goto fail; - } - - minorv = malloc(sizeof(int) * glbuf.gl_pathc); - if (!minorv) { - err = -errno; + EPRINTF("%s: glob failed: %s", pattern, strerror(err)); goto fail; } for (i = 0; i < glbuf.gl_pathc; ++i) { + tap_list_t *tl; int n; - n = sscanf(glbuf.gl_pathv[i], format, &minorv[n_minors]); + tl = _tap_list_alloc(); + if (!tl) { + err = -ENOMEM; + goto fail; + } + + n = sscanf(glbuf.gl_pathv[i], format, &tl->pid); if (n != 1) - continue; + goto skip; - n_minors++; + tl->pid = tap_ctl_get_pid(tl->pid); + if (tl->pid < 0) + goto skip; + + TAILQ_INSERT_TAIL(list, tl, entry); + n_taps++; + continue; + + skip: + _tap_list_free(tl, NULL); } - qsort(minorv, n_minors, sizeof(int), _tap_minor_cmp); - -done: - *_minorv = minorv; + done: err = 0; - -out: - if (glbuf.gl_pathv) - globfree(&glbuf); - - return err ? : n_minors; - -fail: - if (minorv) - free(minorv); - - goto out; -} - -struct tapdisk { - int id; - pid_t pid; - struct list_head list; -}; - -static int -_tap_tapdisk_cmp(const void *a, const void *b) -{ - return ((struct tapdisk*)a)->id - ((struct tapdisk*)b)->id; -} - -int -_tap_ctl_find_tapdisks(struct tapdisk **_tapv) -{ - glob_t glbuf = { 0 }; - const char *pattern, *format; - struct tapdisk *tapv = NULL; - int err, i, n_taps = 0; - - pattern = BLKTAP2_CONTROL_DIR"/"BLKTAP2_CONTROL_SOCKET"*"; - format = BLKTAP2_CONTROL_DIR"/"BLKTAP2_CONTROL_SOCKET"%d"; - - n_taps = 0; - tapv = NULL; - - err = glob(pattern, 0, NULL, &glbuf); - switch (err) { - case GLOB_NOMATCH: - goto done; - - case GLOB_ABORTED: - case GLOB_NOSPACE: - err = -errno; - EPRINTF("%s: glob failed, err %d", pattern, err); - goto fail; - } - - tapv = malloc(sizeof(struct tapdisk) * glbuf.gl_pathc); - if (!tapv) { - err = -errno; - goto fail; - } - - for (i = 0; i < glbuf.gl_pathc; ++i) { - struct tapdisk *tap; - int n; - - tap = &tapv[n_taps]; - - err = sscanf(glbuf.gl_pathv[i], format, &tap->id); - if (err != 1) - continue; - - tap->pid = tap_ctl_get_pid(tap->id); - if (tap->pid < 0) - continue; - - n_taps++; - } - - qsort(tapv, n_taps, sizeof(struct tapdisk), _tap_tapdisk_cmp); - - for (i = 0; i < n_taps; ++i) - INIT_LIST_HEAD(&tapv[i].list); - -done: - *_tapv = tapv; - err = 0; - -out: + out: if (glbuf.gl_pathv) globfree(&glbuf); return err ? : n_taps; -fail: - if (tapv) - free(tapv); - + fail: + tap_ctl_list_free(list); goto out; } -struct tapdisk_list { - int minor; - int state; - char *params; - struct list_head entry; -}; - -int -_tap_ctl_list_tapdisk(int id, struct list_head *_list) +/** + * Retrieves all the VBDs a tapdisk is serving. + * + * @param pid the process ID of the tapdisk whose VBDs should be retrieved + * @param list output parameter that receives the list of VBD + * @returns 0 on success, an error code otherwise + */ +static int +_tap_ctl_list_tapdisk(pid_t pid, struct tqh_tap_list *list) { + struct timeval timeout = {.tv_sec = 10,.tv_usec = 0 }; tapdisk_message_t message; - struct list_head list; - struct tapdisk_list *tl, *next; + tap_list_t *tl; int err, sfd; - err = tap_ctl_connect_id(id, &sfd); + err = tap_ctl_connect_id(pid, &sfd); if (err) return err; memset(&message, 0, sizeof(message)); - message.type = TAPDISK_MESSAGE_LIST; + message.type = TAPDISK_MESSAGE_LIST; message.cookie = -1; - err = tap_ctl_write_message(sfd, &message, 2); + err = tap_ctl_write_message(sfd, &message, &timeout); if (err) return err; - INIT_LIST_HEAD(&list); + TAILQ_INIT(list); + do { - err = tap_ctl_read_message(sfd, &message, 2); + err = tap_ctl_read_message(sfd, &message, &timeout); if (err) { err = -EPROTO; - break; + goto fail; } if (message.u.list.count == 0) break; - tl = malloc(sizeof(struct tapdisk_list)); + tl = _tap_list_alloc(); if (!tl) { err = -ENOMEM; - break; + goto fail; } - tl->minor = message.u.list.minor; - tl->state = message.u.list.state; + tl->pid = pid; + tl->minor = message.u.list.minor; + tl->state = message.u.list.state; + if (message.u.list.path[0] != 0) { - tl->params = strndup(message.u.list.path, - sizeof(message.u.list.path)); - if (!tl->params) { - err = -errno; - break; + err = _parse_params(message.u.list.path, &tl->type, &tl->path); + if (err) { + _tap_list_free(tl, NULL); + goto fail; } - } else - tl->params = NULL; + } - list_add(&tl->entry, &list); + TAILQ_INSERT_HEAD(list, tl, entry); } while (1); - if (err) - list_for_each_entry_safe(tl, next, &list, entry) { - list_del(&tl->entry); - free(tl->params); - free(tl); - } + err = 0; + out: + close(sfd); + return 0; - close(sfd); - list_splice(&list, _list); - return err; -} - -void -_tap_ctl_free_tapdisks(struct tapdisk *tapv, int n_taps) -{ - struct tapdisk *tap; - - for (tap = tapv; tap < &tapv[n_taps]; ++tap) { - struct tapdisk_list *tl, *next; - - list_for_each_entry_safe(tl, next, &tap->list, entry) { - free(tl->params); - free(tl); - } - } - - free(tapv); + fail: + tap_ctl_list_free(list); + goto out; } int -_tap_list_join3(int n_minors, int *minorv, int n_taps, struct tapdisk *tapv, - tap_list_t ***_list) +tap_ctl_list(struct tqh_tap_list *list) { - tap_list_t **list, **_entry, *entry; - int i, _m, err; + struct tqh_tap_list minors, tapdisks, vbds; + tap_list_t *t, *next_t, *v, *next_v, *m, *next_m; + int err; - list = tap_ctl_alloc_list(n_minors + n_taps); - if (!list) { - err = -ENOMEM; + /* + * Find all minors, find all tapdisks, then list all minors + * they attached to. Output is a 3-way outer join. + */ + TAILQ_INIT(&minors); + + /* + * TODO There''s no blktap sysfs entry anymore, get rid of minors and + * rationalise the rest of this function. + */ +#if 0 + err = _tap_ctl_find_minors(&minors); + if (err < 0) { + EPRINTF("error finding minors: %s\n", strerror(err)); + goto fail; + } +#endif + + err = _tap_ctl_find_tapdisks(&tapdisks); + if (err < 0) { + EPRINTF("error finding tapdisks: %s\n", strerror(err)); goto fail; } - _entry = list; + TAILQ_INIT(list); - for (i = 0; i < n_taps; ++i) { - struct tapdisk *tap = &tapv[i]; - struct tapdisk_list *tl; + tap_list_for_each_entry_safe(t, next_t, &tapdisks) { - /* orphaned tapdisk */ - if (list_empty(&tap->list)) { - err = init_list(*_entry++, tap->id, tap->pid, -1, -1, NULL); - if (err) - goto fail; + err = _tap_ctl_list_tapdisk(t->pid, &vbds); + + /* + * TODO Don''t just swallow the error, print a warning, at least. + */ + if (err || TAILQ_EMPTY(&vbds)) { + TAILQ_MOVE_TAIL(t, &tapdisks, list, entry); continue; } - list_for_each_entry(tl, &tap->list, entry) { + tap_list_for_each_entry_safe(v, next_v, &vbds) { - err = init_list(*_entry++, - tap->id, tap->pid, - tl->minor, tl->state, tl->params); - if (err) - goto fail; + tap_list_for_each_entry_safe(m, next_m, &minors) + if (m->minor == v->minor) { + _tap_list_free(m, &minors); + break; + } - if (tl->minor >= 0) { - /* clear minor */ - for (_m = 0; _m < n_minors; ++_m) { - if (minorv[_m] == tl->minor) { - minorv[_m] = -1; - break; - } - } - } + TAILQ_MOVE_TAIL(v, &vbds, list, entry); } + + _tap_list_free(t, &tapdisks); } /* orphaned minors */ - for (_m = 0; _m < n_minors; ++_m) { - int minor = minorv[_m]; - if (minor >= 0) { - err = init_list(*_entry++, -1, -1, minor, -1, NULL); - if (err) - goto fail; - } - } - - /* free extraneous list entries */ - for (; *_entry != NULL; ++entry) { - free_list(*_entry); - *_entry = NULL; - } - - *_list = list; + TAILQ_CONCAT(list, &minors, entry); return 0; fail: - if (list) - tap_ctl_free_list(list); + tap_ctl_list_free(list); + + tap_ctl_list_free(&vbds); + tap_ctl_list_free(&tapdisks); + tap_ctl_list_free(&minors); return err; } int -tap_ctl_list(tap_list_t ***list) +tap_ctl_list_pid(pid_t pid, struct tqh_tap_list *list) { - int n_taps, n_minors, err, *minorv; - struct tapdisk *tapv, *tap; + tap_list_t *t; + int err; - n_taps = -1; - n_minors = -1; + t = _tap_list_alloc(); + if (!t) + return -ENOMEM; - err = n_minors = _tap_ctl_find_minors(&minorv); - if (err < 0) - goto out; - - err = n_taps = _tap_ctl_find_tapdisks(&tapv); - if (err < 0) - goto out; - - for (tap = tapv; tap < &tapv[n_taps]; ++tap) { - err = _tap_ctl_list_tapdisk(tap->id, &tap->list); - if (err) - goto out; + t->pid = tap_ctl_get_pid(pid); + if (t->pid < 0) { + _tap_list_free(t, NULL); + return 0; } - err = _tap_list_join3(n_minors, minorv, n_taps, tapv, list); + err = _tap_ctl_list_tapdisk(t->pid, list); -out: - if (n_taps > 0) - _tap_ctl_free_tapdisks(tapv, n_taps); + if (err || TAILQ_EMPTY(list)) + TAILQ_INSERT_TAIL(list, t, entry); - if (n_minors > 0) - free(minorv); - - return err; + return 0; } int -tap_ctl_find(const char *type, const char *path, tap_list_t *tap) +tap_ctl_find_minor(const char *type, const char *path) { - tap_list_t **list, **_entry; - int ret = -ENOENT, err; + struct tqh_tap_list list = TAILQ_HEAD_INITIALIZER(list); + tap_list_t *entry; + int minor, err; err = tap_ctl_list(&list); if (err) return err; - for (_entry = list; *_entry != NULL; ++_entry) { - tap_list_t *entry = *_entry; + minor = -1; + + tap_list_for_each_entry(entry, &list) { if (type && (!entry->type || strcmp(entry->type, type))) continue; @@ -524,13 +366,11 @@ tap_ctl_find(const char *type, const cha if (path && (!entry->path || strcmp(entry->path, path))) continue; - *tap = *entry; - tap->type = tap->path = NULL; - ret = 0; + minor = entry->minor; break; } - tap_ctl_free_list(list); + tap_ctl_list_free(&list); - return ret; + return minor >= 0 ? minor : -ENOENT; }
Thanos Makatos
2012-Dec-04 18:19 UTC
[PATCH 5 of 9 RFC v2] blktap3/libblktapctl: Introduce functionality used by tapback to instruct tapdisk to connect to the sring
This patch introduces functions tap_ctl_connect_xenblkif and tap_ctl_disconnect_xenblkif, that are used by the tapback daemon to instruct a running tapdisk process to connect to/disconnect from the shared ring. diff --git a/tools/blktap3/control/tap-ctl-xen.c b/tools/blktap3/control/tap-ctl-xen.c new file mode 100644 --- /dev/null +++ b/tools/blktap3/control/tap-ctl-xen.c @@ -0,0 +1,80 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#include <stdio.h> +#include <string.h> + +#include "tap-ctl.h" +#include "tap-ctl-xen.h" + +int tap_ctl_connect_xenblkif(pid_t pid, int minor, domid_t domid, + int devid, const grant_ref_t * grefs, int order, + const evtchn_port_t port, int proto, const char *pool) +{ + tapdisk_message_t message; + int i, err; + + memset(&message, 0, sizeof(message)); + message.type = TAPDISK_MESSAGE_XENBLKIF_CONNECT; + message.cookie = minor; + + message.u.blkif.domid = domid; + message.u.blkif.devid = devid; + for (i = 0; i < 1 << order; i++) + message.u.blkif.gref[i] = grefs[i]; + message.u.blkif.order = order; + message.u.blkif.port = port; + message.u.blkif.proto = proto; + if (pool) + strncpy(message.u.blkif.pool, pool, sizeof(message.u.blkif.pool)); + else + message.u.blkif.pool[0] = 0; + + err = tap_ctl_connect_send_and_receive(pid, &message, NULL); + if (err) + return err; + + if (message.type == TAPDISK_MESSAGE_XENBLKIF_CONNECT_RSP) + err = -message.u.response.error; + else + err = -EINVAL; + + return err; +} + +int tap_ctl_disconnect_xenblkif(pid_t pid, int minor, domid_t domid, + int devid, struct timeval *timeout) +{ + tapdisk_message_t message; + int err; + + memset(&message, 0, sizeof(message)); + message.type = TAPDISK_MESSAGE_XENBLKIF_DISCONNECT; + message.cookie = minor; + message.u.blkif.domid = domid; + message.u.blkif.devid = devid; + + err = tap_ctl_connect_send_and_receive(pid, &message, timeout); + if (message.type == TAPDISK_MESSAGE_XENBLKIF_CONNECT_RSP) + err = -message.u.response.error; + else + err = -EINVAL; + + return err; +} diff --git a/tools/blktap3/control/tap-ctl-xen.h b/tools/blktap3/control/tap-ctl-xen.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/control/tap-ctl-xen.h @@ -0,0 +1,61 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __TAP_CTL_XEN_H__ +#define __TAP_CTL_XEN_H__ + +#include <xen/xen.h> +#include <xen/grant_table.h> +#include <xen/event_channel.h> + +/** + * Instructs a tapdisk to connect to the shared ring. + * + * TODO further explain parameters + * + * @param pid the process ID of the tapdisk that should connect to the shared + * ring + * @param minor the minor number of the virtual block device + * @param domid the domain ID of the guest VM + * @param devid the device ID + * @param grefs the grant references + * @param order number of grant references, expressed as a 2''s order + * @param port event channel port + * @param proto the protocol: native (XENIO_BLKIF_PROTO_NATIVE), + * x86 (XENIO_BLKIF_PROTO_X86_32), or x64 (XENIO_BLKIF_PROTO_X86_64) + * @param pool TODO page pool? + * @returns 0 on success, an error code otherwise + */ +int tap_ctl_connect_xenblkif(pid_t pid, int minor, domid_t domid, int devid, + const grant_ref_t * grefs, int order, evtchn_port_t port, int proto, + const char *pool); + +/** + * Instructs a tapdisk to disconnect from the shared ring. + * + * @param pid the process ID of the tapdisk that should disconnect + * @param minor the minor number of the virtual block device + * @param domid the ID of the guest VM + * @param devid the device ID of the virtual block device + * @param timeout timeout to wait, if NULL the function will wait indefinitely + */ +int tap_ctl_disconnect_xenblkif(pid_t pid, int minor, domid_t domid, + int devid, struct timeval *timeout); + +#endif /* __TAP_CTL_XEN_H__ */
Thanos Makatos
2012-Dec-04 18:19 UTC
[PATCH 6 of 9 RFC v2] blktap3/libblktapctl: Introduce block device control information retrieval functionality
This patch introduces function tap_ctl_info. It is used by the tapback daemon to retrieve the size of the block device, as well as the sector size, in order to communicate it to blkfront so that it can create the virtual block device. diff --git a/tools/blktap3/control/tap-ctl-info.c b/tools/blktap3/control/tap-ctl-info.c new file mode 100644 --- /dev/null +++ b/tools/blktap3/control/tap-ctl-info.c @@ -0,0 +1,47 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#include <stdio.h> +#include <string.h> + +#include "tap-ctl.h" +#include "tap-ctl-info.h" + +int tap_ctl_info(pid_t pid, int minor, unsigned long long *sectors, + unsigned int *sector_size, unsigned int *info) +{ + tapdisk_message_t message; + int err; + + memset(&message, 0, sizeof(message)); + message.type = TAPDISK_MESSAGE_DISK_INFO; + message.cookie = minor; + + err = tap_ctl_connect_send_and_receive(pid, &message, NULL); + if (err) + return err; + + if (message.type != TAPDISK_MESSAGE_DISK_INFO_RSP) + return -EINVAL; + + *sectors = message.u.image.sectors; + *sector_size = message.u.image.sector_size; + *info = message.u.image.info; + return err; +} diff --git a/tools/blktap3/control/tap-ctl-info.h b/tools/blktap3/control/tap-ctl-info.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/control/tap-ctl-info.h @@ -0,0 +1,36 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __TAP_CTL_INFO_H__ +#define __TAP_CTL_INFO_H__ + +/** + * Retrieves virtual disk information from a tapdisk. + * + * @pid the process ID of the tapdisk process + * @minor the minor device number + * @sectors output parameter that receives the number of sectors + * @sector_size output parameter that receives the size of the sector + * @info TODO ? + * + */ +int tap_ctl_info(pid_t pid, int minor, unsigned long long *sectors, + unsigned int *sector_size, unsigned int *info); + +#endif /* __TAP_CTL_INFO_H__ */
Thanos Makatos
2012-Dec-04 18:19 UTC
[PATCH 7 of 9 RFC v2] blktap3/libblktapctl: Introduce tapdisk message exchange functionality
This patch introduces file conrol/tap-ctl-ipc.c, where the functionality of talking to a tapdisk process is implemented. This file is imported from the existing blktap2 implementation, with most changes coming from blktap2 living in github. diff --git a/tools/blktap2/control/tap-ctl-ipc.c b/tools/blktap3/control/tap-ctl-ipc.c copy from tools/blktap2/control/tap-ctl-ipc.c copy to tools/blktap3/control/tap-ctl-ipc.c --- a/tools/blktap2/control/tap-ctl-ipc.c +++ b/tools/blktap3/control/tap-ctl-ipc.c @@ -30,44 +30,33 @@ #include <unistd.h> #include <stdlib.h> #include <string.h> +#include <fcntl.h> #include <sys/un.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/socket.h> #include "tap-ctl.h" -#include "blktap2.h" +#include "blktap3.h" int tap_ctl_debug = 0; int -tap_ctl_read_message(int fd, tapdisk_message_t *message, int timeout) +tap_ctl_read_raw(int fd, void *buf, size_t size, struct timeval *timeout) { fd_set readfds; - int ret, len, offset; - struct timeval tv, *t; + size_t offset = 0; + int ret; - t = NULL; - offset = 0; - len = sizeof(tapdisk_message_t); - - if (timeout) { - tv.tv_sec = timeout; - tv.tv_usec = 0; - t = &tv; - } - - memset(message, 0, sizeof(tapdisk_message_t)); - - while (offset < len) { + while (offset < size) { FD_ZERO(&readfds); FD_SET(fd, &readfds); - ret = select(fd + 1, &readfds, NULL, NULL, t); + ret = select(fd + 1, &readfds, NULL, NULL, timeout); if (ret == -1) break; else if (FD_ISSET(fd, &readfds)) { - ret = read(fd, message + offset, len - offset); + ret = read(fd, buf + offset, size - offset); if (ret <= 0) break; offset += ret; @@ -75,34 +64,24 @@ tap_ctl_read_message(int fd, tapdisk_mes break; } - if (offset != len) { - EPRINTF("failure reading message\n"); + if (offset != size) { + EPRINTF("failure reading data %zd/%zd\n", offset, size); return -EIO; } - DBG("received ''%s'' message (uuid = %u)\n", - tapdisk_message_name(message->type), message->cookie); - return 0; } int -tap_ctl_write_message(int fd, tapdisk_message_t *message, int timeout) +tap_ctl_write_message(int fd, tapdisk_message_t * message, + struct timeval *timeout) { fd_set writefds; int ret, len, offset; - struct timeval tv, *t; - t = NULL; offset = 0; len = sizeof(tapdisk_message_t); - if (timeout) { - tv.tv_sec = timeout; - tv.tv_usec = 0; - t = &tv; - } - DBG("sending ''%s'' message (uuid = %u)\n", tapdisk_message_name(message->type), message->cookie); @@ -113,7 +92,7 @@ tap_ctl_write_message(int fd, tapdisk_me /* we don''t bother reinitializing tv. at worst, it will wait a * bit more time than expected. */ - ret = select(fd + 1, NULL, &writefds, NULL, t); + ret = select(fd + 1, NULL, &writefds, NULL, timeout); if (ret == -1) break; else if (FD_ISSET(fd, &writefds)) { @@ -134,7 +113,8 @@ tap_ctl_write_message(int fd, tapdisk_me } int -tap_ctl_send_and_receive(int sfd, tapdisk_message_t *message, int timeout) +tap_ctl_send_and_receive(int sfd, tapdisk_message_t * message, + struct timeval *timeout) { int err; @@ -161,7 +141,7 @@ tap_ctl_socket_name(int id) char *name; if (asprintf(&name, "%s/%s%d", - BLKTAP2_CONTROL_DIR, BLKTAP2_CONTROL_SOCKET, id) == -1) + BLKTAP3_CONTROL_DIR, BLKTAP3_CONTROL_SOCKET, id) == -1) return NULL; return name; @@ -216,13 +196,15 @@ tap_ctl_connect_id(int id, int *sfd) } err = tap_ctl_connect(name, sfd); + free(name); return err; } int -tap_ctl_connect_send_and_receive(int id, tapdisk_message_t *message, int timeout) +tap_ctl_connect_send_and_receive(int id, tapdisk_message_t * message, + struct timeval *timeout) { int err, sfd; @@ -235,3 +217,20 @@ tap_ctl_connect_send_and_receive(int id, close(sfd); return err; } + +int +tap_ctl_read_message(int fd, tapdisk_message_t * message, + struct timeval *timeout) +{ + size_t size = sizeof(tapdisk_message_t); + int err; + + err = tap_ctl_read_raw(fd, message, size, timeout); + if (err) + return err; + + DBG("received ''%s'' message (uuid = %u)\n", + tapdisk_message_name(message->type), message->cookie); + + return 0; +}
Thanos Makatos
2012-Dec-04 18:19 UTC
[PATCH 8 of 9 RFC v2] blktap3/libblktapctl: Introduce tapdisk spawn functionality
This patch imports file control/tap-ctl-spawn.c from the existing blktap2 implementation, with most changes coming from blktap2 living in github. Function tap-ctl-spawn is used for spawning a new tapdisk process in order to serve a new virtual block device. diff --git a/tools/blktap2/control/tap-ctl-spawn.c b/tools/blktap3/control/tap-ctl-spawn.c copy from tools/blktap2/control/tap-ctl-spawn.c copy to tools/blktap3/control/tap-ctl-spawn.c --- a/tools/blktap2/control/tap-ctl-spawn.c +++ b/tools/blktap3/control/tap-ctl-spawn.c @@ -31,15 +31,17 @@ #include <unistd.h> #include <stdlib.h> #include <string.h> +#include <signal.h> #include <sys/wait.h> #include "tap-ctl.h" -#include "blktap2.h" +#include "blktap3.h" +#include "_paths.h" static pid_t __tap_ctl_spawn(int *readfd) { - int err, child, channel[2]; + int child, channel[2]; char *tapdisk; if (pipe(channel)) { @@ -71,14 +73,21 @@ static pid_t close(channel[0]); close(channel[1]); - tapdisk = getenv("TAPDISK2"); + tapdisk = getenv("TAPDISK"); if (!tapdisk) - tapdisk = "tapdisk2"; + tapdisk = getenv("TAPDISK2"); - execlp(tapdisk, tapdisk, NULL); + if (tapdisk) { + execlp(tapdisk, tapdisk, NULL); + exit(errno); + } - EPRINTF("exec failed\n"); - exit(1); + execl(TAPDISK_EXECDIR "/" TAPDISK_EXEC, TAPDISK_EXEC, NULL); + + if (errno == ENOENT) + execl(TAPDISK_BUILDDIR "/" TAPDISK_EXEC, TAPDISK_EXEC, NULL); + + exit(errno); } pid_t @@ -90,7 +99,7 @@ tap_ctl_get_pid(const int id) memset(&message, 0, sizeof(message)); message.type = TAPDISK_MESSAGE_PID; - err = tap_ctl_connect_send_and_receive(id, &message, 2); + err = tap_ctl_connect_send_and_receive(id, &message, NULL); if (err) return err; @@ -119,6 +128,12 @@ tap_ctl_wait(pid_t child) if (WIFSIGNALED(status)) { int signo = WTERMSIG(status); EPRINTF("tapdisk2[%d] killed by signal %d\n", child, signo); + if (signo == SIGUSR1) + /* NB. there''s a race between tapdisk''s + * sigaction init and xen-bugtool shooting + * debug signals. If killed by something as + * innocuous as USR1, then retry. */ + return -EAGAIN; return -EINTR; } @@ -139,8 +154,8 @@ tap_ctl_get_child_id(int readfd) } errno = 0; - if (fscanf(f, BLKTAP2_CONTROL_DIR"/" - BLKTAP2_CONTROL_SOCKET"%d", &id) != 1) { + if (fscanf(f, BLKTAP3_CONTROL_DIR "/" + BLKTAP3_CONTROL_SOCKET "%d", &id) != 1) { errno = (errno ? : EINVAL); EPRINTF("parsing id failed: %d\n", errno); id = -1; @@ -158,13 +173,17 @@ tap_ctl_spawn(void) readfd = -1; + again: child = __tap_ctl_spawn(&readfd); if (child < 0) return child; err = tap_ctl_wait(child); - if (err) + if (err) { + if (err == -EAGAIN) + goto again; return err; + } id = tap_ctl_get_child_id(readfd); if (id < 0)
Thanos Makatos
2012-Dec-04 18:19 UTC
[PATCH 9 of 9 RFC v2] blktap3/libblktapctl: Introduce makefile that builds tapback-required libblktapctl functionality
This patch imports control/Makefile from the existing blktap2 implementation, building only the libblktapctl functionality required by the tapback daemon. The rest of the binaries/functionality will be introduced by a later patch. Defines TAPDISK_EXEC, TAPDISK_EXECDIR, and TAPDISK_BUILDDIR are used by tap-ctl-spawn, as it needs to know where the tapdisk binary is located in order to spawn a tapdisk process. Variables TAPDISK_EXEC and TAPDISK_EXECDIR are declared in config/StdGNU.mk and end up as #define''s using buildmakevars2file, just like libxl does. diff --git a/Config.mk b/Config.mk --- a/Config.mk +++ b/Config.mk @@ -147,7 +147,7 @@ define buildmakevars2file-closure $(foreach var, \ SBINDIR BINDIR LIBEXEC LIBDIR SHAREDIR PRIVATE_BINDIR \ XENFIRMWAREDIR XEN_CONFIG_DIR XEN_SCRIPT_DIR XEN_LOCK_DIR \ - XEN_RUN_DIR XEN_PAGING_DIR, \ + XEN_RUN_DIR XEN_PAGING_DIR TAPDISK_EXEC TAPDISK_EXECDIR, \ echo "$(var)=\"$($(var))\"" >>$(1).tmp;) \ $(call move-if-changed,$(1).tmp,$(1)) endef diff --git a/config/StdGNU.mk b/config/StdGNU.mk --- a/config/StdGNU.mk +++ b/config/StdGNU.mk @@ -77,3 +77,6 @@ ifeq ($(lto),y) CFLAGS += -flto LDFLAGS-$(clang) += -plugin LLVMgold.so endif + +TAPDISK_EXEC = "tapdisk" +TAPDISK_EXECDIR = $(LIBEXEC) diff --git a/tools/blktap2/control/Makefile b/tools/blktap3/control/Makefile copy from tools/blktap2/control/Makefile copy to tools/blktap3/control/Makefile --- a/tools/blktap2/control/Makefile +++ b/tools/blktap3/control/Makefile @@ -6,40 +6,45 @@ MINOR = 0 LIBNAME = libblktapctl LIBSONAME = $(LIBNAME).so.$(MAJOR) -IBIN = tap-ctl +genpath-target = $(call buildmakevars2file,_paths.h.tmp) +$(eval $(genpath-target)) -CFLAGS += -Werror -CFLAGS += -Wno-unused -CFLAGS += -I../include -I../drivers -CFLAGS += $(CFLAGS_xeninclude) -CFLAGS += $(CFLAGS_libxenctrl) -CFLAGS += -D_GNU_SOURCE -CFLAGS += -DTAPCTL +_paths.h: genpath + sed -e "s/\([^=]*\)=\(.*\)/#define \1 \2/g" $@.tmp >$@.2.tmp + rm -f $@.tmp + $(call move-if-changed,$@.2.tmp,$@) -CTL_OBJS := tap-ctl-ipc.o +override CFLAGS += \ + -I../include \ + -DTAPDISK_BUILDDIR=''"../drivers"'' \ + $(CFLAGS_xeninclude) \ + $(CFLAGS_libxenctrl) \ + -D_GNU_SOURCE \ + -DTAPCTL \ + -Wall \ + -Wextra \ + -Werror + +# FIXME cause trouble +override CFLAGS += \ + -Wno-type-limits \ + -Wno-missing-field-initializers \ + -Wno-sign-compare + CTL_OBJS += tap-ctl-list.o -CTL_OBJS += tap-ctl-allocate.o -CTL_OBJS += tap-ctl-free.o -CTL_OBJS += tap-ctl-create.o -CTL_OBJS += tap-ctl-destroy.o +CTL_OBJS += tap-ctl-info.o +CTL_OBJS += tap-ctl-xen.o +CTL_OBJS += tap-ctl-ipc.o CTL_OBJS += tap-ctl-spawn.o -CTL_OBJS += tap-ctl-attach.o -CTL_OBJS += tap-ctl-detach.o -CTL_OBJS += tap-ctl-open.o -CTL_OBJS += tap-ctl-close.o -CTL_OBJS += tap-ctl-pause.o -CTL_OBJS += tap-ctl-unpause.o -CTL_OBJS += tap-ctl-major.o -CTL_OBJS += tap-ctl-check.o + +tap-ctl-spawn.o: _paths.h CTL_PICS = $(patsubst %.o,%.opic,$(CTL_OBJS)) -OBJS = $(CTL_OBJS) tap-ctl.o PICS = $(CTL_PICS) LIB_STATIC = $(LIBNAME).a LIB_SHARED = $(LIBSONAME).$(MINOR) -IBIN = tap-ctl all: build @@ -51,27 +56,24 @@ build: $(IBIN) $(LIB_STATIC) $(LIB_SHARE $(LIBSONAME): $(LIB_SHARED) ln -sf $< $@ -tap-ctl: tap-ctl.o $(LIBNAME).so - $(CC) $(LDFLAGS) -o $@ $^ - $(LIB_STATIC): $(CTL_OBJS) $(AR) r $@ $^ $(LIB_SHARED): $(CTL_PICS) $(CC) $(LDFLAGS) -fPIC -Wl,$(SONAME_LDFLAG) -Wl,$(LIBSONAME) $(SHLIB_LDFLAGS) -rdynamic $^ -o $@ -install: $(IBIN) $(LIB_STATIC) $(LIB_SHARED) +install: $(LIB_STATIC) $(LIB_SHARED) $(INSTALL_DIR) -p $(DESTDIR)$(SBINDIR) - $(INSTALL_PROG) $(IBIN) $(DESTDIR)$(SBINDIR) $(INSTALL_DATA) $(LIB_STATIC) $(DESTDIR)$(LIBDIR) $(INSTALL_PROG) $(LIB_SHARED) $(DESTDIR)$(LIBDIR) ln -sf $(LIBSONAME) $(DESTDIR)$(LIBDIR)/$(LIBNAME).so ln -sf $(LIB_SHARED) $(DESTDIR)$(LIBDIR)/$(LIBSONAME) clean: - rm -f $(OBJS) $(PICS) $(DEPS) $(IBIN) $(LIB_STATIC) $(LIB_SHARED) + rm -f $(CTL_OBJS) $(PICS) $(DEPS) $(LIB_STATIC) $(LIB_SHARED) rm -f $(LIBNAME).so $(LIBSONAME) rm -f *~ + rm -f _paths.h .PHONY: all build clean install
Thanos Makatos
2013-Jan-11 15:40 UTC
Re: [PATCH 0 of 9 RFC v2] blktap3: Introduce a small subset of blktap3 files
Ping?> -----Original Message----- > From: Thanos Makatos [mailto:thanos.makatos@citrix.com] > Sent: 04 December 2012 18:20 > To: xen-devel@lists.xensource.com > Cc: Thanos Makatos > Subject: [PATCH 0 of 9 RFC v2] blktap3: Introduce a small subset of > blktap3 files > > blktap3 is a disk back-end driver. It is based on blktap2 but does not > require > the blktap/blkback kernel modules as it allows tapdisk to talk directly > to > blkfront. This primarily simplifies maintenance, and _may_ lead to > performance > improvements. blktap3 is based on a blktap2 fork maintained mostly by > Citrix > (it lives in github), so these changes are also imported, apart from > the > blktap3 ones. > > I''ve organised my upstream effort as follows: > 1. Upstream the smallest possible subset of blktap3 that will allow > guest VMs > to use RAW images backed by blktap3. This will enable early testing > on the > bits introduced by blktap3. > 2. Upstream the remaining of blktap3, most notably back-end drivers, > e.g. VHD > etc. > 3. Import bug fixes from blktap2 living in github. > 4. Import new features and optimisations from blktap2 living in github, > e.g. > the mirroring plug-in. > > blktap3 is made of the following components: > 1. blkfront (not a blktap3 component and already upstream): a virtual > block device driver in the guest VM that receives block I/O requests > and > forwards them to tapdisk via the shared ring. > 2. tapdisk: a user space process that receives block I/O requests from > blkfront, translates them to whatever the current backing file > format is > (i.e. RAW, VHD, qcow etc.), and performs the actual I/O. Apart from > block > I/O requests, the tapdisk also allows basic management of each > virtual > block device, e.g. a device may be temporarily paused. tapdisk > listens to > a loopback socket for such commands. The tap-ctl utility (explained > later) > can be used for managing the tapdisk. > 3. libtapback: a user space library that implements the functionality > required > to access the shared ring. It is used by tapdisk to obtain the block > I/O > requests forwarded by blkfront, and to produce the corresponding > responses. > This is the very "heart" of blktap3, it''s architecture will be > thoroughly > explained by the patch series that introduces it. > 4. tapback: a user space daemon that acts as the back-end of each > virtual > block device: it monitors XenStore for the block front-end''s state > changes, > creates/destroys the shared ring, and instructs the tapdisk to > connect > to/disconnect from the shared ring. It also communicates to the > block > front-end required parameters (e.g. block device size in sectors) > via > XenStore. > 5. libblktapctl: a user space library where the tapdisk management > functions > are implemented. > 6. tap-ctl: a user space utility that allows management of the tapdisk, > uses > libblktapctl. > > The tapdisk is spawned/destroyed by libxl when a domain is > created/destroyed, > in the exact same way as in blktap2. libxl uses libblktapctl for this. > > This patch series introduces a small subset of files required by > tapback (the > tapback daemon is introduced by the next patch series): > - basic blktap3 header files > - a rudimentary implementation of libblktapctl. Only the bits required > by > tapback to manage the tapdisk are introduced, the rest of this > library will > be introduced by later patches. > > --- > Changed since v1: > * In all patches the patch message has been improved. > * Patches 1, 5, and 6 use GPLv2. > * Patch 0: Basic explanation of blktap3''s fundamental components. > * Patch 9: Improved tools/blktap3/control/Makefile by moving hard coded > paths to config/StdGNU.mk.
Ian Campbell
2013-Jan-18 13:39 UTC
Re: [PATCH 0 of 9 RFC v2] blktap3: Introduce a small subset of blktap3 files
On Tue, 2012-12-04 at 18:19 +0000, Thanos Makatos wrote:> 2. tapdisk: a user space process that receives block I/O requests from > blkfront, translates them to whatever the current backing file format is > (i.e. RAW, VHD, qcow etc.), and performs the actual I/O. Apart from block > I/O requests, the tapdisk also allows basic management of each virtual > block device, e.g. a device may be temporarily paused. tapdisk listens to > a loopback socket for such commands. The tap-ctl utility (explained later) > can be used for managing the tapdisk.[...]> 4. tapback: a user space daemon that acts as the back-end of each virtual > block device: it monitors XenStore for the block front-end''s state changes, > creates/destroys the shared ring, and instructs the tapdisk to connect > to/disconnect from the shared ring. It also communicates to the block > front-end required parameters (e.g. block device size in sectors) via > XenStore.There is 1 tapdisk per VBD but how many tapbacks are there? One per VBD as well or one per domain or per driver domain? Is tapback involved in things after the ring has been setup and it has instructed tapdisk to use it? i.e. is it on the datapath at all? Ian.
Ian Campbell
2013-Jan-18 13:45 UTC
Re: [PATCH 1 of 9 RFC v2] blktap3: Introduce blktap3 headers
On Tue, 2012-12-04 at 18:19 +0000, Thanos Makatos wrote:> This patch introduces basic blktap3 header files. > > diff --git a/tools/blktap3/include/blktap3.h b/tools/blktap3/include/blktap3.h > new file mode 100644 > --- /dev/null > +++ b/tools/blktap3/include/blktap3.h > @@ -0,0 +1,47 @@ > +/* > + * Copyright (C) 2012 Citrix Ltd. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version 2 > + * of the License, or (at your option) any later version.I see a mix of GPL and LGPL in this patch (compile.h is LGPL, this is GPL). That''s fine if that is what you meant but it will mean the result is GPL.> + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, > + * USA. > + * > + * Commonly used headers and definitions. > + */ > + > +#ifndef __BLKTAP_3_H__ > +#define __BLKTAP_3_H__ > + > +#include "compiler.h" > + > +/* TODO remove from other files */ > +#include <xen-external/bsd-sys-queue.h> > + > +#define BLKTAP3_CONTROL_NAME "blktap-control" > +#define BLKTAP3_CONTROL_DIR "/var/run/"BLKTAP3_CONTROL_NAME > +#define BLKTAP3_CONTROL_SOCKET "ctl" > + > +#define BLKTAP3_ENOSPC_SIGNAL_FILE "/var/run/tapdisk3-enospc" > + > +/* > + * TODO They may have to change due to macro namespacing. > + */ > +#define TAILQ_MOVE_HEAD(node, src, dst, entry) \ > + TAILQ_REMOVE(src, node, entry); \ > + TAILQ_INSERT_HEAD(dst, node, entry); > + > +#define TAILQ_MOVE_TAIL(node, src, dst, entry) \ > + TAILQ_REMOVE(src, node, entry); \ > + TAILQ_INSERT_TAIL(dst, node, entry); > + > +#endif /* __BLKTAP_3_H__ */ > diff --git a/tools/blktap3/include/compiler.h b/tools/blktap3/include/compiler.h > new file mode 100644 > --- /dev/null > +++ b/tools/blktap3/include/compiler.h > @@ -0,0 +1,42 @@ > +/* > + * Copyright (C) 2012 Citrix Ltd. > + * > + * This library is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * This library is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with this library; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 > + * USA > + */ > + > +#ifndef __COMPILER_H__ > +#define __COMPILER_H__ > + > +#define likely(_cond) __builtin_expect(!!(_cond), 1) > +#define unlikely(_cond) __builtin_expect(!!(_cond), 0) > + > +/* > + * FIXME taken from list.h, do we need to mention anything about the license?In general a good idea to. I assume it is GPL which is not the license at the head of this file (which is LGPL). You could use the LGPL tools/libxl/libxl_internal.h:CONTAINER_OF instead, you could even change the name to containerof...> + */ > +#define containerof(_ptr, _type, _memb) \ > + ((_type*)((void*)(_ptr) - offsetof(_type, _memb))) > + > +#define __printf(a, b) __attribute__((format(printf, a, b))) > +#define __scanf(_f, _a) __attribute__((format (scanf, _f, _a))) > + > +#ifndef ARRAY_SIZEIs someone leaking this into your scope? Can we fix them?> +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) > +#endif /* ARRAY_SIZE */ > + > +#define UNUSED_PARAMETER(x) \ > + (void)(x); > + > +#endif /* __COMPILER_H__ */ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Jan-18 13:49 UTC
Re: [PATCH 2 of 9 RFC v2] blktap3/libblktapctl: Introduce tapdisk message types and structures
On Tue, 2012-12-04 at 18:19 +0000, Thanos Makatos wrote:> + /** > + * Protocol to use: native, 32 bit, or 64 bit. > + * TODO Is this used for supporting a 32 bit guest on a 64 bit hypervisor?Sort of. The proto stuff is to support 32-bit dom0 talking to 64-bit domU and vice versa.> + TAPDISK_MESSAGE_DISK_INFO_RSP, > + TAPDISK_MESSAGE_XENBLKIF_CONNECT, > + TAPDISK_MESSAGE_XENBLKIF_CONNECT_RSP, > + TAPDISK_MESSAGE_XENBLKIF_DISCONNECT, > + TAPDISK_MESSAGE_XENBLKIF_DISCONNECT_RSP, > TAPDISK_MESSAGE_EXIT, > }; > > -static inline char * > -tapdisk_message_name(enum tapdisk_message_id id) > -{ > - switch (id) { > - case TAPDISK_MESSAGE_ERROR: > - return "error"; > +#define TAPDISK_MESSAGE_MAX TAPDISK_MESSAGE_EXITA nice trick here is to include TAPDISK_MESSAGE_MAX in the enum.
Ian Campbell
2013-Jan-18 13:54 UTC
Re: [PATCH 5 of 9 RFC v2] blktap3/libblktapctl: Introduce functionality used by tapback to instruct tapdisk to connect to the sring
On Tue, 2012-12-04 at 18:19 +0000, Thanos Makatos wrote:> + err = tap_ctl_connect_send_and_receive(pid, &message, NULL); > + if (err) > + return err; > + > + if (message.type == TAPDISK_MESSAGE_XENBLKIF_CONNECT_RSP) > + err = -message.u.response.error; > + else > + err = -EINVAL; > + > + return err; > +} > + > +int tap_ctl_disconnect_xenblkif(pid_t pid, int minor, domid_t domid, > + int devid, struct timeval *timeout) > +{ > + tapdisk_message_t message; > + int err; > + > + memset(&message, 0, sizeof(message)); > + message.type = TAPDISK_MESSAGE_XENBLKIF_DISCONNECT; > + message.cookie = minor; > + message.u.blkif.domid = domid; > + message.u.blkif.devid = devid; > + > + err = tap_ctl_connect_send_and_receive(pid, &message, timeout);In the code above you had an "if err return err" here. I''d expect them to behave similarly, whichever is right. If this is a common pattern it might be something to consider pushing into tap_ctl_connect_send_and_receive.> + if (message.type == TAPDISK_MESSAGE_XENBLKIF_CONNECT_RSP)CONNECT_RSP to a DISCONNECT message?> + err = -message.u.response.error; > + else > + err = -EINVAL; > + > + return err; > +}
Ian Campbell
2013-Jan-18 13:55 UTC
Re: [PATCH 6 of 9 RFC v2] blktap3/libblktapctl: Introduce block device control information retrieval functionality
> +int tap_ctl_info(pid_t pid, int minor, unsigned long long *sectors, > + unsigned int *sector_size, unsigned int *info) > +{ > + tapdisk_message_t message; > + int err; > + > + memset(&message, 0, sizeof(message)); > + message.type = TAPDISK_MESSAGE_DISK_INFO; > + message.cookie = minor; > + > + err = tap_ctl_connect_send_and_receive(pid, &message, NULL); > + if (err) > + return err; > + > + if (message.type != TAPDISK_MESSAGE_DISK_INFO_RSP) > + return -EINVAL;Here''s a third pattern to go with the two in the previous patch ;-) Ian.
Ian Campbell
2013-Jan-18 13:59 UTC
Re: [PATCH 9 of 9 RFC v2] blktap3/libblktapctl: Introduce makefile that builds tapback-required libblktapctl functionality
On Tue, 2012-12-04 at 18:19 +0000, Thanos Makatos wrote:> diff --git a/config/StdGNU.mk b/config/StdGNU.mk > --- a/config/StdGNU.mk > +++ b/config/StdGNU.mk > @@ -77,3 +77,6 @@ ifeq ($(lto),y) > CFLAGS += -flto > LDFLAGS-$(clang) += -plugin LLVMgold.so > endif > + > +TAPDISK_EXEC = "tapdisk" > +TAPDISK_EXECDIR = $(LIBEXEC)Do these need to be configurable? I''d be tempted to suggest that you just do #define TAPDISK_EXEC "tapdisk" #define TAPDISK_EXECDIR LIBEXEC in one of you headers. Ian.
Thanos Makatos
2013-Jan-18 16:37 UTC
Re: [PATCH 0 of 9 RFC v2] blktap3: Introduce a small subset of blktap3 files
> > 4. tapback: a user space daemon that acts as the back-end of each > virtual > > block device: it monitors XenStore for the block front-end''s state > changes, > > creates/destroys the shared ring, and instructs the tapdisk to > connect > > to/disconnect from the shared ring. It also communicates to the > block > > front-end required parameters (e.g. block device size in sectors) > via > > XenStore. > > There is 1 tapdisk per VBD but how many tapbacks are there? One per VBD > as well or one per domain or per driver domain?There is one tapback daemon in total, serving all VBDs/domains. The tapback daemon sets a XenStore watch to "backend/<device type>" to serve VBD creations for completely new domains, and then sets an additional watch to the front-end state path for each VBD. I guess there could be one tapback daemon detecting completely new domains, responsible for spawning tapback daemons designated to a specific domain/VBD. I''m not sure of the implications of this approach, though. (We could also have one thread designated to each domain/VBD.) Is there a particular reason why a single tapback daemon wouldn''t suffice?> > Is tapback involved in things after the ring has been setup and it has > instructed tapdisk to use it? i.e. is it on the datapath at all?The tapback daemon is involved only for running the Xenbus protocol when the front-end is set up/torn down -- no participation in the data path.> > Ian.
Ian Campbell
2013-Jan-18 16:46 UTC
Re: [PATCH 0 of 9 RFC v2] blktap3: Introduce a small subset of blktap3 files
On Fri, 2013-01-18 at 16:37 +0000, Thanos Makatos wrote:> > > 4. tapback: a user space daemon that acts as the back-end of each > > virtual > > > block device: it monitors XenStore for the block front-end''s state > > changes, > > > creates/destroys the shared ring, and instructs the tapdisk to > > connect > > > to/disconnect from the shared ring. It also communicates to the > > block > > > front-end required parameters (e.g. block device size in sectors) > > via > > > XenStore. > > > > There is 1 tapdisk per VBD but how many tapbacks are there? One per VBD > > as well or one per domain or per driver domain? > > There is one tapback daemon in total, serving all VBDs/domains.You mean one per backend domain I assume?> The tapback daemon sets a XenStore watch to "backend/<device type>" to > serve VBD creations for completely new domains, and then sets an > additional watch to the front-end state path for each VBD. I guess > there could be one tapback daemon detecting completely new domains, > responsible for spawning tapback daemons designated to a specific > domain/VBD. I''m not sure of the implications of this approach, though. > (We could also have one thread designated to each domain/VBD.) Is > there a particular reason why a single tapback daemon wouldn''t > suffice?No, it sounds reasonable. I was more concerned there might be one per VBD in which case it would have made sense to merge tapdisk and tapback.> > Is tapback involved in things after the ring has been setup and it has > > instructed tapdisk to use it? i.e. is it on the datapath at all? > > The tapback daemon is involved only for running the Xenbus protocol > when the front-end is set up/torn down -- no participation in the data > path.Oh good. Ian.
Thanos Makatos
2013-Jan-18 16:58 UTC
Re: [PATCH 0 of 9 RFC v2] blktap3: Introduce a small subset of blktap3 files
> -----Original Message----- > From: Ian Campbell > Sent: 18 January 2013 16:47 > To: Thanos Makatos > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [PATCH 0 of 9 RFC v2] blktap3: Introduce a > small subset of blktap3 files > > On Fri, 2013-01-18 at 16:37 +0000, Thanos Makatos wrote: > > > > 4. tapback: a user space daemon that acts as the back-end of each > > > virtual > > > > block device: it monitors XenStore for the block front-end''s > state > > > changes, > > > > creates/destroys the shared ring, and instructs the tapdisk to > > > connect > > > > to/disconnect from the shared ring. It also communicates to > the > > > block > > > > front-end required parameters (e.g. block device size in > sectors) > > > via > > > > XenStore. > > > > > > There is 1 tapdisk per VBD but how many tapbacks are there? One per > VBD > > > as well or one per domain or per driver domain? > > > > There is one tapback daemon in total, serving all VBDs/domains. > > You mean one per backend domain I assume?I haven''t thought of that, but I believe there can be only one tapback daemon across multiple driver domains. This is because tapback monitors XenStore path "backend/<device type>" (i.e. backend/vbd), so if there were multiple tapback daemon they would all try to serve this event. Is my understanding correct? If this is true then it''s a serious limitation for driver domains.> > > The tapback daemon sets a XenStore watch to "backend/<device type>" > to > > serve VBD creations for completely new domains, and then sets an > > additional watch to the front-end state path for each VBD. I guess > > there could be one tapback daemon detecting completely new domains, > > responsible for spawning tapback daemons designated to a specific > > domain/VBD. I''m not sure of the implications of this approach, > though. > > (We could also have one thread designated to each domain/VBD.) Is > > there a particular reason why a single tapback daemon wouldn''t > > suffice? > > No, it sounds reasonable. I was more concerned there might be one per > VBD in which case it would have made sense to merge tapdisk and > tapback. > > > > Is tapback involved in things after the ring has been setup and it > has > > > instructed tapdisk to use it? i.e. is it on the datapath at all? > > > > The tapback daemon is involved only for running the Xenbus protocol > > when the front-end is set up/torn down -- no participation in the > data > > path. > > Oh good.I''ll update the patch description with this information.> > Ian. >
Ian Campbell
2013-Jan-18 17:01 UTC
Re: [PATCH 0 of 9 RFC v2] blktap3: Introduce a small subset of blktap3 files
On Fri, 2013-01-18 at 16:58 +0000, Thanos Makatos wrote:> > -----Original Message----- > > From: Ian Campbell > > Sent: 18 January 2013 16:47 > > To: Thanos Makatos > > Cc: xen-devel@lists.xensource.com > > Subject: Re: [Xen-devel] [PATCH 0 of 9 RFC v2] blktap3: Introduce a > > small subset of blktap3 files > > > > On Fri, 2013-01-18 at 16:37 +0000, Thanos Makatos wrote: > > > > > 4. tapback: a user space daemon that acts as the back-end of each > > > > virtual > > > > > block device: it monitors XenStore for the block front-end''s > > state > > > > changes, > > > > > creates/destroys the shared ring, and instructs the tapdisk to > > > > connect > > > > > to/disconnect from the shared ring. It also communicates to > > the > > > > block > > > > > front-end required parameters (e.g. block device size in > > sectors) > > > > via > > > > > XenStore. > > > > > > > > There is 1 tapdisk per VBD but how many tapbacks are there? One per > > VBD > > > > as well or one per domain or per driver domain? > > > > > > There is one tapback daemon in total, serving all VBDs/domains. > > > > You mean one per backend domain I assume? > > I haven''t thought of that, but I believe there can be only one tapback > daemon across multiple driver domains. This is because tapback > monitors XenStore path "backend/<device type>" (i.e. backend/vbd), so > if there were multiple tapback daemon they would all try to serve this > event. Is my understanding correct? If this is true then it''s a > serious limitation for driver domains."backend/<device-type>" is a relative not absolute path so it is relative to the driver domains /local/domid/<DOMID>. I wouldn''t expect that a tapback process in one domain would be able to setup the necessary rings/evtchns in such a way that a tapdisk in another process could use them. Ian.
Thanos Makatos
2013-Jan-18 17:15 UTC
Re: [PATCH 0 of 9 RFC v2] blktap3: Introduce a small subset of blktap3 files
> > > > > There is 1 tapdisk per VBD but how many tapbacks are there? One > > > > > per > > > VBD > > > > > as well or one per domain or per driver domain? > > > > > > > > There is one tapback daemon in total, serving all VBDs/domains. > > > > > > You mean one per backend domain I assume? > > > > I haven''t thought of that, but I believe there can be only one > tapback > > daemon across multiple driver domains. This is because tapback > > monitors XenStore path "backend/<device type>" (i.e. backend/vbd), so > > if there were multiple tapback daemon they would all try to serve > this > > event. Is my understanding correct? If this is true then it''s a > > serious limitation for driver domains. > > "backend/<device-type>" is a relative not absolute path so it is > relative to the driver domains /local/domid/<DOMID>.Then there''s no problem with that. So, going back to your original question, there is one tapback daemon per driver domain. What I don''t understand, however, is how a front-end''s VBD creation request will end up to the correct driver domain. Does the front-end know under which /local/domid tree in XenStore it should write to? Is this explained somewhere?> > I wouldn''t expect that a tapback process in one domain would be able to > setup the necessary rings/evtchns in such a way that a tapdisk in > another process could use them.I guess a tapback daemon could delegate this operation to the tapback running at the appropriate driver domain via XenStore, no?> > Ian.
Ian Campbell
2013-Jan-18 17:25 UTC
Re: [PATCH 0 of 9 RFC v2] blktap3: Introduce a small subset of blktap3 files
On Fri, 2013-01-18 at 17:15 +0000, Thanos Makatos wrote:> > > > > > There is 1 tapdisk per VBD but how many tapbacks are there? One > > > > > > per > > > > VBD > > > > > > as well or one per domain or per driver domain? > > > > > > > > > > There is one tapback daemon in total, serving all VBDs/domains. > > > > > > > > You mean one per backend domain I assume? > > > > > > I haven''t thought of that, but I believe there can be only one > > tapback > > > daemon across multiple driver domains. This is because tapback > > > monitors XenStore path "backend/<device type>" (i.e. backend/vbd), so > > > if there were multiple tapback daemon they would all try to serve > > this > > > event. Is my understanding correct? If this is true then it''s a > > > serious limitation for driver domains. > > > > "backend/<device-type>" is a relative not absolute path so it is > > relative to the driver domains /local/domid/<DOMID>. > > Then there''s no problem with that. So, going back to your original > question, there is one tapback daemon per driver domain. > > What I don''t understand, however, is how a front-end''s VBD creation > request will end up to the correct driver domain. Does the front-end > know under which /local/domid tree in XenStore it should write to? Is > this explained somewhere?The frontend looks in devices/vbd/<foo>/backend which to get the full path to to the backend. The toolstack writes this and knows both the front- and back-end domain ids.> > > > > I wouldn''t expect that a tapback process in one domain would be able to > > setup the necessary rings/evtchns in such a way that a tapdisk in > > another process could use them. > > I guess a tapback daemon could delegate this operation to the tapback > running at the appropriate driver domain via XenStore, no?Yes. In fact I''m wondering why the tapback daemon does all the ring setup, xenbus negotiation anyway. It could just watch "backend/<device-type>" and when it sees a new backend/<device-type>/<domid>/<devid> fork a new "tapdisk --domain <domid> --devid <devid>" and have tapdisk take care of the protocol entirely. Saves a bunch of messaging back and forth I think. I don''t have a problem with the existing model, it just strikes me as a bit odd (but then I don''t really know the architecture) Ian.
Thanos Makatos
2013-Jan-18 17:43 UTC
Re: [PATCH 1 of 9 RFC v2] blktap3: Introduce blktap3 headers
> -----Original Message----- > From: Ian Campbell > Sent: 18 January 2013 13:46 > To: Thanos Makatos > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [PATCH 1 of 9 RFC v2] blktap3: Introduce > blktap3 headers > > On Tue, 2012-12-04 at 18:19 +0000, Thanos Makatos wrote: > > This patch introduces basic blktap3 header files. > > > > diff --git a/tools/blktap3/include/blktap3.h > b/tools/blktap3/include/blktap3.h > > new file mode 100644 > > --- /dev/null > > +++ b/tools/blktap3/include/blktap3.h > > @@ -0,0 +1,47 @@ > > +/* > > + * Copyright (C) 2012 Citrix Ltd. > > + * > > + * This program is free software; you can redistribute it and/or > > + * modify it under the terms of the GNU General Public License > > + * as published by the Free Software Foundation; either version 2 > > + * of the License, or (at your option) any later version. > > I see a mix of GPL and LGPL in this patch (compile.h is LGPL, this is > GPL). That''s fine if that is what you meant but it will mean the result > is GPL.That was accidental, I''ll make compiler.h''s license GPL.> > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > + * > > + * You should have received a copy of the GNU General Public License > > + * along with this program; if not, write to the Free Software > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > 02110-1301, > > + * USA. > > + * > > + * Commonly used headers and definitions. > > + */ > > + > > +#ifndef __BLKTAP_3_H__ > > +#define __BLKTAP_3_H__ > > + > > +#include "compiler.h" > > + > > +/* TODO remove from other files */ > > +#include <xen-external/bsd-sys-queue.h> > > + > > +#define BLKTAP3_CONTROL_NAME "blktap-control" > > +#define BLKTAP3_CONTROL_DIR "/var/run/"BLKTAP3_CONTROL_NAME > > +#define BLKTAP3_CONTROL_SOCKET "ctl" > > + > > +#define BLKTAP3_ENOSPC_SIGNAL_FILE "/var/run/tapdisk3-enospc" > > + > > +/* > > + * TODO They may have to change due to macro namespacing. > > + */ > > +#define TAILQ_MOVE_HEAD(node, src, dst, entry) \ > > + TAILQ_REMOVE(src, node, entry); \ > > + TAILQ_INSERT_HEAD(dst, node, entry); > > + > > +#define TAILQ_MOVE_TAIL(node, src, dst, entry) \ > > + TAILQ_REMOVE(src, node, entry); \ > > + TAILQ_INSERT_TAIL(dst, node, entry); > > + > > +#endif /* __BLKTAP_3_H__ */ > > diff --git a/tools/blktap3/include/compiler.h > b/tools/blktap3/include/compiler.h > > new file mode 100644 > > --- /dev/null > > +++ b/tools/blktap3/include/compiler.h > > @@ -0,0 +1,42 @@ > > +/* > > + * Copyright (C) 2012 Citrix Ltd. > > + * > > + * This library is free software; you can redistribute it and/or > > + * modify it under the terms of the GNU Lesser General Public > > + * License as published by the Free Software Foundation; either > > + * version 2.1 of the License, or (at your option) any later > version. > > + * > > + * This library is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + * Lesser General Public License for more details. > > + * > > + * You should have received a copy of the GNU Lesser General Public > > + * License along with this library; if not, write to the Free > Software > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > 02110-1301 > > + * USA > > + */ > > + > > +#ifndef __COMPILER_H__ > > +#define __COMPILER_H__ > > + > > +#define likely(_cond) __builtin_expect(!!(_cond), 1) > > +#define unlikely(_cond) __builtin_expect(!!(_cond), 0) > > + > > +/* > > + * FIXME taken from list.h, do we need to mention anything about the > license? > > In general a good idea to. I assume it is GPL which is not the license > at the head of this file (which is LGPL). > > You could use the LGPL tools/libxl/libxl_internal.h:CONTAINER_OF > instead, you could even change the name to containerof...That would work, should I include tools/libxl/libxl_internal.h or just copy CONTAINER_OF''s definition?> > > + */ > > +#define containerof(_ptr, _type, _memb) \ > > + ((_type*)((void*)(_ptr) - offsetof(_type, _memb))) > > + > > +#define __printf(a, b) __attribute__((format(printf, a, b))) > > +#define __scanf(_f, _a) __attribute__((format (scanf, _f, _a))) > > + > > +#ifndef ARRAY_SIZE > > Is someone leaking this into your scope? Can we fix them?Ok.> > > +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) > > +#endif /* ARRAY_SIZE */ > > + > > +#define UNUSED_PARAMETER(x) \ > > + (void)(x); > > + > > +#endif /* __COMPILER_H__ */ > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel >
Ian Campbell
2013-Jan-21 10:37 UTC
Re: [PATCH 1 of 9 RFC v2] blktap3: Introduce blktap3 headers
On Fri, 2013-01-18 at 17:43 +0000, Thanos Makatos wrote:> > > +/* > > > + * FIXME taken from list.h, do we need to mention anything about the > > license? > > > > In general a good idea to. I assume it is GPL which is not the license > > at the head of this file (which is LGPL). > > > > You could use the LGPL tools/libxl/libxl_internal.h:CONTAINER_OF > > instead, you could even change the name to containerof... > > That would work, should I include tools/libxl/libxl_internal.h or just > copy CONTAINER_OF''s definition?In the absence of any more central place to put it it would be better to copy. Or maybe move to libxc (not a great fit there though) Ian.
Thanos Makatos
2013-Jan-21 18:16 UTC
Re: [PATCH 9 of 9 RFC v2] blktap3/libblktapctl: Introduce makefile that builds tapback-required libblktapctl functionality
> On Tue, 2012-12-04 at 18:19 +0000, Thanos Makatos wrote: > > diff --git a/config/StdGNU.mk b/config/StdGNU.mk > > --- a/config/StdGNU.mk > > +++ b/config/StdGNU.mk > > @@ -77,3 +77,6 @@ ifeq ($(lto),y) > > CFLAGS += -flto > > LDFLAGS-$(clang) += -plugin LLVMgold.so endif > > + > > +TAPDISK_EXEC = "tapdisk" > > +TAPDISK_EXECDIR = $(LIBEXEC) > > Do these need to be configurable? > > I''d be tempted to suggest that you just do > #define TAPDISK_EXEC "tapdisk" > #define TAPDISK_EXECDIR LIBEXEC > in one of you headers. > > Ian.They were originally specified in the Makefile and you suggested they were put elsewhere:> > diff --git a/tools/blktap2/control/Makefile > > b/tools/blktap3/control/Makefile copy from > > tools/blktap2/control/Makefile copy to tools/blktap3/control/Makefile > > --- a/tools/blktap2/control/Makefile > > +++ b/tools/blktap3/control/Makefile > > @@ -6,40 +6,36 @@ MINOR = 0 > > LIBNAME = libblktapctl > > LIBSONAME = $(LIBNAME).so.$(MAJOR) > > > > -IBIN = tap-ctl > > +override CFLAGS += \ > > + -I../include \ > > + -DTAPDISK_EXEC=''"tapdisk"'' \ > > + -DTAPDISK_EXECDIR=''"/usr/local/libexec"'' \ > > This should come from tools/configure and config/Tools.mk etc rather > than being hard coded here. > > By default it should likely be under /usr/lib/xen (aka LIBEXEC_DIR). > Check out buildmakevars2file in tools/Rules.mk and the use of it in > tools/libxl/Makefile -- you probably want to do something similar.Is the above still valid? I don''t see any reason why they should be configurable, so I''d agree they should be placed in a header file.
Thanos Makatos
2013-Jan-21 18:16 UTC
Re: [PATCH 2 of 9 RFC v2] blktap3/libblktapctl: Introduce tapdisk message types and structures
> On Tue, 2012-12-04 at 18:19 +0000, Thanos Makatos wrote: > > + /** > > + * Protocol to use: native, 32 bit, or 64 bit. > > + * TODO Is this used for supporting a 32 bit guest on a 64 bit > hypervisor? > > Sort of. The proto stuff is to support 32-bit dom0 talking to 64-bit > domU and vice versa.I''ll update the comment.> > > + TAPDISK_MESSAGE_DISK_INFO_RSP, > > + TAPDISK_MESSAGE_XENBLKIF_CONNECT, > > + TAPDISK_MESSAGE_XENBLKIF_CONNECT_RSP, > > + TAPDISK_MESSAGE_XENBLKIF_DISCONNECT, > > + TAPDISK_MESSAGE_XENBLKIF_DISCONNECT_RSP, > > TAPDISK_MESSAGE_EXIT, > > }; > > > > -static inline char * > > -tapdisk_message_name(enum tapdisk_message_id id) -{ > > - switch (id) { > > - case TAPDISK_MESSAGE_ERROR: > > - return "error"; > > +#define TAPDISK_MESSAGE_MAX TAPDISK_MESSAGE_EXIT > > A nice trick here is to include TAPDISK_MESSAGE_MAX in the enum.Ok.
Thanos Makatos
2013-Jan-21 18:16 UTC
Re: [PATCH 5 of 9 RFC v2] blktap3/libblktapctl: Introduce functionality used by tapback to instruct tapdisk to connect to the sring
> > + err = tap_ctl_connect_send_and_receive(pid, &message, NULL); > > + if (err) > > + return err; > > + > > + if (message.type == TAPDISK_MESSAGE_XENBLKIF_CONNECT_RSP) > > + err = -message.u.response.error; > > + else > > + err = -EINVAL; > > + > > + return err; > > +} > > + > > +int tap_ctl_disconnect_xenblkif(pid_t pid, int minor, domid_t domid, > > + int devid, struct timeval *timeout) { > > + tapdisk_message_t message; > > + int err; > > + > > + memset(&message, 0, sizeof(message)); > > + message.type = TAPDISK_MESSAGE_XENBLKIF_DISCONNECT; > > + message.cookie = minor; > > + message.u.blkif.domid = domid; > > + message.u.blkif.devid = devid; > > + > > + err = tap_ctl_connect_send_and_receive(pid, &message, timeout); > > In the code above you had an "if err return err" here. I''d expect them > to behave similarly, whichever is right.An omission, I''ll fix it.> > If this is a common pattern it might be something to consider pushing > into tap_ctl_connect_send_and_receive.I''ll put it in tap_ctl_send_and_receive as it fits better there.> > > + if (message.type == TAPDISK_MESSAGE_XENBLKIF_CONNECT_RSP) > > CONNECT_RSP to a DISCONNECT message?An omission, I''ll fix it.> > > + err = -message.u.response.error; > > + else > > + err = -EINVAL; > > + > > + return err; > > +} >
Thanos Makatos
2013-Jan-21 18:16 UTC
Re: [PATCH 6 of 9 RFC v2] blktap3/libblktapctl: Introduce block device control information retrieval functionality
> > +int tap_ctl_info(pid_t pid, int minor, unsigned long long *sectors, > > + unsigned int *sector_size, unsigned int *info) { > > + tapdisk_message_t message; > > + int err; > > + > > + memset(&message, 0, sizeof(message)); > > + message.type = TAPDISK_MESSAGE_DISK_INFO; > > + message.cookie = minor; > > + > > + err = tap_ctl_connect_send_and_receive(pid, &message, NULL); > > + if (err) > > + return err; > > + > > + if (message.type != TAPDISK_MESSAGE_DISK_INFO_RSP) > > + return -EINVAL; > > Here''s a third pattern to go with the two in the previous patch ;-) > > Ian.Ok.
Ian Campbell
2013-Jan-22 09:28 UTC
Re: [PATCH 9 of 9 RFC v2] blktap3/libblktapctl: Introduce makefile that builds tapback-required libblktapctl functionality
On Mon, 2013-01-21 at 18:16 +0000, Thanos Makatos wrote:> > On Tue, 2012-12-04 at 18:19 +0000, Thanos Makatos wrote: > > > diff --git a/config/StdGNU.mk b/config/StdGNU.mk > > > --- a/config/StdGNU.mk > > > +++ b/config/StdGNU.mk > > > @@ -77,3 +77,6 @@ ifeq ($(lto),y) > > > CFLAGS += -flto > > > LDFLAGS-$(clang) += -plugin LLVMgold.so endif > > > + > > > +TAPDISK_EXEC = "tapdisk" > > > +TAPDISK_EXECDIR = $(LIBEXEC) > > > > Do these need to be configurable? > > > > I''d be tempted to suggest that you just do > > #define TAPDISK_EXEC "tapdisk" > > #define TAPDISK_EXECDIR LIBEXEC > > in one of you headers. > > > > Ian. > > They were originally specified in the Makefile and you suggested they were put elsewhere:Hrm yes so I did. At the time I was imagining you wanted to control these separately from $(LIBEXEC) for some reason but looking at what you''d done here I don''t see why that should be the case.> > > > diff --git a/tools/blktap2/control/Makefile > > > b/tools/blktap3/control/Makefile copy from > > > tools/blktap2/control/Makefile copy to tools/blktap3/control/Makefile > > > --- a/tools/blktap2/control/Makefile > > > +++ b/tools/blktap3/control/Makefile > > > @@ -6,40 +6,36 @@ MINOR = 0 > > > LIBNAME = libblktapctl > > > LIBSONAME = $(LIBNAME).so.$(MAJOR) > > > > > > -IBIN = tap-ctl > > > +override CFLAGS += \ > > > + -I../include \ > > > + -DTAPDISK_EXEC=''"tapdisk"'' \ > > > + -DTAPDISK_EXECDIR=''"/usr/local/libexec"'' \ > > > > This should come from tools/configure and config/Tools.mk etc rather > > than being hard coded here. > > > > By default it should likely be under /usr/lib/xen (aka LIBEXEC_DIR). > > Check out buildmakevars2file in tools/Rules.mk and the use of it in > > tools/libxl/Makefile -- you probably want to do something similar. > > Is the above still valid? I don''t see any reason why they should be configurable, so I''d agree they should be placed in a header file.
Thanos Makatos
2013-Jan-22 17:29 UTC
Re: [PATCH 0 of 9 RFC v2] blktap3: Introduce a small subset of blktap3 files
> In fact I''m wondering why the tapback daemon does all the ring setup, > xenbus negotiation anyway. It could just watch "backend/<device-type>" > and when it sees a new backend/<device-type>/<domid>/<devid> fork a new > "tapdisk --domain <domid> --devid <devid>" and have tapdisk take care > of the protocol entirely. Saves a bunch of messaging back and forth I > think. > > I don''t have a problem with the existing model, it just strikes me as a > bit odd (but then I don''t really know the architecture) > > Ian.I''m not aware of any particular reason for this design option. However, this may turn out to be more interesting/complicated -- I''m including xen-api. In the existing blktap3 prototype, when libxl starts a domain it also spawns the tapdisk. When tapback (formerly known as xenio) detects in XenStore that a front-end wants to connect to the VBD, it needs to locate which tapdisk is designated to serve the corresponding back-end file in order to instruct the tapdisk to connect to the ring. It seems that there is a number of ways for tapback to figure out which tapdisk it must use. I''m not sure whether all these alternatives are possible (please correct me if not), or whether all the necessary information is already written to XenStore. * When libxl spawns the tapdisk, it instructs it to attach to the back-end file. Then, it associates the process ID of the tapdisk with the domain and device Ids (I assume by writing all this to XenStore). When tapback needs to figure out which tapdisk to use, it knows the domain and devices ids so all it has to do is to check XenStore for the corresponding tapdisk. * When libxl spawns the tapdisk, it supplies to it the device and domain ID. It can also tell it which back-end file to use, or write this to XenStore. Tapback can then query all running tapdisk processes and select the one having the specified domain/device ID. It seems that having tapback spawning the tapdisk is simpler, though tapback needs to tell tapdisk which back-end file to use, so it seems that libxl needs to associate the back-end file with the domain and device ids -- I guess this can only be achieved by having libxl writing all this to XenStore. Tapdisk will then take care of running the rest of the XenBus protocol. Which solution is best, given that: * multiple front-ends attached to the same VBD may be desirable * VHD snapshot/coalesce operations need to figure out which tapdisks to pause/refresh * various race conditions should be avoided (?) * other?